[PATCH] Simplify (_Float16) sqrtf((float) a) to .SQRT(a) when a is a _Float16 value.
Similar for sqrt/sqrtl. gcc/ChangeLog: PR target/102464 * match.pd: Simplify (_Float16) sqrtf((float) a) to .SQRT(a) when direct_internal_fn_supported_p, similar for sqrt/sqrtl. gcc/testsuite/ChangeLog: PR target/102464 * gcc.target/i386/pr102464-sqrtph.c: New test. * gcc.target/i386/pr102464-sqrtsh.c: New test. --- gcc/match.pd | 6 +++-- .../gcc.target/i386/pr102464-sqrtph.c | 27 +++ .../gcc.target/i386/pr102464-sqrtsh.c | 23 3 files changed, 54 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c diff --git a/gcc/match.pd b/gcc/match.pd index 5bed2e12715..43d1c1bc0bd 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -6228,14 +6228,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) BUILT_IN_ROUNDEVENL BUILT_IN_ROUNDEVEN BUILT_IN_ROUNDEVENF BUILT_IN_ROUNDL BUILT_IN_ROUND BUILT_IN_ROUNDF BUILT_IN_NEARBYINTL BUILT_IN_NEARBYINT BUILT_IN_NEARBYINTF - BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF) + BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF + BUILT_IN_SQRTL BUILT_IN_SQRT BUILT_IN_SQRTF) tos (IFN_TRUNC IFN_TRUNC IFN_TRUNC IFN_FLOOR IFN_FLOOR IFN_FLOOR IFN_CEIL IFN_CEIL IFN_CEIL IFN_ROUNDEVEN IFN_ROUNDEVEN IFN_ROUNDEVEN IFN_ROUND IFN_ROUND IFN_ROUND IFN_NEARBYINT IFN_NEARBYINT IFN_NEARBYINT - IFN_RINT IFN_RINT IFN_RINT) + IFN_RINT IFN_RINT IFN_RINT + IFN_SQRT IFN_SQRT IFN_SQRT) /* (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc., if x is a _Float16. */ (simplify diff --git a/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c b/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c new file mode 100644 index 000..8bd19c6e65e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c @@ -0,0 +1,27 @@ +/* PR target/102464. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ffast-math -ftree-vectorize" } */ + +#include +void foo1 (_Float16* __restrict a, _Float16* b) +{ + for (int i = 0; i != 8; i++) +a[i] = sqrtf (b[i]); +} + +void foo2 (_Float16* __restrict a, _Float16* b) +{ + for (int i = 0; i != 8; i++) +a[i] = sqrt (b[i]); +} + +void foo3 (_Float16* __restrict a, _Float16* b) +{ + for (int i = 0; i != 8; i++) +a[i] = sqrtl (b[i]); +} + +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */ +/* { dg-final { scan-assembler-not "vcvtph2p\[sd\]" } } */ +/* { dg-final { scan-assembler-not "extendhfxf" } } */ +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*xmm\[0-9\]" 3 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c b/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c new file mode 100644 index 000..4cf0089a67f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c @@ -0,0 +1,23 @@ +/* PR target/102464. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16 -ffast-math" } */ + +#include +_Float16 foo1 (_Float16 a) +{ + return sqrtf (a); +} + +_Float16 foo2 (_Float16 a) +{ + return sqrt (a); +} + +_Float16 foo3 (_Float16 a) +{ + return sqrtl (a); +} + +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */ +/* { dg-final { scan-assembler-not "extendhfxf" } } */ +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 3 } } */ -- 2.18.1
Re: [PATCH,Fortran 1/7] Fortran: make some trans* functions static
Hi Bernhard, what you're doing seems a useful clean-up, thanks. One point for discussion: -match +static match gfc_match_label (void) I have generally understood that the gfc_ prefix is for global variables and functions only. We do not always adhere to it (also since some global functions were made static previously), but I think we should stick to it, unless other people think otherwise :-) Best regards Thomas
Re: [PATCH] Convert strlen pass from evrp to ranger.
On 10/24/2021 8:15 PM, Jeff Law wrote: On 10/18/2021 2:17 AM, Aldy Hernandez wrote: On 10/18/21 12:52 AM, Jeff Law wrote: On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: The following patch converts the strlen pass from evrp to ranger, leaving DOM as the last remaining user. So is there any reason why we can't convert DOM as well? DOM's use of EVRP is pretty limited. You've mentioned FP bits before, but my recollection is those are not part of the EVRP analysis DOM uses. Hell, give me a little guidance and I'll do the work... Not only will I take you up on that offer, but I can provide 90% of the work. Here be dragons, though (well, for me, maybe not for you ;-)). [ ... ] So the failure I see it a bootstrap comparison failure affecting omp-expand.c and cp/cp-gimplify.c. We end up generating different code with and without debug symbols. Replying to myself So we're getting different results from a call to fold_range_internal for this statement in bb #35 of expand_omp_target: (gdb) p debug_gimple_stmt (stmt) if (loop_171 != 0B) 259 res = fold_range_internal (r, s, NULL_TREE); (gdb) n 283 if (idx) (gdb) p res $60 = true (gdb) p r $61 = (irange &) @0x7fffdb20: {m_num_ranges = 1 '\001', m_max_ranges = 2 '\002', m_kind = VR_RANGE, m_base = 0x7fffdb30} vs 259 res = fold_range_internal (r, s, NULL_TREE); (gdb) 283 if (idx) (gdb) p res $16 = true (gdb) p r $17 = (irange &) @0x7fffdba0: {m_num_ranges = 1 '\001', m_max_ranges = 2 '\002', m_kind = VR_VARYING, m_base = 0x7fffdbb0} Anyway, not sure when I'll be able to look at this again, perhaps Wednesday. But my sense is something isn't right WRT the range of loop_171. Jeff
[PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]
Hi, As PR102767 shows, the commit r12-3482 exposed one ICE in function rs6000_builtin_vectorization_cost. We claims V1TI supports movmisalign on rs6000 (See define_expand "movmisalign"), so it return true in rs6000_builtin_support_vector_misalignment for misalign 8. Later in the cost querying rs6000_builtin_vectorization_cost, we don't have the arms to handle the V1TI input under (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN). The proposed fix is to add the consideration for V1TI, simply make it as the cost for doubleword which is apparently bigger than the cost of scalar, won't have the vectorization to happen, just to keep consistency and avoid ICE. Another thought is to not support movmisalign for V1TI, but it sounds like a bad idea since it doesn't match the reality. Bootstrapped and regtested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8. Is it ok for trunk? BR, Kewen - gcc/ChangeLog: PR target/102767 * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Consider V1T1 mode for unaligned load and store. gcc/testsuite/ChangeLog: PR target/102767 * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index b7ea1483da5..73d3e06c3fc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN) { elements = TYPE_VECTOR_SUBPARTS (vectype); - if (elements == 2) + /* See PR102767, consider V1TI to keep consistency. */ + if (elements == 2 || elements == 1) /* Double word aligned. */ return 4; @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN) { -elements = TYPE_VECTOR_SUBPARTS (vectype); -if (elements == 2) - /* Double word aligned. */ - return 2; + elements = TYPE_VECTOR_SUBPARTS (vectype); + /* See PR102767, consider V1TI to keep consistency. */ + if (elements == 2 || elements == 1) + /* Double word aligned. */ + return 2; if (elements == 4) { diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 new file mode 100644 index 000..a4122482989 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 @@ -0,0 +1,21 @@ +! { dg-require-effective-target powerpc_vsx_ok } +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" } + +INTERFACE + FUNCTION elemental_mult (a, b, c) +type(*), DIMENSION(..) :: a, b, c + END +END INTERFACE + +allocatable z +integer, dimension(2,2) :: a, b +call test_CFI_address +contains + subroutine test_CFI_address +if (elemental_mult (z, x, y) .ne. 0) stop +a = reshape ([4,3,2,1], [2,2]) +b = reshape ([2,3,4,5], [2,2]) +if (elemental_mult (i, a, b) .ne. 0) stop + end +end +
[PATCH] vect: Don't update inits for simd_lane_access DRs [PR102789]
Hi, As PR102789 shows, when vectorizer does some peelings for alignment in prologue, function vect_update_inits_of_drs would update the inits of some drs. But as the failed case, we shouldn't update the dr for simd_lane_access, it has the fixed-length storage mainly for the main loop, the update can make the access out of bound and access the unexpected elements. I tried to test this broadly to ensure it's safe, since I was not sure if it's reasonable to exclude all kinds of simd_lane_access drs. The testings didn't catch any failures, I hope this is on the right track. It's bootstrapped and regtested on: - x86_64-redhat-linux - aarch64-linux-gnu - powerpc64le-linux-gnu P9 - powerpc64-linux-gnu P8 and P7. Is it ok for trunk? BR, Kewen - gcc/ChangeLog: PR tree-optimization/102789 * tree-vect-loop-manip.c (vect_update_inits_of_drs): Do not update inits of simd_lane_access. diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 4988c93fdb6..378b1026baa 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -1820,7 +1820,8 @@ vect_update_inits_of_drs (loop_vec_info loop_vinfo, tree niters, FOR_EACH_VEC_ELT (datarefs, i, dr) { dr_vec_info *dr_info = loop_vinfo->lookup_dr (dr); - if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt)) + if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt) + && !STMT_VINFO_SIMD_LANE_ACCESS_P (dr_info->stmt)) vect_update_init_of_dr (dr_info, niters, code); } }
[PATCH] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]
If the second operand of __builtin_shuffle is const vector 0, and with specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv. gcc/ChangeLog: * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add patterns match and emit for VSX xxpermdi. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr102868.c: New test. --- gcc/config/rs6000/rs6000.c | 47 -- gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 + 2 files changed, 97 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index d0730253bcc..5d802c1fa96 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct : CODE_FOR_p8_vmrgew_v4sf_direct, - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}}; + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}, +{OPTION_MASK_VSX, + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi + : CODE_FOR_vsx_xxpermdi_v16qi), + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}}, +{OPTION_MASK_VSX, + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi + : CODE_FOR_vsx_xxpermdi_v16qi), + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}}, +{OPTION_MASK_VSX, + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi + : CODE_FOR_vsx_xxpermdi_v16qi), + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}}, +{OPTION_MASK_VSX, + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi + : CODE_FOR_vsx_xxpermdi_v16qi), + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}}; unsigned int i, j, elt, which; unsigned char perm[16]; @@ -23169,6 +23185,27 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, machine_mode omode = insn_data[icode].operand[0].mode; machine_mode imode = insn_data[icode].operand[1].mode; + rtx perm_idx = GEN_INT (0); + if (icode == CODE_FOR_vsx_xxpermdi_v16qi) + { + int perm_val = 0; + if (one_vec) + { + if (perm[0] == 8) + perm_val |= 2; + if (perm[8] == 8) + perm_val |= 1; + } + else + { + if (perm[0] != 0) + perm_val |= 2; + if (perm[8] != 16) + perm_val |= 1; + } + perm_idx = GEN_INT (perm_val); + } + /* For little-endian, don't use vpkuwum and vpkuhum if the underlying vector type is not V4SI and V8HI, respectively. For example, using vpkuwum with a V8HI picks up the even @@ -23192,7 +23229,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering from 0 to 2N-1. */ - if (swapped ^ !BYTES_BIG_ENDIAN) + if (swapped ^ !BYTES_BIG_ENDIAN + && icode != CODE_FOR_vsx_xxpermdi_v16qi) std::swap (op0, op1); if (imode != V16QImode) { @@ -23203,7 +23241,10 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, x = target; else x = gen_reg_rtx (omode); - emit_insn (GEN_FCN (icode) (x, op0, op1)); + if (icode == CODE_FOR_vsx_xxpermdi_v16qi) + emit_insn (GEN_FCN (icode) (x, op0, op1, perm_idx)); + else + emit_insn (GEN_FCN (icode) (x, op0, op1)); if (omode != V16QImode) emit_move_insn (target, gen_lowpart (V16QImode, x)); return true; diff --git a/gcc/testsuite/gcc.target/powerpc/pr102868.c b/gcc/testsuite/gcc.target/powerpc/pr102868.c new file mode 100644 index 000..eb45d193f66 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr102868.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#include +vector float b = {0.0f, 0.0f, 0.0f, 0.0f}; + + +vector float foo1 (vector float x) +{ + vector int c = {0, 1, 4, 5}; + return __builtin_shuffle (x, b, c); +} + +vector float foo2 (vector float x) +{ + vector int c = {2, 3, 4, 5}; + return __builtin_shuffle (x, b, c); +} + +vector float foo3 (vector float x) +{ + vector int c = {0, 1, 6, 7}; + return __builtin_shuffle (x, b, c); +} + +vector float foo4 (vector float x) +{ + vector int c = {2, 3, 6, 7}; + return __builtin_shuffle (x, b, c); +} + +vector unsigned char foo5 (vector unsigned char x) +{ +
Re: [PATCH v3] detect out-of-bounds stores by atomic functions [PR102453]
On 10/24/2021 5:40 PM, Martin Sebor via Gcc-patches wrote: Attached is a revised patch for just the access warning pass to diagnose out-of-bounds stores by atomic functions, with no attr-fnspec changes. Is this okay for trunk? Martin PS Just to clarify the effect of the original patch in case it wasn't: it didn't enable optimizations of atomic built-ins. It just made it possible, by first calling the new atomic_builtin_fnspec() to get the fnspec, and then by actually doing something with it. The original patch did not modify builtin_fnspec() to call the new atomic_builtin_fnspec(). But since you seem to have reservations about exposing the attribute in any form I have withdrawn the original patch and replaced it with the more limited one. I'm letting Richi take the lead here. But I did notice a tiny nit: + + /* Tyhe size in bytes of the access by the function, and the number s/Tyhe/The/
Re: [PATCH] improve handling of aggregates in sprintf [PR 102238, 102919]
On 10/24/2021 5:43 PM, Martin Sebor via Gcc-patches wrote: The detection of overlapping sprintf calls has a limitation that leads to both false positives (PR 102919) and negatives (PR 102238) in corner cases involving members of aggregates. The false positives result from the overlap logic not using the size of the member used as an argument to %s to constrain the length of the directive output. The false negatives are due to the logic failing to determine the identity of a member from the address or reference to the enclosing object and an offset. The attached patch improves the detection logic to handle both sets of cases. In addition, it moves the utility functions used to implement the logic from the sprintf pass into pointer-query where they can be used for other purposes in the future (my work in progress). Tested on x86_64-linux and by building Glibc and verifying it doesn't cause any new warnings, Martin gcc-102238.diff PR tree-optimization/102238 - alias_offset in gimple-ssa-sprintf.c is broken PR tree-optimization/102919 - spurious -Wrestrict warning for sprintf into the same member array as argument plus offset gcc/ChangeLog: PR tree-optimization/102238 PR tree-optimization/102919 * gimple-ssa-sprintf.c (get_string_length): Ad an argument. (array_elt_at_offset): Move to pointer-query. (set_aggregate_size_and_offset): New function. (field_at_offset): Move to pointer-query. (get_origin_and_offset): Rename... (get_origin_and_offset_r): this. Add an argument. Make aggregate handling more robust. (get_origin_and_offset): New. (alias_offset): Add an argument. (format_string): Use subobject size determined by get_origin_and_offset. * pointer-query.cc (field_at_offset): Move from gimple-ssa-sprintf.c. Improve/correct handling of aggregates. (array_elt_at_offset): Same. * pointer-query.h (field_at_offset): Declare. (array_elt_at_offset): Declare. gcc/testsuite/ChangeLog: PR tree-optimization/102238 PR tree-optimization/102919 * gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: Remove warnings. * gcc.dg/Wrestrict-23.c: New test. Given you know this code better than anyone. OK. jeff
Re: [PATCH] Convert strlen pass from evrp to ranger.
On 10/18/2021 2:17 AM, Aldy Hernandez wrote: On 10/18/21 12:52 AM, Jeff Law wrote: On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: The following patch converts the strlen pass from evrp to ranger, leaving DOM as the last remaining user. So is there any reason why we can't convert DOM as well? DOM's use of EVRP is pretty limited. You've mentioned FP bits before, but my recollection is those are not part of the EVRP analysis DOM uses. Hell, give me a little guidance and I'll do the work... Not only will I take you up on that offer, but I can provide 90% of the work. Here be dragons, though (well, for me, maybe not for you ;-)). [ ... ] So the failure I see it a bootstrap comparison failure affecting omp-expand.c and cp/cp-gimplify.c. We end up generating different code with and without debug symbols. The real differences start in dom2 (I guess that's a positive since that's the pass the patch changes). Stripping away the DEBUG statements in the IL, then diffing the .dom2 output shows this: *** Optimizing block #35 *** 2133,2138 --- 2133,2139 LKUP STMT loop_1045 = PHI 2>>> STMT loop_1045 = PHI STMT loop_1045 = PHI + Replaced 'loop_1045' with variable 'single_outer_732' Registering value_relation (loop_1045 == single_outer_732) (bb35) at loop_1045 = PHI Optimizing statement if (loop_1045 != 0B) Replaced 'loop_1045' with variable 'single_outer_732' *** Optimizing statement if (loop_1045 != 0B *** 2140,2156 Visiting conditional with predicate: if (single_outer_732 != 0B) With known ranges ! single_outer_732: struct loop * [1B, +INF] ! Predicate evaluates to: 1 ! 0>>> COPY loop_1045 = 0B ! COPY loop_1045 = 0B Optimizing block #565 ! 1>>> STMT 1 = loop_1045 ne_expr 0B ! 1>>> STMT 0 = loop_1045 eq_expr 0B Optimizing block #36 --- 2141,2158 Visiting conditional with predicate: if (single_outer_732 != 0B) With known ranges ! single_outer_732: struct loop * VARYING ! Predicate evaluates to: DON'T KNOW ! LKUP STMT single_outer_732 ne_expr 0B ! 0>>> COPY single_outer_732 = 0B ! COPY single_outer_732 = 0B Optimizing block #565 ! 1>>> STMT 1 = single_outer_732 ne_expr 0B ! 1>>> STMT 0 = single_outer_732 eq_expr 0B Optimizing block #36 The first hunk is the stage1 compiler, the second is the stage2 compiler. Stage2 does a replacement of the LHS with the RHS of a generate PHI. But the stage1 compiler is able to statically compute a test while the stage2 compiler is not. And things cascade from there. I think all that means there's some kind of inconsistency in the const_and_copies table between the stage1 and stage2 compilers. Not sure how that's possible, but that's what the signs point to. jeff
Re: [PATCH] i386: Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A) and combine FADD(A, FMUL(B, C)) to FMA(B, C, A).
On Fri, Oct 22, 2021 at 1:57 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > This patch is to support transform in fast-math something like > _mm512_add_ph(x1, _mm512_fmadd_pch(a, b, _mm512_setzero_ph())) to > _mm512_fmadd_pch(a, b, x1). > > And support transform _mm512_add_ph(x1, _mm512_fmul_pch(a, b)) to > _mm512_fmadd_pch(a, b, x1). > Ok for master? LGTM. Also please add cfma_optab/conj_cfma_optab, so vectorizer can catch some complex fma pattern match optimization. > > gcc/ChangeLog: > > * config/i386/sse.md (fma__fadd_fmul): Add new > define_insn_and_split. > (fma__fadd_fcmul):Likewise > (fma___fma_zero):Likewise > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512fp16-complex-fma.c: New test. > --- > gcc/config/i386/sse.md| 52 +++ > .../gcc.target/i386/avx512fp16-complex-fma.c | 18 +++ > 2 files changed, 70 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-complex-fma.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index > fbf056bf9e6..36407ca4a59 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -5958,6 +5958,58 @@ >[(set_attr "type" "ssemuladd") > (set_attr "mode" "")]) > > +(define_insn_and_split "fma__fadd_fmul" > + [(set (match_operand:VF_AVX512FP16VL 0 "register_operand") > + (plus:VF_AVX512FP16VL > + (unspec:VF_AVX512FP16VL > + [(match_operand:VF_AVX512FP16VL 1 "vector_operand") > +(match_operand:VF_AVX512FP16VL 2 "vector_operand")] > +UNSPEC_COMPLEX_FMUL) > + (match_operand:VF_AVX512FP16VL 3 "vector_operand")))] > + "TARGET_AVX512FP16 && flag_unsafe_math_optimizations > + && ix86_pre_reload_split()" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (unspec:VF_AVX512FP16VL > + [(match_dup 1) (match_dup 2) (match_dup 3)] > + UNSPEC_COMPLEX_FMA))]) > + > +(define_insn_and_split "fma__fadd_fcmul" > + [(set (match_operand:VF_AVX512FP16VL 0 "register_operand") > + (plus:VF_AVX512FP16VL > + (unspec:VF_AVX512FP16VL > + [(match_operand:VF_AVX512FP16VL 1 "vector_operand") > +(match_operand:VF_AVX512FP16VL 2 "vector_operand")] > +UNSPEC_COMPLEX_FCMUL) > + (match_operand:VF_AVX512FP16VL 3 "vector_operand")))] > + "TARGET_AVX512FP16 && flag_unsafe_math_optimizations > + && ix86_pre_reload_split()" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (unspec:VF_AVX512FP16VL > + [(match_dup 1) (match_dup 2) (match_dup 3)] > + UNSPEC_COMPLEX_FCMA))]) > + > +(define_insn_and_split "fma___fma_zero" > + [(set (match_operand:VF_AVX512FP16VL 0 "register_operand") > + (plus:VF_AVX512FP16VL > + (unspec:VF_AVX512FP16VL > + [(match_operand:VF_AVX512FP16VL 1 "vector_operand") > +(match_operand:VF_AVX512FP16VL 2 "vector_operand") > +(match_operand:VF_AVX512FP16VL 3 "const0_operand")] > +UNSPEC_COMPLEX_F_C_MA) > + (match_operand:VF_AVX512FP16VL 4 "vector_operand")))] > + "TARGET_AVX512FP16 && flag_unsafe_math_optimizations > + && ix86_pre_reload_split()" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (unspec:VF_AVX512FP16VL > + [(match_dup 1) (match_dup 2) (match_dup 4)] > + UNSPEC_COMPLEX_F_C_MA))]) > + > (define_insn "___mask" >[(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=") > (vec_merge:VF_AVX512FP16VL > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-complex-fma.c > b/gcc/testsuite/gcc.target/i386/avx512fp16-complex-fma.c > new file mode 100644 > index 000..2dfd369e785 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-complex-fma.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512fp16 -O2 -Ofast" } */ > +/* { dg-final { scan-assembler-times "vfmaddcph\[ > +\\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+( > +?:\n|\[ \\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-not "vaddph\[ > +\\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+( > +?:\n|\[ \\t\]+#)"} } */ > +/* { dg-final { scan-assembler-not "vfmulcph\[ > +\\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+( > +?:\n|\[ \\t\]+#)"} } */ > +/* { dg-final { scan-assembler-times "vfcmaddcph\[ > +\\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+( > +?:\n|\[ \\t\]+#)" 2 } } */ > + > +#include > +volatile __m512h x1, x2, res, a, b; > +void extern > +avx512f_test (void) > +{ > + res = _mm512_add_ph (x1, _mm512_fmadd_pch (a, b, > +_mm512_setzero_ph())); > + res = _mm512_add_ph (x1, _mm512_fcmadd_pch (a, b, > +_mm512_setzero_ph())); > + > + res = _mm512_add_ph (x1, _mm512_fmul_pch (a, b)); > + res = _mm512_add_ph (x1, _mm512_fcmul_pch (a, b)); } > -- > 2.18.1 > -- BR, Hongtao
Re: [PATCH] Convert strlen pass from evrp to ranger.
On 10/23/2021 3:32 PM, Jeff Law wrote: On 10/21/2021 12:20 PM, Jeff Law wrote: So if we're referring to those temporary const/copy propagations "escaping" into Ranger, then I would fully expect that to cause problems. Essentially they're path sensitive const/copy propagations and may not be valid on all the paths through the CFG to the statement where the propagation occurs Yeah. disabling the global ranger should help, plus making sure you don't use the ranger in the midst of the path sensitive changes. I think we should first try to remove those temporary const/copy propagations. As I noted in a different follow-up, I can't remember if they were done as part of the original non-copying threader or if they enabled further optimizations in the copying threader. If its the former, then they can go away and that would be my preference. I'll try to poke at that over the weekend. OK. So those temporary propagations are still useful. Here's a simple example (pr36550, but there are others): Actually, isn't pr36550 the one you already noted? I saw a few other issues when I just removed that chunk of code. First, we get an *execution* failure in c-torture/builtins/ That one was exceedingly strange. I didn't save it, but it'll almost definitely raise its ugly head again. Wnon-null-4.C. We fail to thread a path and as a result trigger a bogus warning One of the other W* tests failed with a bogus warning too. But it's fixed by some pending work from Martin S, so I didn't worry much about it. So in all, not too bad. I may do some instrumentation for the pr36550 issue -- while it's not showing a lot of fallout in the testsuite, it would be interesting to see how it affects codegen on gcc itself. Jeff
[PATCH] improve handling of aggregates in sprintf [PR 102238, 102919]
The detection of overlapping sprintf calls has a limitation that leads to both false positives (PR 102919) and negatives (PR 102238) in corner cases involving members of aggregates. The false positives result from the overlap logic not using the size of the member used as an argument to %s to constrain the length of the directive output. The false negatives are due to the logic failing to determine the identity of a member from the address or reference to the enclosing object and an offset. The attached patch improves the detection logic to handle both sets of cases. In addition, it moves the utility functions used to implement the logic from the sprintf pass into pointer-query where they can be used for other purposes in the future (my work in progress). Tested on x86_64-linux and by building Glibc and verifying it doesn't cause any new warnings, Martin PR tree-optimization/102238 - alias_offset in gimple-ssa-sprintf.c is broken PR tree-optimization/102919 - spurious -Wrestrict warning for sprintf into the same member array as argument plus offset gcc/ChangeLog: PR tree-optimization/102238 PR tree-optimization/102919 * gimple-ssa-sprintf.c (get_string_length): Ad an argument. (array_elt_at_offset): Move to pointer-query. (set_aggregate_size_and_offset): New function. (field_at_offset): Move to pointer-query. (get_origin_and_offset): Rename... (get_origin_and_offset_r): this. Add an argument. Make aggregate handling more robust. (get_origin_and_offset): New. (alias_offset): Add an argument. (format_string): Use subobject size determined by get_origin_and_offset. * pointer-query.cc (field_at_offset): Move from gimple-ssa-sprintf.c. Improve/correct handling of aggregates. (array_elt_at_offset): Same. * pointer-query.h (field_at_offset): Declare. (array_elt_at_offset): Declare. gcc/testsuite/ChangeLog: PR tree-optimization/102238 PR tree-optimization/102919 * gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: Remove warnings. * gcc.dg/Wrestrict-23.c: New test. diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index 8e90b7cfc43..cc679845137 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -2024,8 +2024,8 @@ format_floating (const directive , tree arg, range_query *) Used by the format_string function below. */ static fmtresult -get_string_length (tree str, gimple *stmt, unsigned eltsize, - range_query *query) +get_string_length (tree str, gimple *stmt, unsigned HOST_WIDE_INT max_size, + unsigned eltsize, range_query *query) { if (!str) return fmtresult (); @@ -2065,6 +2065,20 @@ get_string_length (tree str, gimple *stmt, unsigned eltsize, && (!lendata.maxbound || lenmax <= tree_to_uhwi (lendata.maxbound)) && lenmax <= tree_to_uhwi (lendata.maxlen)) { + if (max_size > 0 && max_size < HOST_WIDE_INT_MAX) + { + /* Adjust the conservative unknown/unbounded result if MAX_SIZE + is valid. Set UNLIKELY to maximum in case MAX_SIZE refers + to a subobject. + TODO: This is overly conservative. Set UNLIKELY to the size + of the outermost enclosing declared object. */ + fmtresult res (0, max_size - 1); + res.nonstr = lendata.decl; + res.range.likely = res.range.max; + res.range.unlikely = HOST_WIDE_INT_MAX; + return res; + } + fmtresult res; res.nonstr = lendata.decl; return res; @@ -2203,110 +2217,80 @@ format_character (const directive , tree arg, range_query *query) return res.adjust_for_width_or_precision (dir.width); } -/* Determine the offset *INDEX of the first byte of an array element of - TYPE (possibly recursively) into which the byte offset OFF points. - On success set *INDEX to the offset of the first byte and return type. - Otherwise, if no such element can be found, return null. */ +/* If TYPE is an array or struct or union, increment *FLDOFF by the starting + offset of the member that *OFF point into and set *FLDSIZE to its size + in bytes and decrement *OFF by the same. Otherwise do nothing. */ -static tree -array_elt_at_offset (tree type, HOST_WIDE_INT off, HOST_WIDE_INT *index) +static void +set_aggregate_size_and_offset (tree type, HOST_WIDE_INT *fldoff, + HOST_WIDE_INT *fldsize, HOST_WIDE_INT *off) { - gcc_assert (TREE_CODE (type) == ARRAY_TYPE); - - tree eltype = type; - while (TREE_CODE (TREE_TYPE (eltype)) == ARRAY_TYPE) -eltype = TREE_TYPE (eltype); - - if (TYPE_MODE (TREE_TYPE (eltype)) != TYPE_MODE (char_type_node)) -eltype = TREE_TYPE (eltype); - - if (eltype == type) -{ - *index = 0; - return type; -} - - HOST_WIDE_INT typsz = int_size_in_bytes (type); - HOST_WIDE_INT eltsz = int_size_in_bytes (eltype); - if (off < typsz * eltsz) + /* The byte offset of the most basic struct member the byte + offset *OFF corresponds to, or for a (multidimensional) + array member, the byte offset of the array element. */ + if (TREE_CODE (type) == ARRAY_TYPE + &&
[PATCH v3] detect out-of-bounds stores by atomic functions [PR102453]
Attached is a revised patch for just the access warning pass to diagnose out-of-bounds stores by atomic functions, with no attr-fnspec changes. Is this okay for trunk? Martin PS Just to clarify the effect of the original patch in case it wasn't: it didn't enable optimizations of atomic built-ins. It just made it possible, by first calling the new atomic_builtin_fnspec() to get the fnspec, and then by actually doing something with it. The original patch did not modify builtin_fnspec() to call the new atomic_builtin_fnspec(). But since you seem to have reservations about exposing the attribute in any form I have withdrawn the original patch and replaced it with the more limited one. On 10/13/21 2:15 AM, Richard Biener wrote: On Tue, Oct 12, 2021 at 9:44 PM Martin Sebor wrote: On 10/12/21 12:52 AM, Richard Biener wrote: On Mon, Oct 11, 2021 at 11:25 PM Martin Sebor wrote: The attached change extends GCC's warnings for out-of-bounds stores to cover atomic (and __sync) built-ins. Rather than hardcoding the properties of these built-ins just for the sake of the out-of-bounds detection, on the assumption that it might be useful for future optimizations as well, I took the approach of extending class attr_fnspec to express their special property that they encode the size of the access in their name. I also took the liberty of making attr_fnspec assignable (something the rest of my patch relies on), and updating some comments for the characters the class uses to encode function properties, based on my understanding of their purpose. Tested on x86_64-linux. Hmm, so you place 'A' at an odd place (where the return value is specified), but you do not actually specify the behavior on the return value. Shoudln't + 'A'specifies that the function atomically accesses a constant + 1 << N bytes where N is indicated by character 3+2i maybe read 'A' specifies that the function returns the memory pointed to by argument one of size 1 << N bytes where N is indicated by character 3 +2i accessed atomically ? I didn't think the return value would be interesting because in general (parallel accesses) it's not related (in an observable way) to the value of the dereferenced operand. Not all the built-ins also return a value (e.g., atomic_store), and whether or not one does return the argument would need to be encoded somehow because it cannot be determined from the return type (__atomic_compare_exchange and __atomic_test_and_set return bool that's not necessarily the value of the operand). Also, since the functions return the operand value either before or after the update, we'd need another letter to describe that. (This alone could be dealt with simply by using 'A' and 'a', but that's not enough for the other cases.) So with all these possibilities I don't think encoding the return value at this point is worthwhile. If/when this enhancement turns out to be used for optimization and we think encoding the return value would be helpful, I'd say let's revisit it then. The accessor APIs should make it a fairly straightforward exercise. I though it would be useful for points-to analysis since knowing how the return value is composed improves the points-to result for it. Note that IPA mod-ref now synthesizes fn-spec and might make use of 'A' if it were not narrowly defined. Sure it's probably difficult to fully specify the RMW cycle that's eventually done but since we have a way to specify a non-constant size of accesses as passed by a parameter it would be nice to allow specifying a constant size anyhow. It just occured to me we could use "fake" parameters to encode those, so for void foo (int *); use like ". R2c4" saying that parameter 1 is read with the size specified by (non-existing) parameter 2 which is specified as 'c'onstant 1 << 4. Alternatively a constant size specification could use alternate encoding 'a' to 'f'. That said, if 'A' is not suppose to specify the return value it shouldn't be in the return value specification... I also wonder if it's necessary to constrain this to 'atomic' accesses for the purpose of the patch and whether that detail could be omitted to eventually make more use of it? I pondered the same question but I couldn't think of any other built-ins with similar semantics (read-write-modify, return a result either pre- or post-modification), so I opted for simplicity. I am open to generalizing it if/when there is a function I could test it with, although I'm not sure the current encoding scheme has enough letters and letter positions to describe the effects in their full generality. Likewise + '0'...'9' specifies the size of value written/read is given either + by the specified argument, or for atomic functions, by + 2 ^ N where N is the constant value denoted by the character should mention (excluding '0') for the argument position. Sure, I'll update the comment if you think
Re: [PATCH,Fortran 0/7] delete some unused decls, make static
On Mon, 25 Oct 2021 00:30:16 +0200 Bernhard Reutner-Fischer wrote: > Hi! > > Quickly skimming through the frontend headers. I'm also attaching the other view for the fortran FE after the header cleanup: python3 $topsrc/contrib/unused_functions.py gcc/fortran/ \ grep -v "gt_" for a guesstimate list of Symbol 'foo' declared extern but never referenced externally Down to about 50 for f951 as we want to keep the debug ones of course. For other language frontends see the head of the script; Back then there was no D nor modula2, and a go sample is missing, too. Should be rather straight forward if anyone is curious. You can just abbreviate the list of objects that are used to link your frontend. Archives are supposedly handled fine, at least last time i tried. HTH, gcc/fortran/match.o: Symbol 'type_param_spec_list' declared extern but never referenced externally gcc/fortran/openmp.o: Symbol 'gfc_free_expr_list(gfc_expr_list*)' declared extern but never referenced externally gcc/fortran/openmp.o: Symbol 'gfc_free_omp_declare_simd(gfc_omp_declare_simd*)' declared extern but never referenced externally gcc/fortran/openmp.o: Symbol 'gfc_match_omp_context_selector(gfc_omp_set_selector*)' declared extern but never referenced externally gcc/fortran/openmp.o: Symbol 'gfc_match_omp_context_selector_specification(gfc_omp_declare_variant*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_class_vtab_hash_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_class_vtab_extends_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_vptr_extends_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_class_vtab_def_init_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_vptr_def_init_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_class_vtab_copy_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_vptr_copy_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_vptr_final_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-expr.o: Symbol 'gfc_class_vtab_deallocate_get(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-decl.o: Symbol 'module_decl_hasher::hash(tree_node*)' declared extern but never referenced externally gcc/fortran/trans-decl.o: Symbol 'gfor_fndecl_set_args' declared extern but never referenced externally gcc/fortran/trans-decl.o: Symbol 'gfor_fndecl_set_convert' declared extern but never referenced externally gcc/fortran/trans-decl.o: Symbol 'gfor_fndecl_set_record_marker' declared extern but never referenced externally gcc/fortran/trans-decl.o: Symbol 'gfor_fndecl_set_max_subrecord_length' declared extern but never referenced externally gcc/fortran/simplify.o: Symbol 'gfc_simplify_get_team(gfc_expr*)' declared extern but never referenced externally gcc/fortran/simplify.o: Symbol 'simplify_ieee_selected_real_kind(gfc_expr*)' declared extern but never referenced externally gcc/fortran/simplify.o: Symbol 'simplify_ieee_support(gfc_expr*)' declared extern but never referenced externally gcc/fortran/decl.o: Symbol 'gfc_match_null(gfc_expr**)' declared extern but never referenced externally gcc/fortran/decl.o: Symbol 'gfc_insert_kind_parameter_exprs(gfc_expr*)' declared extern but never referenced externally gcc/fortran/decl.o: Symbol 'check_bind_name_identifier(char**)' declared extern but never referenced externally gcc/fortran/decl.o: Symbol 'gfc_mod_pointee_as(gfc_array_spec*)' declared extern but never referenced externally gcc/fortran/module.o: Symbol 'mio_symbol_ref(gfc_symbol**)' declared extern but never referenced externally gcc/fortran/module.o: Symbol 'mio_interface_rest(gfc_interface**)' declared extern but never referenced externally gcc/fortran/trans-intrinsic.o: Symbol 'specific_intrinsic_symbol(gfc_expr*)' declared extern but never referenced externally gcc/fortran/resolve.o: Symbol 'gfc_elemental(gfc_symbol*)' declared extern but never referenced externally gcc/fortran/trans-openmp.o: Symbol 'gfc_trans_oacc_declare(gfc_code*)' declared extern but never referenced externally gcc/fortran/primary.o: Symbol 'matching_actual_arglist' declared extern but never referenced externally gcc/fortran/symbol.o: Symbol 'gfc_drop_last_undo_checkpoint()' declared extern but never referenced externally gcc/fortran/symbol.o: Symbol 'gfc_restore_last_undo_checkpoint()' declared extern but never referenced externally gcc/fortran/symbol.o: Symbol 'gfc_get_ultimate_derived_super_type(gfc_symbol*)' declared extern but never referenced externally gcc/fortran/gfortranspec.o: Symbol 'lang_specific_pre_link()' declared extern but never
[PATCH,Fortran 1/7] Fortran: make some trans* functions static
From: Bernhard Reutner-Fischer This makes some trans* functions static and deletes declarations of functions that either do not exist anymore like gfc_get_function_decl or that are unused like gfc_check_any_c_kind. gcc/fortran/ChangeLog: * expr.c (is_non_empty_structure_constructor): Make static. * gfortran.h (gfc_check_any_c_kind): Delete. * match.c (gfc_match_label): Make static. * match.h (gfc_match_label): Delete declaration. * scanner.c (file_changes_cur, file_changes_count, file_changes_allocated): Make static. * trans-expr.c (gfc_get_character_len): Make static. (gfc_class_len_or_zero_get): Make static. (VTAB_GET_FIELD_GEN): Undefine. (gfc_get_class_array_ref): Make static. (gfc_finish_interface_mapping): Make static. * trans-types.c (gfc_check_any_c_kind): Delete. (pfunc_type_node, dtype_type_node, gfc_get_ppc_type): Make static. * trans-types.h (gfc_get_ppc_type): Delete declaration. * trans.c (gfc_msg_wrong_return): Delete. * trans.h (gfc_class_len_or_zero_get, gfc_class_vtab_extends_get, gfc_vptr_extends_get, gfc_get_class_array_ref, gfc_get_character_len, gfc_finish_interface_mapping, gfc_msg_wrong_return, gfc_get_function_decl): Delete declaration. --- gcc/fortran/expr.c| 2 +- gcc/fortran/gfortran.h| 1 - gcc/fortran/match.c | 2 +- gcc/fortran/match.h | 1 - gcc/fortran/scanner.c | 4 ++-- gcc/fortran/trans-expr.c | 10 +- gcc/fortran/trans-types.c | 25 +++-- gcc/fortran/trans-types.h | 1 - gcc/fortran/trans.c | 1 - gcc/fortran/trans.h | 11 --- 10 files changed, 12 insertions(+), 46 deletions(-) diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c index b19d3a26c60..4dea840e348 100644 --- a/gcc/fortran/expr.c +++ b/gcc/fortran/expr.c @@ -4817,7 +4817,7 @@ gfc_apply_init (gfc_typespec *ts, symbol_attribute *attr, gfc_expr *init) /* Check whether an expression is a structure constructor and whether it has other values than NULL. */ -bool +static bool is_non_empty_structure_constructor (gfc_expr * e) { if (e->expr_type != EXPR_STRUCTURE) diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 66192c07d8c..f7662c59a5d 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -3284,7 +3284,6 @@ bool gfc_check_character_range (gfc_char_t, int); extern bool gfc_seen_div0; /* trans-types.c */ -bool gfc_check_any_c_kind (gfc_typespec *); int gfc_validate_kind (bt, int, bool); int gfc_get_int_kind_from_width_isofortranenv (int size); int gfc_get_real_kind_from_width_isofortranenv (int size); diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c index 53a575e616e..91cde55d7a1 100644 --- a/gcc/fortran/match.c +++ b/gcc/fortran/match.c @@ -599,7 +599,7 @@ cleanup: it. We also make sure the symbol does not refer to another (active) block. A matched label is pointed to by gfc_new_block. */ -match +static match gfc_match_label (void) { char name[GFC_MAX_SYMBOL_LEN + 1]; diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index 21e94f79d95..eb9459ea99c 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -47,7 +47,6 @@ match gfc_match_space (void); match gfc_match_eos (void); match gfc_match_small_literal_int (int *, int *); match gfc_match_st_label (gfc_st_label **); -match gfc_match_label (void); match gfc_match_small_int (int *); match gfc_match_small_int_expr (int *, gfc_expr **); match gfc_match_name (char *); diff --git a/gcc/fortran/scanner.c b/gcc/fortran/scanner.c index 5a450692ba3..69b81ab97f8 100644 --- a/gcc/fortran/scanner.c +++ b/gcc/fortran/scanner.c @@ -78,8 +78,8 @@ static struct gfc_file_change gfc_linebuf *lb; int line; } *file_changes; -size_t file_changes_cur, file_changes_count; -size_t file_changes_allocated; +static size_t file_changes_cur, file_changes_count; +static size_t file_changes_allocated; static gfc_char_t *last_error_char; diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 2d7f9e0fb91..e7aec3845d3 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -45,7 +45,7 @@ along with GCC; see the file COPYING3. If not see /* Calculate the number of characters in a string. */ -tree +static tree gfc_get_character_len (tree type) { tree len; @@ -278,7 +278,7 @@ gfc_class_len_get (tree decl) /* Try to get the _len component of a class. When the class is not unlimited poly, i.e. no _len field exists, then return a zero node. */ -tree +static tree gfc_class_len_or_zero_get (tree decl) { tree len; @@ -382,7 +382,7 @@ VTAB_GET_FIELD_GEN (def_init, VTABLE_DEF_INIT_FIELD) VTAB_GET_FIELD_GEN (copy, VTABLE_COPY_FIELD) VTAB_GET_FIELD_GEN (final, VTABLE_FINAL_FIELD) VTAB_GET_FIELD_GEN (deallocate, VTABLE_DEALLOCATE_FIELD) - +#undef VTAB_GET_FIELD_GEN /* The size field is returned as an array
[PATCH,Fortran 2/7] Fortran: make some match* functions static
From: Bernhard Reutner-Fischer gfc_match_small_int_expr was unused, delete it. gfc_match_gcc_unroll should use gfc_match_small_literal_int and then gfc_match_small_int can be deleted since it will be unused. gcc/fortran/ChangeLog: * decl.c (gfc_match_old_kind_spec, set_com_block_bind_c, set_verify_bind_c_sym, set_verify_bind_c_com_block, get_bind_c_idents, gfc_match_suffix, gfc_get_type_attr_spec, (check_extended_derived_type): Make static. (gfc_match_gcc_unroll): Add comment. * match.c (gfc_match_small_int_expr): Delete definition. * match.h (gfc_match_small_int_expr): Delete declaration. (gfc_match_name_C, gfc_match_old_kind_spec, set_com_block_bind_c, set_verify_bind_c_sym, set_verify_bind_c_com_block, get_bind_c_idents, gfc_match_suffix, gfc_get_type_attr_spec): Delete declaration. --- gcc/fortran/decl.c | 15 --- gcc/fortran/match.c | 26 -- gcc/fortran/match.h | 9 - 3 files changed, 8 insertions(+), 42 deletions(-) diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 6043e100fbb..1e034d1b344 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -3128,7 +3128,7 @@ cleanup: This assumes that the byte size is equal to the kind number for non-COMPLEX types, and equal to twice the kind number for COMPLEX. */ -match +static match gfc_match_old_kind_spec (gfc_typespec *ts) { match m; @@ -5867,7 +5867,7 @@ set_binding_label (const char **dest_label, const char *sym_name, /* Set the status of the given common block as being BIND(C) or not, depending on the given parameter, is_bind_c. */ -void +static void set_com_block_bind_c (gfc_common_head *com_block, int is_bind_c) { com_block->is_bind_c = is_bind_c; @@ -6055,7 +6055,7 @@ verify_bind_c_sym (gfc_symbol *tmp_sym, gfc_typespec *ts, the type is C interoperable. Errors are reported by the functions used to set/test these fields. */ -bool +static bool set_verify_bind_c_sym (gfc_symbol *tmp_sym, int num_idents) { bool retval = true; @@ -6075,7 +6075,7 @@ set_verify_bind_c_sym (gfc_symbol *tmp_sym, int num_idents) /* Set the fields marking the given common block as BIND(C), including a binding label, and report any errors encountered. */ -bool +static bool set_verify_bind_c_com_block (gfc_common_head *com_block, int num_idents) { bool retval = true; @@ -6095,7 +6095,7 @@ set_verify_bind_c_com_block (gfc_common_head *com_block, int num_idents) /* Retrieve the list of one or more identifiers that the given bind(c) attribute applies to. */ -bool +static bool get_bind_c_idents (void) { char name[GFC_MAX_SYMBOL_LEN + 1]; @@ -6804,7 +6804,7 @@ match_result (gfc_symbol *function, gfc_symbol **result) clause and BIND(C), either one, or neither. The draft does not require them to come in a specific order. */ -match +static match gfc_match_suffix (gfc_symbol *sym, gfc_symbol **result) { match is_bind_c; /* Found bind(c). */ @@ -10116,7 +10116,7 @@ check_extended_derived_type (char *name) not a handled attribute, and MATCH_YES otherwise. TODO: More error checking on attribute conflicts needs to be done. */ -match +static match gfc_get_type_attr_spec (symbol_attribute *attr, char *name) { /* See if the derived type is marked as private. */ @@ -11794,6 +11794,7 @@ gfc_match_gcc_unroll (void) { int value; + /* FIXME: use gfc_match_small_literal_int instead, delete small_int */ if (gfc_match_small_int () == MATCH_YES) { if (value < 0 || value > USHRT_MAX) diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c index 91cde55d7a1..5d07f897e45 100644 --- a/gcc/fortran/match.c +++ b/gcc/fortran/match.c @@ -530,32 +530,6 @@ gfc_match_small_int (int *value) } -/* This function is the same as the gfc_match_small_int, except that - we're keeping the pointer to the expr. This function could just be - removed and the previously mentioned one modified, though all calls - to it would have to be modified then (and there were a number of - them). Return MATCH_ERROR if fail to extract the int; otherwise, - return the result of gfc_match_expr(). The expr (if any) that was - matched is returned in the parameter expr. */ - -match -gfc_match_small_int_expr (int *value, gfc_expr **expr) -{ - match m; - int i; - - m = gfc_match_expr (expr); - if (m != MATCH_YES) -return m; - - if (gfc_extract_int (*expr, , 1)) -m = MATCH_ERROR; - - *value = i; - return m; -} - - /* Matches a statement label. Uses gfc_match_small_literal_int() to do most of the work. */ diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index eb9459ea99c..e9368db281d 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -48,9 +48,7 @@ match gfc_match_eos (void); match gfc_match_small_literal_int (int *, int *); match gfc_match_st_label (gfc_st_label **); match gfc_match_small_int (int *);
[PATCH,Fortran 5/7] Fortran: Delete unused decl in trans-stmt.h
From: Bernhard Reutner-Fischer gcc/fortran/ChangeLog: * trans-stmt.h (gfc_trans_deallocate_array): Delete. --- gcc/fortran/trans-stmt.h | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/fortran/trans-stmt.h b/gcc/fortran/trans-stmt.h index 1a24d9b4cdc..e824caf4d08 100644 --- a/gcc/fortran/trans-stmt.h +++ b/gcc/fortran/trans-stmt.h @@ -66,7 +66,6 @@ tree gfc_trans_sync_team (gfc_code *); tree gfc_trans_where (gfc_code *); tree gfc_trans_allocate (gfc_code *); tree gfc_trans_deallocate (gfc_code *); -tree gfc_trans_deallocate_array (tree); /* trans-openmp.c */ tree gfc_trans_omp_directive (gfc_code *); -- 2.33.0
[PATCH,Fortran 4/7] Fortran: make some trans-array functions static
From: Bernhard Reutner-Fischer gcc/fortran/ChangeLog: * trans-array.c (gfc_trans_scalarized_loop_end): Make static. * trans-array.h (gfc_trans_scalarized_loop_end, gfc_conv_tmp_ref, gfc_conv_array_transpose): Delete declaration. --- gcc/fortran/trans-array.c | 2 +- gcc/fortran/trans-array.h | 6 -- 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index bceb8b24ba4..5ceb261b698 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -4161,7 +4161,7 @@ gfc_start_scalarized_body (gfc_loopinfo * loop, stmtblock_t * pbody) /* Generates the actual loop code for a scalarization loop. */ -void +static void gfc_trans_scalarized_loop_end (gfc_loopinfo * loop, int n, stmtblock_t * pbody) { diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h index 1d3dc4819eb..12068c742a5 100644 --- a/gcc/fortran/trans-array.h +++ b/gcc/fortran/trans-array.h @@ -118,8 +118,6 @@ void gfc_copy_loopinfo_to_se (gfc_se *, gfc_loopinfo *); /* Marks the start of a scalarized expression, and declares loop variables. */ void gfc_start_scalarized_body (gfc_loopinfo *, stmtblock_t *); -/* Generates one actual loop for a scalarized expression. */ -void gfc_trans_scalarized_loop_end (gfc_loopinfo *, int, stmtblock_t *); /* Generates the actual loops for a scalarized expression. */ void gfc_trans_scalarizing_loops (gfc_loopinfo *, stmtblock_t *); /* Mark the end of the main loop body and the start of the copying loop. */ @@ -137,8 +135,6 @@ tree gfc_build_null_descriptor (tree); void gfc_conv_array_ref (gfc_se *, gfc_array_ref *, gfc_expr *, locus *); /* Translate a reference to a temporary array. */ void gfc_conv_tmp_array_ref (gfc_se * se); -/* Translate a reference to an array temporary. */ -void gfc_conv_tmp_ref (gfc_se *); /* Calculate the overall offset, including subreferences. */ void gfc_get_dataptr_offset (stmtblock_t*, tree, tree, tree, bool, gfc_expr*); @@ -149,8 +145,6 @@ void gfc_conv_expr_descriptor (gfc_se *, gfc_expr *); /* Convert an array for passing as an actual function parameter. */ void gfc_conv_array_parameter (gfc_se *, gfc_expr *, bool, const gfc_symbol *, const char *, tree *); -/* Evaluate and transpose a matrix expression. */ -void gfc_conv_array_transpose (gfc_se *, gfc_expr *); /* These work with both descriptors and descriptorless arrays. */ tree gfc_conv_array_data (tree); -- 2.33.0
[PATCH,Fortran 6/7] Fortran: Delete unused decl in trans-types.h
From: Bernhard Reutner-Fischer gcc/fortran/ChangeLog: * trans-types.h (gfc_convert_function_code): Delete. --- gcc/fortran/trans-types.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/gcc/fortran/trans-types.h b/gcc/fortran/trans-types.h index 1b43503092b..3bc236cad0d 100644 --- a/gcc/fortran/trans-types.h +++ b/gcc/fortran/trans-types.h @@ -65,9 +65,6 @@ enum gfc_packed { PACKED_STATIC }; -/* be-function.c */ -void gfc_convert_function_code (gfc_namespace *); - /* trans-types.c */ void gfc_init_kinds (void); void gfc_init_types (void); -- 2.33.0
[PATCH,Fortran 7/7] Fortran: Delete unused decl in intrinsic.h
From: Bernhard Reutner-Fischer gcc/fortran/ChangeLog: * intrinsic.h (gfc_check_sum, gfc_resolve_atan2d, gfc_resolve_kill, gfc_resolve_kill_sub): Delete declaration. --- gcc/fortran/intrinsic.h | 4 1 file changed, 4 deletions(-) diff --git a/gcc/fortran/intrinsic.h b/gcc/fortran/intrinsic.h index 2148f89e194..7511df1 100644 --- a/gcc/fortran/intrinsic.h +++ b/gcc/fortran/intrinsic.h @@ -168,7 +168,6 @@ bool gfc_check_spread (gfc_expr *, gfc_expr *, gfc_expr *); bool gfc_check_srand (gfc_expr *); bool gfc_check_stat (gfc_expr *, gfc_expr *); bool gfc_check_storage_size (gfc_expr *, gfc_expr *); -bool gfc_check_sum (gfc_expr *, gfc_expr *, gfc_expr *); bool gfc_check_symlnk (gfc_expr *, gfc_expr *); bool gfc_check_team_number (gfc_expr *); bool gfc_check_transf_bit_intrins (gfc_actual_arglist *); @@ -459,7 +458,6 @@ void gfc_resolve_asinh (gfc_expr *, gfc_expr *); void gfc_resolve_atan (gfc_expr *, gfc_expr *); void gfc_resolve_atanh (gfc_expr *, gfc_expr *); void gfc_resolve_atan2 (gfc_expr *, gfc_expr *, gfc_expr *); -void gfc_resolve_atan2d (gfc_expr *, gfc_expr *, gfc_expr *); void gfc_resolve_atomic_def (gfc_code *); void gfc_resolve_atomic_ref (gfc_code *); void gfc_resolve_besn (gfc_expr *, gfc_expr *, gfc_expr *); @@ -542,7 +540,6 @@ void gfc_resolve_rshift (gfc_expr *, gfc_expr *, gfc_expr *); void gfc_resolve_lshift (gfc_expr *, gfc_expr *, gfc_expr *); void gfc_resolve_ishft (gfc_expr *, gfc_expr *, gfc_expr *); void gfc_resolve_ishftc (gfc_expr *, gfc_expr *, gfc_expr *, gfc_expr *); -void gfc_resolve_kill (gfc_expr *, gfc_expr *, gfc_expr *); void gfc_resolve_lbound (gfc_expr *, gfc_expr *, gfc_expr *, gfc_expr *); void gfc_resolve_lcobound (gfc_expr *, gfc_expr *, gfc_expr *, gfc_expr *); void gfc_resolve_len (gfc_expr *, gfc_expr *, gfc_expr *); @@ -658,7 +655,6 @@ void gfc_resolve_gmtime (gfc_code *); void gfc_resolve_hostnm_sub (gfc_code *); void gfc_resolve_idate (gfc_code *); void gfc_resolve_itime (gfc_code *); -void gfc_resolve_kill_sub (gfc_code *); void gfc_resolve_lstat_sub (gfc_code *); void gfc_resolve_ltime (gfc_code *); void gfc_resolve_mvbits (gfc_code *); -- 2.33.0
[PATCH,Fortran 3/7] Fortran: make some constructor* functions static
From: Bernhard Reutner-Fischer gfc_constructor_expr_foreach and gfc_constructor_swap were just stubs. gcc/fortran/ChangeLog: * constructor.c (gfc_constructor_get_base): Make static. (gfc_constructor_expr_foreach, (gfc_constructor_swap): Delete. * constructor.h (gfc_constructor_get_base): Remove declaration. (gfc_constructor_expr_foreach, (gfc_constructor_swap): Delete. --- gcc/fortran/constructor.c | 20 ++-- gcc/fortran/constructor.h | 10 -- 2 files changed, 2 insertions(+), 28 deletions(-) diff --git a/gcc/fortran/constructor.c b/gcc/fortran/constructor.c index 3e4377a5ad3..4b5a748271b 100644 --- a/gcc/fortran/constructor.c +++ b/gcc/fortran/constructor.c @@ -85,7 +85,8 @@ gfc_constructor_get (void) return c; } -gfc_constructor_base gfc_constructor_get_base (void) +static gfc_constructor_base +gfc_constructor_get_base (void) { return splay_tree_new (splay_tree_compare_ints, NULL, node_free); } @@ -209,23 +210,6 @@ gfc_constructor_lookup_expr (gfc_constructor_base base, int offset) } -int -gfc_constructor_expr_foreach (gfc_constructor *ctor ATTRIBUTE_UNUSED, - int(*f)(gfc_expr *) ATTRIBUTE_UNUSED) -{ - gcc_assert (0); - return 0; -} - -void -gfc_constructor_swap (gfc_constructor *ctor ATTRIBUTE_UNUSED, - int n ATTRIBUTE_UNUSED, int m ATTRIBUTE_UNUSED) -{ - gcc_assert (0); -} - - - gfc_constructor * gfc_constructor_first (gfc_constructor_base base) { diff --git a/gcc/fortran/constructor.h b/gcc/fortran/constructor.h index 85a72dcfc0e..25cd6a8f192 100644 --- a/gcc/fortran/constructor.h +++ b/gcc/fortran/constructor.h @@ -23,8 +23,6 @@ along with GCC; see the file COPYING3. If not see /* Get a new constructor structure. */ gfc_constructor *gfc_constructor_get (void); -gfc_constructor_base gfc_constructor_get_base (void); - /* Copy a constructor structure. */ gfc_constructor_base gfc_constructor_copy (gfc_constructor_base base); @@ -64,14 +62,6 @@ gfc_constructor *gfc_constructor_lookup (gfc_constructor_base base, int n); */ gfc_expr *gfc_constructor_lookup_expr (gfc_constructor_base base, int n); - -int gfc_constructor_expr_foreach (gfc_constructor *ctor, int(*)(gfc_expr *)); - - -void gfc_constructor_swap (gfc_constructor *ctor, int n, int m); - - - /* Get the first constructor node in the constructure structure. Returns NULL if there is no such expression. */ gfc_constructor *gfc_constructor_first (gfc_constructor_base base); -- 2.33.0
[PATCH,Fortran 0/7] delete some unused decls, make static
Hi! Quickly skimming through the frontend headers. There are a couple of declarations for functions that do not have definitions. And there are a couple of functions that can be static. Notes i took while at it / TODOs: - get rid of VTAB_GET_FIELD_GEN and unused extern decls - The last block of gfc_trans_vla_one_sizepos() could simply use gfc_evaluate_now_function_scope(). Passing down unshared val of course. - trans-expr.c has one use of gfc_evaluate_now_loc() but should simply use gfc_evaluate_now() there; Calling ...now_loc(input_location) is superfluous - s/mane/name/;# in git grep -w mane gcc/fortran/ - delete gfc_match_small_int, use gfc_match_small_literal_int instead - move gfc_match_null definition up before first user, make it static and delete decl from match.h - gfc_cpp_add_include_path_after move up, make static, rm external decl - gfc_walk_array_ref move up, make static, rm external decl - delete unused gfc_copy_only_alloc_comp ? - delete unused gfc_conv_descriptor_attribute ? - gfc_build_nan str arg is "" always. Delete parameter and handling? - delete unused gfc_simplify_get_team or wire it up in intrinsics, get_team handling (instead of the NULL..) Anyone who does coarrays might want to fill in the missing get_team() simplify and add an appropriate test. That's the only thing that i will not do as i once was more into MPI and verbs so won't ever do coarrays ;) Bootstraps fine, regression tests running over night. Ok for trunk if it passes? thanks, Bernhard Reutner-Fischer (7): Fortran: make some trans* functions static Fortran: make some match* functions static Fortran: make some constructor* functions static Fortran: make some trans-array functions static Fortran: Delete unused decl in trans-stmt.h Fortran: Delete unused decl in trans-types.h Fortran: Delete unused decl in intrinsic.h gcc/fortran/constructor.c | 20 ++-- gcc/fortran/constructor.h | 10 -- gcc/fortran/decl.c| 15 --- gcc/fortran/expr.c| 2 +- gcc/fortran/gfortran.h| 1 - gcc/fortran/intrinsic.h | 4 gcc/fortran/match.c | 28 +--- gcc/fortran/match.h | 10 -- gcc/fortran/scanner.c | 4 ++-- gcc/fortran/trans-array.c | 2 +- gcc/fortran/trans-array.h | 6 -- gcc/fortran/trans-expr.c | 10 +- gcc/fortran/trans-stmt.h | 1 - gcc/fortran/trans-types.c | 25 +++-- gcc/fortran/trans-types.h | 4 gcc/fortran/trans.c | 1 - gcc/fortran/trans.h | 11 --- 17 files changed, 23 insertions(+), 131 deletions(-) -- 2.33.0
Re: [PATCH v2 0/4] libffi: Sync with upstream
On Sun, Oct 24, 2021 at 1:36 PM Iain Sandoe wrote: > > Hi H.J. > > > On 19 Oct 2021, at 19:01, H.J. Lu via Gcc-patches > > wrote: > > > > On Tue, Oct 19, 2021 at 8:03 AM David Edelsohn wrote: > >> > > >> My colleague built GCC, including GCC Go, with your patch: > >> > >> "I was able to build libgo and test it partially. The results are > >> similar to the current master without libffi updates. But 64bit tests > >> aren't working in both cases. It's related to LIBPATH issues..." > >> > > > > Thanks for checking. I will rebase and retest. If there is no regression, > > I will check them in. > > At r12-4638. > > It seems that there are quite a few m32 libffi fails on x86_64-linux-gnu > [gcc123]. > > It seems that all-languages bootstrap is broken on BE powerpc64-linux-gnu > [gcc110]: > > ../../../src-patched/libffi/src/powerpc/linux64_closure.S:404: Error: > unrecognized opcode: `lvx' > make[4]: *** [src/powerpc/linux64_closure.lo] Error 1 > > could you take a look please? Does it fail in libffi upstream? If yes, please open an issue in libffi upstream. If not, why does it fail in GCC? -- H.J.
Re: [PATCH v2 0/4] libffi: Sync with upstream
Hi H.J. > On 19 Oct 2021, at 19:01, H.J. Lu via Gcc-patches > wrote: > > On Tue, Oct 19, 2021 at 8:03 AM David Edelsohn wrote: >> >> My colleague built GCC, including GCC Go, with your patch: >> >> "I was able to build libgo and test it partially. The results are >> similar to the current master without libffi updates. But 64bit tests >> aren't working in both cases. It's related to LIBPATH issues..." >> > > Thanks for checking. I will rebase and retest. If there is no regression, > I will check them in. At r12-4638. It seems that there are quite a few m32 libffi fails on x86_64-linux-gnu [gcc123]. It seems that all-languages bootstrap is broken on BE powerpc64-linux-gnu [gcc110]: ../../../src-patched/libffi/src/powerpc/linux64_closure.S:404: Error: unrecognized opcode: `lvx' make[4]: *** [src/powerpc/linux64_closure.lo] Error 1 could you take a look please? thanks Iain
Re: [PATCH] PR fortran/102917 - PDT type parameters are not restricted to default integer
On Sun, Oct 24, 2021 at 09:00:52PM +0200, Harald Anlauf wrote: > Dear Fortranners, Steve, > > I've created PR 102917 for tracking this issue and packaged > the attached patch. > > Regtested on x86_64-pc-linux-gnu. OK mainline? > Thanks for picking this up. The patch looks good to me, but you may want to have Thomas or Tobias cast a quick glance over it. -- Steve
RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
Hi, Attached is a new version of the patch, mainly for improving performance and simplifying the code. First, regarding the comments: > -Original Message- > From: Richard Biener > Sent: Friday, October 1, 2021 9:00 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with > "equivalence map" for condition prediction > > On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS > wrote: > > > > Sorry about updating on this after so long. It took me much time to work > > out a > > new plan and pass the tests. > > > > The new idea is to use one variable to represent a set of equal variables at > > some basic-block. This variable is called a "equivalence head" or > > "equiv-head" > > in the code. (There's no-longer a "equivalence map".) > > > > - Initially an SSA_NAME's "equivalence head" is its value number. Temporary > > equivalence heads are recorded as unary NOP_EXPR results in the > > vn_nary_op_t > > map. Besides, when inserting into vn_nary_op_t map, make the new result at > > front of the vn_pval list, so that when searching for a variable's > > equivalence head, the first result represents the largest equivalence set > > at > > current location. > > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry. > > For recorded equivalences, the reference is result->entry; for normal > > N-ary > > operations, the reference is operand->entry. > > - When recording equivalences, if one side A is constant or has more refs, > > make > > it the new equivalence head of the other side B. Traverse B's ref-list, > > if a > > variable C's previous equiv-head is B, update to A. And re-insert B's > > n-ary > > operations by replacing B with A. > > - When inserting and looking for the results of n-ary operations, insert and > > lookup by the operands' equiv-heads. > > ... > > > > Thanks, > > Di Zhao > > > > > > Extend FRE with temporary equivalences. > > Comments on the patch: > > + /* nary_ref count. */ > + unsigned num_nary_ref; > + > > I think a unsigned short should be enough and that would nicely > pack after value_id together with the bitfield (maybe change that > to unsigned short :1 then). Changed num_nary_ref to unsigned short and moved after value_id. > @@ -7307,17 +7839,23 @@ process_bb (rpo_elim , basic_block bb, > tree val = gimple_simplify (gimple_cond_code (last), > boolean_type_node, lhs, rhs, > NULL, vn_valueize); > + vn_nary_op_t vnresult = NULL; > /* If the condition didn't simplfy see if we have recorded >an expression from sofar taken edges. */ > if (! val || TREE_CODE (val) != INTEGER_CST) > { > - vn_nary_op_t vnresult; > > looks like you don't need vnresult outside of the if()? vnresult is reused later to record equivalences generated by PHI nodes. > +/* Find predicated value of vn_nary_op by the operands' equivalences. Return > + * NULL_TREE if no known result is found. */ > + > +static tree > +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb, > +vn_nary_op_t *vnresult) > +{ > + lookup_equiv_heads (vno->length, vno->op, vno->op, bb); > + tree result > += simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type); > > why is it necessary to simplify here? It looks like the caller > already does this. In the new patch, changed the code a little to remove redundant calculation. > I wonder whether it's valid to always perform find_predicated_value_by_equivs > from inside vn_nary_op_get_predicated_value instead of bolting it to only > a single place? Removed function find_predicated_value_by_equivs and inlined the code. Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left vn_nary_op_get_predicated_value unchanged. Instead, operands are set to equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are always inserted and searched by equiv-heads. > + > +static vn_nary_op_t > +val_equiv_insert (tree op1, tree op2, edge e) > +{ > > + if (is_gimple_min_invariant (lhs)) > +std::swap (lhs, rhs); > + if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME) > +/* Possible if invoked from record_equiv_from_previous_cond. */ > +return NULL; > > Better formulate all of the above in terms of only SSA_NAME checks since... > > + /* Make the hand-side with more recorded n-ary expressions new > + * equivalence-head, to make fewer re-insertions. */ > + if (TREE_CODE (rhs) == SSA_NAME > + && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref) > +std::swap (lhs, rhs); > > here LHS needs to be an SSA_NAME. Tried to fix this in the new patch. > + /* Record equivalence as unary NOP_EXPR. */ > + vn_nary_op_t val > += vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR,
[PATCH] PR fortran/102917 - PDT type parameters are not restricted to default integer
Dear Fortranners, Steve, I've created PR 102917 for tracking this issue and packaged the attached patch. Regtested on x86_64-pc-linux-gnu. OK mainline? Thanks, Harald > Gesendet: Freitag, 22. Oktober 2021 um 22:25 Uhr > Von: "Steve Kargl" > An: "Harald Anlauf" > Cc: fort...@gcc.gnu.org > Betreff: Re: PDT type parameters are not restricted to default integer > > On Fri, Oct 22, 2021 at 10:16:05PM +0200, Harald Anlauf wrote: > > Hi Steve, > > > > Am 22.10.21 um 21:35 schrieb Steve Kargl via Fortran: > > > Here's an obvious quick fix. Please apply. > > > > > > > > > diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c > > > index 6043e100fbb..e889bb44142 100644 > > > --- a/gcc/fortran/decl.c > > > +++ b/gcc/fortran/decl.c > > > @@ -5619,14 +5619,6 @@ match_attr_spec (void) > > > m = MATCH_ERROR; > > > goto cleanup; > > > } > > > - if (current_ts.kind != gfc_default_integer_kind) > > > - { > > > - gfc_error ("Component with LEN attribute at %C must be " > > > - "default integer kind (%d)", > > > - gfc_default_integer_kind); > > > - m = MATCH_ERROR; > > > - goto cleanup; > > > - } > > > } > > > else > > > { > > > > I think you are right. We should always have allowed any integer kind. > > > > However, have you checked whether this change introduces regressions? > > If you don't, somebody else will. Please open a PR, then. > > > > It seems that pdt_4.f03 will fail with the above patch because > it explicitly tests for this error message. That's the only > failure in the testsuite. For the record, F2003, page 48, > >R435 type-param-def-stmt is INTEGER [ kind-selector ] , ... > >Each type parameter is itself of type integer. If its kind selector >is omitted, the kind type parameter is default integer. > > Now that I think about and look, there is a nearby similar gcc_error() > for KIND. This should be removed too. > > -- > Steve > Fortran: do not restrict PDT KIND and LEN type parameters to default integer gcc/fortran/ChangeLog: PR fortran/102917 * decl.c (match_attr_spec): Remove invalid integer kind checks on KIND and LEN attributes of PDTs. gcc/testsuite/ChangeLog: PR fortran/102917 * gfortran.dg/pdt_4.f03: Adjust testcase. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 6043e100fbb..ce61e53eb7b 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -5592,14 +5592,6 @@ match_attr_spec (void) m = MATCH_ERROR; goto cleanup; } - if (current_ts.kind != gfc_default_integer_kind) - { - gfc_error ("Component with KIND attribute at %C must be " - "default integer kind (%d)", - gfc_default_integer_kind); - m = MATCH_ERROR; - goto cleanup; - } } else if (d == DECL_LEN) { @@ -5619,14 +5611,6 @@ match_attr_spec (void) m = MATCH_ERROR; goto cleanup; } - if (current_ts.kind != gfc_default_integer_kind) - { - gfc_error ("Component with LEN attribute at %C must be " - "default integer kind (%d)", - gfc_default_integer_kind); - m = MATCH_ERROR; - goto cleanup; - } } else { diff --git a/gcc/testsuite/gfortran.dg/pdt_4.f03 b/gcc/testsuite/gfortran.dg/pdt_4.f03 index c1af65a5248..37412e4ca82 100644 --- a/gcc/testsuite/gfortran.dg/pdt_4.f03 +++ b/gcc/testsuite/gfortran.dg/pdt_4.f03 @@ -28,9 +28,9 @@ end module type :: bad_pdt (a,b, c, d) ! { dg-error "does not have a component" } real, kind :: a! { dg-error "must be INTEGER" } -INTEGER(8), kind :: b ! { dg-error "be default integer kind" } +INTEGER(8), kind :: b real, LEN :: c ! { dg-error "must be INTEGER" } -INTEGER(8), LEN :: d ! { dg-error "be default integer kind" } +INTEGER(8), LEN :: d end type type :: mytype (a,b)
[committed] hppa: Revise -mdisable-fpregs option and add new -msoft-mult option
The Linux kernel on hppa is built with -mdisable-fpregs to inhibit the use of the floating-point registers. However, I noticed that the 64-bit kernel was using floating-point registers for hardware integer multiplication (xmpyu). It turned out this was because various DImode routines in libgcc (e.g., __muldi3) were built with hardware integer multiplication enabled. This turned out not to be a problem as currently the kernel saves the floating-point registers in syscalls, etc. But it was the intention that the floating-point registers not be used in kernel code. It also turned out that -mdisable-fpregs didn't disable use of the floating-point registers as documented. The -msoft-float option does that. What the kernel needs is an option to disable hardware integer multiplication. This is sufficient to avoid the use of the floating-point registers. It appears -mdisable-fpregs was originally intended to disable use of xmpyu but its operation got confused with time. The attached change has been tested on hppa2.0w-hp-hpux11.11, hppa64-hp-hpux11.11 and hppa-unknown-linux-gnu. I also checked that libgcc can be built with -msoft-mult. It currently is not configured to build successfully with -msoft-float. Committed to trunk and gcc-11 branch. Dave --- Revise -mdisable-fpregs option and add new -msoft-mult option The behavior of the -mdisable-fpregs is confusing in that it doesn't disable the use of the floating-point registers in all situations. The -msoft-float disables the use of the floating-point registers in all situations. The Linux kernel only needs to disable use of the xmpyu instruction to avoid using the floating-point registers. This change revises the -mdisable-fpregs option to disable the use of the floating-point registers in all situations. It is now equivalent to the -msoft-float option. A new -msoft-mult option is added to disable use of the xmpyu instruction. The libgcc library can be compiled with the -msoft-mult option to avoid using hardware integer multiplication. 2021-10-24 John David Anglin gcc/ChangeLog: * config/pa/pa-d.c (pa_d_handle_target_float_abi): Don't check TARGET_DISABLE_FPREGS. * config/pa/pa.c (fix_range): Use MASK_SOFT_FLOAT instead of MASK_DISABLE_FPREGS. (hppa_rtx_costs): Don't check TARGET_DISABLE_FPREGS. Adjust cost of hardware integer multiplication. (pa_conditional_register_usage): Don't check TARGET_DISABLE_FPREGS. * config/pa/pa.h (INT14_OK_STRICT): Likewise. * config/pa/pa.md: Don't check TARGET_DISABLE_FPREGS. Check TARGET_SOFT_FLOAT in patterns that use xmpyu instruction. * config/pa/pa.opt (mdisable-fpregs): Change target mask to SOFT_FLOAT. Revise comment. (msoft-float): New option. diff --git a/gcc/config/pa/pa-d.c b/gcc/config/pa/pa-d.c index 6802738e85b..14ef8cae343 100644 --- a/gcc/config/pa/pa-d.c +++ b/gcc/config/pa/pa-d.c @@ -47,7 +47,7 @@ pa_d_handle_target_float_abi (void) { const char *abi; - if (TARGET_DISABLE_FPREGS || TARGET_SOFT_FLOAT) + if (TARGET_SOFT_FLOAT) abi = "soft"; else abi = "hard"; diff --git a/gcc/config/pa/pa.c b/gcc/config/pa/pa.c index d13021ad94a..21b812e9be7 100644 --- a/gcc/config/pa/pa.c +++ b/gcc/config/pa/pa.c @@ -497,7 +497,7 @@ fix_range (const char *const_str) break; if (i > FP_REG_LAST) -target_flags |= MASK_DISABLE_FPREGS; +target_flags |= MASK_SOFT_FLOAT; } /* Implement the TARGET_OPTION_OVERRIDE hook. */ @@ -1578,14 +1578,14 @@ hppa_rtx_costs (rtx x, machine_mode mode, int outer_code, } else if (mode == DImode) { - if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT) - *total = COSTS_N_INSNS (32); + if (TARGET_PA_11 && !TARGET_SOFT_FLOAT && !TARGET_SOFT_MULT) + *total = COSTS_N_INSNS (25); else *total = COSTS_N_INSNS (80); } else { - if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT) + if (TARGET_PA_11 && !TARGET_SOFT_FLOAT && !TARGET_SOFT_MULT) *total = COSTS_N_INSNS (8); else *total = COSTS_N_INSNS (20); @@ -10627,7 +10627,7 @@ pa_conditional_register_usage (void) for (i = 33; i < 56; i += 2) fixed_regs[i] = call_used_regs[i] = 1; } - if (TARGET_DISABLE_FPREGS || TARGET_SOFT_FLOAT) + if (TARGET_SOFT_FLOAT) { for (i = FP_REG_FIRST; i <= FP_REG_LAST; i++) fixed_regs[i] = call_used_regs[i] = 1; diff --git a/gcc/config/pa/pa.h b/gcc/config/pa/pa.h index fbb96045a51..7a313d617b0 100644 --- a/gcc/config/pa/pa.h +++ b/gcc/config/pa/pa.h @@ -833,7 +833,6 @@ extern int may_call_alloca; #define INT14_OK_STRICT \ (TARGET_SOFT_FLOAT \ - || TARGET_DISABLE_FPREGS\ || (TARGET_PA_20 && !TARGET_ELF32)) /* The macros
Re: [PATCH] Try to resolve paths in threader without looking further back.
On 10/24/21 6:57 PM, Jeff Law wrote: Ugwe could put the test back, check for some random large number, and come up with a more satisfactory test later? ;-) I thought our "counting" based tests could only check equality (ie, expect to see this string precisely N times). Though if we could check that # threads realized was > some low water mark, that'd probably be better than what we've got right now. Andrew actually had a patch for a dejagnu construct doing just that (scan-tree-dump-minimum), but I just noticed it didn't work quite right for this test. This is a bit embarrassing, but upon further analysis I've just noticed that the number of threadable candidates has been exploding over the year, but the ones that actually make it past the block copier restrictions plus rewire_first_differing_edge, etc, only changed by 1 with this patch. So perhaps we don't need to bend over backward (just yet anyhow). I can leave the simple gimple FE test since I've already coded it. Up to you. How does this look? Aldy >From e268fc18d8773269c4493949d14b6dbf39112b03 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Wed, 20 Oct 2021 07:29:25 +0200 Subject: [PATCH] Try to resolve paths in threader without looking further back. Sometimes we can solve a candidate path without having to recurse further back. This can mostly happen in fully resolving mode, because we can ask the ranger what the range on entry to the path is, but there's no reason this can't always apply. This one-liner removes the fully-resolving restriction. I'm tickled pink to see how many things we now get quite early in the compilation. I actually had to disable jump threading entirely for a few tests because the early threader was catching things disturbingly early. Also, as Richi predicted, I saw a lot of pre-VRP cleanups happening. I was going to commit this as obvious, but I think the test changes merit discussion. We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite some time. Every time a threading pass gets smarter, we push the check further down the pipeline. We've officially run out of dumb threading passes to disable ;-). In the last year we've gone up from a handful of threads, to 34 threads with the current combination of options. I doubt this is testing anything useful anymore, so I've removed it. Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c. We used to thread 3 jump threads, but they were disallowed because of loop rotation. Then we started catching more jump threads in VRP2 threading so we tested there. With this patch though, we triple the number of threads found from 11 to 31. I believe this test has outlived its usefulness, and I've removed it. Note that even though we have these outrageous possibilities for this test, the block copier ultimately chops them down (23 survive though). Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadbackward.c (back_threader::find_paths_to_names): Always try to resolve path without looking back. * tree-ssa-threadupdate.c (dump_jump_thread): Indidicate whether edge is a back edge. gcc/testsuite/ChangeLog: * gcc.dg/graphite/scop-dsyr2k-2.c: Adjust for jump threading changes. * gcc.dg/graphite/scop-dsyr2k.c: Same. * gcc.dg/graphite/scop-dsyrk-2.c: Same. * gcc.dg/graphite/scop-dsyrk.c: Same. * gcc.dg/tree-ssa/pr20701.c: Same. * gcc.dg/tree-ssa/pr20702.c: Same. * gcc.dg/tree-ssa/pr21086.c: Same. * gcc.dg/tree-ssa/pr25382.c: Same. * gcc.dg/tree-ssa/pr58480.c: Same. * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same. * gcc.dg/tree-ssa/vrp08.c: Same. * gcc.dg/tree-ssa/vrp55.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Removed. * gcc.dg/tree-ssa/ssa-thread-11.c: Removed. * gcc.dg/uninit-pr89230-1.c: xfail. --- gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c | 1 + gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c | 1 + gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c | 1 + gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c| 1 + gcc/testsuite/gcc.dg/tree-ssa/pr20701.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr20702.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr21086.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr25382.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr58480.c | 2 +- .../gcc.dg/tree-ssa/ssa-dom-thread-4.c| 60 --- .../gcc.dg/tree-ssa/ssa-dom-thread-7.c| 2 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c | 50 .../gcc.dg/tree-ssa/ssa-thread-backedge.c | 32 ++ .../gcc.dg/tree-ssa/ssa-vrp-thread-1.c| 4 +- gcc/testsuite/gcc.dg/tree-ssa/vrp08.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/vrp55.c | 6 +- gcc/testsuite/gcc.dg/uninit-pr89230-1.c | 3 +- gcc/tree-ssa-threadbackward.c | 4 +- gcc/tree-ssa-threadupdate.c | 3 + 19 files changed, 55 insertions(+), 125 deletions(-) delete mode 100644
Re: [PATCH] Try to resolve paths in threader without looking further back.
On October 24, 2021 6:57:05 PM GMT+02:00, Jeff Law via Gcc-patches wrote: > > >On 10/21/2021 9:53 PM, Aldy Hernandez wrote: >> >> >> > >> > Phew, I think we're finally converging on a useful set of >> threading tests :). >> > >> > OK for trunk? >> Mostly, I just worry about losing the key test for the FSM >> optimization. >> >> >> With the provided test, the forward threaders can't thread through the >> backedge and into the switch. Disabling the other threaders was just a >> precaution. I just wanted to make sure it happened late because of the >> loop restrictions we have in place. I could enable the forward >> threaders to prove they can't get it. >Right. There was a time when the forward threaders handled the >backedge, but it's much better handled by the backwards threader. > >> I could add more cases and check that we have N or more threads >> through the back edges. .and if it makes you feel safer, we could even >> convert the test to gimple and test the specific thread sequence. It's >> just that the gimple FE test is bound to get large and difficult to >> decipher if I start adding many switch cases. >I would love if we could turn the testcase into a gimple based test. I >just shudder at the thought of trying to pull that together. And yes, >it's awful hard to decipher, both in terms of test behavior and in terms >of what the key jump threads are. > >> >> I'm just trying to avoid a huge test with 40 potential threads where >> no one really knows how many we should getas every threading pass >> opens up possibilities for other passes. >Understood. To some degree it's inherent in the problem. The smarter >our threaders get the more likely they are to discover new >opportunities, so there's clearly a maintenance burden to these tests >over time. It's made worse by the interactions with BRANCH_COST as well >as the heuristics for switch conversion. > >Gimple based tests would significantly help the the latter issues, but I >don't know how to tackle the problem of exposing more jump threads as >our threaders get better. Well, you'd feed the specific GIMPLE to a single threading pass and check its dump file. With GIMPLE based tests you can nearly do unit testing... >> >> Ugwe could put the test back, check for some random large >> number, and come up with a more satisfactory test later? ;-) Maybe we can dump the source location of conditions we thread through when dumping the threading pass. >I thought our "counting" based tests could only check equality (ie, >expect to see this string precisely N times). Though if we could check >that # threads realized was > some low water mark, that'd probably be >better than what we've got right now. > > >jeff
[committed] hppa: Don't use 'G' constraint in integer move patterns
The 'G' constraint only matches a float zero, so it will never match in integer move patterns. Tested on hppa-unknown-linux-gnu. Committed to active branches. Dave --- Don't use 'G' constraint in integer move patterns The 'G' constraint only matches a float zero. 2021-10-24 John David Anglin gcc/ChangeLog: * config/pa/pa.md: Don't use 'G' constraint in integer move patterns. diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md index 5cda3b79933..c1864524b38 100644 --- a/gcc/config/pa/pa.md +++ b/gcc/config/pa/pa.md @@ -2186,14 +2186,14 @@ [(set (match_operand:SI 0 "move_dest_operand" "=r,r,r,r,r,r,Q,!*q,!r,!*f,*f,T,?r,?*f") (match_operand:SI 1 "move_src_operand" - "A,rG,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f,*f,r"))] + "A,r,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f,*f,r"))] "(register_operand (operands[0], SImode) || reg_or_0_operand (operands[1], SImode)) && !TARGET_SOFT_FLOAT && !TARGET_64BIT" "@ ldw RT'%A1,%0 - copy %r1,%0 + copy %1,%0 ldi %1,%0 ldil L'%1,%0 {zdepi|depwi,z} %Z1,%0 @@ -2214,14 +2214,14 @@ [(set (match_operand:SI 0 "move_dest_operand" "=r,r,r,r,r,r,Q,!*q,!r,!*f,*f,T") (match_operand:SI 1 "move_src_operand" - "A,rG,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f"))] + "A,r,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f"))] "(register_operand (operands[0], SImode) || reg_or_0_operand (operands[1], SImode)) && !TARGET_SOFT_FLOAT && TARGET_64BIT" "@ ldw RT'%A1,%0 - copy %r1,%0 + copy %1,%0 ldi %1,%0 ldil L'%1,%0 {zdepi|depwi,z} %Z1,%0 @@ -2240,14 +2240,14 @@ [(set (match_operand:SI 0 "move_dest_operand" "=r,r,r,r,r,r,Q,!*q,!r") (match_operand:SI 1 "move_src_operand" - "A,rG,J,N,K,RQ,rM,!rM,!*q"))] + "A,r,J,N,K,RQ,rM,!rM,!*q"))] "(register_operand (operands[0], SImode) || reg_or_0_operand (operands[1], SImode)) && TARGET_SOFT_FLOAT && TARGET_64BIT" "@ ldw RT'%A1,%0 - copy %r1,%0 + copy %1,%0 ldi %1,%0 ldil L'%1,%0 {zdepi|depwi,z} %Z1,%0 @@ -2381,13 +2381,13 @@ [(set (match_operand:SI 0 "move_dest_operand" "=r,r,r,r,r,r,Q,!*q,!r") (match_operand:SI 1 "move_src_operand" - "A,rG,J,N,K,RQ,rM,!rM,!*q"))] + "A,r,J,N,K,RQ,rM,!rM,!*q"))] "(register_operand (operands[0], SImode) || reg_or_0_operand (operands[1], SImode)) && TARGET_SOFT_FLOAT" "@ ldw RT'%A1,%0 - copy %r1,%0 + copy %1,%0 ldi %1,%0 ldil L'%1,%0 {zdepi|depwi,z} %Z1,%0 @@ -2909,11 +2909,11 @@ [(set (match_operand:HI 0 "move_dest_operand" "=r,r,r,r,r,Q,!*q,!r") (match_operand:HI 1 "move_src_operand" - "rG,J,N,K,RQ,rM,!rM,!*q"))] + "r,J,N,K,RQ,rM,!rM,!*q"))] "(register_operand (operands[0], HImode) || reg_or_0_operand (operands[1], HImode))" "@ - copy %r1,%0 + copy %1,%0 ldi %1,%0 ldil L'%1,%0 {zdepi|depwi,z} %Z1,%0 @@ -3069,11 +3069,11 @@ [(set (match_operand:QI 0 "move_dest_operand" "=r,r,r,r,r,Q,!*q,!r") (match_operand:QI 1 "move_src_operand" - "rG,J,N,K,RQ,rM,!rM,!*q"))] + "r,J,N,K,RQ,rM,!rM,!*q"))] "(register_operand (operands[0], QImode) || reg_or_0_operand (operands[1], QImode))" "@ - copy %r1,%0 + copy %1,%0 ldi %1,%0 ldil L'%1,%0 {zdepi|depwi,z} %Z1,%0 @@ -4221,13 +4221,13 @@ [(set (match_operand:DI 0 "move_dest_operand" "=r,r,r,r,r,r,Q,!*q,!r,!*f,*f,T") (match_operand:DI 1 "move_src_operand" - "A,rG,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f"))] + "A,r,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f"))] "(register_operand (operands[0], DImode) || reg_or_0_operand (operands[1], DImode)) && !TARGET_SOFT_FLOAT && TARGET_64BIT" "@ ldd RT'%A1,%0 - copy %r1,%0 + copy %1,%0 ldi %1,%0 ldil L'%1,%0 depdi,z %z1,%0 @@ -4246,13 +4246,13 @@ [(set (match_operand:DI 0 "move_dest_operand" "=r,r,r,r,r,r,Q,!*q,!r") (match_operand:DI 1 "move_src_operand" - "A,rG,J,N,K,RQ,rM,!rM,!*q"))] + "A,r,J,N,K,RQ,rM,!rM,!*q"))] "(register_operand (operands[0], DImode) || reg_or_0_operand (operands[1], DImode)) && TARGET_SOFT_FLOAT && TARGET_64BIT" "@ ldd RT'%A1,%0 - copy %r1,%0 + copy %1,%0 ldi %1,%0 ldil L'%1,%0 depdi,z %z1,%0
Re: [PATCH] Try to resolve paths in threader without looking further back.
On Sun, 24 Oct 2021 10:57:05 -0600 Jeff Law via Gcc-patches wrote: > I thought our "counting" based tests could only check equality (ie, > expect to see this string precisely N times). Though if we could check > that # threads realized was > some low water mark, that'd probably be > better than what we've got right now. That's what i was about to suggest yesterday but didn't dare, with a reference to testsuite/lib/scanasm.exp # proc object-size for the cmp.
Re: (!HELP NEEDED) Where is the doc for the format strings in gcc (for example, %q+D, ...)
On Wed, Oct 20, 2021 at 10:57 AM Marek Polacek via Gcc-patches wrote: > > On Wed, Oct 20, 2021 at 03:49:09PM +, Qing Zhao via Gcc-patches wrote: > > Hi, > > > > In GCC, there are many utility routines for reporting error, warning, or > > information, for example: > > > > warning (0, "weak declaration of %q+D not supported", decl); > > warning_at (stmtloc, OPT_Wmaybe_uninitialized, "%qE may be used > > uninitialized", ptr)); > > inform (loc, "in a call to %qT declared with " "attribute %<%s%>", fntype, > > access_str); > > error ("%qD is unavailable: %s", node, (const char *) msg); > > > > There are format-strings inside them, “%q+D”, “%qE”, “%qT”, “%qD”, etc, > > where can I find a doc for the details of > > These format-strings? Or which source files I should read to understand the > > details? > > You can take a look at cp/error.c: > > /* Called from output_format -- during diagnostic message processing -- >to handle C++ specific format specifier with the following meanings: >%A function argument-list. >%C tree code. >%D declaration. >%E expression. >%F function declaration. >%H type difference (from). >%I type difference (to). >%L language as used in extern "lang". >%O binary operator. >%P function parameter whose position is indicated by an integer. >%Q assignment operator. >%S substitution (template + args) >%T type. >%V cv-qualifier. >%X exception-specification. */ > static bool > cp_printer (pretty_printer *pp, text_info *text, const char *spec, > > or c/c-objc-common.c: > > /* Called during diagnostic message formatting process to print a >source-level entity onto BUFFER. The meaning of the format specifiers >is as follows: >%D: a general decl, >%E: an identifier or expression, >%F: a function declaration, >%T: a type. >%V: a list of type qualifiers from a tree. >%v: an explicit list of type qualifiers >%#v: an explicit list of type qualifiers of a function type. > >Please notice when called, the `%' part was already skipped by the >diagnostic machinery. */ > static bool > c_tree_printer (pretty_printer *pp, text_info *text, const char *spec, > > Marek > Note that this is bug 92435: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92435
Re: [PATCH] Try to resolve paths in threader without looking further back.
On 10/21/2021 9:53 PM, Aldy Hernandez wrote: > > Phew, I think we're finally converging on a useful set of threading tests :). > > OK for trunk? Mostly, I just worry about losing the key test for the FSM optimization. With the provided test, the forward threaders can't thread through the backedge and into the switch. Disabling the other threaders was just a precaution. I just wanted to make sure it happened late because of the loop restrictions we have in place. I could enable the forward threaders to prove they can't get it. Right. There was a time when the forward threaders handled the backedge, but it's much better handled by the backwards threader. I could add more cases and check that we have N or more threads through the back edges. .and if it makes you feel safer, we could even convert the test to gimple and test the specific thread sequence. It's just that the gimple FE test is bound to get large and difficult to decipher if I start adding many switch cases. I would love if we could turn the testcase into a gimple based test. I just shudder at the thought of trying to pull that together. And yes, it's awful hard to decipher, both in terms of test behavior and in terms of what the key jump threads are. I'm just trying to avoid a huge test with 40 potential threads where no one really knows how many we should getas every threading pass opens up possibilities for other passes. Understood. To some degree it's inherent in the problem. The smarter our threaders get the more likely they are to discover new opportunities, so there's clearly a maintenance burden to these tests over time. It's made worse by the interactions with BRANCH_COST as well as the heuristics for switch conversion. Gimple based tests would significantly help the the latter issues, but I don't know how to tackle the problem of exposing more jump threads as our threaders get better. Ugwe could put the test back, check for some random large number, and come up with a more satisfactory test later? ;-) I thought our "counting" based tests could only check equality (ie, expect to see this string precisely N times). Though if we could check that # threads realized was > some low water mark, that'd probably be better than what we've got right now. jeff
[PATCH] x86_64: Implement V1TI mode shifts/rotates by a constant
This patch provides RTL expanders to implement logical shifts and rotates of 128-bit values (stored in vector integer registers) by constant bit counts. Previously, GCC would transfer these values to a pair of scalar registers (TImode) via memory to perform the operation, then transfer the result back via memory. Instead these operations are now expanded using (between 1 and 5) SSE2 vector instructions. Logical shifts by multiples of 8 can be implemented using x86_64's pslldq/psrldq instruction: ashl_8: pslldq $1, %xmm0 ret lshr_32: psrldq $4, %xmm0 ret Logical shifts by greater than 64 can use pslldq/psrldq $8, followed by a psllq/psrlq for the remaining bits: ashl_111: pslldq $8, %xmm0 psllq $47, %xmm0 ret lshr_127: psrldq $8, %xmm0 psrlq $63, %xmm0 ret The remaining logical shifts make use of the following idiom: ashl_1: movdqa %xmm0, %xmm1 psllq $1, %xmm0 pslldq $8, %xmm1 psrlq $63, %xmm1 por %xmm1, %xmm0 ret lshr_15: movdqa %xmm0, %xmm1 psrlq $15, %xmm0 psrldq $8, %xmm1 psllq $49, %xmm1 por %xmm1, %xmm0 ret Rotates by multiples of 32 can use x86_64's pshufd: rotr_32: pshufd $57, %xmm0, %xmm0 ret rotr_64: pshufd $78, %xmm0, %xmm0 ret rotr_96: pshufd $147, %xmm0, %xmm0 ret Rotates by multiples of 8 (other than multiples of 32) can make use of both pslldq and psrldq, followed by por: rotr_8: movdqa %xmm0, %xmm1 psrldq $1, %xmm0 pslldq $15, %xmm1 por %xmm1, %xmm0 ret rotr_112: movdqa %xmm0, %xmm1 psrldq $14, %xmm0 pslldq $2, %xmm1 por %xmm1, %xmm0 ret And the remaining rotates use one or two pshufd, followed by a psrld/pslld/por sequence: rotr_1: movdqa %xmm0, %xmm1 pshufd $57, %xmm0, %xmm0 psrld $1, %xmm1 pslld $31, %xmm0 por %xmm1, %xmm0 ret rotr_63: pshufd $78, %xmm0, %xmm1 pshufd $57, %xmm0, %xmm0 pslld $1, %xmm1 psrld $31, %xmm0 por %xmm1, %xmm0 ret rotr_111: pshufd $147, %xmm0, %xmm1 pslld $17, %xmm0 psrld $15, %xmm1 por %xmm1, %xmm0 ret The new test case, sse2-v1ti-shift.c, is a run-time check to confirm that the results of V1TImode shifts/rotates by constants, exactly match the expected results of TImode operations, for various input test vectors. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2021-10-24 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.c (ix86_expand_v1ti_shift): New helper function to expand V1TI mode logical shifts by integer constants. (ix86_expand_v1ti_rotate): New helper function to expand V1TI mode rotations by integer constants. * config/i386/i386-protos.h (ix86_expand_v1ti_shift, ix86_expand_v1ti_rotate): Prototype new functions here. * config/i386/sse.md (ashlv1ti3, lshrv1ti3, rotlv1ti3, rotrv1ti3): New TARGET_SSE2 expanders to implement V1TI shifts and rotations. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-shift.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 56dd99b..4c3800e 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -6157,6 +6157,169 @@ ix86_split_lshr (rtx *operands, rtx scratch, machine_mode mode) } } +/* Expand V1TI mode shift (of rtx_code CODE) by constant. */ +void ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[]) +{ + HOST_WIDE_INT bits = INTVAL (operands[2]) & 127; + rtx op1 = force_reg (V1TImode, operands[1]); + + if (bits == 0) +{ + emit_move_insn (operands[0], op1); + return; +} + + if ((bits & 7) == 0) +{ + rtx tmp = gen_reg_rtx (V1TImode); + if (code == ASHIFT) +emit_insn (gen_sse2_ashlv1ti3 (tmp, op1, GEN_INT (bits))); + else + emit_insn (gen_sse2_lshrv1ti3 (tmp, op1, GEN_INT (bits))); + emit_move_insn (operands[0], tmp); + return; +} + + rtx tmp1 = gen_reg_rtx (V1TImode); + if (code == ASHIFT) +emit_insn (gen_sse2_ashlv1ti3 (tmp1, op1, GEN_INT (64))); + else +emit_insn (gen_sse2_lshrv1ti3 (tmp1, op1, GEN_INT (64))); + + /* tmp2 is operands[1] shifted by 64, in V2DImode. */ + rtx tmp2 = gen_reg_rtx (V2DImode); + emit_move_insn (tmp2, gen_lowpart (V2DImode, tmp1)); + + /* tmp3 will be the V2DImode result. */ + rtx tmp3 = gen_reg_rtx (V2DImode); + + if (bits > 64) +{ + if (code == ASHIFT) + emit_insn (gen_ashlv2di3 (tmp3, tmp2, GEN_INT (bits - 64))); + else + emit_insn (gen_lshrv2di3 (tmp3, tmp2, GEN_INT (bits - 64))); +} +
[Committed] Correct testcase gcc.target/bfin/20090914-3.c
This patch cures the testsuite failure of bfin/20090914-3.c, which currently FAILs on bfin-elf with "(test for excess errors)" due to: 20090914-3.c:3:1: warning: return type defaults to 'int' [-Wimplicit-int] which is obviously not what this code was intended to test. Fixed by turning the code into a function returning the final "fract32" result, as simply specifying an "int" return type for main, results in the entire function being optimized away, as the result is unused. Checked-in as obvious. 2021-10-24 Roger Sayle gcc/testsuite/ChangeLog * gcc.target/bfin/20090914-3.c: Tweak test case. Roger -- diff --git a/gcc/testsuite/gcc.target/bfin/20090914-3.c b/gcc/testsuite/gcc.target/bfin/20090914-3.c index fb0a9e1..6be5528 100644 --- a/gcc/testsuite/gcc.target/bfin/20090914-3.c +++ b/gcc/testsuite/gcc.target/bfin/20090914-3.c @@ -1,10 +1,11 @@ /* { dg-do compile { target bfin-*-* } } */ typedef long fract32; -main() { +fract32 foo() { fract32 val_tmp; fract32 val1 = 0x7FFF; fract32 val2 = 0x4000; val_tmp = __builtin_bfin_mult_fr1x32x32 (0x0667, val1); val2 = __builtin_bfin_mult_fr1x32x32 (0x7999, val2); val2 = __builtin_bfin_add_fr1x32 (val_tmp, val2); + return val2; }
[committed] doc: No longer generate old.html
Jonathan pointed this out to me while remove a link from the installation documentation to the no longer existing old.html page. At first I was puzzled, but a bit of debugging made me realize where the (now) empty old.html page still was coming from. Fixed thusly, and I'll add some code to detect such situations should they ever occur in the future. Gerald commit 4bd4138141330030b18960c204ebc1787cdaddf3 Author: Gerald Pfeifer Date: Sun Oct 24 11:48:29 2021 +0200 doc: No longer generate old.html Commit 431d26e1dd18c1146d3d4dcd3b45a3b04f7f7d59 removed doc/install-old.texi, alas we still tried to generate the associated web page old.html - which then turned out empty. Simplify remove this from the list of pages to be generated. gcc: * doc/install.texi2html: Do not generate old.html any longer. diff --git a/gcc/doc/install.texi2html b/gcc/doc/install.texi2html index 09bbbc425cd..001a869d0ea 100755 --- a/gcc/doc/install.texi2html +++ b/gcc/doc/install.texi2html @@ -46,9 +46,9 @@ fi echo "@set srcdir $SOURCEDIR/.." ) > $DESTDIR/gcc-vers.texi -for x in index.html specific.html prerequisites.html download.html configure.html \ - build.html test.html finalinstall.html binaries.html old.html \ - gfdl.html +for x in index.html specific.html prerequisites.html download.html \ + configure.html build.html test.html finalinstall.html \ + binaries.html gfdl.html do define=`echo $x | sed -e 's/\.//g'` echo "define = $define"
[committed] doc: Remove details around Itanium on GNU/Linux and Windows
While debugging an issue Jonathan reported I noticed we still have those references to way old versions of GNU/Linux and Windows from the early days of Itanium, which really do not add value - now gone they are. Gerald gcc: * doc/install.texi (Specific): Remove obsolete details around GNU/Linux on Itanium. (Specific): Remove reference to Windows for Itanium. --- gcc/doc/install.texi | 11 --- 1 file changed, 11 deletions(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 38f96bf5a89..36c8280d7da 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -4198,15 +4198,6 @@ If you are using the installed system libunwind library with @option{--with-system-libunwind}, then you must use libunwind 0.98 or later. -None of the following versions of GCC has an ABI that is compatible -with any of the other versions in this list, with the exception that -Red Hat 2.96 and Trillian 000171 are compatible with each other: -3.1, 3.0.2, 3.0.1, 3.0, Red Hat 2.96, and Trillian 000717. -This primarily affects C++ programs and programs that create shared libraries. -GCC 3.1 or later is recommended for compiling linux, the kernel. -As of version 3.1 GCC is believed to be fully ABI compliant, and hence no -more major ABI changes are expected. - @html @end html @@ -5083,8 +5074,6 @@ GCC contains support for x86-64 using the mingw-w64 runtime library, available from @uref{https://mingw-w64.org/doku.php}. This library should be used with the target triple x86_64-pc-mingw32. -Presently Windows for Itanium is not supported. - @subheading Windows CE Windows CE is supported as a target only on Hitachi SuperH (sh-wince-pe), and MIPS (mips-wince-pe). -- 2.33.0
[PATCH] Fix PR 102908: wrongly removing null pointer loads
From: Andrew Pinski Just like PR 100382, here we have a DCE removing a null pointer load which is needed still. In this case, execute_fixup_cfg removes a store (correctly) and then removes the null load (incorrectly) due to not checking stmt_unremovable_because_of_non_call_eh_p. This patch adds the check in the similar way as the patch to fix PR 100382 did. gcc/ChangeLog: * tree-ssa-dce.c (simple_dce_from_worklist): Check stmt_unremovable_because_of_non_call_eh_p also before removing the statement. --- gcc/tree-ssa-dce.c | 5 + 1 file changed, 5 insertions(+) diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index 372e0691ae6..1281e67489c 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -1828,6 +1828,11 @@ simple_dce_from_worklist (bitmap worklist) if (gimple_has_side_effects (t)) continue; + /* Don't remove statements that are needed for non-call +eh to work. */ + if (stmt_unremovable_because_of_non_call_eh_p (cfun, t)) + continue; + /* Add uses to the worklist. */ ssa_op_iter iter; use_operand_p use_p; -- 2.17.1
Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS
Hi Richard, On Sun, 2021-10-24 08:36:36 +0200, Richard Biener wrote: > On October 23, 2021 10:00:05 PM GMT+02:00, Jan-Benedict Glaw > wrote: > >On Tue, 2021-09-21 16:25:19 +0200, Richard Biener via Gcc-patches > > wrote: > >> I have built all targets from contrib/config-list.mk to make sure we > >> don't run into the #error and the following makes the STABS usage > >> explicit for pdp11 and hppa with SOM. > > > >I'm running build tests based on config-list.mk as well and see a good > >number of targets failing, all about the same, ie. for moxie-elf: > > That's odd. I did test the patch using config-list.mk - the patch > sat in the comit tree for quite a while since that exercise (but > unchanged), but I doubt anything significant changed in between. > > >[all 2021-10-17 00:01:19] /usr/lib/gcc-snapshot/bin/g++ -fno-PIE -c > >-DIN_GCC_FRONTEND -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE > >-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall > >-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag > >-Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long > >-Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common > >-DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. > >-I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include > >-I../../gcc/gcc/../libcody -I../../gcc/gcc/../libdecnumber > >-I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber > >-I../../gcc/gcc/../libbacktrace -o default-d.o -MT default-d.o -MMD -MP > >-MF ./.deps/default-d.TPo ../../gcc/gcc/config/default-d.c > >[all 2021-10-17 00:01:19] In file included from ./tm_d.h:9, > >[all 2021-10-17 00:01:19] from > >../../gcc/gcc/config/default-d.c:22: > >[all 2021-10-17 00:01:19] ../../gcc/gcc/defaults.h:908:2: error: #error You > >must define PREFERRED_DEBUGGING_TYPE if DWARF is not supported > > Is that building the D frontend? I remember restricting the builds to C... Probably. I configure as .../gcc/configure --target=moxie-elf --enable-werror-always --enable-languages=all --disable-gcov --disable-shared --disable-threads --without-headers --prefix=/var/lib/laminar/run/gcc-moxie-elf/13/toolchain-install MfG, JBG -- signature.asc Description: PGP signature
Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS
On October 23, 2021 10:00:05 PM GMT+02:00, Jan-Benedict Glaw wrote: >Hi Richard, > >On Tue, 2021-09-21 16:25:19 +0200, Richard Biener via Gcc-patches > wrote: >> I have built all targets from contrib/config-list.mk to make sure we >> don't run into the #error and the following makes the STABS usage >> explicit for pdp11 and hppa with SOM. > >I'm running build tests based on config-list.mk as well and see a good >number of targets failing, all about the same, ie. for moxie-elf: That's odd. I did test the patch using config-list.mk - the patch sat in the comit tree for quite a while since that exercise (but unchanged), but I doubt anything significant changed in between. >[all 2021-10-17 00:01:19] /usr/lib/gcc-snapshot/bin/g++ -fno-PIE -c >-DIN_GCC_FRONTEND -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE >-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing >-Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute >-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros >-Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. >-I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include >-I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody >-I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd >-I../libdecnumber -I../../gcc/gcc/../libbacktrace -o default-d.o -MT >default-d.o -MMD -MP -MF ./.deps/default-d.TPo ../../gcc/gcc/config/default-d.c >[all 2021-10-17 00:01:19] In file included from ./tm_d.h:9, >[all 2021-10-17 00:01:19] from >../../gcc/gcc/config/default-d.c:22: >[all 2021-10-17 00:01:19] ../../gcc/gcc/defaults.h:908:2: error: #error You >must define PREFERRED_DEBUGGING_TYPE if DWARF is not supported Is that building the D frontend? I remember restricting the builds to C... I will check what's up with this next week. >[all 2021-10-17 00:01:19] 908 | #error You must define >PREFERRED_DEBUGGING_TYPE if DWARF is not supported >[all 2021-10-17 00:01:19] | ^ >[all 2021-10-17 00:01:20] make[1]: *** [Makefile:2330: default-d.o] Error 1 >[all 2021-10-17 00:01:21] make[1]: Leaving directory >'/var/lib/laminar/run/gcc-moxie-elf/13/toolchain-build/gcc' >[all 2021-10-17 00:01:21] make: *** [Makefile:4423: all-gcc] Error 2 > >Shall I try to ping all the maintainers? > >MfG, JBG >