Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1
On 2021/8/24 16:20, Richard Biener wrote: > On Tue, 24 Aug 2021, Xionghu Luo wrote: > >> >> >> On 2021/8/19 20:11, Richard Biener wrote: - class loop *inn_loop = loop; if (ALWAYS_EXECUTED_IN (loop->header) == NULL) { @@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) to disprove this if possible). */ if (bb->flags & BB_IRREDUCIBLE_LOOP) break; - -if (!flow_bb_inside_loop_p (inn_loop, bb)) - break; - -if (bb->loop_father->header == bb) - { -if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) - break; - -/* In a loop that is always entered we may proceed anyway. - But record that we entered it and stop once we leave it. */ -inn_loop = bb->loop_father; - } } while (1) >>> I'm not sure this will work correct (I'm not sure how the existing >>> code makes it so either...). That said, I can't poke any hole >>> into the change. What I see is that definitely >>> >>> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) >>> last = bb; >>> >>> if (bitmap_bit_p (contains_call, bb->index)) >>> break; >>> >>> doesn't work reliably since the DOM ordering will process blocks >>> A B and C in random order for >>> >>> for (;;) >>> { >>> if (cond) >>> { >>> A: foo (); >>> } >>> else B:; >>> C:; >>> } >>> >>> and thus we can end up setting 'last' to C_before_ processing >>> 'A' and thus arriving at the call foo () ... >>> >>> get_loop_body_in_dom_order does some "special sauce" but not >>> to address the above problem - but it might be that a subtle >>> issue like the above is the reason for the inner loop handling. >>> The inner loop block order does_not_ adhere to this "special sauce", >>> that is - the "Additionally, if a basic block s dominates >>> the latch, then only blocks dominated by s are be after it." >>> guarantee holds for the outer loop latch, not for the inner. >>> >>> Digging into the history of fill_always_executed_in_1 doesn't >>> reveal anything - the inner loop handling has been present >>> since introduction by Zdenek - but usually Zdenek has a reason >>> for doing things as he does;) >> >> Yes, this is really complicated usage, thanks for point it out. :) >> I constructed two cases to verify this with inner loop includes "If A; else >> B; C". >> Finding that fill_sons_in_loop in get_loop_body_in_dom_order will also checks >> whether the bb domintes outer loop’s latch, if C dominate outer loop’s latch, >> C is postponed, the access order is ABC, 'last' won’t be set to C if A or B >> contains call; > > But it depends on the order of visiting ABC and that's hard to put into > a testcase since it depends on the order of edges and the processing > of the dominance computation. ABC are simply unordered with respect > to a dominator walk. > >> Otherwise if C doesn’t dominate outer loop’s latch in fill_sons_in_loop, >> the access order is CAB, but 'last' also won’t be updated to C in >> fill_always_executed_in_1 >> since there is also dominate check, then if A or B contains call, it could >> break >> successfully. >> >> C won't be set to ALWAYS EXECUTED for both circumstance. >> >>> >>> Note it might be simply a measure against quadratic complexity, >>> esp. since with your patch we also dive into not always executed >>> subloops as you remove the >>> >>> if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) >>> break; >>> >>> check. I suggest to evaluate behavior of the patch on a testcase >>> like >>> >>> void foo (int n, int **k) >>> { >>> for (int i = 0; i < n; ++i) >>> if (k[0][i]) >>> for (int j = 0; j < n; ++j) >>> if (k[1][j]) >>> for (int l = 0; l < n; ++l) >>> if (k[2][l]) >>> ... >>> } >> >> Theoretically the complexity is changing from L1(bbs) to >> L1(bbs)+L2(bbs)+L3(bbs)+…+Ln(bbs), >> so fill_always_executed_in_1's execution time is supposed to be increase from >> O(n) to O(n2)? The time should depend on loop depth and bb counts. I also >> drafted a >> test case has 73-depth loop function with 25 no-ipa function copies each >> compiled >> in lim2 and lim4 dependently. Total execution time of >> fill_always_executed_in_1 is >> increased from 32ms to 58ms, almost doubled but not quadratic? > > It's more like n + (n-1) + (n-2) + ... + 1 which is n^2/2 but that's still > O(n^2). > >> It seems reasonable to see compiling time getting longer since most bbs are >> checked >> more but a MUST to ensure early break correctly in every loop level... >> Though loop nodes could be huge, loop depth will never be so large in actual >> code? > >
Re: [PATCH] Fold more shuffle builtins to VEC_PERM_EXPR.
On Thu, Aug 26, 2021 at 12:57 PM liuhongt wrote: > > This patch is a follow-up to [1], it fold all shufps/shufpd builtins into > gimple. Of course for non-mask or mask all-ones version. > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html > > gcc/ > PR target/98167 > PR target/43147 > * config/i386/i386.c (ix86_gimple_fold_builtin): Fold > IX86_BUILTIN_SHUFPD512, IX86_BUILTIN_SHUFPS512, > IX86_BUILTIN_SHUFPD256, IX86_BUILTIN_SHUFPS, > IX86_BUILTIN_SHUFPS256. > (ix86_masked_all_ones): New function. > > gcc/testsuite/ > * gcc.target/i386/avx512f-vshufpd-1.c: Adjust testcase. > * gcc.target/i386/avx512f-vshufps-1.c: Adjust testcase. > * gcc.target/i386/pr43147.c: New test. > --- > gcc/config/i386/i386.c| 90 ++- > .../gcc.target/i386/avx512f-vshufpd-1.c | 3 +- > .../gcc.target/i386/avx512f-vshufps-1.c | 3 +- > gcc/testsuite/gcc.target/i386/pr43147.c | 15 > 4 files changed, 87 insertions(+), 24 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr43147.c > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index ebec8668758..f3eed9f2426 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -17541,6 +17541,20 @@ ix86_vector_shift_count (tree arg1) >return NULL_TREE; > } > > +/* Return true if arg_mask is all ones, arg_vec is corresponding vector. */ > +static bool > +ix86_masked_all_ones (unsigned HOST_WIDE_INT elems, tree arg_mask) > +{ > + if (TREE_CODE (arg_mask) != INTEGER_CST) > +return false; > + > + unsigned HOST_WIDE_INT mask = TREE_INT_CST_LOW (arg_mask); > + if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U) > +return false; > + > + return true; > +} > + > static tree > ix86_fold_builtin (tree fndecl, int n_args, >tree *args, bool ignore ATTRIBUTE_UNUSED) > @@ -18026,6 +18040,7 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) >enum tree_code tcode; >unsigned HOST_WIDE_INT count; >bool is_vshift; > + unsigned HOST_WIDE_INT elems; > >switch (fn_code) > { > @@ -18349,17 +18364,11 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) >gcc_assert (n_args >= 2); >arg0 = gimple_call_arg (stmt, 0); >arg1 = gimple_call_arg (stmt, 1); > - if (n_args > 2) > - { > - /* This is masked shift. Only optimize if the mask is all ones. */ > - tree argl = gimple_call_arg (stmt, n_args - 1); > - if (!tree_fits_uhwi_p (argl)) > - break; > - unsigned HOST_WIDE_INT mask = tree_to_uhwi (argl); > - unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > - if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U) > - break; > - } > + elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + /* For masked shift, only optimize if the mask is all ones. */ > + if (n_args > 2 > + && !ix86_masked_all_ones (elems, gimple_call_arg (stmt, n_args - > 1))) > + break; >if (is_vshift) > { > if (TREE_CODE (arg1) != VECTOR_CST) > @@ -18408,25 +18417,62 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) > } >break; > > +case IX86_BUILTIN_SHUFPD512: > +case IX86_BUILTIN_SHUFPS512: > case IX86_BUILTIN_SHUFPD: > +case IX86_BUILTIN_SHUFPD256: > +case IX86_BUILTIN_SHUFPS: > +case IX86_BUILTIN_SHUFPS256: > + arg0 = gimple_call_arg (stmt, 0); > + elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + /* This is masked shuffle. Only optimize if the mask is all ones. */ > + if (n_args > 3 > + && !ix86_masked_all_ones (elems, > + gimple_call_arg (stmt, n_args - 1))) > + break; >arg2 = gimple_call_arg (stmt, 2); >if (TREE_CODE (arg2) == INTEGER_CST) > { > + unsigned HOST_WIDE_INT shuffle_mask = TREE_INT_CST_LOW (arg2); > + /* Check valid imm, refer to gcc.target/i386/testimm-10.c. */ > + if (shuffle_mask > 255) > + return false; > + > + machine_mode imode = GET_MODE_INNER (TYPE_MODE (TREE_TYPE (arg0))); > location_t loc = gimple_location (stmt); > - unsigned HOST_WIDE_INT imask = TREE_INT_CST_LOW (arg2); > - arg0 = gimple_call_arg (stmt, 0); > + tree itype = (imode == E_DFmode > + ? long_long_integer_type_node : integer_type_node); > + tree vtype = build_vector_type (itype, elems); > + tree_vector_builder elts (vtype, elems, 1); > + > + > + /* Transform integer shuffle_mask to vector perm_mask which > +is used by vec_perm_expr, refer to shuflp[sd]256/512 in sse.md. > */ > + for (unsigned i = 0; i != elems; i++) > + { > + unsigned
[PATCH] Fold more shuffle builtins to VEC_PERM_EXPR.
This patch is a follow-up to [1], it fold all shufps/shufpd builtins into gimple. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. [1] https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html gcc/ PR target/98167 PR target/43147 * config/i386/i386.c (ix86_gimple_fold_builtin): Fold IX86_BUILTIN_SHUFPD512, IX86_BUILTIN_SHUFPS512, IX86_BUILTIN_SHUFPD256, IX86_BUILTIN_SHUFPS, IX86_BUILTIN_SHUFPS256. (ix86_masked_all_ones): New function. gcc/testsuite/ * gcc.target/i386/avx512f-vshufpd-1.c: Adjust testcase. * gcc.target/i386/avx512f-vshufps-1.c: Adjust testcase. * gcc.target/i386/pr43147.c: New test. --- gcc/config/i386/i386.c| 90 ++- .../gcc.target/i386/avx512f-vshufpd-1.c | 3 +- .../gcc.target/i386/avx512f-vshufps-1.c | 3 +- gcc/testsuite/gcc.target/i386/pr43147.c | 15 4 files changed, 87 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr43147.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ebec8668758..f3eed9f2426 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -17541,6 +17541,20 @@ ix86_vector_shift_count (tree arg1) return NULL_TREE; } +/* Return true if arg_mask is all ones, arg_vec is corresponding vector. */ +static bool +ix86_masked_all_ones (unsigned HOST_WIDE_INT elems, tree arg_mask) +{ + if (TREE_CODE (arg_mask) != INTEGER_CST) +return false; + + unsigned HOST_WIDE_INT mask = TREE_INT_CST_LOW (arg_mask); + if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U) +return false; + + return true; +} + static tree ix86_fold_builtin (tree fndecl, int n_args, tree *args, bool ignore ATTRIBUTE_UNUSED) @@ -18026,6 +18040,7 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) enum tree_code tcode; unsigned HOST_WIDE_INT count; bool is_vshift; + unsigned HOST_WIDE_INT elems; switch (fn_code) { @@ -18349,17 +18364,11 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) gcc_assert (n_args >= 2); arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1); - if (n_args > 2) - { - /* This is masked shift. Only optimize if the mask is all ones. */ - tree argl = gimple_call_arg (stmt, n_args - 1); - if (!tree_fits_uhwi_p (argl)) - break; - unsigned HOST_WIDE_INT mask = tree_to_uhwi (argl); - unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); - if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U) - break; - } + elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + /* For masked shift, only optimize if the mask is all ones. */ + if (n_args > 2 + && !ix86_masked_all_ones (elems, gimple_call_arg (stmt, n_args - 1))) + break; if (is_vshift) { if (TREE_CODE (arg1) != VECTOR_CST) @@ -18408,25 +18417,62 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) } break; +case IX86_BUILTIN_SHUFPD512: +case IX86_BUILTIN_SHUFPS512: case IX86_BUILTIN_SHUFPD: +case IX86_BUILTIN_SHUFPD256: +case IX86_BUILTIN_SHUFPS: +case IX86_BUILTIN_SHUFPS256: + arg0 = gimple_call_arg (stmt, 0); + elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + /* This is masked shuffle. Only optimize if the mask is all ones. */ + if (n_args > 3 + && !ix86_masked_all_ones (elems, + gimple_call_arg (stmt, n_args - 1))) + break; arg2 = gimple_call_arg (stmt, 2); if (TREE_CODE (arg2) == INTEGER_CST) { + unsigned HOST_WIDE_INT shuffle_mask = TREE_INT_CST_LOW (arg2); + /* Check valid imm, refer to gcc.target/i386/testimm-10.c. */ + if (shuffle_mask > 255) + return false; + + machine_mode imode = GET_MODE_INNER (TYPE_MODE (TREE_TYPE (arg0))); location_t loc = gimple_location (stmt); - unsigned HOST_WIDE_INT imask = TREE_INT_CST_LOW (arg2); - arg0 = gimple_call_arg (stmt, 0); + tree itype = (imode == E_DFmode + ? long_long_integer_type_node : integer_type_node); + tree vtype = build_vector_type (itype, elems); + tree_vector_builder elts (vtype, elems, 1); + + + /* Transform integer shuffle_mask to vector perm_mask which +is used by vec_perm_expr, refer to shuflp[sd]256/512 in sse.md. */ + for (unsigned i = 0; i != elems; i++) + { + unsigned sel_idx; + /* Imm[1:0](if VL > 128, then use Imm[3:2],Imm[5:4],Imm[7:6]) +provide 2 select constrols for each element of the +destination. */ + if (imode == E_DFmode) + sel_idx = (i & 1) * elems + (i & ~1) + + ((shuffle_mask >> i) & 1); +
Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
On Thu, Aug 26, 2021 at 7:16 AM Jeff Law wrote: > > > > On 8/24/2021 3:44 AM, Hongtao Liu via Gcc-patches wrote: > > On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu wrote: > > On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu wrote: > > On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu wrote: > > On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches > wrote: > > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford > wrote: > > Richard Biener via Gcc-patches writes: > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt wrote: > > Hi: > --- > OK, I think sth is amiss here upthread. insv/extv do look like they > are designed > to work on integer modes (but docs do not say anything about this here). > In fact the caller of extract_bit_field_using_extv is named > extract_integral_bit_field. Of course nothing seems to check what kind of > modes we're dealing with, but we're for example happily doing > expand_shift in 'mode'. In the extract_integral_bit_field call 'mode' is > some integer mode and op0 is HFmode? From the above I get it's > the other way around? In that case we should wrap the > call to extract_integral_bit_field, extracting in an integer mode with the > same size as 'mode' and then converting the result as (subreg:HF (reg:HI > ...)). > --- > This is a separate patch as a follow up of upper comments. > > gcc/ChangeLog: > > * expmed.c (extract_bit_field_1): Wrap the call to > extract_integral_bit_field, extracting in an integer mode with > the same size as 'tmode' and then converting the result > as (subreg:tmode (reg:imode)). > > gcc/testsuite/ChangeLog: > * gcc.target/i386/float16-5.c: New test. > --- > gcc/expmed.c | 19 +++ > gcc/testsuite/gcc.target/i386/float16-5.c | 12 > 2 files changed, 31 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c > > diff --git a/gcc/expmed.c b/gcc/expmed.c > index 3143f38e057..72790693ef0 100644 > --- a/gcc/expmed.c > +++ b/gcc/expmed.c > @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, > poly_uint64 bitnum, >op0_mode = opt_scalar_int_mode (); > } > > + /* Make sure we are playing with integral modes. Pun with subregs > + if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE > + in extract_integral_bit_field. */ > + if (int_mode_for_mode (tmode).exists () > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly > cheaper. Then imode should always be != tmode. Maybe > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure > how it behaves for composite modes. > > Of course the least surprises would happen when we restrict this > to FLOAT_MODE_P (tmode). > > Richard - any preferences? > > If the bug is that extract_integral_bit_field is being called with > a non-integral mode parameter, then it looks odd that we can still > fall through to it without an integral mode (when exists is false). > > If calling extract_integral_bit_field without an integral mode is > a bug then I think we should have: > > int_mode_for_mode (mode).require () > > whenever mode is not already SCALAR_INT_MODE_P/is_a. > Ideally we'd make the mode parameter scalar_int_mode too. > > extract_integral_bit_field currently has: > > /* Find a correspondingly-sized integer field, so we can apply > shifts and masks to it. */ > scalar_int_mode int_mode; > if (!int_mode_for_mode (tmode).exists (_mode)) > /* If this fails, we should probably push op0 out to memory and then >do a load. */ > int_mode = int_mode_for_mode (mode).require (); > > which would seem to be redundant after this change. > > I'm not sure what exactly the bug is, but extract_integral_bit_field ends > up creating a lowpart subreg that's not allowed and that ICEs (and I > can't see a way to check beforehand). So it seems to me at least > part of that function doesn't expect non-integral extraction modes. > > But who knows - the code is older than I am (OK, not, but older than > my involvment in GCC ;)) > > How about attached patch w/ below changelog > > gcc/ChangeLog: > > * expmed.c (extract_bit_field_1): Make sure we're playing with > integral modes before call extract_integral_bit_field. > (extract_integral_bit_field): Add a parameter of type > scalar_int_mode which corresponds to of tmode. > And call extract_and_convert_fixed_bit_field instead of > extract_fixed_bit_field and convert_extracted_bit_field. > (extract_and_convert_fixed_bit_field): New function, it's a > combination of extract_fixed_bit_field and > convert_extracted_bit_field. > > gcc/testsuite/ChangeLog: > * gcc.target/i386/float16-5.c: New test. > > I'd like to ping this patch, or maybe we can use the patch before with > richi's comments. > > Rebased and ping^2, there are plenty of avx512fp16 patches blocked by > this patch, i'd like someone to help
Re: [Patch] Fix cygming-crtend.c build warning due to -Wprio-ctor-dtor
On 8/25/21 11:06 PM, Jeff Law wrote: On 8/25/2021 10:46 AM, Jonathan Yong via Gcc-patches wrote: Attached patches OK? cygming-crtend.c: fix build warnings libgcc/Changelog: * config/i386/cygming-crtend.c: Fix register_frame_ctor and register_frame_dtor warnings. extend.texi: add note about reserved ctor/dtor priorities gcc/Changelog: * doc/extend.texi: Add note about reserved priorities to the constructor attribute. Yes. Both are OK. jeff Done, pushed to master, thanks. OpenPGP_0x713B5FE29C145D45.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
Hi Bill, Thanks for your comments. Hi Segher, Here is the ChangeLog and patch diff. Thanks. 2021-08-25 Haochen Gui gcc/ * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions. gcc/testsuite/ * gcc.target/powerpc/vec-minmax-1.c: New test. * gcc.target/powerpc/vec-minmax-2.c: Likewise. diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index b4e13af4dc6..90527734ceb 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) return true; /* flavors of vec_min. */ case VSX_BUILTIN_XVMINDP: + case ALTIVEC_BUILTIN_VMINFP: + if (!flag_finite_math_only || flag_signed_zeros) + return false; + /* Fall through to MIN_EXPR. */ + gcc_fallthrough (); case P8V_BUILTIN_VMINSD: case P8V_BUILTIN_VMINUD: case ALTIVEC_BUILTIN_VMINSB: @@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case ALTIVEC_BUILTIN_VMINUB: case ALTIVEC_BUILTIN_VMINUH: case ALTIVEC_BUILTIN_VMINUW: - case ALTIVEC_BUILTIN_VMINFP: arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1); lhs = gimple_call_lhs (stmt); @@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) return true; /* flavors of vec_max. */ case VSX_BUILTIN_XVMAXDP: + case ALTIVEC_BUILTIN_VMAXFP: + if (!flag_finite_math_only || flag_signed_zeros) + return false; + /* Fall through to MAX_EXPR. */ + gcc_fallthrough (); case P8V_BUILTIN_VMAXSD: case P8V_BUILTIN_VMAXUD: case ALTIVEC_BUILTIN_VMAXSB: @@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case ALTIVEC_BUILTIN_VMAXUB: case ALTIVEC_BUILTIN_VMAXUH: case ALTIVEC_BUILTIN_VMAXUW: - case ALTIVEC_BUILTIN_VMAXFP: arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1); lhs = gimple_call_lhs (stmt); diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c new file mode 100644 index 000..547798fd65c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c @@ -0,0 +1,53 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */ + +/* This test verifies that float or double vec_min/max are bound to + xv[min|max][d|s]p instructions when fast-math is not set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_min (va, vb), PREF_D); +} + +#ifdef _BIG_ENDIAN + const int PREF_F = 0; +#else + const int PREF_F = 3; +#endif + +float vmaxf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_max (va, vb), PREF_F); +} + +float vminf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_min (va, vb), PREF_F); +} diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c new file mode 100644 index 000..4c6f4365830 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c @@ -0,0 +1,51 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */ +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */ + +/* This test verifies that float or double vec_min/max can be converted + to scalar comparison when fast-math is set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); +
Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
On Tue, Aug 24, 2021 at 7:39 PM Richard Biener wrote: > > On Tue, Aug 24, 2021 at 11:38 AM Hongtao Liu wrote: > > > > On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu wrote: > > > > > > On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu wrote: > > > > > > > > On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu wrote: > > > > > > > > > > On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches > > > > > wrote: > > > > > > > > > > > > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford > > > > > > wrote: > > > > > > > > > > > > > > Richard Biener via Gcc-patches writes: > > > > > > > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt > > > > > > > > wrote: > > > > > > > >> > > > > > > > >> Hi: > > > > > > > >> --- > > > > > > > >> OK, I think sth is amiss here upthread. insv/extv do look > > > > > > > >> like they > > > > > > > >> are designed > > > > > > > >> to work on integer modes (but docs do not say anything about > > > > > > > >> this here). > > > > > > > >> In fact the caller of extract_bit_field_using_extv is named > > > > > > > >> extract_integral_bit_field. Of course nothing seems to check > > > > > > > >> what kind of > > > > > > > >> modes we're dealing with, but we're for example happily doing > > > > > > > >> expand_shift in 'mode'. In the extract_integral_bit_field > > > > > > > >> call 'mode' is > > > > > > > >> some integer mode and op0 is HFmode? From the above I get it's > > > > > > > >> the other way around? In that case we should wrap the > > > > > > > >> call to extract_integral_bit_field, extracting in an integer > > > > > > > >> mode with the > > > > > > > >> same size as 'mode' and then converting the result as > > > > > > > >> (subreg:HF (reg:HI ...)). > > > > > > > >> --- > > > > > > > >> This is a separate patch as a follow up of upper comments. > > > > > > > >> > > > > > > > >> gcc/ChangeLog: > > > > > > > >> > > > > > > > >> * expmed.c (extract_bit_field_1): Wrap the call to > > > > > > > >> extract_integral_bit_field, extracting in an integer > > > > > > > >> mode with > > > > > > > >> the same size as 'tmode' and then converting the result > > > > > > > >> as (subreg:tmode (reg:imode)). > > > > > > > >> > > > > > > > >> gcc/testsuite/ChangeLog: > > > > > > > >> * gcc.target/i386/float16-5.c: New test. > > > > > > > >> --- > > > > > > > >> gcc/expmed.c | 19 > > > > > > > >> +++ > > > > > > > >> gcc/testsuite/gcc.target/i386/float16-5.c | 12 > > > > > > > >> 2 files changed, 31 insertions(+) > > > > > > > >> create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c > > > > > > > >> > > > > > > > >> diff --git a/gcc/expmed.c b/gcc/expmed.c > > > > > > > >> index 3143f38e057..72790693ef0 100644 > > > > > > > >> --- a/gcc/expmed.c > > > > > > > >> +++ b/gcc/expmed.c > > > > > > > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, > > > > > > > >> poly_uint64 bitsize, poly_uint64 bitnum, > > > > > > > >>op0_mode = opt_scalar_int_mode (); > > > > > > > >> } > > > > > > > >> > > > > > > > >> + /* Make sure we are playing with integral modes. Pun with > > > > > > > >> subregs > > > > > > > >> + if we aren't. When tmode is HFmode, op0 is SImode, there > > > > > > > >> will be ICE > > > > > > > >> + in extract_integral_bit_field. */ > > > > > > > >> + if (int_mode_for_mode (tmode).exists () > > > > > > > > > > > > > > > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly > > > > > > > > cheaper. Then imode should always be != tmode. Maybe > > > > > > > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure > > > > > > > > how it behaves for composite modes. > > > > > > > > > > > > > > > > Of course the least surprises would happen when we restrict this > > > > > > > > to FLOAT_MODE_P (tmode). > > > > > > > > > > > > > > > > Richard - any preferences? > > > > > > > > > > > > > > If the bug is that extract_integral_bit_field is being called with > > > > > > > a non-integral mode parameter, then it looks odd that we can still > > > > > > > fall through to it without an integral mode (when exists is > > > > > > > false). > > > > > > > > > > > > > > If calling extract_integral_bit_field without an integral mode is > > > > > > > a bug then I think we should have: > > > > > > > > > > > > > > int_mode_for_mode (mode).require () > > > > > > > > > > > > > > whenever mode is not already > > > > > > > SCALAR_INT_MODE_P/is_a. > > > > > > > Ideally we'd make the mode parameter scalar_int_mode too. > > > > > > > > > > > > > > extract_integral_bit_field currently has: > > > > > > > > > > > > > > /* Find a correspondingly-sized integer field, so we can apply > > > > > > > shifts and masks to it. */ > > > > > > > scalar_int_mode int_mode; > > > > > > > if (!int_mode_for_mode (tmode).exists (_mode)) > > > > > > > /* If this fails, we should probably push op0 out to memory > > > > > > > and then > > > > > > >do a load.
[r12-3142 Regression] FAIL: 20_util/enable_shared_from_this/89303.cc (test for excess errors) on Linux/x86_64
On Linux/x86_64, 5c85f29537662f1f4195a102cbf0182ffa32d8ac is the first bad commit commit 5c85f29537662f1f4195a102cbf0182ffa32d8ac Author: Jan Hubicka Date: Wed Aug 25 21:43:07 2021 +0200 Merge load/stores in ipa-modref summaries caused FAIL: 20_util/enable_shared_from_this/89303.cc (test for excess errors) FAIL: g++.dg/torture/pr89303.C -O1 (internal compiler error) FAIL: g++.dg/torture/pr89303.C -O1 (test for excess errors) with GCC configured with To reproduce: $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=20_util/enable_shared_from_this/89303.cc --target_board='unix{-m32}'" $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=20_util/enable_shared_from_this/89303.cc --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg-torture.exp=g++.dg/torture/pr89303.C --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg-torture.exp=g++.dg/torture/pr89303.C --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[r12-3144 Regression] FAIL: gcc.dg/tree-ssa/pr64130.c scan-tree-dump evrp "int \\[-8589934591, -2\\]" on Linux/x86_64
On Linux/x86_64, ed3de423f1694d30f90c024fb6e19e2c6323 is the first bad commit commit ed3de423f1694d30f90c024fb6e19e2c6323 Author: Martin Sebor Date: Wed Aug 25 14:36:13 2021 -0600 Avoid printing range table header alone. caused FAIL: gcc.dg/tree-ssa/evrp1.c scan-tree-dump evrp "\\[5, \\+INF" FAIL: gcc.dg/tree-ssa/evrp2.c scan-tree-dump evrp "\\[4, 7\\]" FAIL: gcc.dg/tree-ssa/evrp3.c scan-tree-dump evrp "\\[1, 10\\]" FAIL: gcc.dg/tree-ssa/evrp4.c scan-tree-dump evrp "\\[1B, \\+INF\\]" FAIL: gcc.dg/tree-ssa/evrp6.c scan-tree-dump evrp "\\[12, \\+INF" FAIL: gcc.dg/tree-ssa/pr64130.c scan-tree-dump evrp "int \\[2, 8589934591\\]" FAIL: gcc.dg/tree-ssa/pr64130.c scan-tree-dump evrp "int \\[-8589934591, -2\\]" with GCC configured with To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp1.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp1.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp1.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp1.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp2.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp2.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp2.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp2.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp3.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp3.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp3.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp3.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp4.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp4.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp4.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp4.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp6.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp6.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp6.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp6.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr64130.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr64130.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr64130.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr64130.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[PATCH] Inline IBM long double __gcc_qsub
rs6000: inline ldouble __gcc_qsub While performing some tests of IEEE 128 float for PPC64LE, Michael Meissner noticed that __gcc_qsub is substantially slower than __gcc_qadd. __gcc_qsub valls __gcc_add with the second operand negated. Because the functions normally are invoked through libgcc shared object, the extra PLT overhead has a large impact on the overall time of the function. Instead of trying to be fancy with function decorations to prevent interposition, this patch inlines the definition of __gcc_qadd into __gcc_qsub with the negation propagated through the function. libgcc/ChangeLog: * config/rs6000/ibm-ldouble.c (__gcc_qsub): Inline negated __gcc_qadd. diff --git a/libgcc/config/rs6000/ibm-ldouble.c b/libgcc/config/rs6000/ibm-ldouble.c index 4c13453f975..ed74900e5c3 100644 --- a/libgcc/config/rs6000/ibm-ldouble.c +++ b/libgcc/config/rs6000/ibm-ldouble.c @@ -158,9 +158,42 @@ __gcc_qadd (double a, double aa, double c, double cc) } IBM128_TYPE -__gcc_qsub (double a, double b, double c, double d) +__gcc_qsub (double a, double aa, double c, double cc) { - return __gcc_qadd (a, b, -c, -d); + double xh, xl, z, q, zz; + + z = a - c; + + if (nonfinite (z)) +{ + if (fabs (z) != inf()) + return z; + z = -cc + aa - c + a; + if (nonfinite (z)) + return z; + xh = z; /* Will always be DBL_MAX. */ + zz = aa - cc; + if (fabs(a) > fabs(c)) + xl = a - z - c + zz; + else + xl = -c - z + a + zz; +} + else +{ + q = a - z; + zz = q - c + (a - (q + z)) + aa - cc; + + /* Keep -0 result. */ + if (zz == 0.0) + return z; + + xh = z + zz; + if (nonfinite (xh)) + return xh; + + xl = z - xh + zz; +} + return pack_ldouble (xh, xl); } #ifdef __NO_FPRS__
Re: [PATCH] improve note location and refactor warn_uninit
On 8/25/21 5:00 PM, Jeff Law wrote: On 8/25/2021 2:03 PM, Martin Sebor via Gcc-patches wrote: Richard, some time ago you mentioned you'd had trouble getting -Wuninitialized to print the note pointing to the uninitialized variable. I said I'd look into it, and so I did. The attached patch simplifies the warn_uninit() function to get rid of some redundant cruft: besides a few pointless null tests and a duplicate argument it also does away with ancient code that's responsible for the problem you saw. It turns out that tests for the problem are already in the test suite but have been xfailed since the day they were added, so the patch simply removes the xfails. I'll go ahead and commit this cleanup as obvious later this week unless you have suggestions for further changes. Tested on x86_64-linux. I'm going to object to installing this "as obvious". I don't see how anyone could come to the conclusion this is obvious. As I explained, the patch just a) removes unnecessary and/or unused code, b) moves declarations closer to their point of initialization, and c) gets rid of the linemap_location_before_p() tests that prevent the note that Richard was looking for from being printed. It has no effect besides that and it seems rather trivial to me. But I don't want to reopen the whole "obviousness debate" again. Please wait for review. Fine. Martin
Re: [PATCH] avoid printing range table header alone
On 8/25/21 1:34 PM, Andrew MacLeod wrote: On 8/25/21 3:15 PM, Martin Sebor wrote: On 8/25/21 11:46 AM, Andrew MacLeod wrote: On 8/25/21 1:25 PM, Martin Sebor wrote: I noticed that the output of -fdump-tree-walloca (and probably all passes that call gimple_ranger::export_global_ranges()) includes the following two lines for all functions, even when there's nothing else of relevance after them: Exported global range table === I was a bit puzzled by it at first, wondering if it was leftover debugging output, until I checked the code and realized it's a header that may be (but isn't always) followed by the contents of the table. To reduce clutter in the dump the attached patch changes the function to print the header only when it is followed by at least one line of output. (Another tweak to reduce even further the amount of output printed by default that might be worth considering is to print the table only when TDF_DETAILS is set.) Martin Except checking x == 1 won't work if ssa_name(1) was not updated. Just set a flag if the header hasn't been printed yet. tried and true. I guess you can protect it behind details if you want. Doh! I've fixed that. While testing I came across another instance of the same pattern so I've made the same change there as well. I also took the liberty to make the output slightly more consistent between the two dumps. Attached is what I'm planning to commit. Please let me know if you have any other suggestions. OK, thats fine. I got ahead of myself and pushed the change in r12-3144 before my second test run finished. A few tree-ssa tests needed adjusting after the TDF_DETAILS change. I fixed those r12-3152. FWIW, I see duplicate output in the EVRP1 dump that looks just like it could be the output under Non-varying global ranges. I wonder if it's the result of the block in ranger_cache::fill_block_cache guarded by if (DEBUG_RANGE_CACHE). Martin The cache debug mechanism hasnt been fully reworked yet... i moved GORI and the ranger to the trace class.. I haven't decided yet what I want to change in the cache. its a bit dated. regardless, if your sources are up to date, DEBUG_RANGE_CACHE is now guarded by: #define DEBUG_RANGE_CACHE (dump_file && (param_evrp_mode & EVRP_MODE_CACHE) \ == EVRP_MODE_CACHE) so you'd only see that output if you specify --param=evrp-mode=cache or debug It's not a big deal but the output I noticed is actually coming from gimple_ranger::dump_bb (). In the test I happened to look at it shows up just before Non-varying global ranges: It's attached just to show you what I mean. Martin ;; Function foo (foo, funcdef_no=0, decl_uid=1943, cgraph_uid=1, symbol_order=0) ;; 1 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 ;; 2 succs { 1 } Substituting values and folding statements Folding statement: _3 = *x_2(D); Not folded Folding statement: return _3; Not folded === BB 2 : _3 = *x_2(D); return _3; char foo (char * x) { char _3; : _3 = *x_2(D); return _3; } ;; Function bar (bar, funcdef_no=1, decl_uid=1946, cgraph_uid=2, symbol_order=1) ;; 1 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 ;; 2 succs { 1 } Substituting values and folding statements Folding statement: saved_stack.1_16 = __builtin_stack_save (); Not folded Folding statement: _18 = *x_17(D); Not folded Folding statement: _1 = (long int) _18; Not folded Folding statement: _2 = _1 + -1; Registering value_relation (_2 < _1) (bb2) at _2 = _1 + -1; Not folded Folding statement: _3 = (sizetype) _2; Not folded Folding statement: _4 = (sizetype) _18; Not folded Folding statement: _5 = (bitsizetype) _4; Not folded Folding statement: _6 = _5 * 64; Not folded Folding statement: _8 = _4 * 8; Not folded Folding statement: arr.0_26 = __builtin_alloca_with_align (_8, 64); Not folded Folding statement: __builtin_stack_restore (saved_stack.1_16); Not folded Folding statement: return; Not folded === BB 2 Relational : (_2 < _1) : saved_stack.1_16 = __builtin_stack_save (); _18 = *x_17(D); _1 = (long int) _18; _2 = _1 + -1; _3 = (sizetype) _2; _4 = (sizetype) _18; _5 = (bitsizetype) _4; _6 = _5 * 64; _8 = _4 * 8; arr.0_26 = __builtin_alloca_with_align (_8, 64); __builtin_stack_restore (saved_stack.1_16); return; _1 : long int [-128, 127] _2 : long int [-129, 126] _3 : sizetype [0, 126][18446744073709551487, +INF] _4 : sizetype [0, 127][18446744073709551488, +INF] _5 : bitsizetype [0, 127][18446744073709551488, 18446744073709551615] _6 : bitsizetype [0, 8128][0x3fe000, 0x3fffc0] _8 : sizetype [0, 1016][18446744073709550592, 18446744073709551608] arr.0_26 : void *[0:D.1956] * [1B, +INF] Non-varying global ranges: =: _1 : long int [-128, 127] _2 : long
Re: [PATCH 14/34] rs6000: Add remaining overloads
On Thu, Jul 29, 2021 at 08:31:01AM -0500, Bill Schmidt wrote: > * config/rs6000/rs6000-overload.def: Add remaining overloads. > +; TODO: Note that the entry for VEC_ADDE currently gets ignored in > +; altivec_resolve_overloaded_builtin. Revisit whether we can remove > +; that. We still need to register the legal builtin forms here. > +[VEC_ADDE, vec_adde, __builtin_vec_adde] > + vsq __builtin_vec_adde (vsq, vsq, vsq); > +VADDEUQM VADDEUQM_VSQ > + vuq __builtin_vec_adde (vuq, vuq, vuq); > +VADDEUQM VADDEUQM_VUQ I'm not sure what this means. "Currently" is the problem I think. Do you mean that the existing code (before this patch) ignores it already? > +; XVRSPIP{TARGET_VSX};VRFIP > +[VEC_CEIL, vec_ceil, __builtin_vec_ceil] > + vf __builtin_vec_ceil (vf); > +VRFIP > + vd __builtin_vec_ceil (vd); > +XVRDPIP Is that a comment you forgot to remove, or is there work to be done here? > +[VEC_CMPEQ, vec_cmpeq, __builtin_vec_cmpeq] > +; XVCMPEQSP{TARGET_VSX};VCMPEQFP And this. > +; XVCMPEQSP_P{TARGET_VSX};VCMPEQFP_P > +[VEC_CMPEQ_P, SKIP, __builtin_vec_vcmpeq_p] And more! And more later. It isn't clear to me at all what those comments mean, and they are formatted haphazardly, so looks like a WIP? > +; Note that the entries for VEC_MUL are currently ignored. See rs6000-c.c: > +; altivec_resolve_overloaded_builtin, where there is special-case code for > +; VEC_MUL. TODO: Is this really necessary? Investigate. Seven missing > +; prototypes here...no corresponding builtins. Also added "vmulld" in P10 > +; which could be used instead of MUL_V2DI, conditionally? Space after "..." :-P > +; Opportunity for improvement: We can use XVRESP instead of VREFP for > +; TARGET_VSX. We would need conditional dispatch to allow two possibilities. > +; Some syntax like "XVRESP{TARGET_VSX};VREFP". > +; TODO. > +[VEC_RE, vec_re, __builtin_vec_re] Don't we already anyway? The only difference is whether all VSRs are allowed or only the VRs, no? The RTL generated is just the same? Or maybe I am overlooking something :-) > +; ** > +; ** > +; Deprecated overloads that should never have existed at all > +; ** > +; ** The coding conventions say not to use showy block comments like that, but it seems appropriate here :-) Okay for trunk with the looked at. Please don't repost this one. Thanks! Segher
Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
On 8/24/2021 3:44 AM, Hongtao Liu via Gcc-patches wrote: On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu wrote: On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu wrote: On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu wrote: On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches wrote: On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford wrote: Richard Biener via Gcc-patches writes: On Fri, Aug 6, 2021 at 5:32 AM liuhongt wrote: Hi: --- OK, I think sth is amiss here upthread. insv/extv do look like they are designed to work on integer modes (but docs do not say anything about this here). In fact the caller of extract_bit_field_using_extv is named extract_integral_bit_field. Of course nothing seems to check what kind of modes we're dealing with, but we're for example happily doing expand_shift in 'mode'. In the extract_integral_bit_field call 'mode' is some integer mode and op0 is HFmode? From the above I get it's the other way around? In that case we should wrap the call to extract_integral_bit_field, extracting in an integer mode with the same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)). --- This is a separate patch as a follow up of upper comments. gcc/ChangeLog: * expmed.c (extract_bit_field_1): Wrap the call to extract_integral_bit_field, extracting in an integer mode with the same size as 'tmode' and then converting the result as (subreg:tmode (reg:imode)). gcc/testsuite/ChangeLog: * gcc.target/i386/float16-5.c: New test. --- gcc/expmed.c | 19 +++ gcc/testsuite/gcc.target/i386/float16-5.c | 12 2 files changed, 31 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c diff --git a/gcc/expmed.c b/gcc/expmed.c index 3143f38e057..72790693ef0 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum, op0_mode = opt_scalar_int_mode (); } + /* Make sure we are playing with integral modes. Pun with subregs + if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE + in extract_integral_bit_field. */ + if (int_mode_for_mode (tmode).exists () check !INTEGRAL_MODE_P (tmode) before, that should be slightly cheaper. Then imode should always be != tmode. Maybe even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure how it behaves for composite modes. Of course the least surprises would happen when we restrict this to FLOAT_MODE_P (tmode). Richard - any preferences? If the bug is that extract_integral_bit_field is being called with a non-integral mode parameter, then it looks odd that we can still fall through to it without an integral mode (when exists is false). If calling extract_integral_bit_field without an integral mode is a bug then I think we should have: int_mode_for_mode (mode).require () whenever mode is not already SCALAR_INT_MODE_P/is_a. Ideally we'd make the mode parameter scalar_int_mode too. extract_integral_bit_field currently has: /* Find a correspondingly-sized integer field, so we can apply shifts and masks to it. */ scalar_int_mode int_mode; if (!int_mode_for_mode (tmode).exists (_mode)) /* If this fails, we should probably push op0 out to memory and then do a load. */ int_mode = int_mode_for_mode (mode).require (); which would seem to be redundant after this change. I'm not sure what exactly the bug is, but extract_integral_bit_field ends up creating a lowpart subreg that's not allowed and that ICEs (and I can't see a way to check beforehand). So it seems to me at least part of that function doesn't expect non-integral extraction modes. But who knows - the code is older than I am (OK, not, but older than my involvment in GCC ;)) How about attached patch w/ below changelog gcc/ChangeLog: * expmed.c (extract_bit_field_1): Make sure we're playing with integral modes before call extract_integral_bit_field. (extract_integral_bit_field): Add a parameter of type scalar_int_mode which corresponds to of tmode. And call extract_and_convert_fixed_bit_field instead of extract_fixed_bit_field and convert_extracted_bit_field. (extract_and_convert_fixed_bit_field): New function, it's a combination of extract_fixed_bit_field and convert_extracted_bit_field. gcc/testsuite/ChangeLog: * gcc.target/i386/float16-5.c: New test. I'd like to ping this patch, or maybe we can use the patch before with richi's comments. Rebased and ping^2, there are plenty of avx512fp16 patches blocked by this patch, i'd like someone to help review this patch. Please ignore the former attached patch, should be the patch attached here. Richard. + && imode != tmode + && imode != GET_MODE (op0)) +{ + rtx ret = extract_integral_bit_field (op0, op0_mode, +
[committed] libstdc++: Add another non-reserved name to tests
Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * testsuite/17_intro/names.cc: Check 'sz'. Tested x86_64-linux. Committed to trunk. commit ea5674687ac45fe7242c57220b699337899881f0 Author: Jonathan Wakely Date: Wed Aug 25 23:19:25 2021 libstdc++: Add another non-reserved name to tests Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * testsuite/17_intro/names.cc: Check 'sz'. diff --git a/libstdc++-v3/testsuite/17_intro/names.cc b/libstdc++-v3/testsuite/17_intro/names.cc index b0cc21d6cda..b945511e088 100644 --- a/libstdc++-v3/testsuite/17_intro/names.cc +++ b/libstdc++-v3/testsuite/17_intro/names.cc @@ -108,6 +108,7 @@ #define func ( #define tmp ( +#define sz ( #if __cplusplus < 201103L #define uses_allocator (
Re: [PATCH] Use non-numbered clones for target_clones.
On 8/25/2021 6:23 AM, Martin Liška wrote: Hello. There's updated version of the patch that: - strips '.$number' suffix for default clones - skips numbering for both declarations and definitions of target clones. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin 0001-Use-non-numbered-clones-for-target_clones.patch From 99fdcd317cec7e3742e161f43951a67421506df7 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 20 Aug 2021 16:35:18 +0200 Subject: [PATCH] Use non-numbered clones for target_clones. gcc/ChangeLog: * cgraph.h (create_version_clone_with_body): Add new parameter. * cgraphclones.c: Likewise. * multiple_target.c (create_dispatcher_calls): Do not use numbered suffixes. (create_target_clone): Likewise here. gcc/testsuite/ChangeLog: * gcc.target/i386/mvc5.c: Scan assembly names. * gcc.target/i386/mvc7.c: Likewise. * gcc.target/i386/pr95778-1.c: Update scanned patterns. * gcc.target/i386/pr95778-2.c: Likewise. OK jeff
[committed] libstdc++: Fix names.cc test failures on Windows
The Windows CRT headers define structs with members called f, x, y etc so don't check those. There are also lots of unnecessary function parameters in mingw headers using non-reserved names, e.g. uses p and z as parameters of mingw_gettimeofday uses j as a parameter of imaxabs uses l, o and func as parameter names Those should be fixed in the headers instead. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * testsuite/17_intro/names.cc: Adjust for Windows. Tested powerpc64le-linux. Committed to trunk. commit f1a08f4d7839a0197a28ecd9ca6d4400769da3a7 Author: Jonathan Wakely Date: Wed Aug 25 22:27:22 2021 libstdc++: Fix names.cc test failures on Windows The Windows CRT headers define structs with members called f, x, y etc so don't check those. There are also lots of unnecessary function parameters in mingw headers using non-reserved names, e.g. uses p and z as parameters of mingw_gettimeofday uses j as a parameter of imaxabs uses l, o and func as parameter names Those should be fixed in the headers instead. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * testsuite/17_intro/names.cc: Adjust for Windows. diff --git a/libstdc++-v3/testsuite/17_intro/names.cc b/libstdc++-v3/testsuite/17_intro/names.cc index 3cbe0d2da54..b0cc21d6cda 100644 --- a/libstdc++-v3/testsuite/17_intro/names.cc +++ b/libstdc++-v3/testsuite/17_intro/names.cc @@ -287,4 +287,14 @@ #endif // __VXWORKS__ +#ifdef _WIN32 +#undef Value +// defines _CRT_FLOAT::f +#undef f +// defines _CRT_DOUBLE::x and _LONGDOUBLE::x +#undef x +// defines _complex::x and _complex::y +#undef y +#endif + #include
[committed] libstdc++: Fix non-reserved names in
Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * include/std/valarray: Uglify 'func' parameters. * testsuite/17_intro/names.cc: Add 'func' to checks. Tested powerpc64le-linux. Committed to trunk. commit 0163bbaaef119ef9e98c4b3dcba159609f77c818 Author: Jonathan Wakely Date: Wed Aug 25 22:24:54 2021 libstdc++: Fix non-reserved names in Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * include/std/valarray: Uglify 'func' parameters. * testsuite/17_intro/names.cc: Add 'func' to checks. diff --git a/libstdc++-v3/include/std/valarray b/libstdc++-v3/include/std/valarray index ad3e14ebe52..5adc94282ee 100644 --- a/libstdc++-v3/include/std/valarray +++ b/libstdc++-v3/include/std/valarray @@ -536,25 +536,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @brief Apply a function to the array. * * Returns a new valarray with elements assigned to the result of - * applying func to the corresponding element of this array. The new + * applying __func to the corresponding element of this array. The new * array has the same size as this one. * - * @param func Function of Tp returning Tp to apply. + * @param __func Function of Tp returning Tp to apply. * @return New valarray with transformed elements. */ - _Expr<_ValFunClos<_ValArray, _Tp>, _Tp> apply(_Tp func(_Tp)) const; + _Expr<_ValFunClos<_ValArray, _Tp>, _Tp> apply(_Tp __func(_Tp)) const; /** * @brief Apply a function to the array. * * Returns a new valarray with elements assigned to the result of - * applying func to the corresponding element of this array. The new + * applying __func to the corresponding element of this array. The new * array has the same size as this one. * - * @param func Function of const Tp& returning Tp to apply. + * @param __func Function of const Tp& returning Tp to apply. * @return New valarray with transformed elements. */ - _Expr<_RefFunClos<_ValArray, _Tp>, _Tp> apply(_Tp func(const _Tp&)) const; + _Expr<_RefFunClos<_ValArray, _Tp>, _Tp> apply(_Tp __func(const _Tp&)) const; /** * @brief Resize array. @@ -1062,18 +1062,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template inline _Expr<_ValFunClos<_ValArray, _Tp>, _Tp> -valarray<_Tp>::apply(_Tp func(_Tp)) const +valarray<_Tp>::apply(_Tp __func(_Tp)) const { typedef _ValFunClos<_ValArray, _Tp> _Closure; - return _Expr<_Closure, _Tp>(_Closure(*this, func)); + return _Expr<_Closure, _Tp>(_Closure(*this, __func)); } template inline _Expr<_RefFunClos<_ValArray, _Tp>, _Tp> -valarray<_Tp>::apply(_Tp func(const _Tp &)) const +valarray<_Tp>::apply(_Tp __func(const _Tp &)) const { typedef _RefFunClos<_ValArray, _Tp> _Closure; - return _Expr<_Closure, _Tp>(_Closure(*this, func)); + return _Expr<_Closure, _Tp>(_Closure(*this, __func)); } #define _DEFINE_VALARRAY_UNARY_OPERATOR(_Op, _Name) \ diff --git a/libstdc++-v3/testsuite/17_intro/names.cc b/libstdc++-v3/testsuite/17_intro/names.cc index aca7a8e5812..3cbe0d2da54 100644 --- a/libstdc++-v3/testsuite/17_intro/names.cc +++ b/libstdc++-v3/testsuite/17_intro/names.cc @@ -106,6 +106,7 @@ #endif #define z ( +#define func ( #define tmp ( #if __cplusplus < 201103L
Re: [PATCH] analyzer: Recognize __builtin_free as a matching deallocator
On 8/25/2021 9:16 AM, Siddhesh Poyarekar wrote: On 8/25/21 5:44 PM, Matthias Klose wrote: On 7/28/21 1:44 PM, David Malcolm via Gcc-patches wrote: On Wed, 2021-07-28 at 10:34 +0530, Siddhesh Poyarekar wrote: Recognize __builtin_free as being equivalent to free when passed into __attribute__((malloc ())), similar to how it is treated when it is encountered as a call. This fixes spurious warnings in glibc where xmalloc family of allocators as well as reallocarray, memalign, etc. are declared to have __builtin_free as the free function. gcc/analyzer/ChangeLog: * sm-malloc.cc (malloc_state_machine::get_or_create_deallocator): Recognize __builtin_free. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/attr-malloc-1.c (compatible_alloc, compatible_alloc2): New extern allocator declarations. (test_9, test_10): New tests. Looks good to me, thanks Dave Please could this be backported to all active branches? Sure, it looks like only gcc11 needs this since malloc attribute matching seems recent. David, I've never done a backport before, may I just cherry-pick, push and post a [committed] patch on list or does it need to go through review? In general it's fine to cherry-pick fixes from the mainline to a release branch that fix regressions, incorrect code and the like. This doesn't fall into one of those categories -- but given this is limited to the analyzer, I think it's fine to cherry-pick into the release branch as long as David is OK with it as well. jeff
Re: [Patch] Fix cygming-crtend.c build warning due to -Wprio-ctor-dtor
On 8/25/2021 10:46 AM, Jonathan Yong via Gcc-patches wrote: Attached patches OK? cygming-crtend.c: fix build warnings libgcc/Changelog: * config/i386/cygming-crtend.c: Fix register_frame_ctor and register_frame_dtor warnings. extend.texi: add note about reserved ctor/dtor priorities gcc/Changelog: * doc/extend.texi: Add note about reserved priorities to the constructor attribute. Yes. Both are OK. jeff
Re: [PATCH] diagnostics: Fix sporadic test failure
On 8/25/2021 12:07 PM, H.J. Lu wrote: On Sat, May 29, 2021 at 1:03 PM Jeff Law via Gcc-patches wrote: On 5/29/2021 1:55 PM, Bernd Edlinger wrote: On 5/29/21 9:31 PM, Jeff Law wrote: On 5/28/2021 6:38 AM, Bernd Edlinger wrote: Hi, it turns out to be reproducible this way: COLUMNS=80 make check-gcc-c RUNTESTFLAGS="plugin.exp=diagnostic*" Running /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/plugin/plugin.exp ... FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c -fplugin=./diagnostic_plugin_test_tree_expression_range.so 1 blank line(s) in output FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c -fplugin=./diagnostic_plugin_test_tree_expression_range.so expected multiline pattern lines 550-551 not found: " __builtin_types_compatible_p \(long, int\) \+ f \(i\)\);.*\n ~\^~~\n" FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c -fplugin=./diagnostic_plugin_test_tree_expression_range.so (test for excess errors) a lot more errors happen with COLUMNS=20. Tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. 2021-05-28 Bernd Edlinger * gcc.dg/plugin/diagnostic_plugin_show_trees.c (plugin_init): Fix caret_max_with. * gcc.dg/plugin/diagnostic_plugin_test_inlining.c (plugin_init): Likewise. * gcc.dg/plugin/diagnostic_plugin_test_paths.c (plugin_init): Likewise. * gcc.dg/plugin/diagnostic_plugin_test_string_literals.c (plugin_init): Likewise. * gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c (plugin_init): Likewise. So while you've got a patch here, you haven't indicated what the problem actually was. I'm guessing caret_max_width was uninitialized and thus we got random-ish values. Presumably those randomish-values are what caused tests to occasionally appear to truncate their output and fail? If that's the case, this is fine. If it's something deeper, please provide a bit of background to help in evaluating the patch. Yes, the problem is just the function get_terminal_width in diagnostic.c, can somehow learn the X-terminal's window dimensions where the make check is started: int get_terminal_width (void) { const char * s = getenv ("COLUMNS"); if (s != NULL) { int n = atoi (s); if (n > 0) return n; } #ifdef TIOCGWINSZ struct winsize w; w.ws_col = 0; if (ioctl (0, TIOCGWINSZ, ) == 0 && w.ws_col > 0) return w.ws_col; #endif return INT_MAX; } and this is used to initialize context->caret_max_width in diagnostic_set_caret_max_width, and possibly also other places. This causes a small variation in the output that is trips the test cases. It is just an extra newline in some cases, that I have not debugged why exactly this happens, but I assume this is intentional to make the diagnostics spanning multiple lines better readable to a human. Thanks. So for the testsuite we certainly don't want to be doing that :-) OK for the trunk, thanks for chasing it down -- I've been seeing these failures, but haven't had the time to chase down a root cause. I'd like to backport it to GCC 11 branch to avoid random failures on GCC 11 branch: https://gcc.gnu.org/pipermail/gcc-regression/2021-August/075244.html Sure. jeff
Re: [PATCH] declare get_range_query attribute returns_nonnull
On 8/25/2021 5:01 PM, Martin Sebor via Gcc-patches wrote: Andrew, based on your remarks in our discussion Re: enable ranger and caching in pass_waccess, I've added some comments to struct function and get_range_query() and declared the latter with attribute returns_nonnull to make it explicit both to readers and to GCC that the x_range_query member is never null. Tested on x86_64-linux. Martin gcc-get_range_query.diff gcc/ChangeLog: * function.h (function): Add comments. (get_range_query): Same. Add attribute returns nonnull. OK jeff
[PATCH] declare get_range_query attribute returns_nonnull
Andrew, based on your remarks in our discussion Re: enable ranger and caching in pass_waccess, I've added some comments to struct function and get_range_query() and declared the latter with attribute returns_nonnull to make it explicit both to readers and to GCC that the x_range_query member is never null. Tested on x86_64-linux. Martin gcc/ChangeLog: * function.h (function): Add comments. (get_range_query): Same. Add attribute returns nonnull. diff --git a/gcc/function.h b/gcc/function.h index 0db51775e7c..36003e7576a 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -312,7 +312,8 @@ struct GTY(()) function { /* Range query mechanism for functions. The default is to pick up global ranges. If a pass wants on-demand ranges OTOH, it must - call enable/disable_ranger(). */ + call enable/disable_ranger(). The pointer is never null. It + should be queried by calling get_range_query(). */ range_query * GTY ((skip)) x_range_query; /* Last statement uid. */ @@ -719,10 +720,10 @@ extern const char *current_function_name (void); extern void used_types_insert (tree); /* Returns the currently active range access class. When there is no active - range class, global ranges are used. */ + range class, global ranges are used. Never returns null. */ -inline range_query * -get_range_query (struct function *fun) +ATTRIBUTE_RETURNS_NONNULL inline range_query * +get_range_query (const struct function *fun) { return fun->x_range_query; }
Re: [PATCH] improve note location and refactor warn_uninit
On 8/25/2021 2:03 PM, Martin Sebor via Gcc-patches wrote: Richard, some time ago you mentioned you'd had trouble getting -Wuninitialized to print the note pointing to the uninitialized variable. I said I'd look into it, and so I did. The attached patch simplifies the warn_uninit() function to get rid of some redundant cruft: besides a few pointless null tests and a duplicate argument it also does away with ancient code that's responsible for the problem you saw. It turns out that tests for the problem are already in the test suite but have been xfailed since the day they were added, so the patch simply removes the xfails. I'll go ahead and commit this cleanup as obvious later this week unless you have suggestions for further changes. Tested on x86_64-linux. I'm going to object to installing this "as obvious". I don't see how anyone could come to the conclusion this is obvious. Please wait for review. jeff
Re: [PATCH 13/34] rs6000: Add Cell builtins
On Thu, Jul 29, 2021 at 08:31:00AM -0500, Bill Schmidt wrote: > * config/rs6000/rs6000-builtin-new.def: Add cell stanza. This one is fine, too. Thanks! Segher
Re: [PATCH 12/34] rs6000: Add miscellaneous builtins
On Thu, Jul 29, 2021 at 08:30:59AM -0500, Bill Schmidt wrote: > * config/rs6000/rs6000-builtin-new.def: Add ieee128-hw, dfp, > crypto, and htm stanzas. Okay for trunk. Thanks! Segher
Re: [PATCH 11/34] rs6000: Add MMA builtins
Hi! On Thu, Jul 29, 2021 at 08:30:58AM -0500, Bill Schmidt wrote: > * config/rs6000/rs6000-builtin-new.def: Add mma stanza. Okay for trunk. Thanks! Segher
Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
On Wed, Aug 25, 2021 at 05:29:22PM -0500, Segher Boessenkool wrote: > On Wed, Aug 25, 2021 at 02:22:06PM -0400, Michael Meissner wrote: > > On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote: > > > Out of interest, did you notice any scheduling differences with this? > > > > I don't use the built-ins so I wouldn't notice a difference. I noticed > > this as > > part of the next patch to add support for XXSPLTIDP (and ultimately > > XXSPLTIW in > > a future patch). The XXSPLTIDP instruction allows loading up many SFmode, > > DFmode, and V2DFmode constants. The XXSPLTIW instruction allows loading up > > certain V16QImode, V8HImode, V4SImode, and V4SFmode constants. > > Yeah, you might notice scheduling differences when "normal" code starts > using this. Builtins are meh :-) Yep, that's my thought. It just noticed the vecsimple stuff as I was adding the xxsplti{w,dp,32dx} support. > > However, I suspect if you aren't running spec on an otherwise > > idle machine, things will change where XXSPLTIDP will be more of a win by > > eliminating the loads. > > It has more bandwidth and it is less contested, yup. It does have the > usual "prefixed" gotchas though. Note, it replaces one prefixed instruction (plfd) with another (xxspltidp). > > While XXSPLTIDP by itself is positive, unfortunately, there is a regression > > in > > cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to > > track down. > > > > If I add both instructions, several of the benchmarks improve (including > > xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression > > and > > fotonik3d_r also has a new 3% regression. > > > > Given that many more programs use floating point constants than vector > > constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push > > the XXSPLTIDP now, and try to figure out the differences before submitting > > the > > XXSPLTIW patch. > > It would be interesting to figure out a pattern behind the regressions :-) Definately. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
On Wed, Aug 25, 2021 at 02:22:06PM -0400, Michael Meissner wrote: > On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote: > > Out of interest, did you notice any scheduling differences with this? > > I don't use the built-ins so I wouldn't notice a difference. I noticed this > as > part of the next patch to add support for XXSPLTIDP (and ultimately XXSPLTIW > in > a future patch). The XXSPLTIDP instruction allows loading up many SFmode, > DFmode, and V2DFmode constants. The XXSPLTIW instruction allows loading up > certain V16QImode, V8HImode, V4SImode, and V4SFmode constants. Yeah, you might notice scheduling differences when "normal" code starts using this. Builtins are meh :-) > However, I suspect if you aren't running spec on an otherwise > idle machine, things will change where XXSPLTIDP will be more of a win by > eliminating the loads. It has more bandwidth and it is less contested, yup. It does have the usual "prefixed" gotchas though. > While XXSPLTIDP by itself is positive, unfortunately, there is a regression in > cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to > track down. > > If I add both instructions, several of the benchmarks improve (including > xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression and > fotonik3d_r also has a new 3% regression. > > Given that many more programs use floating point constants than vector > constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push > the XXSPLTIDP now, and try to figure out the differences before submitting the > XXSPLTIW patch. It would be interesting to figure out a pattern behind the regressions :-) Segher
[PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.
>From 327273dfeec5c000f3c33ca7b88ee0097fd33586 Mon Sep 17 00:00:00 2001 From: Michael Meissner Date: Wed, 25 Aug 2021 00:31:35 -0400 Subject: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10. I built a compiler on a little endian power8 system where the default long double was IEEE 128-bit instead of IBM 128-bit. I discovered that on power8, we would generate a lxvd2x and xxpermdi to deal with the endianess instead of the Altivec lxv. In addition, I noticed the constant that was being loaded (1.0q) could be loaded by the lxvkq instruction. I rewrote the test to handle all forms of vector load and store that can be generated. And I changed the constant to be one that lxvkq does not support. I did bootstrap tests on the following systems, and the the test ran in all environments (each of the systems were configured for the cpu mentioned): 1) Little endian power9 with IBM 128-bit long double 2) Little endian power9 with IEEE 128-bit long double 3) Little endian power8 with IBM 128-bit long double 4) Little endian power8 with IEEE 128-bit long double 5) Little endian power10 with IBM 128-bit long double 6) Little endian power10 with IEEE 128-bit long double 7) Big endianpower8 with IBM 128-bit long double Can I check this patch into the master branch and later backport it to GCC-11? 2021-08-25 Michael Meissner gcc/testsuite/ * gcc.target/powerpc/float128-call.c: Fix test for IEEE 128-bit long double and power10. --- .../gcc.target/powerpc/float128-call.c| 27 +-- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/float128-call.c b/gcc/testsuite/gcc.target/powerpc/float128-call.c index b64ffc68bfa..d1cf47e4298 100644 --- a/gcc/testsuite/gcc.target/powerpc/float128-call.c +++ b/gcc/testsuite/gcc.target/powerpc/float128-call.c @@ -6,22 +6,33 @@ #error "-mfloat128 is not supported." #endif +/* Pick a constant to load that cannot be generated by the power10 lxvkq + instruction. */ #ifdef __LONG_DOUBLE_IEEE128__ #define TYPE long double -#define ONE 1.0L +#define TEN 10.0L #else #define TYPE __float128 -#define ONE 1.0Q +#define TEN 10.0Q #endif /* Test to make sure vector registers are used for passing IEEE 128-bit floating point values and returning them. Also make sure the 'q' suffix is - handled. */ -TYPE one (void) { return ONE; } + handled for __float128. */ +TYPE one (void) { return TEN; } void store (TYPE a, TYPE *p) { *p = a; } -/* { dg-final { scan-assembler {\mlxvd2x 34\M} {target be} } } */ -/* { dg-final { scan-assembler {\mstxvd2x 34\M} {target be} } } */ -/* { dg-final { scan-assembler {\mlvx 2\M} {target le} } } */ -/* { dg-final { scan-assembler {\mstvx 2\M} {target le} } } */ +/* This regexp captures the different vector load/stores that can be generated: + + lxvd2x -- big endian power7/power8, little endian power8 + lvx -- Altivec + lxv -- power9 + plxv-- power10 + lxvx-- X-form variant. + stxvd2x -- big endian power7/power8, little endian power8 + stvx-- Altivec + lxvx-- power9/power10. */ + +/* { dg-final { scan-assembler {\mlxvd2x 34\M|\mlvx 2\M|\mp?lxvx? 34\M} } } */ +/* { dg-final { scan-assembler {\mstxvd2x 34\M|\mstvx 2\M|\mstxvx 34\M} } } */ -- 2.31.1 -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[committed] libstdc++: Fix conditions for optimizing uninitialized algos [PR102064]
While laying some groundwork for constexpr std::vector, I noticed some bugs in the std::uninitialized_xxx algorithms. The conditions being checked for optimizing trivial cases were not quite right, as shown in the examples in the PR. This consolidates the checks into a single macro. The macro has appropriate definitions for C++98 or for later standards, to avoid a #if everywhere the checks are used. For C++11 and later the check makes a call to a new function doing a static_assert to ensure we don't use assignment in cases where construction would have been invalid. Extracting that check to a separate function will be useful for constexpr std::vector, as that can't use std::uninitialized_copy directly because it isn't constexpr). The consolidated checks mean that some slight variations in static assert message are gone, as there is only one place that does the assert now. That required adjusting some tests. As part of that the redundant 89164_c++17.cc test was merged into 89164.cc which is compiled as C++17 by default now, but can also use other -std options if the C++17-specific error is made conditional with a target selector. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: PR libstdc++/102064 * include/bits/stl_uninitialized.h (_GLIBCXX_USE_ASSIGN_FOR_INIT): Define macro to check conditions for optimizing trivial cases. (__check_constructible): New function to do static assert. (uninitialized_copy, uninitialized_fill, uninitialized_fill_n): Use new macro. * testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc: Adjust dg-error pattern. * testsuite/23_containers/vector/cons/89164.cc: Likewise. Add C++17-specific checks from 89164_c++17.cc. * testsuite/23_containers/vector/cons/89164_c++17.cc: Removed. * testsuite/20_util/specialized_algorithms/uninitialized_copy/102064.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_copy_n/102064.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill/102064.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill_n/102064.cc: New test. Tested powerpc64le-linux. Committed to trunk. commit ead408529d7a69873a7c14dd12fa043cd5862253 Author: Jonathan Wakely Date: Wed Aug 25 21:10:48 2021 libstdc++: Fix conditions for optimizing uninitialized algos [PR102064] While laying some groundwork for constexpr std::vector, I noticed some bugs in the std::uninitialized_xxx algorithms. The conditions being checked for optimizing trivial cases were not quite right, as shown in the examples in the PR. This consolidates the checks into a single macro. The macro has appropriate definitions for C++98 or for later standards, to avoid a #if everywhere the checks are used. For C++11 and later the check makes a call to a new function doing a static_assert to ensure we don't use assignment in cases where construction would have been invalid. Extracting that check to a separate function will be useful for constexpr std::vector, as that can't use std::uninitialized_copy directly because it isn't constexpr). The consolidated checks mean that some slight variations in static assert message are gone, as there is only one place that does the assert now. That required adjusting some tests. As part of that the redundant 89164_c++17.cc test was merged into 89164.cc which is compiled as C++17 by default now, but can also use other -std options if the C++17-specific error is made conditional with a target selector. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: PR libstdc++/102064 * include/bits/stl_uninitialized.h (_GLIBCXX_USE_ASSIGN_FOR_INIT): Define macro to check conditions for optimizing trivial cases. (__check_constructible): New function to do static assert. (uninitialized_copy, uninitialized_fill, uninitialized_fill_n): Use new macro. * testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc: Adjust dg-error pattern. * testsuite/23_containers/vector/cons/89164.cc: Likewise. Add C++17-specific checks from 89164_c++17.cc. * testsuite/23_containers/vector/cons/89164_c++17.cc: Removed. * testsuite/20_util/specialized_algorithms/uninitialized_copy/102064.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_copy_n/102064.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill/102064.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill_n/102064.cc: New test. diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h
[committed] libstdc++: Remove __gnu_cxx::rope::erase(size_type) [PR102048]
This function claims to remove a single character at index p, but it actually removes p+1 characters beginning at p. So r.erase(0) removes the first character, but r.erase(1) removes the second and third, and r.erase(2) removes the second, third and fourth. This is not a useful API. The overload is present in the SGI STL header that we imported, but it isn't documented in the API reference. The erase overloads that are documented are: erase(const iterator& p) erase(const iterator& f, const iterator& l) erase(size_type i, size_type n); Having an erase(size_type p) overload that erases a single character (as the comment says it does) might be useful, but would be inconsistent with std::basic_string::erase(size_type p = 0, size_type n = npos), which erases from p to the end of the string when called with a single argument. Since the function isn't part of the documented API, doesn't do what it claims to do (or anything useful) and "fixing" it would leave it inconsistent with basic_string, I'm just removing that overload. libstdc++-v3/ChangeLog: PR libstdc++/102048 * include/ext/rope (rope::erase(size_type)): Remove broken function. Tested powerpc64le-linux. Committed to trunk. commit 2cd229dec8d6716938de5052479d059d306969da Author: Jonathan Wakely Date: Wed Aug 25 16:42:49 2021 libstdc++: Remove __gnu_cxx::rope::erase(size_type) [PR102048] This function claims to remove a single character at index p, but it actually removes p+1 characters beginning at p. So r.erase(0) removes the first character, but r.erase(1) removes the second and third, and r.erase(2) removes the second, third and fourth. This is not a useful API. The overload is present in the SGI STL header that we imported, but it isn't documented in the API reference. The erase overloads that are documented are: erase(const iterator& p) erase(const iterator& f, const iterator& l) erase(size_type i, size_type n); Having an erase(size_type p) overload that erases a single character (as the comment says it does) might be useful, but would be inconsistent with std::basic_string::erase(size_type p = 0, size_type n = npos), which erases from p to the end of the string when called with a single argument. Since the function isn't part of the documented API, doesn't do what it claims to do (or anything useful) and "fixing" it would leave it inconsistent with basic_string, I'm just removing that overload. libstdc++-v3/ChangeLog: PR libstdc++/102048 * include/ext/rope (rope::erase(size_type)): Remove broken function. diff --git a/libstdc++-v3/include/ext/rope b/libstdc++-v3/include/ext/rope index 9681dbc6225..d4406a942b6 100644 --- a/libstdc++-v3/include/ext/rope +++ b/libstdc++-v3/include/ext/rope @@ -2401,11 +2401,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION this->_M_tree_ptr = __result; } - // Erase, single character - void - erase(size_type __p) - { erase(__p, __p + 1); } - // Insert, iterator variants. iterator insert(const iterator& __p, const rope& __r)
Re: [PING][PATCH] enable ranger and caching in pass_waccess
On 8/25/21 10:14 AM, Andrew MacLeod wrote: On 8/25/21 11:20 AM, Martin Sebor wrote: Ping: Andrew, did I answer your questions? Do you (or anyone else) have any other comments on the latest patch below? https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577865.html I wasn't attempting to block it, its outside my review purview.. I was merely commenting that you should not need a pointer to a range_query at all anymore Are you planning to transition to using the get_range_query() interface instead of keeping a range_query pointer in the pointer_query class? This pass and to a smaller extent the pointer_query class that's used by it and the strlen pass are still a work in progress. I also still need to convert the strlen pass to use Ranger and I expect it will take some changes to pointer_query. So at that point, if going through get_range_query (cfun) everywhere is what you recommend, I'm happy to do it. absolutely. you should not need to even know whether you have a ranger instance running or not. get_range_query will give you whichever is active, and there is ALWAYS something active.. defaulting to the global version. the code in get_range() seems to be the end of the call chain which uses the pointer and should be consolidated to something much simpler if (rvals && stmt) { if (!rvals->range_of_expr (vr, val, stmt)) return NULL_TREE; <...> // ?? This entire function should use get_range_query or get_global_range_query (), // instead of doing something different for RVALS and global ranges. if (!get_global_range_query ()->range_of_expr (vr, val) || vr.undefined_p ()) return NULL_TREE; This entire section can boil down to something like if (!get_range_query (cfun)->range_of_expr (vr, val, stmt)) return NULL; So get_range_query (cfun) will never be null? And no special handling of INTEGER_CST is needed, or checking for SSA_NAME. Nice. Attached is another revision of the same patch with this function simplified. (FYI: I expect the whole function to eventually go away, and to make other similar simplifications, but I'm not there yet.) PS There has been an effort to get rid of global variables from GCC, or, as the first step, to avoid accessing them directly(*). If and when that happens, it seems like each pass will have to store either the ranger instance as a member (directly or indirectly, via a member of a class that stores it) or the function passed to pass::execute() if it wants to access either. [*] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573975.html The patch at the link above wasn't approved but IIUC removing globals from GCC is still a goal. I have no idea what direction that is going, but the net effect will be the same in the end. You'll be calling get_range_query() with either the function pointer, or the pass pointer, or something.. but you should never need to pass around a pointer to either a ranger or range-query. As I said earlier, this will likely be a pass property and accessed thru the pass eventually... but until then, use what we have.. cfun. It'll be trivial to transition down the road to whatever the ultimate solution is. Okay, I'll do that in my future changes. Martin Enable ranger and caching in pass_waccess. gcc/ChangeLog: * gimple-ssa-warn-access.cc (get_size_range): Add argument. (check_access): Pass additional argument. (check_memop_access): Remove template and make a member function. (maybe_check_dealloc_call): Make a pass_waccess member function. (class pass_waccess): Add, rename, and remove members. (pass_waccess::pass_waccess): Adjust to name change. (pass_waccess::~pass_waccess): Same. (check_alloca): Make a member function. (check_alloc_size_call): Same. (check_strcat): Same. (check_strncat): Same. (check_stxcpy): Same. (check_stxncpy): Same. (check_strncmp): Same. (maybe_warn_rdwr_sizes): Rename... (pass_waccess::maybe_check_access_sizes): ...to this. (pass_waccess::check_call): Adjust to name changes. (pass_waccess::maybe_check_dealloc_call): Make a pass_waccess member function. (pass_waccess::execute): Adjust to name changes. * gimple-ssa-warn-access.h (check_memop_access): Remove. * pointer-query.cc (access_ref::phi): Handle null pointer. (access_ref::inform_access): Same. (pointer_query::put_ref): Modify a cached value, not a copy of it. (pointer_query::dump): New function. (compute_objsize_r): Avoid overwriting access_ref::bndrng. Cache more results. * pointer-query.h (pointer_query::dump): Declare. * tree-ssa-strlen.c (get_range): Simplify. Use function query. (dump_strlen_info): Use function query. (printf_strlen_execute): Factor code out into pointer_query::put_ref. gcc/testsuite/ChangeLog: * gcc.dg/Wstringop-overflow-11.c: Remove xfails. * gcc.dg/Wstringop-overflow-12.c: Same. * gcc.dg/Wstringop-overflow-43.c: Add xfails. * gcc.dg/Wstringop-overflow-73.c: New
Re: [PATCH] Fix PR c++/66590: incorrect warning "reaches end of non-void function" for switch
On 8/13/21 2:40 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski So the problem here is there is code in the C++ front-end not to add a break statement (to the IR) if the previous block does not fall through. The problem is the code which does the check to see if the block may fallthrough does not check a CLEANUP_STMT; it assumes it is always fall through. Anyways this adds the code for the case of a CLEANUP_STMT that is only for !CLEANUP_EH_ONLY (the try/finally case). OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. OK, thanks. gcc/cp/ChangeLog: * cp-objcp-common.c (cxx_block_may_fallthru): Handle CLEANUP_STMT for the case which will be try/finally. gcc/testsuite/ChangeLog: * g++.dg/warn/Wreturn-5.C: New test. --- gcc/cp/cp-objcp-common.c | 9 + gcc/testsuite/g++.dg/warn/Wreturn-5.C | 15 +++ 2 files changed, 24 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/Wreturn-5.C diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c index beef0123b04..43888507b85 100644 --- a/gcc/cp/cp-objcp-common.c +++ b/gcc/cp/cp-objcp-common.c @@ -317,6 +317,15 @@ cxx_block_may_fallthru (const_tree stmt) return true; return block_may_fallthru (ELSE_CLAUSE (stmt)); +case CLEANUP_STMT: + /* Just handle the try/finally cases. */ + if (!CLEANUP_EH_ONLY (stmt)) + { + return (block_may_fallthru (CLEANUP_BODY (stmt)) + && block_may_fallthru (CLEANUP_EXPR (stmt))); + } + return true; + default: return c_block_may_fallthru (stmt); } diff --git a/gcc/testsuite/g++.dg/warn/Wreturn-5.C b/gcc/testsuite/g++.dg/warn/Wreturn-5.C new file mode 100644 index 000..543e33e905d --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wreturn-5.C @@ -0,0 +1,15 @@ +// PR C++/66590 +// { dg-do compile } +// { dg-options "-Wall" } + +struct A{ ~A();}; + +int f(int x) +{ +A a; +switch (x) +{ +case 1: { A tmp; return 1; } break; +default: return 0; +} +} // { dg-bogus "control reaches end of non-void function" }
Re: [llvm-dev] [PATCH] Add optional _Float16 support
On Wed, Aug 25, 2021 at 8:36 AM H.J. Lu wrote: > On Mon, Aug 23, 2021 at 10:55 PM John McCall wrote: > > On Thu, Jul 29, 2021 at 9:40 AM H.J. Lu wrote: > >> On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu wrote: > >> > On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers > wrote: > >> > > On Tue, 13 Jul 2021, H.J. Lu wrote: > >> > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei < > pengfei.w...@intel.com> wrote: > >> > > > > > >> > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 > registers. > >> > > > > > >> > > > > Can you please explain the behavior here? Is there difference > between _Float16 and _Complex _Float16 when return? I.e., > >> > > > > 1, In which case will _Float16 values return in both %xmm0 and > %xmm1? > >> > > > > 2, For a single _Float16 value, are both real part and > imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 > respectively? > >> > > > > >> > > > Here is the v2 patch to add the missing _Float16 bits. The PDF > file is at > >> > > > > >> > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > >> > > > >> > > This PDF shows _Complex _Float16 as having a size of 2 bytes > (should be > >> > > 4-byte size, 2-byte alignment). > >> > > > >> > > It also seems to change double from 4-byte to 8-byte alignment, > which is > >> > > wrong. And it's inconsistent about whether it covers the long > double = > >> > > double (Android) case - it shows that case for _Complex long double > but > >> > > not for long double itself. > >> > > >> > Here is the v3 patch with the fixes. I also updated the PDF file. > >> > >> Here is the final patch I checked in. _Complex _Float16 is changed to > return > >> in XMM0 register. The new PDF file is at > >> > >> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > > > > > > This should be explicit that the real part is returned in bits 0..15 and > the imaginary part is returned in bits 16..31, or however we conventionally > designate subcomponents of a vector. > > How about this? > > diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex > index 860ff66..8f527c1 100644 > --- a/low-level-sys-info.tex > +++ b/low-level-sys-info.tex > @@ -457,6 +457,9 @@ and \texttt{unions}) are always returned in memory. > & \texttt{__float128} & memory \\ > \hline > & \texttt{_Complex _Float16} & \reg{xmm0} \\ > +& & The real part is returned in bits 0..15. The imaginary part is > +returned \\ > +& & in bits 16..31.\\ > \cline{2-3} > Complex & \texttt{_Complex float} & \EDX:\EAX \\ > floating- & & The real part is returned in \EAX. The imaginary part is > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/89eb3e52c7e5eadd58f7597508e13f34/intel386-psABI-2021-08-25.pdf Looks good to me, thanks. John.
Re: [PATCH] c++: Fix up value initialization of structs with zero width bitfields [PR102019]
On 8/23/21 4:16 PM, Jakub Jelinek wrote: Hi! The removal of remove_zero_width_bit_fields, in addition to triggering some ABI issues that need solving anyway (ABI incompatibility between C and C++) also resulted in UB inside of gcc, we now call build_zero_init which calls build_int_cst on an integral type with TYPE_PRECISION of 0. Fixed by ignoring the zero width bitfields. I understand build_value_init_noctor wants to initialize to 0 even unnamed bitfields (of non-zero width), at least until we have some CONSTRUCTOR flag that says that even all the padding bits should be cleared. Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. 2021-08-23 Jakub Jelinek PR c++/102019 * init.c (build_value_init_noctor): Ignore unnamed zero-width bitfields. --- gcc/cp/init.c.jj2021-07-29 13:24:42.698012879 +0200 +++ gcc/cp/init.c 2021-08-23 14:43:08.791947395 +0200 @@ -427,6 +427,11 @@ build_value_init_noctor (tree type, tsub == NULL_TREE)) continue; + /* Ignore unnamed zero-width bitfields. */ + if (DECL_UNNAMED_BIT_FIELD (field) + && integer_zerop (DECL_SIZE (field))) + continue; + /* We could skip vfields and fields of types with user-defined constructors, but I think that won't improve performance at all; it should be simpler in general just Jakub
Re: [PATCH] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions
On 8/21/21 12:55 AM, Barrett Adair via Gcc-patches wrote: This patch fixes AST comparison for trailing return types using dependent sizeof/alignof/noexcept expressions as template value arguments. I believe this bug is over a decade old, hailing from GCC 4.6. I found it over 5 years ago and sat on the repro until I had time to fix it myself. Thanks! The new test cases demonstrate redeclarations mistakenly parsed as ambiguous overloads. These fail with gcc, but pass with clang. The last test case concerns a GNU extension for alignof (hence the -Wno-pedantic). After applying this patch, bootstrap succeeds, the new tests pass, and my native "make -k check" outputs look essentially similar to these based on the same commit: https://gcc.gnu.org/pipermail/gcc-testresults/2021-August/715032.html (except I only waited for a few thousand of the libstdc++ tests to pass). I think it makes sense to now split the switch-case blocks here since there isn't much shared logic left, and I have a patch to do that if desired, but I think the patch below is the cleanest for initial review. I studied the history and churn of the comparing_specializations hacks. While I am still not entirely certain this is the correct change to make there, the change seems at least superficially reasonable to me. I haven't been able to break it yet. Perhaps someone more familiar with this area of the compiler can weigh in. See also the existing cp_unevaluated_operand push/pop for these EXPR codes in cp_walk_tree. I am awaiting a response to the copyright assignment request sent to ass...@gnu.org. Until that goes through, you can use DCO sign-off (https://gcc.gnu.org/dco.html). commit fe64ca143dfb9021b4c47abb6382827c13e1f0cd Author: Barrett Adair Date: Fri Aug 20 15:37:36 2021 -0500 Please generate your patch with git format-patch for easier application. It also looks like the patch was corrupted by your email client; a simple fix for that is to attach the patch instead of pasting it in the body of your message. Or 'git help format-patch' has suggestions for ways to make including the patch inline work. I mostly use git send-email myself, or attachments when using send-email would be too complicated. Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index 3c62dd74380..fce2a46cfa8 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -3903,40 +3903,41 @@ cp_tree_equal (tree t1, tree t2) as being equivalent to anything. */ if (VAR_P (o1) && DECL_NAME (o1) == NULL_TREE && !DECL_RTL_SET_P (o1)) /*Nop*/; else if (VAR_P (o2) && DECL_NAME (o2) == NULL_TREE && !DECL_RTL_SET_P (o2)) /*Nop*/; else if (!cp_tree_equal (o1, o2)) return false; return cp_tree_equal (TREE_OPERAND (t1, 1), TREE_OPERAND (t2, 1)); } case PARM_DECL: /* For comparing uses of parameters in late-specified return types with an out-of-class definition of the function, but can also come up for expressions that involve 'this' in a member function template. */ if (comparing_specializations + && !cp_unevaluated_operand This looks dangerous; I would expect it to mean that in template struct A { }; template void f(T t1) { A a; } template void g(T t2) { A a; } the two As are treated as equivalent; that's what the comparing_specializations code is trying to prevent. && DECL_CONTEXT (t1) != DECL_CONTEXT (t2)) /* When comparing hash table entries, only an exact match is good enough; we don't want to replace 'this' with the version from another function. But be more flexible with parameters with identical contexts. */ return false; if (same_type_p (TREE_TYPE (t1), TREE_TYPE (t2))) { if (DECL_ARTIFICIAL (t1) ^ DECL_ARTIFICIAL (t2)) return false; if (CONSTRAINT_VAR_P (t1) ^ CONSTRAINT_VAR_P (t2)) return false; if (DECL_ARTIFICIAL (t1) || (DECL_PARM_LEVEL (t1) == DECL_PARM_LEVEL (t2) && DECL_PARM_INDEX (t1) == DECL_PARM_INDEX (t2))) return true; } return false; @@ -3974,63 +3975,68 @@ cp_tree_equal (tree t1, tree t2) return true; case CONSTRAINT_INFO: return cp_tree_equal (CI_ASSOCIATED_CONSTRAINTS (t1), CI_ASSOCIATED_CONSTRAINTS (t2)); case CHECK_CONSTR: return (CHECK_CONSTR_CONCEPT (t1) == CHECK_CONSTR_CONCEPT (t2) && comp_template_args (CHECK_CONSTR_ARGS (t1), CHECK_CONSTR_ARGS (t2))); case TREE_VEC: /* These are template args. Really we should be getting the caller to do this as it knows it to be true. */ if (!comp_template_args (t1, t2, NULL, NULL, false)) return false; return true; case SIZEOF_EXPR: case ALIGNOF_EXPR: +case NOEXCEPT_EXPR: { tree o1 = TREE_OPERAND (t1, 0); tree o2 = TREE_OPERAND (t2, 0); if (code1 ==
[PATCH] improve note location and refactor warn_uninit
Richard, some time ago you mentioned you'd had trouble getting -Wuninitialized to print the note pointing to the uninitialized variable. I said I'd look into it, and so I did. The attached patch simplifies the warn_uninit() function to get rid of some redundant cruft: besides a few pointless null tests and a duplicate argument it also does away with ancient code that's responsible for the problem you saw. It turns out that tests for the problem are already in the test suite but have been xfailed since the day they were added, so the patch simply removes the xfails. I'll go ahead and commit this cleanup as obvious later this week unless you have suggestions for further changes. Tested on x86_64-linux. Martin Refactor warn_uninit() code, improve note location. Related: PR tree-optimization/17506 - warning about uninitialized variable points to wrong location PR testsuite/37182 - Revision 139286 caused gcc.dg/pr17506.c and gcc.dg/uninit-15.c gcc/ChangeLog: PR tree-optimization/17506 PR testsuite/37182 * tree-ssa-uninit.c (warn_uninit): Refactor and simplify. (warn_uninit_phi_uses): Remove argument from calls to warn_uninit. (warn_uninitialized_vars): Same. Reduce visibility of locals. (warn_uninitialized_phi): Same. gcc/testsuite/ChangeLog: PR tree-optimization/17506 PR testsuite/37182 * gcc.dg/diagnostic-tree-expr-ranges-2.c: Add expected output. * gcc.dg/uninit-15-O0.c: Remove xfail. * gcc.dg/uninit-15.c: Same. diff --git a/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges-2.c b/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges-2.c index 302e233a04a..1cbcad5ac7d 100644 --- a/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges-2.c +++ b/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges-2.c @@ -3,21 +3,25 @@ int test_uninit_1 (void) { - int result; - return result; /* { dg-warning "uninitialized" } */ -/* { dg-begin-multiline-output "" } - return result; - ^~ + int result_1; /* { dg-message "declared here" } */ + return result_1; /* { dg-warning "uninitialized" } */ + /* { dg-begin-multiline-output "" } + return result_1; + ^~~~ + int result_1; + ^~~~ { dg-end-multiline-output "" } */ } int test_uninit_2 (void) { - int result; - result += 3; /* { dg-warning "uninitialized" } */ -/* { dg-begin-multiline-output "" } - result += 3; - ~~~^~~~ + int result_2; /* { dg-message "declared here" } */ + result_2 += 3;/* { dg-warning "uninitialized" } */ + /* { dg-begin-multiline-output "" } + result_2 += 3; + ~^~~~ + int result_2; + ^~~~ { dg-end-multiline-output "" } */ - return result; + return result_2; } diff --git a/gcc/testsuite/gcc.dg/uninit-15-O0.c b/gcc/testsuite/gcc.dg/uninit-15-O0.c index 36d96348617..1ee1388 100644 --- a/gcc/testsuite/gcc.dg/uninit-15-O0.c +++ b/gcc/testsuite/gcc.dg/uninit-15-O0.c @@ -14,7 +14,7 @@ void baz(); void bar() { -int j; /* { dg-message "was declared here" {} { xfail *-*-* } } */ +int j; /* { dg-message "declared here" } */ for (; foo(j); ++j) /* { dg-warning "is used uninitialized" } */ baz(); } diff --git a/gcc/testsuite/gcc.dg/uninit-15.c b/gcc/testsuite/gcc.dg/uninit-15.c index 85cb116f349..7352662f627 100644 --- a/gcc/testsuite/gcc.dg/uninit-15.c +++ b/gcc/testsuite/gcc.dg/uninit-15.c @@ -20,7 +20,7 @@ void baz (void); void bar (void) { - int j; /* { dg-message "note: 'j' was declared here" "" { xfail *-*-* } } */ + int j;/* { dg-message "note: 'j' was declared here" } */ for (; foo (j); ++j) /* { dg-warning "'j' is used uninitialized" } */ baz (); } diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c index ad2cf48819b..3f9eab74e68 100644 --- a/gcc/tree-ssa-uninit.c +++ b/gcc/tree-ssa-uninit.c @@ -131,20 +131,19 @@ uninit_undefined_value_p (tree t) again for plain uninitialized variables, since optimization may have changed conditionally uninitialized to unconditionally uninitialized. */ -/* Emit a warning for EXPR based on variable VAR at the point in the - program T, an SSA_NAME, is used being uninitialized. The exact - warning text is in MSGID and DATA is the gimple stmt with info about - the location in source code. When DATA is a GIMPLE_PHI, PHIARG_IDX - gives which argument of the phi node to take the location from. WC - is the warning code. */ +/* Emit warning OPT for variable VAR at the point in the program where + the SSA_NAME T is being used uninitialized. The warning text is in + MSGID and STMT is the statement that does the uninitialized read. + PHI_ARG_LOC is the location of the PHI argument if T and VAR are one, + or UNKNOWN_LOCATION otherwise. */ static void -warn_uninit (enum opt_code wc, tree t, tree expr, tree var, - const char *gmsgid, void *data, location_t phiarg_loc) +warn_uninit (opt_code opt, tree t, tree var, const char *gmsgid, + gimple *context, location_t phi_arg_loc =
[PATCH] Generate XXSPLTIDP on power10.
Generate XXSPLTIDP on power10. This patch implements XXSPLTIDP support for SF and DF scalar constants and V2DF vector constants. The XXSPLTIDP instruction is given a 32-bit immediate that is converted to a vector of two DFmode constants. The immediate is in SFmode format, so only constants that fit as SFmode values can be loaded with XXSPLTIDP. I added a new constraint (eF) to match constants that can be loaded with the XXSPLTIDP instruction. I have added a temporary switch (-mxxspltidp) to control whether or not the XXSPLTIDP instruction is generated. I added 3 new tests to test loading up SF/DF scalar and V2DF vector constants. I have tested this with bootstrap compilers on power10 systems and there was no regression. I have built GCC with these patches on little endian power9 and big endian power8 systems, and there were no regressions. In addition, I have built and run the full Spec 2017 rate suite, comparing with the patches enabled and not enabled. There were roughly 66,000 XXSPLTIDP's generated in the rate build for Spec 2017. On a stand-alone system that is running single threaded, blender_r has a 1.9% increase in performance, and rest of the benchmarks are performance neutral. However, I would expect that in a real world scenario, switching to use XXSPLTIDP will increase performance due to removing all of the loads. Can I check this into the master branch? 2021-08-25 Michael Meissner gcc/ * config/rs6000/constraints.md (eF): New constraint. * config/rs6000/predicates.md (easy_fp_constant): If we can load the scalar constant with XXSPLTIDP, the floating point constant is easy. (xxspltidp_operand): New predicate. (easy_vector_constant): If we can generate XXSPLTIDP, mark the vector constant as easy. * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New declaration. (prefixed_permute_p): Likewise. * config/rs6000/rs6000.c (xxspltidp_constant_p): New function. (output_vec_const_move): Add support for XXSPLTIDP. (prefixed_permute_p): New function. * config/rs6000/rs6000.md (prefixed attribute): Add support for permute prefixed instructions. (movsf_hardfloat): Add XXSPLTIDP support. (mov_hardfloat32, FMOVE64 iterator): Likewise. (mov_hardfloat64, FMOVE64 iterator): Likewise. * config/rs6000/rs6000.opt (-mxxspltidp): New switch. * config/rs6000/vsx.md (vsx_move_64bit): Add XXSPLTIDP support. (vsx_move_32bit): Likewise. (vsx_splat_v2df_xxspltidp): New insn. (XXSPLTIDP): New mode iterator. (xxspltidp__internal): New insn and splits. (xxspltidp__inst): Replace xxspltidp_v2df_inst with an iterated form that also does SFmode, and DFmode. gcc/testsuite/ * gcc.target/powerpc/vec-splat-constant-sf.c: New test. * gcc.target/powerpc/vec-splat-constant-df.c: New test. * gcc.target/powerpc/vec-splat-constant-v2df.c: New test. --- gcc/config/rs6000/constraints.md | 5 + gcc/config/rs6000/predicates.md | 17 +++ gcc/config/rs6000/rs6000-protos.h | 2 + gcc/config/rs6000/rs6000.c| 106 ++ gcc/config/rs6000/rs6000.md | 45 +--- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/vsx.md | 64 ++- .../powerpc/vec-splat-constant-df.c | 60 ++ .../powerpc/vec-splat-constant-sf.c | 60 ++ .../powerpc/vec-splat-constant-v2df.c | 64 +++ 10 files changed, 405 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index c8cff1a3038..ea2e4a267c3 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -208,6 +208,11 @@ (define_constraint "P" (and (match_code "const_int") (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x1"))) +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP +(define_constraint "eF" + "A vector constant that can be loaded with the XXSPLTIDP instruction." + (match_operand 0 "xxspltidp_operand")) + ;; 34-bit signed integer constant (define_constraint "eI" "A signed 34-bit integer constant if prefixed instructions are supported." diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 956e42bc514..134243e404b 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant" if (TARGET_VSX && op == CONST0_RTX (mode)) return 1; + /* If we have the ISA 3.1
Re: [PATCH] avoid printing range table header alone
On 8/25/21 3:15 PM, Martin Sebor wrote: On 8/25/21 11:46 AM, Andrew MacLeod wrote: On 8/25/21 1:25 PM, Martin Sebor wrote: I noticed that the output of -fdump-tree-walloca (and probably all passes that call gimple_ranger::export_global_ranges()) includes the following two lines for all functions, even when there's nothing else of relevance after them: Exported global range table === I was a bit puzzled by it at first, wondering if it was leftover debugging output, until I checked the code and realized it's a header that may be (but isn't always) followed by the contents of the table. To reduce clutter in the dump the attached patch changes the function to print the header only when it is followed by at least one line of output. (Another tweak to reduce even further the amount of output printed by default that might be worth considering is to print the table only when TDF_DETAILS is set.) Martin Except checking x == 1 won't work if ssa_name(1) was not updated. Just set a flag if the header hasn't been printed yet. tried and true. I guess you can protect it behind details if you want. Doh! I've fixed that. While testing I came across another instance of the same pattern so I've made the same change there as well. I also took the liberty to make the output slightly more consistent between the two dumps. Attached is what I'm planning to commit. Please let me know if you have any other suggestions. OK, thats fine. FWIW, I see duplicate output in the EVRP1 dump that looks just like it could be the output under Non-varying global ranges. I wonder if it's the result of the block in ranger_cache::fill_block_cache guarded by if (DEBUG_RANGE_CACHE). Martin The cache debug mechanism hasnt been fully reworked yet... i moved GORI and the ranger to the trace class.. I haven't decided yet what I want to change in the cache. its a bit dated. regardless, if your sources are up to date, DEBUG_RANGE_CACHE is now guarded by: #define DEBUG_RANGE_CACHE (dump_file && (param_evrp_mode & EVRP_MODE_CACHE) \ == EVRP_MODE_CACHE) so you'd only see that output if you specify --param=evrp-mode=cache or debug Andrew
Re: [PATCH] avoid printing range table header alone
On 8/25/21 11:46 AM, Andrew MacLeod wrote: On 8/25/21 1:25 PM, Martin Sebor wrote: I noticed that the output of -fdump-tree-walloca (and probably all passes that call gimple_ranger::export_global_ranges()) includes the following two lines for all functions, even when there's nothing else of relevance after them: Exported global range table === I was a bit puzzled by it at first, wondering if it was leftover debugging output, until I checked the code and realized it's a header that may be (but isn't always) followed by the contents of the table. To reduce clutter in the dump the attached patch changes the function to print the header only when it is followed by at least one line of output. (Another tweak to reduce even further the amount of output printed by default that might be worth considering is to print the table only when TDF_DETAILS is set.) Martin Except checking x == 1 won't work if ssa_name(1) was not updated. Just set a flag if the header hasn't been printed yet. tried and true. I guess you can protect it behind details if you want. Doh! I've fixed that. While testing I came across another instance of the same pattern so I've made the same change there as well. I also took the liberty to make the output slightly more consistent between the two dumps. Attached is what I'm planning to commit. Please let me know if you have any other suggestions. FWIW, I see duplicate output in the EVRP1 dump that looks just like it could be the output under Non-varying global ranges. I wonder if it's the result of the block in ranger_cache::fill_block_cache guarded by if (DEBUG_RANGE_CACHE). Martin Avoid printing range table header alone. gcc/ChangeLog: * gimple-range-cache.cc (ssa_global_cache::dump): Avoid printing range table header alone. * gimple-range.cc (gimple_ranger::export_global_ranges): Same. diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index 4138d0556c6..facf981c15d 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -628,20 +628,32 @@ ssa_global_cache::clear () void ssa_global_cache::dump (FILE *f) { - unsigned x; - int_range_max r; - fprintf (f, "Non-varying global ranges:\n"); - fprintf (f, "=:\n"); - for ( x = 1; x < num_ssa_names; x++) -if (gimple_range_ssa_p (ssa_name (x)) && - get_global_range (r, ssa_name (x)) && !r.varying_p ()) - { - print_generic_expr (f, ssa_name (x), TDF_NONE); - fprintf (f, " : "); - r.dump (f); - fprintf (f, "\n"); - } - fputc ('\n', f); + /* Cleared after the table header has been printed. */ + bool print_header = true; + for (unsigned x = 1; x < num_ssa_names; x++) +{ + int_range_max r; + if (gimple_range_ssa_p (ssa_name (x)) && + get_global_range (r, ssa_name (x)) && !r.varying_p ()) + { + if (print_header) + { + /* Print the header only when there's something else + to print below. */ + fprintf (f, "Non-varying global ranges:\n"); + fprintf (f, "==\n"); + print_header = false; + } + + print_generic_expr (f, ssa_name (x), TDF_NONE); + fprintf (f, " : "); + r.dump (f); + fprintf (f, "\n"); + } +} + + if (!print_header) +fputc ('\n', f); } // -- diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index ef3afeacc90..fb23d567971 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -259,16 +259,11 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name) void gimple_ranger::export_global_ranges () { - unsigned x; - int_range_max r; - if (dump_file) -{ - fprintf (dump_file, "Exported global range table\n"); - fprintf (dump_file, "===\n"); -} - - for ( x = 1; x < num_ssa_names; x++) + /* Cleared after the table header has been printed. */ + bool print_header = true; + for (unsigned x = 1; x < num_ssa_names; x++) { + int_range_max r; tree name = ssa_name (x); if (name && !SSA_NAME_IN_FREE_LIST (name) && gimple_range_ssa_p (name) @@ -276,21 +271,29 @@ gimple_ranger::export_global_ranges () && !r.varying_p()) { bool updated = update_global_range (r, name); + if (!updated || !dump_file || !(dump_flags & TDF_DETAILS)) + continue; - if (updated && dump_file) + if (print_header) { - value_range vr = r; - print_generic_expr (dump_file, name , TDF_SLIM); - fprintf (dump_file, " --> "); - vr.dump (dump_file); + /* Print the header only when there's something else + to print below. */ + fprintf (dump_file, "Exported global range table:\n"); + fprintf (dump_file, "\n"); + print_header = false; + } + + value_range vr = r; + print_generic_expr (dump_file, name , TDF_SLIM); + fprintf (dump_file, " : "); + vr.dump (dump_file); + fprintf (dump_file, "\n"); +
*PING* – Re: [Patch] Fortran: Fix Bind(C) char-len check, add ptr-contiguous check
Early *PING*. (I also should still review several Fortan patches... There are lots of patches waiting for review :-/) On 20.08.21 19:24, Tobias Burnus wrote: The following is about interoperability (BIND(C)) only. * The patch adds a missing check for pointer + contiguous. (Rejected to avoid copy-in issues? Or checking issues?) * And it corrects an issue regarding len > 1 characters. While subroutine foo(x) character(len=2) :: x(*) is valid Fortran code (the argument can be "abce" or ['a','b','c','d'] or ...) – and would work also with bind(C) as the len=2 does not need to be passed as hidden argument as len is a constant. However, it is not valid nonetheless. OK? Comments? Tobias PS: Referenced locations in the standard (F2018): C1554 If proc-language-binding-spec is specified for a procedure, each of its dummy arguments shall be an interoperable procedure (18.3.6) or a variable that is interoperable (18.3.4, 18.3.5), assumed-shape, assumed-rank, assumed-type, of type CHARACTER with assumed length, or that has the ALLOCATABLE or POINTER attribute. 18.3.1: "... If the type is character, the length type parameter is interoperable if and only if its value is one. ..." "18.3.4 Interoperability of scalar variables": "... A named scalar Fortran variable is interoperable ... if it is of type character12its length is not assumed or declared by an expression that is not a constant expression." 18.3.5: Likewise but for arrays. 18.3.6 "... Fortran procedure interface is interoperable with a C function prototype ..." "(5) any dummy argument without the VALUE attribute corresponds to a formal parameter of the prototype that is of a pointer type, and either • the dummy argument is interoperable with an entity of the referenced type ..." (Remark: those are passed as byte stream) "• the dummy argument is a nonallocatable nonpointer variable of type CHARACTER with assumed character length and the formal parameter is a pointer to CFI_cdesc_t, • the dummy argument is allocatable, assumed-shape, assumed-rank, or a pointer without the CONTIGUOUS attribute, and the formal parameter is a pointer to CFI_cdesc_t, or (Remark: those two use an array descriptor, also for explicit-size/assumed-size arrays or for scalars.) • the dummy argument is assumed-type ..." - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.
> On 25 Aug 2021, at 19:22, H.J. Lu wrote: > > On Wed, Aug 25, 2021 at 11:10 AM Iain Sandoe wrote: >> >> >> >>> On 25 Aug 2021, at 18:56, H.J. Lu via Gcc-patches >>> wrote: >>> >>> On Wed, Aug 25, 2021 at 10:51 AM H.J. Lu wrote: On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe wrote: > > Hi, > >> On 20 Aug 2021, at 11:29, Richard Sandiford >> wrote: >> Maybe it would be easier to have the makefile fragments determine something like CODE_MODEL_CFLAGS, which can be "-fPIC", "-mdynamic-no-pic", etc., and use: COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS) >>> >>> OK. I have misgivings about this - the problem is that: >>> >>> -fPIC -fno-PIE != -fno-PIE -fPIC, which is not obvious to many folks - >>> who expect that >>> the “last edition of a flag will be the one in force”. >>> >>> So the PIE-ness and the PIC-ness are decoupled in the configury but >>> they need to be >>> ordered specifically for targets that want PIC code by default (FWIW, I >>> don’t think Darwin >>> is the only default-PIC case here, from discussions on irc). >> >> Yeah, that's what the above was supposed to achieve. In other words, >> if you force non-PIE, you also need to follow that by >> $(CODE_MODEL_CFLAGS), >> which restates whatever the base code model is. >> >> If it's the decoupling you're worried about, then an alternative would >> be to have: >> >> NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)” > > I’d like to ask a couple of questions (of HJ who introduced the no-PIE > logic) before implementing this. > > A. We use no-PIE for cc1* because that is needed to handle the PCH > implementation (which relies on the executables being loaded at the same > addresses each time). > > B. It’s certainly not obvious to me why we need to build code to run on > $build to be no-PIE - I don’t see any such dependencies in the generators > etc. > > - So Question1 - HJ what was the motivation for making the XXX_BUILD_XXX > adopt no-PIE? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 > —— > > Independently of this we seem to be building the objects for $host thus: > > $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc. > > but we build for $build thus: > > $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) > > which means that code model flags in $ALL_CXXFLAGS are overridden for > $build, but active for $host > ^^ this is actually what causes the Darwin build fail - since on Darwin > we cannot build static linked code for user-space processes. > > in any event that’s inconsistent (unless there’s a reason that it should > be different). > > > > below are extracts from gcc/Makefile *on linux* which demonstrates the > different ordering. > > AFAICT, > NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables? >> >> ^^ what was the intention for these? >> > Question 2 : HJ, what was your intention for how a configuration would > request PIC code (for example) for things to run on $build? >>> >>> We need to fix >>> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 >> >> The need for no-PIE for the exectuables that consume PCH is completely clear >> (but a target can choose not to use PCH - and configure >> —disable-libstdcxx-pch). >> >> The PR doesn’t explain: >> >> 1. why it’s being enabled for the generators (and other $build code) which >> do not use PCH (AFAIK) >> >> 2. why the flags ordering is different for $build and $host. >> >> I am completely happy to make the fix Richard suggested - but we seem to be >> adding complexity rather than simplifying things; as noted in the PR there >> are targets that cannot use no-PIE and therefore have to disable PCH anyway. > > Please follow: > > https://gcc.gnu.org/pipermail/gcc-patches/2015-October/432180.html I don’t see answers to the questions above in that thead or the PR - there is nothing so far in this that explains why no-PIE has been forced on for the stuff in gcc/build but OK.. I’ll look at implementing the patch suggested by Richard, thanks Iain
Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.
On Wed, Aug 25, 2021 at 11:10 AM Iain Sandoe wrote: > > > > > On 25 Aug 2021, at 18:56, H.J. Lu via Gcc-patches > > wrote: > > > > On Wed, Aug 25, 2021 at 10:51 AM H.J. Lu wrote: > >> > >> On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe > >> wrote: > >>> > >>> Hi, > >>> > On 20 Aug 2021, at 11:29, Richard Sandiford > wrote: > > >> Maybe it would be easier to have the makefile fragments determine > >> something like CODE_MODEL_CFLAGS, which can be "-fPIC", > >> "-mdynamic-no-pic", > >> etc., and use: > >> > >> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS) > > > > OK. I have misgivings about this - the problem is that: > > > > -fPIC -fno-PIE != -fno-PIE -fPIC, which is not obvious to many folks - > > who expect that > > the “last edition of a flag will be the one in force”. > > > > So the PIE-ness and the PIC-ness are decoupled in the configury but > > they need to be > > ordered specifically for targets that want PIC code by default (FWIW, I > > don’t think Darwin > > is the only default-PIC case here, from discussions on irc). > > Yeah, that's what the above was supposed to achieve. In other words, > if you force non-PIE, you also need to follow that by > $(CODE_MODEL_CFLAGS), > which restates whatever the base code model is. > > If it's the decoupling you're worried about, then an alternative would > be to have: > > NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)” > >>> > >>> I’d like to ask a couple of questions (of HJ who introduced the no-PIE > >>> logic) before implementing this. > >>> > >>> A. We use no-PIE for cc1* because that is needed to handle the PCH > >>> implementation (which relies on the executables being loaded at the same > >>> addresses each time). > >>> > >>> B. It’s certainly not obvious to me why we need to build code to run on > >>> $build to be no-PIE - I don’t see any such dependencies in the generators > >>> etc. > >>> > >>> - So Question1 - HJ what was the motivation for making the XXX_BUILD_XXX > >>> adopt no-PIE? > >> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 > >> > >>> —— > >>> > >>> Independently of this we seem to be building the objects for $host thus: > >>> > >>> $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc. > >>> > >>> but we build for $build thus: > >>> > >>> $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > >>> $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) > >>> > >>> which means that code model flags in $ALL_CXXFLAGS are overridden for > >>> $build, but active for $host > >>> ^^ this is actually what causes the Darwin build fail - since on Darwin > >>> we cannot build static linked code for user-space processes. > >>> > >>> in any event that’s inconsistent (unless there’s a reason that it should > >>> be different). > >>> > >>> > >>> > >>> below are extracts from gcc/Makefile *on linux* which demonstrates the > >>> different ordering. > >>> > >>> AFAICT, > >>> NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables? > > ^^ what was the intention for these? > > >>> Question 2 : HJ, what was your intention for how a configuration would > >>> request PIC code (for example) for things to run on $build? > > > > We need to fix > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 > > The need for no-PIE for the exectuables that consume PCH is completely clear > (but a target can choose not to use PCH - and configure > —disable-libstdcxx-pch). > > The PR doesn’t explain: > > 1. why it’s being enabled for the generators (and other $build code) which do > not use PCH (AFAIK) > > 2. why the flags ordering is different for $build and $host. > > I am completely happy to make the fix Richard suggested - but we seem to be > adding complexity rather than simplifying things; as noted in the PR there > are targets that cannot use no-PIE and therefore have to disable PCH anyway. Please follow: https://gcc.gnu.org/pipermail/gcc-patches/2015-October/432180.html -- H.J.
Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Wed, Aug 25, 2021 at 12:37:14PM -0400, Michael Meissner wrote: > > I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx, > > xxpermx, and xxeval all used the 'vecsimple' type. These instructions are > > permute instructions (3 cycle latency) and should use 'vecperm' instead. > > They are all executed on the PM pipe currently, yup. If this changes > later we'll have to fix it, but that is for then :-) > > > While I was at it, I changed the UNSPEC name for xxspltidp to be > > UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID. > > In the future please do separate things as separate patches. > > > * config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from > > UNSPEC_XXSPLTID. > > * config/rs6000/vsx.md (UNSPEC_XXSPLTID): Rename to... > (UNSPEC_XXSPLTIDP): ... this. > > > (xxspltidp_v2df): Use vecperm type attribute. Use > > UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID. > > Typo ("UU"). > > Okay for trunk with those trivial fixes. Also okay for backport to 11, > it is trivial enough. Thanks! Thanks. > Out of interest, did you notice any scheduling differences with this? I don't use the built-ins so I wouldn't notice a difference. I noticed this as part of the next patch to add support for XXSPLTIDP (and ultimately XXSPLTIW in a future patch). The XXSPLTIDP instruction allows loading up many SFmode, DFmode, and V2DFmode constants. The XXSPLTIW instruction allows loading up certain V16QImode, V8HImode, V4SImode, and V4SFmode constants. I'm trying to iron out the slow-downs, and I wanted the scheduler to know it needed to add insns between the XXSPLTIDP to load the constant and its use if it can. Right now, I'm seeing a slight boost in blender_r with XXSPLTIDP (over doing a load). However, I suspect if you aren't running spec on an otherwise idle machine, things will change where XXSPLTIDP will be more of a win by eliminating the loads. While XXSPLTIDP by itself is positive, unfortunately, there is a regression in cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to track down. If I add both instructions, several of the benchmarks improve (including xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression and fotonik3d_r also has a new 3% regression. Given that many more programs use floating point constants than vector constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push the XXSPLTIDP now, and try to figure out the differences before submitting the XXSPLTIW patch. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.
> On 25 Aug 2021, at 18:56, H.J. Lu via Gcc-patches > wrote: > > On Wed, Aug 25, 2021 at 10:51 AM H.J. Lu wrote: >> >> On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe wrote: >>> >>> Hi, >>> On 20 Aug 2021, at 11:29, Richard Sandiford wrote: >> Maybe it would be easier to have the makefile fragments determine >> something like CODE_MODEL_CFLAGS, which can be "-fPIC", >> "-mdynamic-no-pic", >> etc., and use: >> >> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS) > > OK. I have misgivings about this - the problem is that: > > -fPIC -fno-PIE != -fno-PIE -fPIC, which is not obvious to many folks - > who expect that > the “last edition of a flag will be the one in force”. > > So the PIE-ness and the PIC-ness are decoupled in the configury but they > need to be > ordered specifically for targets that want PIC code by default (FWIW, I > don’t think Darwin > is the only default-PIC case here, from discussions on irc). Yeah, that's what the above was supposed to achieve. In other words, if you force non-PIE, you also need to follow that by $(CODE_MODEL_CFLAGS), which restates whatever the base code model is. If it's the decoupling you're worried about, then an alternative would be to have: NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)” >>> >>> I’d like to ask a couple of questions (of HJ who introduced the no-PIE >>> logic) before implementing this. >>> >>> A. We use no-PIE for cc1* because that is needed to handle the PCH >>> implementation (which relies on the executables being loaded at the same >>> addresses each time). >>> >>> B. It’s certainly not obvious to me why we need to build code to run on >>> $build to be no-PIE - I don’t see any such dependencies in the generators >>> etc. >>> >>> - So Question1 - HJ what was the motivation for making the XXX_BUILD_XXX >>> adopt no-PIE? >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 >> >>> —— >>> >>> Independently of this we seem to be building the objects for $host thus: >>> >>> $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc. >>> >>> but we build for $build thus: >>> >>> $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE >>> $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) >>> >>> which means that code model flags in $ALL_CXXFLAGS are overridden for >>> $build, but active for $host >>> ^^ this is actually what causes the Darwin build fail - since on Darwin we >>> cannot build static linked code for user-space processes. >>> >>> in any event that’s inconsistent (unless there’s a reason that it should be >>> different). >>> >>> >>> >>> below are extracts from gcc/Makefile *on linux* which demonstrates the >>> different ordering. >>> >>> AFAICT, >>> NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables? ^^ what was the intention for these? >>> Question 2 : HJ, what was your intention for how a configuration would >>> request PIC code (for example) for things to run on $build? > > We need to fix > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 The need for no-PIE for the exectuables that consume PCH is completely clear (but a target can choose not to use PCH - and configure —disable-libstdcxx-pch). The PR doesn’t explain: 1. why it’s being enabled for the generators (and other $build code) which do not use PCH (AFAIK) 2. why the flags ordering is different for $build and $host. I am completely happy to make the fix Richard suggested - but we seem to be adding complexity rather than simplifying things; as noted in the PR there are targets that cannot use no-PIE and therefore have to disable PCH anyway. thanks Iain > > first. > > H.J. >>> thanks >>> Iain >>> >>> --- >>> >>> >>> ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \ >>> $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \ >>> >>> -- >>> # Native compiler for the build machine and its switches. >>> CC_FOR_BUILD = $(CC) >>> CXX_FOR_BUILD = $(CXX) >>> NO_PIE_CFLAGS_FOR_BUILD = >>> NO_PIE_FLAG_FOR_BUILD = >>> BUILD_CFLAGS= $(ALL_CFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE >>> BUILD_CXXFLAGS = $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE >>> BUILD_NO_PIE_CFLAGS = $(NO_PIE_CFLAGS) >>> BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS) >>> BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS) >>> >>> # Native compiler that we use. This may be C++ some day. >>> COMPILER_FOR_BUILD = $(CXX_FOR_BUILD) >>> BUILD_COMPILERFLAGS = $(BUILD_CXXFLAGS) >>> >>> # Native linker that we use. >>> LINKER_FOR_BUILD = $(CXX_FOR_BUILD) >>> BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS) >>> >>> # Native linker and preprocessor flags. For x-fragment overrides. >>> BUILD_LDFLAGS=$(LDFLAGS) >>> BUILD_NO_PIE_FLAG = $(NO_PIE_FLAG) >>> BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG) >>> BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \ >>>-I$(srcdir)/../include $(CPPINC)
Re: [PATCH] diagnostics: Fix sporadic test failure
On Sat, May 29, 2021 at 1:03 PM Jeff Law via Gcc-patches wrote: > > > > On 5/29/2021 1:55 PM, Bernd Edlinger wrote: > > > > On 5/29/21 9:31 PM, Jeff Law wrote: > >> > >> On 5/28/2021 6:38 AM, Bernd Edlinger wrote: > >>> Hi, > >>> > >>> it turns out to be reproducible this way: > >>> > >>> COLUMNS=80 make check-gcc-c RUNTESTFLAGS="plugin.exp=diagnostic*" > >>> > >>> Running /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/plugin/plugin.exp ... > >>> FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c > >>>-fplugin=./diagnostic_plugin_test_tree_expression_range.so 1 blank > >>> line(s) in output > >>> FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c > >>>-fplugin=./diagnostic_plugin_test_tree_expression_range.so expected > >>> multiline pattern lines 550-551 not found: " > >>> __builtin_types_compatible_p \(long, int\) \+ f \(i\)\);.*\n > >>> ~\^~~\n" > >>> FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c > >>>-fplugin=./diagnostic_plugin_test_tree_expression_range.so (test for > >>> excess errors) > >>> > >>> a lot more errors happen with COLUMNS=20. > >>> > >>> Tested on x86_64-pc-linux-gnu. > >>> Is it OK for trunk? > >>> > >>> > >>> Thanks > >>> Bernd. > >>> > >>> > >>> 2021-05-28 Bernd Edlinger > >>> > >>> * gcc.dg/plugin/diagnostic_plugin_show_trees.c (plugin_init): Fix > >>> caret_max_with. > >>> * gcc.dg/plugin/diagnostic_plugin_test_inlining.c > >>> (plugin_init): Likewise. > >>> * gcc.dg/plugin/diagnostic_plugin_test_paths.c (plugin_init): > >>> Likewise. > >>> * gcc.dg/plugin/diagnostic_plugin_test_string_literals.c > >>> (plugin_init): Likewise. > >>> * gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c > >>> (plugin_init): Likewise. > >> So while you've got a patch here, you haven't indicated what the problem > >> actually was. I'm guessing caret_max_width was uninitialized and thus we > >> got random-ish values. Presumably those randomish-values are what caused > >> tests to occasionally appear to truncate their output and fail? > >> > >> If that's the case, this is fine. If it's something deeper, please > >> provide a bit of background to help in evaluating the patch. > >> > > Yes, the problem is just the function get_terminal_width in diagnostic.c, > > can > > somehow learn the X-terminal's window dimensions where the make check is > > started: > > > > int > > get_terminal_width (void) > > { > >const char * s = getenv ("COLUMNS"); > >if (s != NULL) { > > int n = atoi (s); > > if (n > 0) > >return n; > >} > > > > #ifdef TIOCGWINSZ > >struct winsize w; > >w.ws_col = 0; > >if (ioctl (0, TIOCGWINSZ, ) == 0 && w.ws_col > 0) > > return w.ws_col; > > #endif > > > >return INT_MAX; > > } > > > > and this is used to initialize context->caret_max_width in > > diagnostic_set_caret_max_width, > > and possibly also other places. This causes a small variation in the output > > that > > is trips the test cases. It is just an extra newline in some cases, that I > > have not > > debugged why exactly this happens, but I assume this is intentional to make > > the > > diagnostics spanning multiple lines better readable to a human. > Thanks. So for the testsuite we certainly don't want to be doing that > :-) OK for the trunk, thanks for chasing it down -- I've been seeing > these failures, but haven't had the time to chase down a root cause. > I'd like to backport it to GCC 11 branch to avoid random failures on GCC 11 branch: https://gcc.gnu.org/pipermail/gcc-regression/2021-August/075244.html Thanks. -- H.J.
Merge stores/loads in modref summaries
Hi, this patch adds logic needed to merge neighbouring accesses in ipa-modref summaries. This helps analyzing array initializers and similar code. It is bit of work, since it breaks the fact that modref tree makes a good lattice for dataflow: the access ranges can be extended indefinitely. For this reason I added counter tracking number of adjustments and a cap to limit them during the dataflow. This triggers in: void recurse (char *p, int n) { *p = 0; if (n) recurse (p+1,n-1); } Where we work now hard enugh to determine: access: Parm 0 param offset:0 offset:0 size:8 max_size:-1 adjusted 8 times which is correct access info saying that param0 can be accessed from byte 0 in 8bit accesses with unknown max_size. --param max-modref-accesses is now hit 8 times instead of 45 before the patch. We hit --param param=modref-max-adjustments once for fft algorithm (where the recursion really seems to walk array) and max-bases once for late modref and 9 times for IPA modref (it works on types rather than alias sets so it is more likely to hit the limit). I would be happy for suggestions how to simplify the merging logic. It is bit convoluted since I need to know if I am going to adjust the range and need to deal with poly_ints and possibly unknown sizes. Incrementally I will add logic on improving behaviour when limits are met instead of just giving up on the analysis. With the patch I get following cc1plus stats: Alias oracle query stats: refs_may_alias_p: 83135089 disambiguations, 101581194 queries ref_maybe_used_by_call_p: 590484 disambiguations, 84157326 queries call_may_clobber_ref_p: 345434 disambiguations, 349295 queries nonoverlapping_component_refs_p: 0 disambiguations, 39520 queries nonoverlapping_refs_since_match_p: 33266 disambiguations, 66411 must overlaps, 100574 queries aliasing_component_refs_p: 66251 disambiguations, 9920037 queries TBAA oracle: 31033174 disambiguations 93485041 queries 14359693 are in alias set 0 11930606 queries asked about the same object 129 queries asked about the same alias set 0 access volatile 34218393 are dependent in the DAG 1943046 are aritificially in conflict with void * Modref stats: modref use: 26293 disambiguations, 705198 queries modref clobber: 1828340 disambiguations, 21213011 queries 4748965 tbaa queries (0.223870 per modref query) 711083 base compares (0.033521 per modref query) PTA query stats: pt_solution_includes: 13119524 disambiguations, 33183481 queries pt_solutions_intersect: 1510541 disambiguations, 15368102 queries this would suggest quite large improvement over my last run https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577962.html (about 12% on overall disambiguation count) but I also updated my setup so part of the increase may be accounted for diferent libraries. The overall size of modref access lists is about halved on cc1plus that looks promising though. It may be that we less often hit the limit on number of querries done in tree-ssa-alias. Bootstrapped/regtested x86_64-linux. I plan to commit the patch after bit more testing. gcc/ChangeLog: * doc/invoke.texi: Document --param modref-max-adjustments * ipa-modref-tree.c (test_insert_search_collapse): Update. (test_merge): Update. * ipa-modref-tree.h (struct modref_access_node): Add adjustments; (modref_access_node::operator==): Fix handling of access ranges. (modref_access_node::contains): Constify parameter; handle also mismatched parm offsets. (modref_access_node::update): New function. (modref_access_node::merge0: New function. (unspecified_modref_access_node): Update constructor. (modref_ref_node::insert_access): Add record_adjustments parameter; handle merging. (modref_ref_node::try_merge_with): New private function. (modref_tree::insert): New record_adjustments parameter. (modref_tree::merge): New record_adjustments parameter. (modref_tree::copy_from): Update. * ipa-modref.c (dump_access): Dump adjustments field. (get_access): Update constructor. (record_access): Update call of insert. (record_access_lto): Update call of insert. (merge_call_side_effects): Add record_adjustments parameter. (get_access_for_fnspec): Update. (process_fnspec): Update. (analyze_call): Update. (analyze_function): Update. (read_modref_records): Update. (ipa_merge_modref_summary_after_inlining): Update. (propagate_unknown_call): Update. (modref_propagate_in_scc): Update. * params.opt: (param-max-modref-adjustments=): New. gcc/testsuite/ChangeLog: * gcc.dg/ipa/modref-1.c: Update testcase. * gcc.dg/tree-ssa/modref-4.c: Update testcase. * gcc.dg/tree-ssa/modref-8.c: New test. diff --git
Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.
On Wed, Aug 25, 2021 at 10:51 AM H.J. Lu wrote: > > On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe wrote: > > > > Hi, > > > > > On 20 Aug 2021, at 11:29, Richard Sandiford > > > wrote: > > > > > >>> Maybe it would be easier to have the makefile fragments determine > > >>> something like CODE_MODEL_CFLAGS, which can be "-fPIC", > > >>> "-mdynamic-no-pic", > > >>> etc., and use: > > >>> > > >>> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS) > > >> > > >> OK. I have misgivings about this - the problem is that: > > >> > > >> -fPIC -fno-PIE != -fno-PIE -fPIC, which is not obvious to many folks - > > >> who expect that > > >> the “last edition of a flag will be the one in force”. > > >> > > >> So the PIE-ness and the PIC-ness are decoupled in the configury but they > > >> need to be > > >> ordered specifically for targets that want PIC code by default (FWIW, I > > >> don’t think Darwin > > >> is the only default-PIC case here, from discussions on irc). > > > > > > Yeah, that's what the above was supposed to achieve. In other words, > > > if you force non-PIE, you also need to follow that by > > > $(CODE_MODEL_CFLAGS), > > > which restates whatever the base code model is. > > > > > > If it's the decoupling you're worried about, then an alternative would > > > be to have: > > > > > > NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)” > > > > I’d like to ask a couple of questions (of HJ who introduced the no-PIE > > logic) before implementing this. > > > > A. We use no-PIE for cc1* because that is needed to handle the PCH > > implementation (which relies on the executables being loaded at the same > > addresses each time). > > > > B. It’s certainly not obvious to me why we need to build code to run on > > $build to be no-PIE - I don’t see any such dependencies in the generators > > etc. > > > > - So Question1 - HJ what was the motivation for making the XXX_BUILD_XXX > > adopt no-PIE? > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 > > > —— > > > > Independently of this we seem to be building the objects for $host thus: > > > > $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc. > > > > but we build for $build thus: > > > > $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > > $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) > > > > which means that code model flags in $ALL_CXXFLAGS are overridden for > > $build, but active for $host > > ^^ this is actually what causes the Darwin build fail - since on Darwin we > > cannot build static linked code for user-space processes. > > > > in any event that’s inconsistent (unless there’s a reason that it should be > > different). > > > > > > > > below are extracts from gcc/Makefile *on linux* which demonstrates the > > different ordering. > > > > AFAICT, > > NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables? > > > > Question 2 : HJ, what was your intention for how a configuration would > > request PIC code (for example) for things to run on $build? We need to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 first. H.J. > > thanks > > Iain > > > > --- > > > > > > ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \ > > $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \ > > > > -- > > # Native compiler for the build machine and its switches. > > CC_FOR_BUILD = $(CC) > > CXX_FOR_BUILD = $(CXX) > > NO_PIE_CFLAGS_FOR_BUILD = > > NO_PIE_FLAG_FOR_BUILD = > > BUILD_CFLAGS= $(ALL_CFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > > BUILD_CXXFLAGS = $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > > BUILD_NO_PIE_CFLAGS = $(NO_PIE_CFLAGS) > > BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS) > > BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS) > > > > # Native compiler that we use. This may be C++ some day. > > COMPILER_FOR_BUILD = $(CXX_FOR_BUILD) > > BUILD_COMPILERFLAGS = $(BUILD_CXXFLAGS) > > > > # Native linker that we use. > > LINKER_FOR_BUILD = $(CXX_FOR_BUILD) > > BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS) > > > > # Native linker and preprocessor flags. For x-fragment overrides. > > BUILD_LDFLAGS=$(LDFLAGS) > > BUILD_NO_PIE_FLAG = $(NO_PIE_FLAG) > > BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG) > > BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \ > > -I$(srcdir)/../include $(CPPINC) $(CPPFLAGS) > > -- > > # This is the variable actually used when we compile. If you change this, > > # you probably want to update BUILD_CFLAGS in configure.ac > > ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \ > > > > build/%.o : # dependencies provided by explicit rule later > > $(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \ > > -o $@ $< > > > > this has > > $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > > $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) > > NO_PIE_CFLAGS_FOR_BUILD is apparently ignored > > > > # Rule for the generator programs: > > $(genprog:%=build/gen%$(build_exeext)): build/gen%$(build_exeext): > > build/gen%.o $(BUILD_LIBDEPS) >
Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.
On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe wrote: > > Hi, > > > On 20 Aug 2021, at 11:29, Richard Sandiford > > wrote: > > > >>> Maybe it would be easier to have the makefile fragments determine > >>> something like CODE_MODEL_CFLAGS, which can be "-fPIC", > >>> "-mdynamic-no-pic", > >>> etc., and use: > >>> > >>> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS) > >> > >> OK. I have misgivings about this - the problem is that: > >> > >> -fPIC -fno-PIE != -fno-PIE -fPIC, which is not obvious to many folks - > >> who expect that > >> the “last edition of a flag will be the one in force”. > >> > >> So the PIE-ness and the PIC-ness are decoupled in the configury but they > >> need to be > >> ordered specifically for targets that want PIC code by default (FWIW, I > >> don’t think Darwin > >> is the only default-PIC case here, from discussions on irc). > > > > Yeah, that's what the above was supposed to achieve. In other words, > > if you force non-PIE, you also need to follow that by $(CODE_MODEL_CFLAGS), > > which restates whatever the base code model is. > > > > If it's the decoupling you're worried about, then an alternative would > > be to have: > > > > NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)” > > I’d like to ask a couple of questions (of HJ who introduced the no-PIE logic) > before implementing this. > > A. We use no-PIE for cc1* because that is needed to handle the PCH > implementation (which relies on the executables being loaded at the same > addresses each time). > > B. It’s certainly not obvious to me why we need to build code to run on > $build to be no-PIE - I don’t see any such dependencies in the generators etc. > > - So Question1 - HJ what was the motivation for making the XXX_BUILD_XXX > adopt no-PIE? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934 > —— > > Independently of this we seem to be building the objects for $host thus: > > $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc. > > but we build for $build thus: > > $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) > > which means that code model flags in $ALL_CXXFLAGS are overridden for $build, > but active for $host > ^^ this is actually what causes the Darwin build fail - since on Darwin we > cannot build static linked code for user-space processes. > > in any event that’s inconsistent (unless there’s a reason that it should be > different). > > > > below are extracts from gcc/Makefile *on linux* which demonstrates the > different ordering. > > AFAICT, > NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables? > > Question 2 : HJ, what was your intention for how a configuration would > request PIC code (for example) for things to run on $build? > > thanks > Iain > > --- > > > ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \ > $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \ > > -- > # Native compiler for the build machine and its switches. > CC_FOR_BUILD = $(CC) > CXX_FOR_BUILD = $(CXX) > NO_PIE_CFLAGS_FOR_BUILD = > NO_PIE_FLAG_FOR_BUILD = > BUILD_CFLAGS= $(ALL_CFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > BUILD_CXXFLAGS = $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > BUILD_NO_PIE_CFLAGS = $(NO_PIE_CFLAGS) > BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS) > BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS) > > # Native compiler that we use. This may be C++ some day. > COMPILER_FOR_BUILD = $(CXX_FOR_BUILD) > BUILD_COMPILERFLAGS = $(BUILD_CXXFLAGS) > > # Native linker that we use. > LINKER_FOR_BUILD = $(CXX_FOR_BUILD) > BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS) > > # Native linker and preprocessor flags. For x-fragment overrides. > BUILD_LDFLAGS=$(LDFLAGS) > BUILD_NO_PIE_FLAG = $(NO_PIE_FLAG) > BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG) > BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \ > -I$(srcdir)/../include $(CPPINC) $(CPPFLAGS) > -- > # This is the variable actually used when we compile. If you change this, > # you probably want to update BUILD_CFLAGS in configure.ac > ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \ > > build/%.o : # dependencies provided by explicit rule later > $(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \ > -o $@ $< > > this has > $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE > $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) > NO_PIE_CFLAGS_FOR_BUILD is apparently ignored > > # Rule for the generator programs: > $(genprog:%=build/gen%$(build_exeext)): build/gen%$(build_exeext): > build/gen%.o $(BUILD_LIBDEPS) > +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \ > $(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS) > > -- > build/genversion$(build_exeext): build/genversion.o > +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) \ > build/genversion.o -o $@ > = > > # The name of the compiler to use. > COMPILER = $(CXX) > COMPILER_FLAGS =
Re: [PATCH] avoid printing range table header alone
On 8/25/21 1:25 PM, Martin Sebor wrote: I noticed that the output of -fdump-tree-walloca (and probably all passes that call gimple_ranger::export_global_ranges()) includes the following two lines for all functions, even when there's nothing else of relevance after them: Exported global range table === I was a bit puzzled by it at first, wondering if it was leftover debugging output, until I checked the code and realized it's a header that may be (but isn't always) followed by the contents of the table. To reduce clutter in the dump the attached patch changes the function to print the header only when it is followed by at least one line of output. (Another tweak to reduce even further the amount of output printed by default that might be worth considering is to print the table only when TDF_DETAILS is set.) Martin Except checking x == 1 won't work if ssa_name(1) was not updated. Just set a flag if the header hasn't been printed yet. tried and true. I guess you can protect it behind details if you want. Andrew
Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
Hi Mike, On Wed, Aug 25, 2021 at 12:37:14PM -0400, Michael Meissner wrote: > I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx, > xxpermx, and xxeval all used the 'vecsimple' type. These instructions are > permute instructions (3 cycle latency) and should use 'vecperm' instead. They are all executed on the PM pipe currently, yup. If this changes later we'll have to fix it, but that is for then :-) > While I was at it, I changed the UNSPEC name for xxspltidp to be > UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID. In the future please do separate things as separate patches. > * config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from > UNSPEC_XXSPLTID. * config/rs6000/vsx.md (UNSPEC_XXSPLTID): Rename to... (UNSPEC_XXSPLTIDP): ... this. > (xxspltidp_v2df): Use vecperm type attribute. Use > UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID. Typo ("UU"). Okay for trunk with those trivial fixes. Also okay for backport to 11, it is trivial enough. Thanks! Out of interest, did you notice any scheduling differences with this? Segher
Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.
Hi, > On 20 Aug 2021, at 11:29, Richard Sandiford wrote: > >>> Maybe it would be easier to have the makefile fragments determine >>> something like CODE_MODEL_CFLAGS, which can be "-fPIC", "-mdynamic-no-pic", >>> etc., and use: >>> >>> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS) >> >> OK. I have misgivings about this - the problem is that: >> >> -fPIC -fno-PIE != -fno-PIE -fPIC, which is not obvious to many folks - who >> expect that >> the “last edition of a flag will be the one in force”. >> >> So the PIE-ness and the PIC-ness are decoupled in the configury but they >> need to be >> ordered specifically for targets that want PIC code by default (FWIW, I >> don’t think Darwin >> is the only default-PIC case here, from discussions on irc). > > Yeah, that's what the above was supposed to achieve. In other words, > if you force non-PIE, you also need to follow that by $(CODE_MODEL_CFLAGS), > which restates whatever the base code model is. > > If it's the decoupling you're worried about, then an alternative would > be to have: > > NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)” I’d like to ask a couple of questions (of HJ who introduced the no-PIE logic) before implementing this. A. We use no-PIE for cc1* because that is needed to handle the PCH implementation (which relies on the executables being loaded at the same addresses each time). B. It’s certainly not obvious to me why we need to build code to run on $build to be no-PIE - I don’t see any such dependencies in the generators etc. - So Question1 - HJ what was the motivation for making the XXX_BUILD_XXX adopt no-PIE? —— Independently of this we seem to be building the objects for $host thus: $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc. but we build for $build thus: $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) which means that code model flags in $ALL_CXXFLAGS are overridden for $build, but active for $host ^^ this is actually what causes the Darwin build fail - since on Darwin we cannot build static linked code for user-space processes. in any event that’s inconsistent (unless there’s a reason that it should be different). below are extracts from gcc/Makefile *on linux* which demonstrates the different ordering. AFAICT, NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables? Question 2 : HJ, what was your intention for how a configuration would request PIC code (for example) for things to run on $build? thanks Iain --- ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \ $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \ -- # Native compiler for the build machine and its switches. CC_FOR_BUILD = $(CC) CXX_FOR_BUILD = $(CXX) NO_PIE_CFLAGS_FOR_BUILD = NO_PIE_FLAG_FOR_BUILD = BUILD_CFLAGS= $(ALL_CFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE BUILD_CXXFLAGS = $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE BUILD_NO_PIE_CFLAGS = $(NO_PIE_CFLAGS) BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS) BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS) # Native compiler that we use. This may be C++ some day. COMPILER_FOR_BUILD = $(CXX_FOR_BUILD) BUILD_COMPILERFLAGS = $(BUILD_CXXFLAGS) # Native linker that we use. LINKER_FOR_BUILD = $(CXX_FOR_BUILD) BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS) # Native linker and preprocessor flags. For x-fragment overrides. BUILD_LDFLAGS=$(LDFLAGS) BUILD_NO_PIE_FLAG = $(NO_PIE_FLAG) BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG) BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \ -I$(srcdir)/../include $(CPPINC) $(CPPFLAGS) -- # This is the variable actually used when we compile. If you change this, # you probably want to update BUILD_CFLAGS in configure.ac ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \ build/%.o : # dependencies provided by explicit rule later $(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \ -o $@ $< this has $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS) NO_PIE_CFLAGS_FOR_BUILD is apparently ignored # Rule for the generator programs: $(genprog:%=build/gen%$(build_exeext)): build/gen%$(build_exeext): build/gen%.o $(BUILD_LIBDEPS) +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \ $(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS) -- build/genversion$(build_exeext): build/genversion.o +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) \ build/genversion.o -o $@ = # The name of the compiler to use. COMPILER = $(CXX) COMPILER_FLAGS = $(CXXFLAGS) # If HOST_LIBS is set, then the user is controlling the libraries to -- CET_HOST_FLAGS = -fcf-protection COMPILER += $(CET_HOST_FLAGS) -- # We don't want to compile the compilers with -fPIE, it make PCH fail. COMPILER += $(NO_PIE_CFLAGS) -- # A list of all the language-specific executables. COMPILERS = gnat1$(exeext) cc1$(exeext)
[PATCH] avoid printing range table header alone
I noticed that the output of -fdump-tree-walloca (and probably all passes that call gimple_ranger::export_global_ranges()) includes the following two lines for all functions, even when there's nothing else of relevance after them: Exported global range table === I was a bit puzzled by it at first, wondering if it was leftover debugging output, until I checked the code and realized it's a header that may be (but isn't always) followed by the contents of the table. To reduce clutter in the dump the attached patch changes the function to print the header only when it is followed by at least one line of output. (Another tweak to reduce even further the amount of output printed by default that might be worth considering is to print the table only when TDF_DETAILS is set.) Martin Avoid printing range table header alone. gcc/ChangeLog: * gimple-range.cc (gimple_ranger::export_global_ranges): diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index ef3afeacc90..521d7cee5f0 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -259,16 +259,9 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name) void gimple_ranger::export_global_ranges () { - unsigned x; - int_range_max r; - if (dump_file) -{ - fprintf (dump_file, "Exported global range table\n"); - fprintf (dump_file, "===\n"); -} - - for ( x = 1; x < num_ssa_names; x++) + for (unsigned x = 1; x < num_ssa_names; x++) { + int_range_max r; tree name = ssa_name (x); if (name && !SSA_NAME_IN_FREE_LIST (name) && gimple_range_ssa_p (name) @@ -276,21 +269,28 @@ gimple_ranger::export_global_ranges () && !r.varying_p()) { bool updated = update_global_range (r, name); + if (!updated || !dump_file) + continue; - if (updated && dump_file) + if (x == 1) { - value_range vr = r; - print_generic_expr (dump_file, name , TDF_SLIM); - fprintf (dump_file, " --> "); - vr.dump (dump_file); + /* Print the header only when there's something else + to print below. */ + fprintf (dump_file, "Exported global range table\n"); + fprintf (dump_file, "===\n"); + } + + value_range vr = r; + print_generic_expr (dump_file, name , TDF_SLIM); + fprintf (dump_file, " --> "); + vr.dump (dump_file); + fprintf (dump_file, "\n"); + int_range_max same = vr; + if (same != r) + { + fprintf (dump_file, " irange : "); + r.dump (dump_file); fprintf (dump_file, "\n"); - int_range_max same = vr; - if (same != r) - { - fprintf (dump_file, " irange : "); - r.dump (dump_file); - fprintf (dump_file, "\n"); - } } } }
Re: [PATCH] analyzer: Impose recursion limit on indirect calls.
On Wed, 2021-08-25 at 21:25 +0530, Ankur Saini wrote: > > > > On 25-Aug-2021, at 8:35 PM, David Malcolm > > wrote: > > > > On Wed, 2021-08-25 at 20:31 +0530, Ankur Saini wrote: > > > > > > > > > > On 25-Aug-2021, at 7:24 PM, Martin Liška > > > > wrote: > > > > > > > > On 8/25/21 15:22, David Malcolm via Gcc-patches wrote: > > > > > On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote: > > > > > > This should also fix the failing regression found in PR > > > > > > analyzer/101980. > > > > > > > > > > > > - The patch is in sync with current master > > > > > > - successfully bootstrapped and tested on x86_64-linux-gnu. > > > > > > > > > > > The patch is OK for trunk. > > > > > Thanks for fixing this > > > > > Dave > > > > > > > > Hello. > > > > > > > > Quite accidentally, but I noticed the patch violates GNU coding > > > > style: > > > > > > > > $ git show 43a5d46feabd93ba78983919234f05f5fc9a0982 | > > > > ./contrib/check_GNU_style.py - > > > > > > > > === ERROR type #1: blocks of 8 spaces should be replaced with > > > > tabs > > > > (8 error(s)) === > > > > > > > > gcc/analyzer/engine.cc:3064:0: that exceed it further. > > > > > > > > gcc/analyzer/engine.cc:3065:0: This is something of a > > > > blunt > > > > workaround, but it only > > > > > > > > gcc/analyzer/engine.cc:3066:0: applies to recursion > > > > (and > > > > mutual recursion), not to > > > > > > > > gcc/analyzer/engine.cc:3067:0: general call stacks. */ > > > > > > > > gcc/analyzer/engine.cc:3069:0: > > > > > param_analyzer_max_recursion_depth) > > > > > > > > gcc/analyzer/engine.cc:3071:0:if (logger) > > > > > > > > gcc/analyzer/engine.cc:3072:0: logger->log ("rejecting > > > > call edge: recursion limit exceeded"); > > > > > > > > gcc/analyzer/engine.cc:3073:0:return false; > > > > > > > > > > Looks like my editor again converted all the tabs to spaces. > > > > > > btw, I can also see a lot of other places where 8 spaces are not > > > begin converted to tabs, should I also change that accordingly or > > > leave them the way it is and just update this patch ? > > > > It's usually best to split out bugfixes from formatting/whitespace > > changes, so maybe do it as two patches: > > > > (1) an updated version of this patch to fix the recursion issue, > > using > > the correct whitespace for the lines that are touched > > unfortunately, I already checked in and pushed the changes to the > master before this was pointed out. Fair enough. > But I can add them to the whitespace issue patch. Sounds good. Dave > > > > > (2) a patch that fixes whitespaces issues, but doesn't change the > > behavior of the code > > > > > > Dave >
[Patch] Fix cygming-crtend.c build warning due to -Wprio-ctor-dtor
Attached patches OK? cygming-crtend.c: fix build warnings libgcc/Changelog: * config/i386/cygming-crtend.c: Fix register_frame_ctor and register_frame_dtor warnings. extend.texi: add note about reserved ctor/dtor priorities gcc/Changelog: * doc/extend.texi: Add note about reserved priorities to the constructor attribute. From e9758ef8c03e617eafe13ca6512fd7cd0256abc7 Mon Sep 17 00:00:00 2001 From: Jonathan Yong <10wa...@gmail.com> Date: Wed, 25 Aug 2021 16:33:36 + Subject: [PATCH 1/2] cygming-crtend.c: fix build warnings libgcc/Changelog: * config/i386/cygming-crtend.c: Fix register_frame_ctor and register_frame_dtor warnings. Signed-off-by: Jonathan Yong <10wa...@gmail.com> --- libgcc/config/i386/cygming-crtend.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libgcc/config/i386/cygming-crtend.c b/libgcc/config/i386/cygming-crtend.c index c7ba109a04a..4ab63427ee2 100644 --- a/libgcc/config/i386/cygming-crtend.c +++ b/libgcc/config/i386/cygming-crtend.c @@ -56,7 +56,10 @@ static EH_FRAME_SECTION_CONST int __FRAME_END__[] extern void __gcc_register_frame (void); extern void __gcc_deregister_frame (void); +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wprio-ctor-dtor" static void register_frame_ctor (void) __attribute__ ((constructor (0))); +#pragma GCC diagnostic pop static void register_frame_ctor (void) @@ -65,7 +68,10 @@ register_frame_ctor (void) } #if !DEFAULT_USE_CXA_ATEXIT +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wprio-ctor-dtor" static void deregister_frame_dtor (void) __attribute__ ((destructor (0))); +#pragma GCC diagnostic pop static void deregister_frame_dtor (void) -- 2.33.0 From 12b18651af252fb111e1a7375fe09fbc69987922 Mon Sep 17 00:00:00 2001 From: Jonathan Yong <10wa...@gmail.com> Date: Wed, 25 Aug 2021 16:36:14 + Subject: [PATCH 2/2] extend.texi: add note about reserved ctor/dtor priorities gcc/Changelog: * doc/extend.texi: Add note about reserved priorities to the constructor attribute. Signed-off-by: Jonathan Yong <10wa...@gmail.com> --- gcc/doc/extend.texi | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 49df8e6dc38..251a10302b4 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -2787,16 +2787,16 @@ On some targets the attributes also accept an integer argument to specify a priority to control the order in which constructor and destructor functions are run. A constructor with a smaller priority number runs before a constructor with a larger -priority number; the opposite relationship holds for destructors. So, -if you have a constructor that allocates a resource and a destructor -that deallocates the same resource, both functions typically have the -same priority. The priorities for constructor and destructor -functions are the same as those specified for namespace-scope C++ -objects (@pxref{C++ Attributes}). However, at present, the order in which -constructors for C++ objects with static storage duration and functions -decorated with attribute @code{constructor} are invoked is unspecified. -In mixed declarations, attribute @code{init_priority} can be used to -impose a specific ordering. +priority number; the opposite relationship holds for destructors. Note +that priorities 0-100 are reserved. So, if you have a constructor that +allocates a resource and a destructor that deallocates the same +resource, both functions typically have the same priority. The +priorities for constructor and destructor functions are the same as +those specified for namespace-scope C++ objects (@pxref{C++ Attributes}). +However, at present, the order in which constructors for C++ objects +with static storage duration and functions decorated with attribute +@code{constructor} are invoked is unspecified. In mixed declarations, +attribute @code{init_priority} can be used to impose a specific ordering. Using the argument forms of the @code{constructor} and @code{destructor} attributes on targets where the feature is not supported is rejected with -- 2.33.0 OpenPGP_0x713B5FE29C145D45.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
Make xxsplti*, xpermx, xxeval be vecperm type. I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx, xxpermx, and xxeval all used the 'vecsimple' type. These instructions are permute instructions (3 cycle latency) and should use 'vecperm' instead. While I was at it, I changed the UNSPEC name for xxspltidp to be UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID. I've built bootstrap compilers with this fix applied on little endian power9, little endian power10, and big endian power8. Can I check this patch into the master branch. I would also like to check this patch into the gcc-11 branch after a suitable period of time. Can I do this also? Note, the patch for gcc-11 will be different because the patch to move the xx* built-ins from altivec.md to vsx.md has not been back ported to gcc 11. 2021-08-24 Michael Meissner gcc/ * config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from UNSPEC_XXSPLTID. (xxspltiw_v4si): Use vecperm type attribute. (xxspltiw_v4si_inst): Use vecperm type attribute. (xxspltiw_v4sf_inst): Likewise. (xxspltidp_v2df): Use vecperm type attribute. Use UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID. (xxspltidp_v2df_inst): Likewise. (xxsplti32dx_v4si): Use vecperm type attribute. (xxsplti32dx_v4si_inst): Likewise. (xxsplti32dx_v4sf_inst): Likewise. (xxblend_): Likewise. (xxpermx): Likewise. (xxpermx_inst): Likewise. (xxeval): Likewise. --- gcc/config/rs6000/vsx.md | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e4ca6e94d49..bf033e31c1c 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -374,7 +374,7 @@ (define_c_enum "unspec" UNSPEC_VDIVEU UNSPEC_XXEVAL UNSPEC_XXSPLTIW - UNSPEC_XXSPLTID + UNSPEC_XXSPLTIDP UNSPEC_XXSPLTI32DX UNSPEC_XXBLEND UNSPEC_XXPERMX @@ -6414,7 +6414,7 @@ (define_insn "xxspltiw_v4si" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple") + [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) (define_expand "xxspltiw_v4sf" @@ -6434,14 +6434,14 @@ (define_insn "xxspltiw_v4sf_inst" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple") + [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) ;; XXSPLTIDP built-in function support (define_expand "xxspltidp_v2df" [(set (match_operand:V2DF 0 "register_operand" ) (unspec:V2DF [(match_operand:SF 1 "const_double_operand")] -UNSPEC_XXSPLTID))] +UNSPEC_XXSPLTIDP))] "TARGET_POWER10" { long value = rs6000_const_f32_to_i32 (operands[1]); @@ -6452,10 +6452,10 @@ (define_expand "xxspltidp_v2df" (define_insn "xxspltidp_v2df_inst" [(set (match_operand:V2DF 0 "register_operand" "=wa") (unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")] -UNSPEC_XXSPLTID))] +UNSPEC_XXSPLTIDP))] "TARGET_POWER10" "xxspltidp %x0,%1" - [(set_attr "type" "vecsimple") + [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) ;; XXSPLTI32DX built-in function support @@ -6476,7 +6476,7 @@ (define_expand "xxsplti32dx_v4si" GEN_INT (index), operands[3])); DONE; } - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecperm")]) (define_insn "xxsplti32dx_v4si_inst" [(set (match_operand:V4SI 0 "register_operand" "=wa") @@ -6486,7 +6486,7 @@ (define_insn "xxsplti32dx_v4si_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple") + [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) (define_expand "xxsplti32dx_v4sf" @@ -6515,7 +6515,7 @@ (define_insn "xxsplti32dx_v4sf_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple") + [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) ;; XXBLEND built-in function support @@ -6527,7 +6527,7 @@ (define_insn "xxblend_" UNSPEC_XXBLEND))] "TARGET_POWER10" "xxblendv %x0,%x1,%x2,%x3" - [(set_attr "type" "vecsimple") + [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) ;; XXPERMX built-in function support @@ -6562,7 +6562,7 @@ (define_expand "xxpermx" DONE; } - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecperm")]) (define_insn "xxpermx_inst" [(set (match_operand:V2DI 0 "register_operand" "+v") @@ -6573,7 +6573,7 @@ (define_insn "xxpermx_inst" UNSPEC_XXPERMX))] "TARGET_POWER10" "xxpermx %x0,%x1,%x2,%x3,%4" - [(set_attr "type" "vecsimple") + [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) ;; XXEVAL built-in function support @@ -6586,6
Re: [PING][PATCH] enable ranger and caching in pass_waccess
On 8/25/21 11:20 AM, Martin Sebor wrote: Ping: Andrew, did I answer your questions? Do you (or anyone else) have any other comments on the latest patch below? https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577865.html I wasn't attempting to block it, its outside my review purview.. I was merely commenting that you should not need a pointer to a range_query at all anymore Are you planning to transition to using the get_range_query() interface instead of keeping a range_query pointer in the pointer_query class? This pass and to a smaller extent the pointer_query class that's used by it and the strlen pass are still a work in progress. I also still need to convert the strlen pass to use Ranger and I expect it will take some changes to pointer_query. So at that point, if going through get_range_query (cfun) everywhere is what you recommend, I'm happy to do it. absolutely. you should not need to even know whether you have a ranger instance running or not. get_range_query will give you whichever is active, and there is ALWAYS something active.. defaulting to the global version. the code in get_range() seems to be the end of the call chain which uses the pointer and should be consolidated to something much simpler if (rvals && stmt) { if (!rvals->range_of_expr (vr, val, stmt)) return NULL_TREE; <...> // ?? This entire function should use get_range_query or get_global_range_query (), // instead of doing something different for RVALS and global ranges. if (!get_global_range_query ()->range_of_expr (vr, val) || vr.undefined_p ()) return NULL_TREE; This entire section can boil down to something like if (!get_range_query (cfun)->range_of_expr (vr, val, stmt)) return NULL; PS There has been an effort to get rid of global variables from GCC, or, as the first step, to avoid accessing them directly(*). If and when that happens, it seems like each pass will have to store either the ranger instance as a member (directly or indirectly, via a member of a class that stores it) or the function passed to pass::execute() if it wants to access either. [*] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573975.html The patch at the link above wasn't approved but IIUC removing globals from GCC is still a goal. I have no idea what direction that is going, but the net effect will be the same in the end. You'll be calling get_range_query() with either the function pointer, or the pass pointer, or something.. but you should never need to pass around a pointer to either a ranger or range-query. As I said earlier, this will likely be a pass property and accessed thru the pass eventually... but until then, use what we have.. cfun. It'll be trivial to transition down the road to whatever the ultimate solution is. Andrew
Re: [PATCH] analyzer: Impose recursion limit on indirect calls.
> On 25-Aug-2021, at 8:35 PM, David Malcolm wrote: > > On Wed, 2021-08-25 at 20:31 +0530, Ankur Saini wrote: >> >> >>> On 25-Aug-2021, at 7:24 PM, Martin Liška wrote: >>> >>> On 8/25/21 15:22, David Malcolm via Gcc-patches wrote: On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote: > This should also fix the failing regression found in PR > analyzer/101980. > > - The patch is in sync with current master > - successfully bootstrapped and tested on x86_64-linux-gnu. > The patch is OK for trunk. Thanks for fixing this Dave >>> >>> Hello. >>> >>> Quite accidentally, but I noticed the patch violates GNU coding >>> style: >>> >>> $ git show 43a5d46feabd93ba78983919234f05f5fc9a0982 | >>> ./contrib/check_GNU_style.py - >>> >>> === ERROR type #1: blocks of 8 spaces should be replaced with tabs >>> (8 error(s)) === >>> >>> gcc/analyzer/engine.cc:3064:0: that exceed it further. >>> >>> gcc/analyzer/engine.cc:3065:0: This is something of a blunt >>> workaround, but it only >>> >>> gcc/analyzer/engine.cc:3066:0: applies to recursion (and >>> mutual recursion), not to >>> >>> gcc/analyzer/engine.cc:3067:0: general call stacks. */ >>> >>> gcc/analyzer/engine.cc:3069:0: > >>> param_analyzer_max_recursion_depth) >>> >>> gcc/analyzer/engine.cc:3071:0:if (logger) >>> >>> gcc/analyzer/engine.cc:3072:0: logger->log ("rejecting >>> call edge: recursion limit exceeded"); >>> >>> gcc/analyzer/engine.cc:3073:0:return false; >>> >> >> Looks like my editor again converted all the tabs to spaces. >> >> btw, I can also see a lot of other places where 8 spaces are not >> begin converted to tabs, should I also change that accordingly or >> leave them the way it is and just update this patch ? > > It's usually best to split out bugfixes from formatting/whitespace > changes, so maybe do it as two patches: > > (1) an updated version of this patch to fix the recursion issue, using > the correct whitespace for the lines that are touched unfortunately, I already checked in and pushed the changes to the master before this was pointed out. But I can add them to the whitespace issue patch. > > (2) a patch that fixes whitespaces issues, but doesn't change the > behavior of the code > > > Dave
[PING][PATCH] enable ranger and caching in pass_waccess
Ping: Andrew, did I answer your questions? Do you (or anyone else) have any other comments on the latest patch below? https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577865.html On 8/20/21 4:16 PM, Martin Sebor wrote: On 8/20/21 7:09 AM, Andrew MacLeod wrote: On 8/19/21 7:09 PM, Martin Sebor via Gcc-patches wrote: The attached patch changes the new access warning pass to use the per-function ranger instance. To do that it makes a number of the global static functions members of the pass (that involved moving one to a later point in the file, increasing the diff; the body of the function hasn't changed otherwise). Still more functions remain. At the same time, the patch also enables the simple pointer_query cache to avoid repeatedly recomputing the properties of related pointers into the same objects, and makes the cache more effective (trunk fails to cache a bunch of intermediate results). Finally, the patch enhances the debugging support for the cache. Other than the ranger/caching the changes have no user-visible effect. Why are you calling enable/disable ranger if you are passing a ranger instance around instead of using the get_range_query (cfun)->range* calls? The pass stores an instance of the pointer_query class which in turn stores a pointer to range_query (which is a copy of the ranger). So storing it also in pass_waccess isn't necessary and can be removed. I've made that change in the attached update. I'm not sure the corresponding pointer should at some point also be removed from the pointer_query class and replaced by calls to get_range_query (cfun). If so, that would take some surgery to the strlen pass which also uses pointer_query and isn't quite ready to make this switch. Are you planning to transition to using the get_range_query() interface instead of keeping a range_query pointer in the pointer_query class? This pass and to a smaller extent the pointer_query class that's used by it and the strlen pass are still a work in progress. I also still need to convert the strlen pass to use Ranger and I expect it will take some changes to pointer_query. So at that point, if going through get_range_query (cfun) everywhere is what you recommend, I'm happy to do it. Anyway, attached is an updated revision with the m_ranger member removed and a few helpers changed to take a range_query argument to use the pointer_query member instead. It was retested on x86_64-linux. Martin PS There has been an effort to get rid of global variables from GCC, or, as the first step, to avoid accessing them directly(*). If and when that happens, it seems like each pass will have to store either the ranger instance as a member (directly or indirectly, via a member of a class that stores it) or the function passed to pass::execute() if it wants to access either. [*] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573975.html The patch at the link above wasn't approved but IIUC removing globals from GCC is still a goal.
Re: [PATCH] analyzer: Recognize __builtin_free as a matching deallocator
On 8/25/21 5:44 PM, Matthias Klose wrote: On 7/28/21 1:44 PM, David Malcolm via Gcc-patches wrote: On Wed, 2021-07-28 at 10:34 +0530, Siddhesh Poyarekar wrote: Recognize __builtin_free as being equivalent to free when passed into __attribute__((malloc ())), similar to how it is treated when it is encountered as a call. This fixes spurious warnings in glibc where xmalloc family of allocators as well as reallocarray, memalign, etc. are declared to have __builtin_free as the free function. gcc/analyzer/ChangeLog: * sm-malloc.cc (malloc_state_machine::get_or_create_deallocator): Recognize __builtin_free. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/attr-malloc-1.c (compatible_alloc, compatible_alloc2): New extern allocator declarations. (test_9, test_10): New tests. Looks good to me, thanks Dave Please could this be backported to all active branches? Sure, it looks like only gcc11 needs this since malloc attribute matching seems recent. David, I've never done a backport before, may I just cherry-pick, push and post a [committed] patch on list or does it need to go through review? Thanks, Siddhesh
Re: [PATCH] analyzer: Impose recursion limit on indirect calls.
On Wed, 2021-08-25 at 20:31 +0530, Ankur Saini wrote: > > > > On 25-Aug-2021, at 7:24 PM, Martin Liška wrote: > > > > On 8/25/21 15:22, David Malcolm via Gcc-patches wrote: > > > On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote: > > > > This should also fix the failing regression found in PR > > > > analyzer/101980. > > > > > > > > - The patch is in sync with current master > > > > - successfully bootstrapped and tested on x86_64-linux-gnu. > > > > > > > The patch is OK for trunk. > > > Thanks for fixing this > > > Dave > > > > Hello. > > > > Quite accidentally, but I noticed the patch violates GNU coding > > style: > > > > $ git show 43a5d46feabd93ba78983919234f05f5fc9a0982 | > > ./contrib/check_GNU_style.py - > > > > === ERROR type #1: blocks of 8 spaces should be replaced with tabs > > (8 error(s)) === > > > > gcc/analyzer/engine.cc:3064:0: that exceed it further. > > > > gcc/analyzer/engine.cc:3065:0: This is something of a blunt > > workaround, but it only > > > > gcc/analyzer/engine.cc:3066:0: applies to recursion (and > > mutual recursion), not to > > > > gcc/analyzer/engine.cc:3067:0: general call stacks. */ > > > > gcc/analyzer/engine.cc:3069:0: > > > param_analyzer_max_recursion_depth) > > > > gcc/analyzer/engine.cc:3071:0:if (logger) > > > > gcc/analyzer/engine.cc:3072:0: logger->log ("rejecting > > call edge: recursion limit exceeded"); > > > > gcc/analyzer/engine.cc:3073:0:return false; > > > > Looks like my editor again converted all the tabs to spaces. > > btw, I can also see a lot of other places where 8 spaces are not > begin converted to tabs, should I also change that accordingly or > leave them the way it is and just update this patch ? It's usually best to split out bugfixes from formatting/whitespace changes, so maybe do it as two patches: (1) an updated version of this patch to fix the recursion issue, using the correct whitespace for the lines that are touched (2) a patch that fixes whitespaces issues, but doesn't change the behavior of the code Dave
Re: [PATCH] analyzer: Impose recursion limit on indirect calls.
On 8/25/21 15:22, David Malcolm via Gcc-patches wrote: On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote: This should also fix the failing regression found in PR analyzer/101980. - The patch is in sync with current master - successfully bootstrapped and tested on x86_64-linux-gnu. The patch is OK for trunk. Thanks for fixing this Dave Hello. Quite accidentally, but I noticed the patch violates GNU coding style: $ git show 43a5d46feabd93ba78983919234f05f5fc9a0982 | ./contrib/check_GNU_style.py - === ERROR type #1: blocks of 8 spaces should be replaced with tabs (8 error(s)) === gcc/analyzer/engine.cc:3064:0: that exceed it further. gcc/analyzer/engine.cc:3065:0: This is something of a blunt workaround, but it only gcc/analyzer/engine.cc:3066:0: applies to recursion (and mutual recursion), not to gcc/analyzer/engine.cc:3067:0: general call stacks. */ gcc/analyzer/engine.cc:3069:0: > param_analyzer_max_recursion_depth) gcc/analyzer/engine.cc:3071:0:if (logger) gcc/analyzer/engine.cc:3072:0: logger->log ("rejecting call edge: recursion limit exceeded"); gcc/analyzer/engine.cc:3073:0:return false; Martin
Re: Ping: [PATCH] diagnostics: Support for -finput-charset [PR93067]
On Tue, 2021-08-24 at 19:28 -0400, Lewis Hyatt wrote: > On Tue, Aug 24, 2021 at 6:51 PM David Malcolm > wrote: > > > > On Tue, 2021-08-24 at 08:17 -0400, Lewis Hyatt wrote: > > > Hello- > > > > > > I thought it might be a good time to check on this patch please? > > > Thanks! > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576449.html > > > > > > -Lewis > > > > I went through that latest version of the patch and have no further > > suggestions - I like the changes you made to incorporate the > > changes I > > had made to input.c. > > > > The latest version of the patch is OK for trunk. > > > > It might be an idea to rebase it and retest it before pushing it, > > to > > make sure nothing significant has changed in the last few weeks. > > > > Thanks for your work on this, and sorry again for the delay in > > reviewing it. > > > > Dave > > > > > > OK great, thanks for your time. I will push after retesting. > > BTW, do you think it would be worthwhile to work on the other half of > encoding support, i.e. translating from UTF-8 to the user's locale, > when outputting diagnostics? I have probably 90% of a patch that does > this, however it complexifies things a bit and I am not sure if it is > really worth the trouble. What is rather manageable (that my patch in > progress does now) is to replace non-translatable characters with > something like UCN escapes. What is not so easy, is to do this and > preserve the alignment of carets and label lines and such... this > requires making the display width of a character also > locale-dependent, which concept doesn't exist currently. Adding that > feels like a lot of complication for what would be a little-used > feature... Anyway, if you think a patch that does the translation > without preserving the alignment would be useful, I could finish it > up > and send it. Otherwise I was kinda inclined to forget about it. > Thanks! Maybe post the patch you have so far, making clear that it's unfinished/work-in-progress? I think I'll find it easier to think about this with a patch in hand rather than a description of a patch, and also, that way the list archives will have a copy of your work in case we do want to finish it at some later point (rather than it just being on your hard drive). Thanks Dave
Re: [PATCH] analyzer: Impose recursion limit on indirect calls.
On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote: > This should also fix the failing regression found in PR > analyzer/101980. > > - The patch is in sync with current master > - successfully bootstrapped and tested on x86_64-linux-gnu. > The patch is OK for trunk. Thanks for fixing this Dave
Re: [llvm-dev] [PATCH] Add optional _Float16 support
On Mon, Aug 23, 2021 at 10:55 PM John McCall wrote: > > On Thu, Jul 29, 2021 at 9:40 AM H.J. Lu wrote: >> >> On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu wrote: >> > >> > On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers >> > wrote: >> > > >> > > On Tue, 13 Jul 2021, H.J. Lu wrote: >> > > >> > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei >> > > > wrote: >> > > > > >> > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 >> > > > > > registers. >> > > > > >> > > > > Can you please explain the behavior here? Is there difference >> > > > > between _Float16 and _Complex _Float16 when return? I.e., >> > > > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? >> > > > > 2, For a single _Float16 value, are both real part and imaginary >> > > > > part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? >> > > > >> > > > Here is the v2 patch to add the missing _Float16 bits. The PDF file >> > > > is at >> > > > >> > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI >> > > >> > > This PDF shows _Complex _Float16 as having a size of 2 bytes (should be >> > > 4-byte size, 2-byte alignment). >> > > >> > > It also seems to change double from 4-byte to 8-byte alignment, which is >> > > wrong. And it's inconsistent about whether it covers the long double = >> > > double (Android) case - it shows that case for _Complex long double but >> > > not for long double itself. >> > >> > Here is the v3 patch with the fixes. I also updated the PDF file. >> >> Here is the final patch I checked in. _Complex _Float16 is changed to >> return >> in XMM0 register. The new PDF file is at >> >> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > > > This should be explicit that the real part is returned in bits 0..15 and the > imaginary part is returned in bits 16..31, or however we conventionally > designate subcomponents of a vector. > > John. How about this? diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index 860ff66..8f527c1 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -457,6 +457,9 @@ and \texttt{unions}) are always returned in memory. & \texttt{__float128} & memory \\ \hline & \texttt{_Complex _Float16} & \reg{xmm0} \\ +& & The real part is returned in bits 0..15. The imaginary part is +returned \\ +& & in bits 16..31.\\ \cline{2-3} Complex & \texttt{_Complex float} & \EDX:\EAX \\ floating- & & The real part is returned in \EAX. The imaginary part is https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/89eb3e52c7e5eadd58f7597508e13f34/intel386-psABI-2021-08-25.pdf -- H.J.
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
Hi Haochen, Thanks for the updates! This looks good to me; please await Segher's response. Bill On 8/25/21 2:06 AM, HAO CHEN GUI wrote: Hi, I refined the patch according to Bill's advice. I pasted the ChangeLog and diff file here. If it doesn't work, please let me know. Thanks. 2021-08-25 Haochen Gui gcc/ * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions. gcc/testsuite/ * gcc.target/powerpc/vec-minmax-1.c: New test. * gcc.target/powerpc/vec-minmax-2.c: Likewise. diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index b4e13af4dc6..90527734ceb 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) return true; /* flavors of vec_min. */ case VSX_BUILTIN_XVMINDP: + case ALTIVEC_BUILTIN_VMINFP: + if (!flag_finite_math_only || flag_signed_zeros) + return false; + /* Fall through to MIN_EXPR. */ + gcc_fallthrough (); case P8V_BUILTIN_VMINSD: case P8V_BUILTIN_VMINUD: case ALTIVEC_BUILTIN_VMINSB: @@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case ALTIVEC_BUILTIN_VMINUB: case ALTIVEC_BUILTIN_VMINUH: case ALTIVEC_BUILTIN_VMINUW: - case ALTIVEC_BUILTIN_VMINFP: arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1); lhs = gimple_call_lhs (stmt); @@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) return true; /* flavors of vec_max. */ case VSX_BUILTIN_XVMAXDP: + case ALTIVEC_BUILTIN_VMAXFP: + if (!flag_finite_math_only || flag_signed_zeros) + return false; + /* Fall through to MAX_EXPR. */ + gcc_fallthrough (); case P8V_BUILTIN_VMAXSD: case P8V_BUILTIN_VMAXUD: case ALTIVEC_BUILTIN_VMAXSB: @@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case ALTIVEC_BUILTIN_VMAXUB: case ALTIVEC_BUILTIN_VMAXUH: case ALTIVEC_BUILTIN_VMAXUW: - case ALTIVEC_BUILTIN_VMAXFP: arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1); lhs = gimple_call_lhs (stmt); diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c new file mode 100644 index 000..da92f059aea --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c @@ -0,0 +1,53 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */ + +/* This test verifies that float or double vec_min/max are bound to + xv[min|max][d|s]p instructions when fast-math is not set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_min (va, vb), PREF_D); +} + +#ifdef _BIG_ENDIAN + const int PREF_F = 0; +#else + const int PREF_F = 3; +#endif + +float vmaxf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_max (va, vb), PREF_F); +} + +float vminf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_min (va, vb), PREF_F); +} diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c new file mode 100644 index 000..d318b933181 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c @@ -0,0 +1,51 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */ +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */ + +/* This test verifies that float or double vec_min/max can be converted + to scalar comparison when fast-math is set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a,
Re: [PATCH] Use non-numbered clones for target_clones.
Hello. There's updated version of the patch that: - strips '.$number' suffix for default clones - skips numbering for both declarations and definitions of target clones. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, MartinFrom 99fdcd317cec7e3742e161f43951a67421506df7 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 20 Aug 2021 16:35:18 +0200 Subject: [PATCH] Use non-numbered clones for target_clones. gcc/ChangeLog: * cgraph.h (create_version_clone_with_body): Add new parameter. * cgraphclones.c: Likewise. * multiple_target.c (create_dispatcher_calls): Do not use numbered suffixes. (create_target_clone): Likewise here. gcc/testsuite/ChangeLog: * gcc.target/i386/mvc5.c: Scan assembly names. * gcc.target/i386/mvc7.c: Likewise. * gcc.target/i386/pr95778-1.c: Update scanned patterns. * gcc.target/i386/pr95778-2.c: Likewise. Co-Authored-By: Stefan Kneifel --- gcc/cgraph.h | 5 - gcc/cgraphclones.c| 11 --- gcc/multiple_target.c | 16 +++- gcc/testsuite/gcc.target/i386/mvc5.c | 4 gcc/testsuite/gcc.target/i386/mvc7.c | 8 ++-- gcc/testsuite/gcc.target/i386/pr95778-1.c | 4 ++-- gcc/testsuite/gcc.target/i386/pr95778-2.c | 4 ++-- 7 files changed, 33 insertions(+), 19 deletions(-) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index d54a258c2ee..4cdb3738b4d 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1006,13 +1006,16 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node that will promote value of the attribute DECL_FUNCTION_SPECIFIC_TARGET of the declaration. + If VERSION_DECL is set true, use clone_function_name_numbered for the + function clone. Otherwise, use clone_function_name. + Return the new version's cgraph node. */ cgraph_node *create_version_clone_with_body (vec redirect_callers, vec *tree_map, ipa_param_adjustments *param_adjustments, bitmap bbs_to_copy, basic_block new_entry_block, const char *clone_name, - tree target_attributes = NULL_TREE); + tree target_attributes = NULL_TREE, bool version_decl = true); /* Insert a new cgraph_function_version_info node into cgraph_fnver_htab corresponding to cgraph_node. */ diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index b16e68194e1..ae91dccd31d 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -987,6 +987,9 @@ cgraph_node::create_version_clone (tree new_decl, that will promote value of the attribute DECL_FUNCTION_SPECIFIC_TARGET of the declaration. + If VERSION_DECL is set true, use clone_function_name_numbered for the + function clone. Otherwise, use clone_function_name. + Return the new version's cgraph node. */ cgraph_node * @@ -995,7 +998,7 @@ cgraph_node::create_version_clone_with_body vec *tree_map, ipa_param_adjustments *param_adjustments, bitmap bbs_to_copy, basic_block new_entry_block, const char *suffix, - tree target_attributes) + tree target_attributes, bool version_decl) { tree old_decl = decl; cgraph_node *new_version_node = NULL; @@ -1016,8 +1019,10 @@ cgraph_node::create_version_clone_with_body new_decl = copy_node (old_decl); /* Generate a new name for the new version. */ - DECL_NAME (new_decl) = clone_function_name_numbered (old_decl, suffix); - SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl)); + tree fnname = (version_decl ? clone_function_name_numbered (old_decl, suffix) + : clone_function_name (old_decl, suffix)); + DECL_NAME (new_decl) = fnname; + SET_DECL_ASSEMBLER_NAME (new_decl, fnname); SET_DECL_RTL (new_decl, NULL); DECL_VIRTUAL_P (new_decl) = 0; diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c index 6c0565880c5..28d5f95dca5 100644 --- a/gcc/multiple_target.c +++ b/gcc/multiple_target.c @@ -166,9 +166,8 @@ create_dispatcher_calls (struct cgraph_node *node) } } - symtab->change_decl_assembler_name (node->decl, - clone_function_name_numbered ( - node->decl, "default")); + tree fname = clone_function_name (node->decl, "default"); + symtab->change_decl_assembler_name (node->decl, fname); if (node->definition) { @@ -309,9 +308,9 @@ create_target_clone (cgraph_node *node, bool definition, char *name, if (definition) { - new_node = node->create_version_clone_with_body (vNULL, NULL, - NULL, NULL, - NULL, name, attributes); + new_node + = node->create_version_clone_with_body (vNULL, NULL, NULL, NULL, NULL, + name, attributes, false); if (new_node == NULL) return NULL; new_node->force_output = true; @@ -322,9 +321,8 @@ create_target_clone (cgraph_node *node, bool definition, char *name, new_node = cgraph_node::get_create (new_decl); DECL_ATTRIBUTES (new_decl) = attributes; /* Generate a new name for the new version. */
Re: [GCC-11] [PATCH 0/5] Finish and general-regs-only
On Wed, Aug 25, 2021 at 12:34 AM Uros Bizjak wrote: > > On Tue, Aug 24, 2021 at 4:57 PM H.J. Lu wrote: > > > > On Sun, Aug 15, 2021 at 11:11 PM Richard Biener > > wrote: > > > > > > On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu wrote: > > > > > > > > and target("general-regs-only") function attribute > > > > were added to GCC 11. But their implementations are incomplete. I'd > > > > like to backport the following patches to GCC 11 branch to finish them. > > > > > > Fine with me if x86 maintainers do not disagree (also see one comment I > > > have > > > on the -mwait adding patch). > > > > Hi Uros, Honza, > > > > Do you have any comments? The updated -mwait patch with LTO_minor_version > > bump is at: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577471.html > > I don't have any comments, but IIRC, approved changes can be > backported from mainline to release branches without additional > approval. I am checking them in. Thanks. > Uros. > > > Thanks. > > > > H.J. > > > > H.J. Lu (5): > > > > x86: Add -mmwait for -mgeneral-regs-only > > > > x86: Use crc32 target option for CRC32 intrinsics > > > > x86: Remove OPTION_MASK_ISA_SSE4_2 from CRC32 _builtin functions > > > > x86: Enable the GPR only instructions for -mgeneral-regs-only > > > > : Add pragma GCC target("general-regs-only") > > > > > > > > gcc/common/config/i386/i386-common.c | 45 ++- > > > > gcc/config.gcc | 6 +- > > > > gcc/config/i386/i386-builtin.def | 8 +- > > > > gcc/config/i386/i386-builtins.c| 4 +- > > > > gcc/config/i386/i386-c.c | 2 + > > > > gcc/config/i386/i386-options.c | 12 + > > > > gcc/config/i386/i386.c | 6 +- > > > > gcc/config/i386/i386.h | 2 + > > > > gcc/config/i386/i386.md| 4 +- > > > > gcc/config/i386/i386.opt | 4 + > > > > gcc/config/i386/ia32intrin.h | 42 ++- > > > > gcc/config/i386/mwaitintrin.h | 52 +++ > > > > gcc/config/i386/pmmintrin.h| 13 +- > > > > gcc/config/i386/serializeintrin.h | 7 +- > > > > gcc/config/i386/sse.md | 4 +- > > > > gcc/config/i386/x86gprintrin.h | 13 + > > > > gcc/doc/extend.texi| 5 + > > > > gcc/doc/invoke.texi| 8 +- > > > > gcc/testsuite/gcc.target/i386/crc32-6.c| 13 + > > > > gcc/testsuite/gcc.target/i386/monitor-2.c | 27 ++ > > > > gcc/testsuite/gcc.target/i386/pr101492-1.c | 10 + > > > > gcc/testsuite/gcc.target/i386/pr101492-2.c | 10 + > > > > gcc/testsuite/gcc.target/i386/pr101492-3.c | 10 + > > > > gcc/testsuite/gcc.target/i386/pr101492-4.c | 12 + > > > > gcc/testsuite/gcc.target/i386/pr99744-3.c | 13 + > > > > gcc/testsuite/gcc.target/i386/pr99744-4.c | 357 + > > > > gcc/testsuite/gcc.target/i386/pr99744-5.c | 25 ++ > > > > gcc/testsuite/gcc.target/i386/pr99744-6.c | 23 ++ > > > > gcc/testsuite/gcc.target/i386/pr99744-7.c | 12 + > > > > gcc/testsuite/gcc.target/i386/pr99744-8.c | 13 + > > > > 30 files changed, 717 insertions(+), 45 deletions(-) > > > > create mode 100644 gcc/config/i386/mwaitintrin.h > > > > create mode 100644 gcc/testsuite/gcc.target/i386/crc32-6.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/monitor-2.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-1.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-2.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-3.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-4.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-3.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-4.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-5.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-6.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-7.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-8.c > > > > > > > > -- > > > > 2.31.1 > > > > > > > > > > > > -- > > H.J. -- H.J.
Re: [PATCH] analyzer: Recognize __builtin_free as a matching deallocator
On 7/28/21 1:44 PM, David Malcolm via Gcc-patches wrote: > On Wed, 2021-07-28 at 10:34 +0530, Siddhesh Poyarekar wrote: >> Recognize __builtin_free as being equivalent to free when passed into >> __attribute__((malloc ())), similar to how it is treated when it is >> encountered as a call. This fixes spurious warnings in glibc where >> xmalloc family of allocators as well as reallocarray, memalign, >> etc. are declared to have __builtin_free as the free function. >> >> gcc/analyzer/ChangeLog: >> * sm-malloc.cc >> (malloc_state_machine::get_or_create_deallocator): Recognize >> __builtin_free. >> >> gcc/testsuite/ChangeLog: >> * gcc.dg/analyzer/attr-malloc-1.c (compatible_alloc, >> compatible_alloc2): New extern allocator declarations. >> (test_9, test_10): New tests. > > Looks good to me, thanks > Dave > > Please could this be backported to all active branches?
Re: [Patch v2] C, C++, Fortran, OpenMP: Add support for device-modifiers for 'omp target device'
Hi Jakub, I applied all your suggested changes and checked for no test regressions on x86_64-linux with nvptx offloading. The revised patch is attached. Do you think that it's ok to commit the code? Thanks, Marcel Am 23.08.2021 um 19:47 schrieb Jakub Jelinek: On Fri, Aug 20, 2021 at 09:18:32PM +0200, Marcel Vollweiler wrote: --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -15864,37 +15864,81 @@ c_parser_omp_clause_map (c_parser *parser, tree list) } /* OpenMP 4.0: - device ( expression ) */ +>>> device ( expression ) Please remove all the >s. + + OpenMP 5.0: + device ( [device-modifier :] integer-expression ) + + device-modifier: + ancestor | device_num */ + /* A requires directive with the reverse_offload clause must be + specified. */ + if ((omp_requires_mask & OMP_REQUIRES_REVERSE_OFFLOAD) == 0) +{ + c_parser_error (parser, "a % directive with the " + "% clause must be " + "specified"); [BI think this diagnostics is confusing, it tells the user that it has to do something but doesn't tell why. It is also not a parser error. So I think it should be instead error_at (tok->location, "% device modifier not " "preceded by % directive " "with % clause"); + parens.skip_until_found_close (parser); + return list; +} + ancestor = true; +} + if (!INTEGRAL_TYPE_P (TREE_TYPE (t))) +{ + c_parser_error (parser, "expected integer expression"); + return list; } + check_no_duplicate_clause (list, OMP_CLAUSE_DEVICE, "device"); + + c = build_omp_clause (clause_loc, OMP_CLAUSE_DEVICE); + + OMP_CLAUSE_DEVICE_ID (c) = t; + OMP_CLAUSE_CHAIN (c) = list; + OMP_CLAUSE_DEVICE_ANCESTOR (c) = ancestor; + + list = c; return list; } diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 5349ef1..b4d8d81 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -15139,6 +15139,22 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) case OMP_CLAUSE_COLLAPSE: case OMP_CLAUSE_FINAL: case OMP_CLAUSE_DEVICE: + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE + && OMP_CLAUSE_DEVICE_ANCESTOR (c)) +{ + t = OMP_CLAUSE_DEVICE_ID (c); + if (TREE_CODE (t) == INTEGER_CST + && wi::to_widest (t) != 1) +{ + error_at (OMP_CLAUSE_LOCATION (c), +"the % clause expression must evaluate to " +"%<1%>"); + remove = true; + break; +} +} + /* FALLTHRU */ For the C FE, I'd suggest to move this to the c_parser_omp_clause_device routine like other similar checking is done there too. And you can use if (TREE_CODE (t) == INTEGER_CST && !integer_onep (t)) + error_at (tok->location, "a % directive with the " + "% clause must be " + "specified"); See above. @@ -38562,6 +38601,7 @@ cp_parser_omp_clause_device (cp_parser *parser, tree list, c = build_omp_clause (location, OMP_CLAUSE_DEVICE); OMP_CLAUSE_DEVICE_ID (c) = t; OMP_CLAUSE_CHAIN (c) = list; + OMP_CLAUSE_DEVICE_ANCESTOR (c) = ancestor; But in C++ the INTEGER_CST checking shouldn't be done here, because the argument could be type or value dependent. --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -7334,6 +7334,15 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) "% id must be integral"); remove = true; } + else if (OMP_CLAUSE_DEVICE_ANCESTOR (c) + && TREE_CODE (t) == INTEGER_CST + && wi::to_widest (t) != 1) !integer_onep (t) + if (!(gfc_current_ns->omp_requires & OMP_REQ_REVERSE_OFFLOAD)) +{ + gfc_error ("a % directive with the " + "% clause must be " + "specified at %C"); See above. + else if (gfc_match ("%e )", >device) == MATCH_YES) +{ +} + else Better != MATCH_YES and drop the {} else ? +{ + gfc_error ("Expected integer expression or a single device-" + "modifier % or % at %C"); + break; +} + continue; +} if ((mask & OMP_CLAUSE_DEVICE) && openacc && gfc_match ("device ( ") == MATCH_YES + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE + && OMP_CLAUSE_DEVICE_ANCESTOR (c)) +{ + if (code != OMP_TARGET) +{ +error_at (OMP_CLAUSE_LOCATION (c), + "% clause with % is only " +
[PATCH] tree-optimization/102046 - fix SLP build from scalars with patterns
When we swap operands for SLP builds we lose track where exactly pattern defs are - but we fail to update the any_pattern member of the operands info. Do so conservatively. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-08-25 Richard Biener PR tree-optimization/102046 * tree-vect-slp.c (vect_build_slp_tree_2): Conservatively update ->any_pattern when swapping operands. * gcc.dg/vect/pr102046.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr102046.c | 19 +++ gcc/tree-vect-slp.c | 4 2 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/pr102046.c diff --git a/gcc/testsuite/gcc.dg/vect/pr102046.c b/gcc/testsuite/gcc.dg/vect/pr102046.c new file mode 100644 index 000..ae48b497dc0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr102046.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -fvect-cost-model=dynamic" } */ +/* { dg-additional-options "-march=btver2" { target x86_64-*-* i?86-*-* } } */ + +struct S +{ + unsigned a, b; +}; + +struct S g; + +void +foo (struct S *o) +{ + struct S s = g; + s.b *= 3; + s.a -= s.a / 2; + *o = s; +} diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index edc11c62793..4d688c7a267 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -2311,6 +2311,10 @@ out: } if (dump_enabled_p ()) dump_printf (MSG_NOTE, "\n"); + /* After swapping some operands we lost track whether an +operand has any pattern defs so be conservative here. */ + if (oprnds_info[0]->any_pattern || oprnds_info[1]->any_pattern) + oprnds_info[0]->any_pattern = oprnds_info[1]->any_pattern = true; /* And try again with scratch 'matches' ... */ bool *tem = XALLOCAVEC (bool, group_size); if ((child = vect_build_slp_tree (vinfo, oprnd_info->def_stmts, -- 2.31.1
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
On 25/8/2021 下午 4:17, HAO CHEN GUI via Gcc-patches wrote: Hi Kewen, Thanks for your advice. On 25/8/2021 下午 3:50, Kewen.Lin wrote: Hi Haochen, on 2021/8/25 下午3:06, HAO CHEN GUI via Gcc-patches wrote: Hi, I refined the patch according to Bill's advice. I pasted the ChangeLog and diff file here. If it doesn't work, please let me know. Thanks. 2021-08-25 Haochen Gui gcc/ IIUC, this patch is for PR93127, one line for PR is missing here. The patch does comes from the PR, but it doesn't work as the PR requires. So I am not sure if I should add the PR number. * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions. gcc/testsuite/ Same, need a PR line. * gcc.target/powerpc/vec-minmax-1.c: New test. * gcc.target/powerpc/vec-minmax-2.c: Likewise. Maybe it's better to use pr93127-{1,2}.c for case names? ... --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c @@ -0,0 +1,53 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ I guess this "powerpc64le" isn't intentional? The test case has the macro to distinguish endianess, I assume we want this to be compiled on BE? If so, we just put the line below instead? It should be tested on BE as well. I will replace it with 'powerpc*-*-* && lp64'. It should be 'powerpc*-*-*'. Thanks again! /* { dg-do compile } */ And it needs extra testing on BE as well. :) Thanks for fixing this! BR, Kewen +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */ + +/* This test verifies that float or double vec_min/max are bound to + xv[min|max][d|s]p instructions when fast-math is not set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_min (va, vb), PREF_D); +} + +#ifdef _BIG_ENDIAN + const int PREF_F = 0; +#else + const int PREF_F = 3; +#endif + +float vmaxf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_max (va, vb), PREF_F); +} + +float vminf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_min (va, vb), PREF_F); +} diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c new file mode 100644 index 000..d318b933181 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c @@ -0,0 +1,51 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */ +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */ + +/* This test verifies that float or double vec_min/max can be converted + to scalar comparison when fast-math is set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_min (va, vb), PREF_D); +} + +#ifdef _BIG_ENDIAN + const int PREF_F = 0; +#else + const int PREF_F = 3; +#endif + +float vmaxf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_max (va, vb), PREF_F); +} + +float vminf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_min (va, vb), PREF_F); +}
Re: [PATCH] Change illegitimate constant into memref of constant pool in change_zero_ext.
On Wed, Aug 25, 2021 at 5:14 AM Segher Boessenkool wrote: > > Hi! > > On Tue, Aug 24, 2021 at 04:55:30PM +0800, liuhongt wrote: > > This patch extend change_zero_ext to change illegitimate constant > > into constant pool, this will enable simplification of below: > > It should be in a separate function. recog_for_combine will call both. > But not both for the same RTL! This is important. Originally, combine > tried only one thing for every combination of input instructions it got. > Every extra attempt causes quite a bit more garbage (not to mention it > takes time as well, recog isn't super cheap), so we should try to not > make recog_for_combine exponential in the variants it tries.. > > And of course the function name should always be descriptive of what a > function does :-) > > > change_zero_ext (rtx pat) > > @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat) > > { > >rtx x = **iter; > >scalar_int_mode mode, inner_mode; > > + machine_mode const_mode = GET_MODE (x); > > + > > + /* Change illegitimate constant into memref of constant pool. */ > > + if (CONSTANT_P (x) > > + && !const_vec_duplicate_p (x) > > This is x86-specific? It makes no sense in generic code, anyway. Or if > it does it needs a big fat comment :-) > > > + && const_mode != BLKmode > > + && GET_CODE (x) != HIGH > > + && GET_MODE_SIZE (const_mode).is_constant () > > + && !targetm.legitimate_constant_p (const_mode, x) > > + && !targetm.cannot_force_const_mem (const_mode, x)) > > You should only test that it did not recog, and then force a constant > to memory. You do not want to do this for every constant (rotate by 42 > won't likely match better if you force 42 to memory) so some > sophistication will help here, but please do not make it target- > specific. > > > + { > > + x = force_const_mem (GET_MODE (x), x); > > That mode is const_mode. > > > Ideally you will have some example where it benefits some other target, > too. Running recog twice for a big fraction of all combine attempts, > for no benefit at all, is not a good idea. The zext* thing is there > because combine *itself* creates a lot of extra zext*, whether those > exist for the target or not. So this isn't obvious precedent (and that > wouldn't mean it is a good idea anyway ;-) ) When trying to construct a testcase for another backend, I realized that the issue can also be solved by folding _mm_shuffle_ps into gimple vec_perm_expr. Let me try that way. > > > Segher -- BR, Hongtao
Re: [PATCH] Change illegitimate constant into memref of constant pool in change_zero_ext.
On Wed, Aug 25, 2021 at 5:14 AM Segher Boessenkool wrote: > > Hi! > > On Tue, Aug 24, 2021 at 04:55:30PM +0800, liuhongt wrote: > > This patch extend change_zero_ext to change illegitimate constant > > into constant pool, this will enable simplification of below: > > It should be in a separate function. recog_for_combine will call both. > But not both for the same RTL! This is important. Originally, combine > tried only one thing for every combination of input instructions it got. > Every extra attempt causes quite a bit more garbage (not to mention it > takes time as well, recog isn't super cheap), so we should try to not > make recog_for_combine exponential in the variants it tries.. > > And of course the function name should always be descriptive of what a > function does :-) > Yes, will do. > > change_zero_ext (rtx pat) > > @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat) > > { > >rtx x = **iter; > >scalar_int_mode mode, inner_mode; > > + machine_mode const_mode = GET_MODE (x); > > + > > + /* Change illegitimate constant into memref of constant pool. */ > > + if (CONSTANT_P (x) > > + && !const_vec_duplicate_p (x) > > This is x86-specific? It makes no sense in generic code, anyway. Or if > it does it needs a big fat comment :-) > > > + && const_mode != BLKmode > > + && GET_CODE (x) != HIGH > > + && GET_MODE_SIZE (const_mode).is_constant () > > + && !targetm.legitimate_constant_p (const_mode, x) > > + && !targetm.cannot_force_const_mem (const_mode, x)) > > You should only test that it did not recog, and then force a constant > to memory. You do not want to do this for every constant (rotate by 42 > won't likely match better if you force 42 to memory) so some > sophistication will help here, but please do not make it target- > specific. When it comes to change_zero_ext, insn_code_number must < 0, constant pool is used as a second choice. -cut-- if (insn_code_number >= 0 || check_asm_operands (pat)) return insn_code_number; void *marker = get_undo_marker (); bool changed = false; if (GET_CODE (pat) == SET) changed = change_zero_ext (pat); ---cut end- > > > + { > > + x = force_const_mem (GET_MODE (x), x); > > That mode is const_mode. > > > Ideally you will have some example where it benefits some other target, > too. Running recog twice for a big fraction of all combine attempts, > for no benefit at all, is not a good idea. The zext* thing is there This optimization is similar, pass_combine extracts const_vector from constant pool and simplify it to another const_vector, but doesn't realize the simplified const_vector should also be in constant pool. Maybe I should just restrict this optimization to const_vector only. > because combine *itself* creates a lot of extra zext*, whether those > exist for the target or not. So this isn't obvious precedent (and that > wouldn't mean it is a good idea anyway ;-) ) > > > Segher -- BR, Hongtao
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
Hi Kewen, Thanks for your advice. On 25/8/2021 下午 3:50, Kewen.Lin wrote: Hi Haochen, on 2021/8/25 下午3:06, HAO CHEN GUI via Gcc-patches wrote: Hi, I refined the patch according to Bill's advice. I pasted the ChangeLog and diff file here. If it doesn't work, please let me know. Thanks. 2021-08-25 Haochen Gui gcc/ IIUC, this patch is for PR93127, one line for PR is missing here. The patch does comes from the PR, but it doesn't work as the PR requires. So I am not sure if I should add the PR number. * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions. gcc/testsuite/ Same, need a PR line. * gcc.target/powerpc/vec-minmax-1.c: New test. * gcc.target/powerpc/vec-minmax-2.c: Likewise. Maybe it's better to use pr93127-{1,2}.c for case names? ... --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c @@ -0,0 +1,53 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ I guess this "powerpc64le" isn't intentional? The test case has the macro to distinguish endianess, I assume we want this to be compiled on BE? If so, we just put the line below instead? It should be tested on BE as well. I will replace it with 'powerpc*-*-* && lp64'. /* { dg-do compile } */ And it needs extra testing on BE as well. :) Thanks for fixing this! BR, Kewen +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */ + +/* This test verifies that float or double vec_min/max are bound to + xv[min|max][d|s]p instructions when fast-math is not set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_min (va, vb), PREF_D); +} + +#ifdef _BIG_ENDIAN + const int PREF_F = 0; +#else + const int PREF_F = 3; +#endif + +float vmaxf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_max (va, vb), PREF_F); +} + +float vminf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_min (va, vb), PREF_F); +} diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c new file mode 100644 index 000..d318b933181 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c @@ -0,0 +1,51 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */ +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */ + +/* This test verifies that float or double vec_min/max can be converted + to scalar comparison when fast-math is set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_min (va, vb), PREF_D); +} + +#ifdef _BIG_ENDIAN + const int PREF_F = 0; +#else + const int PREF_F = 3; +#endif + +float vmaxf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_max (va, vb), PREF_F); +} + +float vminf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_min (va, vb), PREF_F); +}
[PATCH] analyzer: Impose recursion limit on indirect calls.
This should also fix the failing regression found in PR analyzer/101980. - The patch is in sync with current master - successfully bootstrapped and tested on x86_64-linux-gnu. fix.patch Description: Binary data Thanks - Ankur
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
Hi Haochen, on 2021/8/25 下午3:06, HAO CHEN GUI via Gcc-patches wrote: > Hi, > > I refined the patch according to Bill's advice. I pasted the ChangeLog > and diff file here. If it doesn't work, please let me know. Thanks. > > 2021-08-25 Haochen Gui > > gcc/ IIUC, this patch is for PR93127, one line for PR is missing here. > * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): > Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, > VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions. > > gcc/testsuite/ Same, need a PR line. > * gcc.target/powerpc/vec-minmax-1.c: New test. > * gcc.target/powerpc/vec-minmax-2.c: Likewise. > Maybe it's better to use pr93127-{1,2}.c for case names? ... > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c > @@ -0,0 +1,53 @@ > +/* { dg-do compile { target { powerpc64le-*-* } } } */ I guess this "powerpc64le" isn't intentional? The test case has the macro to distinguish endianess, I assume we want this to be compiled on BE? If so, we just put the line below instead? /* { dg-do compile } */ And it needs extra testing on BE as well. :) Thanks for fixing this! BR, Kewen > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */ > + > +/* This test verifies that float or double vec_min/max are bound to > + xv[min|max][d|s]p instructions when fast-math is not set. */ > + > + > +#include > + > +#ifdef _BIG_ENDIAN > + const int PREF_D = 0; > +#else > + const int PREF_D = 1; > +#endif > + > +double vmaxd (double a, double b) > +{ > + vector double va = vec_promote (a, PREF_D); > + vector double vb = vec_promote (b, PREF_D); > + return vec_extract (vec_max (va, vb), PREF_D); > +} > + > +double vmind (double a, double b) > +{ > + vector double va = vec_promote (a, PREF_D); > + vector double vb = vec_promote (b, PREF_D); > + return vec_extract (vec_min (va, vb), PREF_D); > +} > + > +#ifdef _BIG_ENDIAN > + const int PREF_F = 0; > +#else > + const int PREF_F = 3; > +#endif > + > +float vmaxf (float a, float b) > +{ > + vector float va = vec_promote (a, PREF_F); > + vector float vb = vec_promote (b, PREF_F); > + return vec_extract (vec_max (va, vb), PREF_F); > +} > + > +float vminf (float a, float b) > +{ > + vector float va = vec_promote (a, PREF_F); > + vector float vb = vec_promote (b, PREF_F); > + return vec_extract (vec_min (va, vb), PREF_F); > +} > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c > b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c > new file mode 100644 > index 000..d318b933181 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c > @@ -0,0 +1,51 @@ > +/* { dg-do compile { target { powerpc64le-*-* } } } */ > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */ > +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */ > +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */ > + > +/* This test verifies that float or double vec_min/max can be converted > + to scalar comparison when fast-math is set. */ > + > + > +#include > + > +#ifdef _BIG_ENDIAN > + const int PREF_D = 0; > +#else > + const int PREF_D = 1; > +#endif > + > +double vmaxd (double a, double b) > +{ > + vector double va = vec_promote (a, PREF_D); > + vector double vb = vec_promote (b, PREF_D); > + return vec_extract (vec_max (va, vb), PREF_D); > +} > + > +double vmind (double a, double b) > +{ > + vector double va = vec_promote (a, PREF_D); > + vector double vb = vec_promote (b, PREF_D); > + return vec_extract (vec_min (va, vb), PREF_D); > +} > + > +#ifdef _BIG_ENDIAN > + const int PREF_F = 0; > +#else > + const int PREF_F = 3; > +#endif > + > +float vmaxf (float a, float b) > +{ > + vector float va = vec_promote (a, PREF_F); > + vector float vb = vec_promote (b, PREF_F); > + return vec_extract (vec_max (va, vb), PREF_F); > +} > + > +float vminf (float a, float b) > +{ > + vector float va = vec_promote (a, PREF_F); > + vector float vb = vec_promote (b, PREF_F); > + return vec_extract (vec_min (va, vb), PREF_F); > +} >
Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
On Tue, Aug 24, 2021 at 5:22 PM Hongyu Wang wrote: > > Hi Uros, > > Sorry for the late update. I have tried adjusting the combine pass but > found it is not easy to modify shift const, so I came up with an > alternative solution with your patch. It matches the non-canonical > zero-extend in ix86_decompose_address and adjust ix86_rtx_cost to > combine below pattern > > (set (reg:DI 85) >(and:DI (ashift:DI (reg:DI 87) >(const_int 1 [0x1])) >(const_int 4294967294 [0xfffe]))) > > Survived bootstrap and regtest on x86-64-linux. Ok for master? gcc/ChangeLog: PR target/101716 * config/i386/i386.c (ix86_live_on_entry): Adjust comment. (ix86_decompose_address): Remove retval check for ASHIFT, allow non-canonical zero extend if AND mask covers ASHIFT count. (ix86_legitimate_address_p): Adjust condition for decompose. (ix86_rtx_costs): Adjust cost for lea with non-canonical zero-extend. OK. Thanks, Uros. > Uros Bizjak 于2021年8月16日周一 下午5:26写道: > > > > > On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang wrote: > > > > > > > So, the question is if the combine pass really needs to zero-extend > > > > with 0xfffe, the left shift << 1 guarantees zero in the LSB, so > > > > 0x should be better and in line with canonical zero-extension > > > > RTX. > > > > > > The shift mask is generated in simplify_shift_const_1: > > > > > > mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode), > > > int_result_mode); > > > rtx count_rtx = gen_int_shift_amount (int_result_mode, count); > > > mask_rtx > > > = simplify_const_binary_operation (code, int_result_mode, > > > mask_rtx, count_rtx); > > > > > > Can we adjust the count for ashift if nonzero_bits overlaps it? > > > > > > > Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is > > > > embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the > > > > call in ix86_legitimate_address_p) for some (historic?) reason. It > > > > looks to me that this restriction is not necessary, since > > > > ix86_legitimize_address can canonicalize ASHIFT RTXes without > > > > problems. The attached patch that survives bootstrap and regtest can > > > > help in your case. > > > > > > We have a split to transform ashift to mult, I'm afraid it could not > > > help this issue. > > > > If you want existing *lea to accept ASHIFT RTX, it uses > > address_no_seg_operand predicate which uses address_operand predicate, > > which calls ix86_legitimate_address_p, which ATM rejects ASHIFT RTXes. > > > > Uros.
Re: [GCC-11] [PATCH 0/5] Finish and general-regs-only
On Tue, Aug 24, 2021 at 4:57 PM H.J. Lu wrote: > > On Sun, Aug 15, 2021 at 11:11 PM Richard Biener > wrote: > > > > On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu wrote: > > > > > > and target("general-regs-only") function attribute > > > were added to GCC 11. But their implementations are incomplete. I'd > > > like to backport the following patches to GCC 11 branch to finish them. > > > > Fine with me if x86 maintainers do not disagree (also see one comment I have > > on the -mwait adding patch). > > Hi Uros, Honza, > > Do you have any comments? The updated -mwait patch with LTO_minor_version > bump is at: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577471.html I don't have any comments, but IIRC, approved changes can be backported from mainline to release branches without additional approval. Uros. > Thanks. > > H.J. > > > H.J. Lu (5): > > > x86: Add -mmwait for -mgeneral-regs-only > > > x86: Use crc32 target option for CRC32 intrinsics > > > x86: Remove OPTION_MASK_ISA_SSE4_2 from CRC32 _builtin functions > > > x86: Enable the GPR only instructions for -mgeneral-regs-only > > > : Add pragma GCC target("general-regs-only") > > > > > > gcc/common/config/i386/i386-common.c | 45 ++- > > > gcc/config.gcc | 6 +- > > > gcc/config/i386/i386-builtin.def | 8 +- > > > gcc/config/i386/i386-builtins.c| 4 +- > > > gcc/config/i386/i386-c.c | 2 + > > > gcc/config/i386/i386-options.c | 12 + > > > gcc/config/i386/i386.c | 6 +- > > > gcc/config/i386/i386.h | 2 + > > > gcc/config/i386/i386.md| 4 +- > > > gcc/config/i386/i386.opt | 4 + > > > gcc/config/i386/ia32intrin.h | 42 ++- > > > gcc/config/i386/mwaitintrin.h | 52 +++ > > > gcc/config/i386/pmmintrin.h| 13 +- > > > gcc/config/i386/serializeintrin.h | 7 +- > > > gcc/config/i386/sse.md | 4 +- > > > gcc/config/i386/x86gprintrin.h | 13 + > > > gcc/doc/extend.texi| 5 + > > > gcc/doc/invoke.texi| 8 +- > > > gcc/testsuite/gcc.target/i386/crc32-6.c| 13 + > > > gcc/testsuite/gcc.target/i386/monitor-2.c | 27 ++ > > > gcc/testsuite/gcc.target/i386/pr101492-1.c | 10 + > > > gcc/testsuite/gcc.target/i386/pr101492-2.c | 10 + > > > gcc/testsuite/gcc.target/i386/pr101492-3.c | 10 + > > > gcc/testsuite/gcc.target/i386/pr101492-4.c | 12 + > > > gcc/testsuite/gcc.target/i386/pr99744-3.c | 13 + > > > gcc/testsuite/gcc.target/i386/pr99744-4.c | 357 + > > > gcc/testsuite/gcc.target/i386/pr99744-5.c | 25 ++ > > > gcc/testsuite/gcc.target/i386/pr99744-6.c | 23 ++ > > > gcc/testsuite/gcc.target/i386/pr99744-7.c | 12 + > > > gcc/testsuite/gcc.target/i386/pr99744-8.c | 13 + > > > 30 files changed, 717 insertions(+), 45 deletions(-) > > > create mode 100644 gcc/config/i386/mwaitintrin.h > > > create mode 100644 gcc/testsuite/gcc.target/i386/crc32-6.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/monitor-2.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-2.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-3.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-4.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-3.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-4.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-5.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-6.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-7.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-8.c > > > > > > -- > > > 2.31.1 > > > > > > > -- > H.J.
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
Hi, I refined the patch according to Bill's advice. I pasted the ChangeLog and diff file here. If it doesn't work, please let me know. Thanks. 2021-08-25 Haochen Gui gcc/ * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions. gcc/testsuite/ * gcc.target/powerpc/vec-minmax-1.c: New test. * gcc.target/powerpc/vec-minmax-2.c: Likewise. diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index b4e13af4dc6..90527734ceb 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) return true; /* flavors of vec_min. */ case VSX_BUILTIN_XVMINDP: + case ALTIVEC_BUILTIN_VMINFP: + if (!flag_finite_math_only || flag_signed_zeros) + return false; + /* Fall through to MIN_EXPR. */ + gcc_fallthrough (); case P8V_BUILTIN_VMINSD: case P8V_BUILTIN_VMINUD: case ALTIVEC_BUILTIN_VMINSB: @@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case ALTIVEC_BUILTIN_VMINUB: case ALTIVEC_BUILTIN_VMINUH: case ALTIVEC_BUILTIN_VMINUW: - case ALTIVEC_BUILTIN_VMINFP: arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1); lhs = gimple_call_lhs (stmt); @@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) return true; /* flavors of vec_max. */ case VSX_BUILTIN_XVMAXDP: + case ALTIVEC_BUILTIN_VMAXFP: + if (!flag_finite_math_only || flag_signed_zeros) + return false; + /* Fall through to MAX_EXPR. */ + gcc_fallthrough (); case P8V_BUILTIN_VMAXSD: case P8V_BUILTIN_VMAXUD: case ALTIVEC_BUILTIN_VMAXSB: @@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case ALTIVEC_BUILTIN_VMAXUB: case ALTIVEC_BUILTIN_VMAXUH: case ALTIVEC_BUILTIN_VMAXUW: - case ALTIVEC_BUILTIN_VMAXFP: arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1); lhs = gimple_call_lhs (stmt); diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c new file mode 100644 index 000..da92f059aea --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c @@ -0,0 +1,53 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */ + +/* This test verifies that float or double vec_min/max are bound to + xv[min|max][d|s]p instructions when fast-math is not set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_min (va, vb), PREF_D); +} + +#ifdef _BIG_ENDIAN + const int PREF_F = 0; +#else + const int PREF_F = 3; +#endif + +float vmaxf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_max (va, vb), PREF_F); +} + +float vminf (float a, float b) +{ + vector float va = vec_promote (a, PREF_F); + vector float vb = vec_promote (b, PREF_F); + return vec_extract (vec_min (va, vb), PREF_F); +} diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c new file mode 100644 index 000..d318b933181 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c @@ -0,0 +1,51 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */ +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */ + +/* This test verifies that float or double vec_min/max can be converted + to scalar comparison when fast-math is set. */ + + +#include + +#ifdef _BIG_ENDIAN + const int PREF_D = 0; +#else + const int PREF_D = 1; +#endif + +double vmaxd (double a, double b) +{ + vector double va = vec_promote (a, PREF_D); + vector double vb = vec_promote (b, PREF_D); + return vec_extract (vec_max (va, vb), PREF_D); +} + +double vmind (double a, double b) +{ + vector double va = vec_promote (a,
Re: [PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]
On Wed, Aug 25, 2021 at 2:14 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > For avx512f_scattersi, mask operand only affect set src, we > need to refine the pattern to let gcc know mask register also affect the dest. > So we put mask operand into UNSPEC_VSIBADDR. > > Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. > Ok for master? > > gcc/ChangeLog: > > *config/i386/sse.md (scattersi): Add mask operand to > UNSPEC_VSIBADDR. > (scattersi): Likewise. > (*avx512f_scattersi): Merge mask operand > to set_dest. > (*avx512f_scatterdi): Likewise > > gcc/testsuite/ChangeLog: > > *gcc.target/i386/avx512f-pr101472.c: New test. > *gcc.target/i386/avx512vl-pr101472.c: Ditto. Please follow GCC Coding Convention ChanLog which is described in https://gcc.gnu.org/codingconventions.html#ChangeLogs. -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], - operands[4]), UNSPEC_VSIBADDR); += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], + operands[4], operands[1]), UNSPEC_VSIBADDR); Lines shall be at most 80 columns. }) (define_insn "*avx512f_scattersi" @@ -24214,10 +24214,11 @@ [(unspec:P [(match_operand:P 0 "vsib_address_operand" "Tv") (match_operand: 2 "register_operand" "v") - (match_operand:SI 4 "const1248_operand" "n")] + (match_operand:SI 4 "const1248_operand" "n") + (match_operand: 6 "register_operand" "1")] UNSPEC_VSIBADDR)]) (unspec:VI48F - [(match_operand: 6 "register_operand" "1") + [(match_dup 6) (match_operand:VI48F 3 "register_operand" "v")] UNSPEC_SCATTER)) (clobber (match_scratch: 1 "="))] @@ -24243,8 +24244,8 @@ "TARGET_AVX512F" { operands[5] -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], - operands[4]), UNSPEC_VSIBADDR); += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], + operands[4], operands[1]), UNSPEC_VSIBADDR); Ditto. }) -- BR, Hongtao
[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]
Hi, For avx512f_scattersi, mask operand only affect set src, we need to refine the pattern to let gcc know mask register also affect the dest. So we put mask operand into UNSPEC_VSIBADDR. Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. Ok for master? gcc/ChangeLog: *config/i386/sse.md (scattersi): Add mask operand to UNSPEC_VSIBADDR. (scattersi): Likewise. (*avx512f_scattersi): Merge mask operand to set_dest. (*avx512f_scatterdi): Likewise gcc/testsuite/ChangeLog: *gcc.target/i386/avx512f-pr101472.c: New test. *gcc.target/i386/avx512vl-pr101472.c: Ditto. 0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch Description: 0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch
[PATCH] i386: Fix _mm512_fpclass_ps_mask in O0 [PR 101471]
Hi, For _mm512_fpclass_ps_mask in O0, mask should be (__mmask16)-1 instead of (__mmask8)-1). Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: * gcc/config/i386/avx512dqintrin.h : fix _mm512_fpclass_ps_mask define in O0 gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-pr101471.c: add new test 0001-i386-Fix-_mm512_fpclass_ps_mask-in-O0-PR-101471.patch Description: 0001-i386-Fix-_mm512_fpclass_ps_mask-in-O0-PR-101471.patch