PING^2 [PATCH] x86: Add cmpmemsi for -minline-all-stringops
On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak wrote: > > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu wrote: > > > > > > > > Duplicate the cmpstrn pattern for cmpmem. The only difference is that > > > > the length argument of cmpmem is guaranteed to be less than or equal to > > > > lengths of 2 memory areas. Since "repz cmpsb" can be much slower than > > > > memcmp function implemented with vector instruction, see > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only. > > > > > > If there is no benefit compared to the library implementation, then > > > enable these patterns only when -minline-all-stringops is used. > > > > Fixed. > > > > > Eventually these should be reimplemented with SSE4 string instructions. > > > > > > Honza is the author of the block handling x86 system, I'll leave the > > > review to him. > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi. It was changed > > by > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f > > Author: Nick Clifton > > Date: Fri Aug 12 16:26:11 2011 + > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern. > > > > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi > > pattern. > > * doc/md.texi (cmpstrn): Note that the comparison stops if both > > fetched bytes are zero. > > (cmpstr): Likewise. > > (cmpmem): Note that the comparison does not stop if both of the > > fetched bytes are zero. > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151 > > > > is a regression. > > > > Honza, can you take a look at this? > > > > PING: > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html > PING. -- H.J.
PING^2 [PATCH] x86: Inline strncmp only with -minline-all-stringops
On Wed, Aug 19, 2020 at 6:25 AM H.J. Lu wrote: > > On Wed, Jul 15, 2020 at 10:42:27AM -0700, H.J. Lu wrote: > > Expand strncmp to "repz cmpsb" only with -minline-all-stringops since > > "repz cmpsb" can be much slower than strncmp function implemented with > > vector instructions, see > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > > > gcc/ > > > > PR target/95458 > > * config/i386/i386-expand.c (ix86_expand_cmpstrn_or_cmpmem): > > Return false for -mno-inline-all-stringops. > > > > gcc/testsuite/ > > > > PR target/95458 > > * gcc.target/i386/pr95458-1.c: New test. > > * gcc.target/i386/pr95458-2.c: Likewise. > > Expand strncmp to "repz cmpsb" only with -minline-all-stringops since > "repz cmpsb" can be much slower than strncmp function implemented with > vector instructions, see > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > gcc/ > > PR target/95458 > * config/i386/i386.md (cmpstrnsi): Expand only with > TARGET_INLINE_ALL_STRINGOPS. > > gcc/testsuite/ > > PR target/95458 > * gcc.target/i386/pr95458-1.c: New test. > * gcc.target/i386/pr95458-2.c: Likewise. > --- > gcc/config/i386/i386.md | 8 +++- > gcc/testsuite/gcc.target/i386/pr95458-1.c | 11 +++ > gcc/testsuite/gcc.target/i386/pr95458-2.c | 7 +++ > 3 files changed, 25 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr95458-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr95458-2.c > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index fb677e17817..f3fbed81c4a 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -18007,7 +18007,13 @@ (define_expand "cmpstrnsi" > { >rtx addr1, addr2, countreg, align, out; > > - if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS) > + /* Expand strncmp only with -minline-all-stringops since > + "repz cmpsb" can be much slower than strncmp functions > + implemented with vector instructions, see > + > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > + */ > + if (!TARGET_INLINE_ALL_STRINGOPS) > FAIL; > >/* Can't use this if the user has appropriated ecx, esi or edi. */ > diff --git a/gcc/testsuite/gcc.target/i386/pr95458-1.c > b/gcc/testsuite/gcc.target/i386/pr95458-1.c > new file mode 100644 > index 000..231a4787dce > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr95458-1.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -minline-all-stringops" } */ > + > +int > +func (char *d, unsigned int l) > +{ > + return __builtin_strncmp (d, "foo", l) ? 1 : 2; > +} > + > +/* { dg-final { scan-assembler-not "call\[\\t \]*_?strncmp" } } */ > +/* { dg-final { scan-assembler "cmpsb" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pr95458-2.c > b/gcc/testsuite/gcc.target/i386/pr95458-2.c > new file mode 100644 > index 000..1a620444770 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr95458-2.c > @@ -0,0 +1,7 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mno-inline-all-stringops" } */ > + > +#include "pr95458-1.c" > + > +/* { dg-final { scan-assembler "call\[\\t \]*_?strncmp" } } */ > +/* { dg-final { scan-assembler-not "cmpsb" } } */ > -- > 2.26.2 > PING. -- H.J.
Re: [PATCH] fixincludes/fixfixes.c: Fix 'set but not used' warning.
On 9/11/20 6:21 AM, Christophe Lyon via Gcc-patches wrote: > pz_tmp_base and pz_tmp_dot are always set, but used only when > _PC_NAME_MAX is defined. > > This patch moves their declaration and definition undef #ifdef > _PC_NAME_MAX to avoid this warning. > > 2020-09-11 Torbjörn SVENSSON > Christophe Lyon > > fixincludes/ > * fixfixes.c (pz_tmp_base, pz_tmp_dot): Define only with > _PC_NAME_MAX. OK. I assume Christophe can commit the change. Jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] Compile gcc.target/i386/pr78904-4a.c with -mtune=generic
On 9/15/20 8:32 AM, H.J. Lu via Gcc-patches wrote: > commit e95395926a84a2406faefe0995295d199d595440 > Author: Uros Bizjak > Date: Thu Jun 18 20:12:48 2020 +0200 > > i386: Fix mode of ZERO_EXTRACT RTXes, remove ext_register_operand > predicate. > > caused > > FAIL: gcc.target/i386/pr78904-4a.c scan-assembler [ \t]movb[\t ]+%.h, t > > when compiled with --target_board='unix{-m32\ -march=cascadelake}'. With > -mtune=generic: > > movzwl 4(%esp), %edx > movl8(%esp), %eax > movb%dh, t(%eax) > ret > > With -mtune=cascadelake: > > movzbl 5(%esp), %edx > movl8(%esp), %eax > movb%dl, t(%eax) > ret > > Add -mtune=generic for --target_board='unix{-m32\ -march=cascadelake}'. OK with an appropriate ChangeLog entry on the commit. jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] Increase rtx_cost of sse_to_integer in skylake_cost.
On 9/15/20 9:20 PM, Hongtao Liu via Gcc-patches wrote: > Hi: > Rtx cost of sse_to_integer would be used by pass_stv as a > measurement for the scalar-to-vector transformation. As > https://gcc.gnu.org/pipermail/gcc-patches/2019-August/528839.html > indicates, movement between sse regs and gprs should be much expensive > than movement inside gprs(which is 2 as default). This patch would > also fix "pr96861". > > Bootstrap is ok, regression test is ok for both "i386.exp=* > --target_board='unix{-m32,}'" and "i386.exp=* > --target_board='unix{-m32\ -march=cascadelake,-m64\ > -march=cascadelake}"". > No big impact on SPEC2017. > Ok for trunk? > > gcc/ChangeLog > > PR target/96861 > * config/i386/x86-tune-costs.h (skylake_cost): increase rtx > cost of sse_to_integer from 2 to 6. > > gcc/testsuite > > * gcc.target/i386/pr95021-3.c: Add -mtune=generic. I'll defer to HJ's judgement here. If he's OK with it, then it's fine by me. jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] gcc: Make strchr return value pointers const
On 9/7/20 6:13 PM, JonY via Gcc-patches wrote: > On 9/4/20 12:47 PM, Martin Storsjö wrote: >> Hi, >> >> On Fri, 4 Sep 2020, Jakub Jelinek wrote: >> >>> On Tue, Sep 01, 2020 at 04:01:42PM +0300, Martin Storsjö wrote: This fixes compilation of codepaths for dos-like filesystems with Clang. When built with clang, it treats C input files as C++ when the compiler driver is invoked in C++ mode, triggering errors when the return value of strchr() on a pointer to const is assigned to a pointer to non-const variable. >>> Not really specific to clang, e.g. glibc does that in its headers too >>> as the C++ standard mandates that (and I guess mingw should do that too). >>> This matches similar variables outside of the ifdefs for dos-like path handling. 2020-09-01 Martin Storsjö gcc/Changelog: * dwarf2out.c (file_name_acquire): Make a strchr return value pointer to const. libcpp/Changelog: * files.c (remap_filename): Make a strchr return value pointer to const. >>> LGTM. And it is short enough not to need copyright assignment, so ok for >>> trunk. >> Thanks! Can someone commit this for me? >> >> // Martin > Ping can anyone commit this? > > Are platform maintainers allowed to push general changes like these? If > so I can push soon. If it's been ack'd by a maintainer, yes. Jakub definitely qualifies as a maintainer, so feel free to push it on Martin's behalf. jeff > pEpkey.asc Description: application/pgp-keys
Re: [PATCH] doc: use @code{} instead of @samp{@command{}} around 'date %s'
On 9/11/20 4:41 PM, Sergei Trofimovich via Gcc-patches wrote: > From: Sergei Trofimovich > > Before the change 'man gcc' rendered "SOURCE_DATE_EPOCH" section as: > ... the output of @command{date +%s} on GNU/Linux ... > After the change it renders as: > ... the output of "date +%s" on GNU/Linux ... > > gcc/ChangeLog: > > * doc/cppenv.texi: Use @code{} instead of @samp{@command{}} > around 'date %s'. OK jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] document -Wuninitialized for allocated objects
On 9/15/20 11:06 AM, Martin Sebor via Gcc-patches wrote: > The attached patch updates the manual to mention that Wuninitialized > and -Wmaybe-uninitialized are issued for both auto and allocated > objects, as well as for passing pointers to uninitialized objects > to const-qualified parameters. Both of these features are GCC 11 > enhancements. > > Martin > > patches-gcc-wuninit-allocated.diff > > Document -Wuninitialized for allocated objects. > > gcc/ChangeLog: > > * doc/invoke.texi (-Wuninitialized): Document -Wuninitialized for > allocated objects. >(-Wmaybe-uninitialized): Same. OK jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] work harder to avoid -Wuninitialized for empty structs (PR 96295)
On 9/15/20 1:00 PM, Martin Sebor via Gcc-patches wrote: > The -Wuninitialized/-Wmaybe-uninitialized enhancement to warn when > a pointer or reference to an uninitialized object is passed to > a const-qualified function argument tries to avoid triggering for > objects of empty types. However, the suppression is incomplete > and lets the warning trigger in some corner cases. The attached > patch extends the suppression to those as well. > > Tested on x86_64-linux. I will plan to commit it later this week > if there are no objections. > > Martin > > gcc-96295.diff > > Work harder to avoid -Wuninitialized for objects of empty structs (PR > middle-end/96295). > > Resolves: > PR middle-end/96295 - -Wmaybe-uninitialized warning for range operator with > reference to an empty struct > > gcc/ChangeLog: > > PR middle-end/96295 > * tree-ssa-uninit.c (maybe_warn_operand): Work harder to avoid > warning for objects of empty structs > > gcc/testsuite/ChangeLog: > > PR middle-end/96295 > * g++.dg/warn/Wuninitialized-11.C: New test. FWIW, we had a build in Fedora rawhide fail just in the last couple weeks because of the enhanced uninitialized warning. I've already forgotten the package, but it was a true positive, though sadly it was just in the package's testsuite and thus we didn't actually find/fix a user visible package bug. OK for the trunk. Thanks. jeff pEpkey.asc Description: application/pgp-keys
Re: [RS6000] rs6000_rtx_costs cost IOR
On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote: > Hi! > > On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote: > > * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR. > > > case IOR: > > - /* FIXME */ > >*total = COSTS_N_INSNS (1); > > - return true; > > Hey this was okay for over five years :-) > > > + left = XEXP (x, 0); > > + if (GET_CODE (left) == AND > > + && CONST_INT_P (XEXP (left, 1))) > > Add a comment that this is the integer insert insns? > > > + // rotlsi3_insert_5 > > But use /* comments */. > > > + /* Test both regs even though the one in the mask is > > +constrained to be equal to the output. Increasing > > +cost may well result in rejecting an invalid insn > > +earlier. */ > > Is that ever actually useful? Possibly not in this particular case, but I did see cases where invalid insns were rejected early by costing non-reg sub-expressions. Beside that, the default position on rtx_costs paths that return true should be to cost any sub-expressions unless you know for sure they are zero cost. And yes, we fail to do that for some cases, eg. mul_highpart. > So this new block is pretty huge. Can it easily be factored to a > separate function? Just the insert insns part, not all IOR. Done in my local tree. > Okay for trunk with the comments changed to the correct syntax, and > factoring masked insert out to a separate function pre-approved if you > want to do that. Thanks! I'll hold off committing until the whole rtx_costs patch series is reviewed (not counting the rotate_and_mask_constant patch). -- Alan Modra Australia Development Lab, IBM
Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
On 9/15/20 1:47 PM, Martin Sebor wrote: > Overflowing the size of a dynamic allocation (e.g., malloc or VLA) > can lead to a subsequent buffer overflow corrupting the heap or > stack. The attached patch diagnoses a subset of these cases where > the overflow/wraparound is still detectable. > > Besides regtesting GCC on x86_64-linux I also verified the warning > doesn't introduce any false positives into Glibc or Binutils/GDB > builds on the same target. > > Martin > > gcc-96838.diff > > PR middle-end/96838 - missing warning on integer overflow in calls to > allocation functions > > gcc/ChangeLog: > > PR middle-end/96838 > * calls.c (eval_size_vflow): New function. > (get_size_range): Call it. Add argument. > (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound. > * calls.h (get_size_range): Add argument. > > gcc/testsuite/ChangeLog: > > PR middle-end/96838 > * gcc.dg/Walloc-size-larger-than-19.c: New test. > * gcc.dg/Walloc-size-larger-than-20.c: New test. If an attacker can control an integer overflow that feeds an allocation, then they can do all kinds of bad things. In fact, when my son was asking me attack vectors, this is one I said I'd look at if I were a bad guy. I'm a bit surprised you can't just query the range of the argument and get the overflow status out of that range, but I don't see that in the APIs. How painful would it be to make that part of the API? The conceptual model would be to just ask for the range of the argument to malloc which would include the range and a status bit indicating the computation might have overflowed. jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] -mno-xsave should imply -mno-avx since -mavx implies -mxsave
Thanks! On Wed, Sep 16, 2020 at 8:57 PM Uros Bizjak wrote: > > > gcc/ChangeLog > > > > * common/config/i386/i386-common.c > > (OPTION_MASK_ISA_AVX_UNSET): Remove OPTION_MASK_ISA_XSAVE_UNSET. > > (OPTION_MASK_ISA_XSAVE_UNSET): Add OPTION_MASK_ISA_AVX_UNSET. > > > > gcc/testsuite/ChangeLog > > > > * gcc.target/i386/xsave-avx-1.c: New test. > > OK for mainline and backports. > > Thanks, > Uros. -- BR, Hongtao
Re: [PATCH] Increase rtx_cost of sse_to_integer in skylake_cost.
Thanks. On Wed, Sep 16, 2020 at 8:54 PM Uros Bizjak wrote: > > > gcc/ChangeLog > > > > PR target/96861 > > * config/i386/x86-tune-costs.h (skylake_cost): increase rtx > > cost of sse_to_integer from 2 to 6. > > > > gcc/testsuite > > > > * gcc.target/i386/pr95021-3.c: Add -mtune=generic. > > OK. > > Thanks, > Uros. -- BR, Hongtao
Re: [PATCH] c++: premature analysis of requires-expression [PR96410]
On 9/16/20 6:11 PM, Patrick Palka wrote: On Wed, 16 Sep 2020, Jason Merrill wrote: On 9/16/20 1:34 PM, Patrick Palka wrote: On Thu, 13 Aug 2020, Jason Merrill wrote: On 8/13/20 11:21 AM, Patrick Palka wrote: On Mon, 10 Aug 2020, Jason Merrill wrote: On 8/10/20 2:18 PM, Patrick Palka wrote: On Mon, 10 Aug 2020, Patrick Palka wrote: In the below testcase, semantic analysis of the requires-expressions in the generic lambda must be delayed until instantiation of the lambda because the requirements depend on the lambda's template arguments. But tsubst_requires_expr does semantic analysis even during regeneration of the lambda, which leads to various bogus errors and ICEs since some subroutines aren't prepared to handle dependent/template trees. This patch adjusts subroutines of tsubst_requires_expr to avoid doing some problematic semantic analyses when processing_template_decl. In particular, expr_noexcept_p generally can't be checked on a dependent expression. Next, tsubst_nested_requirement should avoid checking satisfaction when processing_template_decl. And similarly for convert_to_void (called from tsubst_valid_expression_requirement). I wonder if, instead of trying to do a partial substitution into a requires-expression at all, we want to use the PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the arguments for later satisfaction? Like this? -- >8 -- Subject: [PATCH] c++: requires-expressions and partial instantiation [PR96410] This patch makes tsubst_requires_expr avoid substituting into a requires-expression when partially instantiating a generic lambda. This is necessary in general to ensure that we check its requirements in lexical order (see the first testcase below). Instead, template arguments are saved in the requires-expression until all arguments can be substituted into it at once, using a mechanism similar to PACK_EXPANSION_EXTRA_ARGS. This change incidentally also fixes the two mentioned PRs -- the problem there is that tsubst_requires_expr was performing semantic checks on templated trees, and some of the checks are not prepared to handle such trees. With this patch tsubst_requires_expr, no longer does any semantic checking at all when processing_template_decl. gcc/cp/ChangeLog: PR c++/96409 PR c++/96410 * constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS and REQUIRES_EXPR_REQS. Use REQUIRES_EXPR_EXTRA_ARGS and add_extra_args to defer substitution until we have all the template arguments. (finish_requires_expr): Adjust the call to build_min so that REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE. * cp-tree.def (REQUIRES_EXPR): Give it a third operand. * cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS, REQUIRES_EXPR_EXTRA_ARGS): Define. (add_extra_args): Declare. gcc/testsuite/ChangeLog: PR c++/96409 PR c++/96410 * g++.dg/cpp2a/concepts-lambda13.C: New test. * g++.dg/cpp2a/concepts-lambda14.C: New test. --- gcc/cp/constraint.cc | 21 +++- gcc/cp/cp-tree.def| 8 +++--- gcc/cp/cp-tree.h | 16 .../g++.dg/cpp2a/concepts-lambda13.C | 18 + .../g++.dg/cpp2a/concepts-lambda14.C | 25 +++ 5 files changed, 79 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda14.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index ad9d47070e3..9a06d763554 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -2175,7 +2175,19 @@ tsubst_requires_expr (tree t, tree args, /* A requires-expression is an unevaluated context. */ cp_unevaluated u; - tree parms = TREE_OPERAND (t, 0); + args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args); + if (processing_template_decl) +{ + /* We're partially instantiating a generic lambda. Substituting into +this requires-expression now may cause its requirements to get +checked out of order, so instead just remember the template +arguments and wait until we can substitute them all at once. */ + t = copy_node (t); + REQUIRES_EXPR_EXTRA_ARGS (t) = args; Don't we need build_extra_args, in case the requires_expr is in a local_specializations context like a function body? Ah yes probably... though I haven't yet been able to come up with a concrete testcase that demonstrates we need it. It seems we get away without it because a requires-expression is an unevaluated operand, and many callers of retrieve_local_specialization have fallback code that triggers only when inside an unevaluated operand, e.g. tsubst_copy does: r = retrieve_local_specialization (t); if (r == NULL_TREE) { ... /* This
Re: libbacktrace: Add support for MiniDebugInfo
On Wed, Sep 16, 2020 at 8:54 AM Alex Coplan wrote: > > On 16/09/2020 07:26, Ian Lance Taylor wrote: > > On Wed, Sep 16, 2020, 4:03 AM Alex Coplan wrote: > > > > > Hi Ian, > > > > > > On 14/09/2020 14:12, Ian Lance Taylor via Gcc-patches wrote: > > > > This patch to libbacktrace adds support for MiniDebugInfo, as > > > > requested in PR 93608. > > > > > > This appears to introduce a failure in the libbacktrace testsuite > > > (observed on both x86 and aarch64): > > > > > > ../gcc/libbacktrace/../test-driver: line 107: 7905 Segmentation fault > > > (core dumped) "$@" > $log_file 2>&1 > > > FAIL: mtest_minidebug > > > > > > > > I tested on x86 without seeing anything like this. Can you give me any > > more details? Thanks. > > Sure. On an Ubuntu 18.04 / x86-64 system with current trunk, configuring > with: > > ~/toolchain/src/gcc/configure \ > --prefix=`pwd` \ > --enable-languages=c,c++ \ > --disable-multilib \ > --disable-bootstrap > > running `make && make check-libbacktrace` gives the failure described > above. Here is the contents of libbacktrace/test-suite.log: > > = > package-unused version-unused: ./test-suite.log > = > > # TOTAL: 33 > # PASS: 32 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 1 > # XPASS: 0 > # ERROR: 0 > > .. contents:: :depth: 2 > > FAIL: mtest_minidebug > = > > no debug info in ELF executable > no debug info in ELF executable > no debug info in ELF executable > no debug info in ELF executable > no debug info in ELF executable > no debug info in ELF executable > no debug info in ELF executable > test1: not enough frames; got 0, expected at least 3 > test1: [0]: got expected f3 > test1: [1]: got expected f2 > FAIL mtest_minidebug (exit status: 139) > > --- > > Let me know if you need any more info. Thanks. I think the problem arises when libbacktrace is unable to find debug info for any shared library involved in the link. This patch should fix it. Bootstrapped and ran libbacktrace tests on x86_64-pc-linux-gnu and aarch64-linux-gnu. Committed to mainline. Ian PR libbacktrace/97080 * fileline.c (backtrace_syminfo_to_full_callback): New function. (backtrace_syminfo_to_full_error_callback): New function. * elf.c (elf_nodebug): Call syminfo_fn if possible. * internal.h (struct backtrace_call_full): Define. (backtrace_syminfo_to_full_callback): Declare. (backtrace_syminfo_to_full_error_callback): Declare. * mtest.c (f3): Only check all[i] if data.index permits. diff --git a/libbacktrace/elf.c b/libbacktrace/elf.c index dd004708246..941f820d944 100644 --- a/libbacktrace/elf.c +++ b/libbacktrace/elf.c @@ -547,18 +547,6 @@ elf_crc32_file (struct backtrace_state *state, int descriptor, return ret; } -/* A dummy callback function used when we can't find any debug info. */ - -static int -elf_nodebug (struct backtrace_state *state ATTRIBUTE_UNUSED, -uintptr_t pc ATTRIBUTE_UNUSED, -backtrace_full_callback callback ATTRIBUTE_UNUSED, -backtrace_error_callback error_callback, void *data) -{ - error_callback (data, "no debug info in ELF executable", -1); - return 0; -} - /* A dummy callback function used when we can't find a symbol table. */ @@ -571,6 +559,33 @@ elf_nosyms (struct backtrace_state *state ATTRIBUTE_UNUSED, error_callback (data, "no symbol table in ELF executable", -1); } +/* A callback function used when we can't find any debug info. */ + +static int +elf_nodebug (struct backtrace_state *state, uintptr_t pc, +backtrace_full_callback callback, +backtrace_error_callback error_callback, void *data) +{ + if (state->syminfo_fn != NULL && state->syminfo_fn != elf_nosyms) +{ + struct backtrace_call_full bdata; + + /* Fetch symbol information so that we can least get the +function name. */ + + bdata.full_callback = callback; + bdata.full_error_callback = error_callback; + bdata.full_data = data; + bdata.ret = 0; + state->syminfo_fn (state, pc, backtrace_syminfo_to_full_callback, +backtrace_syminfo_to_full_error_callback, ); + return bdata.ret; +} + + error_callback (data, "no debug info in ELF executable", -1); + return 0; +} + /* Compare struct elf_symbol for qsort. */ static int diff --git a/libbacktrace/fileline.c b/libbacktrace/fileline.c index be62b9899c5..cd1e10dd58c 100644 --- a/libbacktrace/fileline.c +++ b/libbacktrace/fileline.c @@ -317,3 +317,30 @@ backtrace_syminfo (struct backtrace_state *state, uintptr_t pc, state->syminfo_fn (state, pc, callback, error_callback, data); return 1; } + +/* A backtrace_syminfo_callback that can call into a + backtrace_full_callback, used when we have a symbol table but no + debug info. */ + +void +backtrace_syminfo_to_full_callback (void *data, uintptr_t pc, + const char *symname, +
Re: [RS6000] rs6000_rtx_costs cost IOR
Hi! On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote: > * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR. > case IOR: > - /* FIXME */ >*total = COSTS_N_INSNS (1); > - return true; Hey this was okay for over five years :-) > + left = XEXP (x, 0); > + if (GET_CODE (left) == AND > + && CONST_INT_P (XEXP (left, 1))) Add a comment that this is the integer insert insns? > + // rotlsi3_insert_5 But use /* comments */. > + /* Test both regs even though the one in the mask is > + constrained to be equal to the output. Increasing > + cost may well result in rejecting an invalid insn > + earlier. */ Is that ever actually useful? So this new block is pretty huge. Can it easily be factored to a separate function? Just the insert insns part, not all IOR. Okay for trunk with the comments changed to the correct syntax, and factoring masked insert out to a separate function pre-approved if you want to do that. Thanks! Segher
Re: [RS6000] rs6000_rtx_costs multi-insn constants
On Tue, Sep 15, 2020 at 10:49:43AM +0930, Alan Modra wrote: > This small patch to rs6000_rtx_const considerably improves code (_costs) > generated for large constants in 64-bit code, teaching gcc that it is > better to load a constant from memory than to generate a sequence of > up to five dependent instructions. Note that the rs6000 backend does > generate large constants as loads from memory at expand time but > optimisation passes replace them with SETs of the value due to not > having correct costs. > > PR 94393 > * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost multi-insn > constants. Okay for trunk. Note that some p10 insns take a floating point immediate, but those need to be handled specially anyway. Thanks! Segher
Re: [RS6000] rs6000_rtx_costs comment
Hi! This took a while to digest, sorry. On Tue, Sep 15, 2020 at 10:49:42AM +0930, Alan Modra wrote: > + 1) Calls from places like optabs.c:avoid_expensive_constant will > + come here with OUTER_CODE set to an operation such as AND with X > + being a CONST_INT or other CONSTANT_P type. This will be compared > + against set_src_cost, where we'll come here with OUTER_CODE as SET > + and X the same constant. This (and similar) reasons are why I still haven't made set_src_cost based on insn_cost -- it is in some places compared to some rtx_cost. > + 2) Calls from places like combine:distribute_and_simplify_rtx are > + asking whether a possibly quite complex SET_SRC can be implemented > + more cheaply than some other logically equivalent SET_SRC. It is comparing the set_src_cost of two equivalent formulations, yeah. This is one place where set_src_cost can be pretty easily replaced by insn_cost (combine uses that in most other places, already, and that was a quite useful change). > + 3) Calls from places like default_noce_conversion_profitable_p will > + come here via seq_cost and pass the pattern of a SET insn in X. The pattern of the single SET in any instruction that is single_set, yeah. > + Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs > + will next see the SET_SRC. The overall cost should be comparable > + to rs6000_insn_cost since the code is comparing one insn sequence > + (some of which may be costed by insn_cost) against another insn > + sequence. Yes. And our rtx_cost misses incredibly many cases, but most common things are handled okay. > + 4) Calls from places like cprop.c:try_replace_reg will come here > + with OUTER_CODE as INSN, and X either a valid pattern of a SET or > + one where some registers have been replaced with constants. The > + replacements may make the SET invalid, for example if > + (set (reg1) (and (reg2) (const_int 0xfff))) > + replaces reg2 as > + (set (reg1) (and (symbol_ref) (const_int 0xfff))) > + then the replacement can't be implemented in one instruction and > + really the cost should be higher by one instruction. However, > + the cost for invalid insns doesn't matter much except that a > + higher cost may lead to their rejection earlier. Yup. This uses set_rtx_cost, which also ideally will use insn_cost one day. > + 5) fwprop.c:should_replace_address puts yet another wrinkle on this > + function, where we prefer an address calculation that is more > + complex yet has the same address_cost. In this case "more > + complex" is determined by having a higher set_src_cost. So for > + example, if we want a plain (reg) address to be replaced with > + (plus (reg) (const)) when possible then PLUS needs to cost more > + than zero here. */ Maybe it helps if you more prominenty mention set_rtx_cost and set_src_cost? Either way, okay for trunk. Thanks! Segher
Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)
Hi Honza – some input would be really helpful! Hi Jakub – updated version below. On 9/16/20 12:36 PM, Jakub Jelinek wrote: I think you want Honza on this primarily, I'm always lost in the cgraph alias code. (Likewise as this thread shows) + while (node->alias_target) +node = symtab_node::get (node->alias_target); + node = node->ultimate_alias_target (); I think the above is either you walk the aliases yourself, or use ultimate_alias_target, but not both. I think we need to distinguish between: * aliases which end up with the same symbol name and are stored in the ref_list; example: cpp_implicit_alias. * aliases like with the alias attribute, which is handled via alias_target and have different names. Just experimentally: * The 'while (node->alias_target)' properly resolves the attribute testcase (libgomp.c-c++-common/pr96390.c). Here, ultimate_alias_target () does not help as node->analyzed == 0. * The 'node->ultimate_alias_target ()' works for the cpp_implicit_alias case (libgomp.c++/pr96390.C). Just looking at the alias target does not help as in this case, alias_target == NULL. And the second thing is, I'm not sure how the aliases behave if the ultimate alias target is properly marked as omp declare target, but some of the aliases are not. The offloaded code will still call the alias, so do we somehow arrange for the aliases to be also emitted into the offloading LTO IL? [...] I wonder if the aliases that are needed shouldn't be marked node->offloadable and have "omp declare target" attribute added for them too. Done now. Okay – or do we find more issues? Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390) gcc/ChangeLog: PR middle-end/96390 * omp-offload.c (omp_discover_declare_target_tgt_fn_r): Handle alias nodes. libgomp/ChangeLog: PR middle-end/96390 * testsuite/libgomp.c++/pr96390.C: New test. * testsuite/libgomp.c-c++-common/pr96390.c: New test. gcc/omp-offload.c| 50 libgomp/testsuite/libgomp.c++/pr96390.C | 49 +++ libgomp/testsuite/libgomp.c-c++-common/pr96390.c | 26 3 files changed, 117 insertions(+), 8 deletions(-) diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 32c2485abd4..02dec44b8ce 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -196,21 +196,55 @@ omp_declare_target_var_p (tree decl) static tree omp_discover_declare_target_tgt_fn_r (tree *tp, int *walk_subtrees, void *data) { - if (TREE_CODE (*tp) == FUNCTION_DECL - && !omp_declare_target_fn_p (*tp) - && !lookup_attribute ("omp declare target host", DECL_ATTRIBUTES (*tp))) + if (TREE_CODE (*tp) == FUNCTION_DECL) { - tree id = get_identifier ("omp declare target"); - if (!DECL_EXTERNAL (*tp) && DECL_SAVED_TREE (*tp)) - ((vec *) data)->safe_push (*tp); - DECL_ATTRIBUTES (*tp) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (*tp)); + tree decl = *tp; symtab_node *node = symtab_node::get (*tp); if (node != NULL) { - node->offloadable = 1; + /* First, find final FUNCTION_DECL; find final alias target and there + ensure alias like cpp_implicit_alias are resolved by calling + ultimate_alias_target; the latter does not resolve alias_target as + node->analyzed = 0. */ + symtab_node *orig_node = node; + while (node->alias_target) + node = symtab_node::get (node->alias_target); + node = node->ultimate_alias_target (); + decl = node->decl; + + if (omp_declare_target_fn_p (decl) + || lookup_attribute ("omp declare target host", + DECL_ATTRIBUTES (decl))) + return NULL_TREE; + if (ENABLE_OFFLOADING) g->have_offload = true; + + /* Now mark original node and all alias targets for offloading. */ + node->offloadable = 1; + if (orig_node != node) + { + tree id = get_identifier ("omp declare target"); + while (orig_node->alias_target) + { + orig_node = orig_node->ultimate_alias_target (); + DECL_ATTRIBUTES (orig_node->decl) + = tree_cons (id, NULL_TREE, + DECL_ATTRIBUTES (orig_node->decl)); + orig_node = symtab_node::get (orig_node->alias_target); + } + } } + else if (omp_declare_target_fn_p (decl) + || lookup_attribute ("omp declare target host", +DECL_ATTRIBUTES (decl))) + return NULL_TREE; + + tree id = get_identifier ("omp declare target"); + if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl)) + ((vec *) data)->safe_push (decl); + DECL_ATTRIBUTES (decl) = tree_cons (id, NULL_TREE, + DECL_ATTRIBUTES (decl)); } else if (TYPE_P (*tp)) *walk_subtrees = 0; diff --git a/libgomp/testsuite/libgomp.c++/pr96390.C b/libgomp/testsuite/libgomp.c++/pr96390.C new file
[committed] analyzer: fix state explosions due to SCC bug
Debugging the state explosion of the very large switch statement in gcc.dg/analyzer/pr96653.c showed that the worklist was failing to order the exploded nodes correctly; the in-edges at the join point after the switch were not getting processed together, but were instead being rocessed in smaller batches, bloating the exploded graph until the per-point limit was reached. The root cause turned out to be a bug in creating the strongly-connected components for the supergraph: the code was considering interprocedural edges as well as intraprocedural edges, leading to unpredictable misorderings of the SCC and worklist, leading to bloating of the exploded graph. This patch fixes the SCC creation so it only considers intraprocedural edges within the supergraph. It also tweaks worklist::key_t::cmp to give higher precedence to call_string over differences within a supernode, since enodes with different call_strings can't be merges. In practise, none of my test cases were affected by this latter change, though it seems to be the right thing to do. With this patch, the very large switch statement in gcc.dg/analyzer/pr96653.c is handled in a single call to exploded_graph::maybe_process_run_of_before_supernode_enodes: merged 358 in-enodes into 2 out-enode(s) at SN: 402 and that testcase no longer hits the per-program-point limits. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Also lightly tested on powerpc64-linux-gnu, arm-unknown-eabi, and aarch64-unknown-linux-gnu. Pushed to master as r11-3247-gfd111c419d146ee47c7df9a36a535e8d843d4802. gcc/analyzer/ChangeLog: * engine.cc (strongly_connected_components::strong_connect): Only consider intraprocedural edges when creating SCCs. (worklist::key_t::cmp): Add comment. Treat call_string differences as more important than differences of program_point within a supernode. gcc/testsuite/ChangeLog: PR analyzer/96653 * gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c: Update expected number of exploded nodes. * gcc.dg/analyzer/malloc-vs-local-1a.c: Update expected number of exploded nodes. * gcc.dg/analyzer/pr96653.c: Remove -Wno-analyzer-too-complex. --- gcc/analyzer/engine.cc| 22 ++- .../loop-0-up-to-n-by-1-with-iter-obj.c | 3 ++- .../gcc.dg/analyzer/malloc-vs-local-1a.c | 20 - gcc/testsuite/gcc.dg/analyzer/pr96653.c | 3 +-- 4 files changed, 29 insertions(+), 19 deletions(-) diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 637a990da4a..d03e23a9b6e 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -1615,6 +1615,9 @@ strongly_connected_components::strong_connect (unsigned index) superedge *sedge; FOR_EACH_VEC_ELT (v_snode->m_succs, i, sedge) { + if (sedge->get_kind () != SUPEREDGE_CFG_EDGE + && sedge->get_kind () != SUPEREDGE_INTRAPROCEDURAL_CALL) + continue; supernode *w_snode = sedge->m_dest; per_node_data *w = _per_node[w_snode->m_index]; if (w->m_index == -1) @@ -1690,7 +1693,14 @@ worklist::add_node (exploded_node *enode) /* Comparator for implementing worklist::key_t comparison operators. Return negative if KA is before KB Return positive if KA is after KB - Return 0 if they are equal. */ + Return 0 if they are equal. + + The ordering of the worklist is critical for performance and for + avoiding node explosions. Ideally we want all enodes at a CFG join-point + with the same callstring to be sorted next to each other in the worklist + so that a run of consecutive enodes can be merged and processed "in bulk" + rather than individually or pairwise, minimizing the number of new enodes + created. */ int worklist::key_t::cmp (const worklist::key_t , const worklist::key_t ) @@ -1742,6 +1752,11 @@ worklist::key_t::cmp (const worklist::key_t , const worklist::key_t ) gcc_assert (snode_a == snode_b); + /* The points might vary by callstring; try sorting by callstring. */ + int cs_cmp = call_string::cmp (call_string_a, call_string_b); + if (cs_cmp) +return cs_cmp; + /* Order within supernode via program point. */ int within_snode_cmp = function_point::cmp_within_supernode (point_a.get_function_point (), @@ -1749,11 +1764,6 @@ worklist::key_t::cmp (const worklist::key_t , const worklist::key_t ) if (within_snode_cmp) return within_snode_cmp; - /* The points might vary by callstring; try sorting by callstring. */ - int cs_cmp = call_string::cmp (call_string_a, call_string_b); - if (cs_cmp) -return cs_cmp; - /* Otherwise, we ought to have the same program_point. */ gcc_assert (point_a == point_b); diff --git a/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c b/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c index 0172c9b324c..2b0352711ae 100644 ---
[committed] analyzer: show SCC ids in .dot dumps
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to master as d2c4d5199cf277becc1f377536973815d1c9519c. gcc/analyzer/ChangeLog: * engine.cc (supernode_cluster::dump_dot): Show the SCC id in the per-supernode clusters in FILENAME.eg.dot output. (exploded_graph_annotator::add_node_annotations): Show the SCC of the supernode in FILENAME.supernode.eg.dot output. * exploded-graph.h (worklist::scc_id): New. (exploded_graph::get_scc_id): New. --- gcc/analyzer/engine.cc| 6 -- gcc/analyzer/exploded-graph.h | 9 + 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 53fafb58633..637a990da4a 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -3288,8 +3288,9 @@ public: (const void *)this); gv->indent (); gv->println ("style=\"dashed\";"); -gv->println ("label=\"SN: %i (bb: %i)\";", -m_supernode->m_index, m_supernode->m_bb->index); +gv->println ("label=\"SN: %i (bb: %i; scc: %i)\";", +m_supernode->m_index, m_supernode->m_bb->index, +args.m_eg.get_scc_id (*m_supernode)); int i; exploded_node *enode; @@ -4040,6 +4041,7 @@ public: gv->begin_td (); pp_string (pp, "BEFORE"); +pp_printf (pp, " (scc: %i)", m_eg.get_scc_id (n)); gv->end_td (); unsigned i; diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h index 5d4c3190283..04e878fbdfc 100644 --- a/gcc/analyzer/exploded-graph.h +++ b/gcc/analyzer/exploded-graph.h @@ -652,6 +652,10 @@ public: exploded_node *take_next (); exploded_node *peek_next (); void add_node (exploded_node *enode); + int get_scc_id (const supernode ) const + { +return m_scc.get_scc_id (snode.m_index); + } private: class key_t @@ -783,6 +787,11 @@ public: const call_string_data_map_t *get_per_call_string_data () const { return _per_call_string_data; } + int get_scc_id (const supernode ) const + { +return m_worklist.get_scc_id (node); + } + private: void print_bar_charts (pretty_printer *pp) const; -- 2.26.2
[committed] analyzer: bulk merger/processing of runs of nodes at CFG join points
Prior to this patch the analyzer worklist considered only one node or two nodes at a time, processing and/or merging state individually or pairwise. This could lead to explosions of merger nodes at CFG join points, especially after switch statements, which could have large numbers of in-edges, and thus large numbers of merger exploded_nodes could be created, exceeding the per-point limit and thus stopping analysis with -Wanalyzer-too-complex. This patch special-cases the handling for runs of consecutive nodes in the worklist at a CFG join point, processing and merging them all together. The patch fixes a state explosion seen in bzip2.c seen when attempting to reproduce PR analyzer/95188, in a switch statement in a loop for argument parsing. With this patch, the analyzer successfully consolidates the state after the argument parsing to a single exploded node. In gcc.dg/analyzer/pr96653.c there is a switch statement with over 300 cases which leads to hitting the per-point limit. With this patch the consolidation code doesn't manage to merge all of them due to other worklist-ordering bugs, and it still hits the per-point limits, but it does manage some very long consolidations: merged 2 in-enodes into 2 out-enode(s) at SN: 403 merged 2 in-enodes into 2 out-enode(s) at SN: 403 merged 2 in-enodes into 1 out-enode(s) at SN: 11 merged 29 in-enodes into 1 out-enode(s) at SN: 35 merged 6 in-enodes into 1 out-enode(s) at SN: 41 merged 31 in-enodes into 1 out-enode(s) at SN: 35 and with a followup patch to fix an SCC issue it manages: merged 358 in-enodes into 2 out-enode(s) at SN: 402 The patch appears to fix the failure on non-x86_64 of: gcc.dg/analyzer/pr93032-mztools.c (test for excess errors) which is PR analyzer/96616. Unfortunately, the patch introduces a memory leak false positive in gcc.dg/analyzer/pr94851-1.c, but this appears to be a pre-existing bug that was hidden by state-merging failures. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Also lightly tested on powerpc64-linux-gnu, arm-unknown-eabi, and aarch64-unknown-linux-gnu to verify fix for PR analyzer/96616. Pushed to master as r11-3245-gb28491dc2d79763ecbff4f0b9f1f3e48a443be1d. gcc/analyzer/ChangeLog: * engine.cc (exploded_node::dump_dot): Show STATUS_BULK_MERGED. (exploded_graph::process_worklist): Call maybe_process_run_of_before_supernode_enodes. (exploded_graph::maybe_process_run_of_before_supernode_enodes): New. (exploded_graph_annotator::print_enode): Show STATUS_BULK_MERGED. * exploded-graph.h (enum exploded_node::status): Add STATUS_BULK_MERGED. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/bzip2-arg-parse-1.c: New test. * gcc.dg/analyzer/loop-n-down-to-1-by-1.c: Remove xfail. * gcc.dg/analyzer/pr94851-1.c: Add xfail. --- gcc/analyzer/engine.cc| 207 ++ gcc/analyzer/exploded-graph.h | 4 + .../gcc.dg/analyzer/bzip2-arg-parse-1.c | 95 .../gcc.dg/analyzer/loop-n-down-to-1-by-1.c | 4 +- gcc/testsuite/gcc.dg/analyzer/pr94851-1.c | 3 +- 5 files changed, 310 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/bzip2-arg-parse-1.c diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 5903b19b774..53fafb58633 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -848,6 +848,8 @@ exploded_node::dump_dot (graphviz_out *gv, const dump_args_t ) const pp_printf (pp, "EN: %i", m_index); if (m_status == STATUS_MERGER) pp_string (pp, " (merger)"); + else if (m_status == STATUS_BULK_MERGED) +pp_string (pp, " (bulk merged)"); pp_newline (pp); if (args.show_enode_details_p (*this)) @@ -2211,6 +2213,12 @@ exploded_graph::process_worklist () if (logger) logger->log ("next to process: EN: %i", node->m_index); + /* If we have a run of nodes that are before-supernode, try merging and +processing them together, rather than pairwise or individually. */ + if (flag_analyzer_state_merge && node != m_origin) + if (maybe_process_run_of_before_supernode_enodes (node)) + goto handle_limit; + /* Avoid exponential explosions of nodes by attempting to merge nodes that are at the same program point and which have sufficiently similar state. */ @@ -2340,6 +2348,7 @@ exploded_graph::process_worklist () process_node (node); +handle_limit: /* Impose a hard limit on the number of exploded nodes, to ensure that the analysis terminates in the face of pathological state explosion (or bugs). @@ -2367,6 +2376,201 @@ exploded_graph::process_worklist () } } +/* Attempt to process a consecutive run of sufficiently-similar nodes in + the worklist at a CFG join-point (having already popped ENODE from the + head of the worklist). + + If ENODE's point is of the form
[committed] analyzer: add program_point::get_next
Avoid some future copy-and-paste by introducing a function. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to master as b9b5fc0c2175b34131d9fd0805b1b307f754f4f0. gcc/analyzer/ChangeLog: * engine.cc (exploded_graph::process_node) : Simplify by using program_point::get_next. * program-point.cc (program_point::get_next): New. * program-point.h (program_point::get_next): New decl. --- gcc/analyzer/engine.cc| 24 gcc/analyzer/program-point.cc | 29 + gcc/analyzer/program-point.h | 2 ++ 3 files changed, 35 insertions(+), 20 deletions(-) diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 8f5c5143ca5..5903b19b774 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -2458,26 +2458,10 @@ exploded_graph::process_node (exploded_node *node) ); } - if (point.get_supernode ()->m_stmts.length () > 0) - { - program_point next_point - = program_point::before_stmt (point.get_supernode (), 0, - point.get_call_string ()); - exploded_node *next - = get_or_create_node (next_point, next_state, node); - if (next) - add_edge (node, next, NULL); - } - else - { - program_point next_point - = program_point::after_supernode (point.get_supernode (), - point.get_call_string ()); - exploded_node *next = get_or_create_node (next_point, next_state, - node); - if (next) - add_edge (node, next, NULL); - } + program_point next_point (point.get_next ()); + exploded_node *next = get_or_create_node (next_point, next_state, node); + if (next) + add_edge (node, next, NULL); } break; case PK_BEFORE_STMT: diff --git a/gcc/analyzer/program-point.cc b/gcc/analyzer/program-point.cc index 2c0a4c42bf5..ef19e6e38e5 100644 --- a/gcc/analyzer/program-point.cc +++ b/gcc/analyzer/program-point.cc @@ -529,6 +529,35 @@ function_point::next_stmt () } } +/* For those program points for which there is a uniquely-defined + successor, return it. */ + +program_point +program_point::get_next () const +{ + switch (m_function_point.get_kind ()) +{ +default: + gcc_unreachable (); +case PK_ORIGIN: +case PK_AFTER_SUPERNODE: + gcc_unreachable (); /* Not uniquely defined. */ +case PK_BEFORE_SUPERNODE: + if (get_supernode ()->m_stmts.length () > 0) + return before_stmt (get_supernode (), 0, get_call_string ()); + else + return after_supernode (get_supernode (), get_call_string ()); +case PK_BEFORE_STMT: + { + unsigned next_idx = get_stmt_idx (); + if (next_idx < get_supernode ()->m_stmts.length ()) + return before_stmt (get_supernode (), next_idx, get_call_string ()); + else + return after_supernode (get_supernode (), get_call_string ()); + } +} +} + #if CHECKING_P namespace selftest { diff --git a/gcc/analyzer/program-point.h b/gcc/analyzer/program-point.h index cb11478468d..97fd0a5e9de 100644 --- a/gcc/analyzer/program-point.h +++ b/gcc/analyzer/program-point.h @@ -294,6 +294,8 @@ public: /* For before_stmt, go to next stmt. */ void next_stmt () { m_function_point.next_stmt (); } + program_point get_next () const; + private: function_point m_function_point; call_string m_call_string; -- 2.26.2
[committed] analyzer: show program point in -Wanalyzer-too-complex
I found this useful when debugging. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to master as 6dd96e24ea3cb9919fedd4da35fbfd36ed98b0ea. gcc/analyzer/ChangeLog: * engine.cc (exploded_graph::get_or_create_node): Show the program point when issuing -Wanalyzer-too-complex due to hitting the per-program-point limit. --- gcc/analyzer/engine.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 49701b74fd4..8f5c5143ca5 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -1982,6 +1982,7 @@ exploded_graph::get_or_create_node (const program_point , > param_analyzer_max_enodes_per_program_point) { pretty_printer pp; + point.print (, format (false)); print_enode_indices (, per_point_data->m_enodes); if (logger) logger->log ("not creating enode; too many at program point: %s", -- 2.26.2
[committed] analyzer: getchar has no side-effects
Seen whilst debugging another issue, where the analyzer was assuming conservatively that a call to getchar could clobber a global. This is handled for most of the other stdio functions by the list in sm-file.cc Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to master as e097c9ab83192fc2f738ec6426a275282e9a51ea. gcc/analyzer/ChangeLog: * region-model.cc (region_model::on_call_pre): Treat getchar as having no side-effects. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/getchar-1.c: New test. --- gcc/analyzer/region-model.cc | 5 + gcc/testsuite/gcc.dg/analyzer/getchar-1.c | 19 +++ 2 files changed, 24 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/analyzer/getchar-1.c diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index d53272e4332..1312391557d 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -732,6 +732,11 @@ region_model::on_call_pre (const gcall *call, region_model_context *ctxt) return impl_call_calloc (cd); else if (is_named_call_p (callee_fndecl, "alloca", call, 1)) return impl_call_alloca (cd); + else if (is_named_call_p (callee_fndecl, "getchar", call, 0)) + { + /* No side-effects (tracking stream state is out-of-scope +for the analyzer). */ + } else if (is_named_call_p (callee_fndecl, "memset", call, 3)) { impl_call_memset (cd); diff --git a/gcc/testsuite/gcc.dg/analyzer/getchar-1.c b/gcc/testsuite/gcc.dg/analyzer/getchar-1.c new file mode 100644 index 000..25595e0786e --- /dev/null +++ b/gcc/testsuite/gcc.dg/analyzer/getchar-1.c @@ -0,0 +1,19 @@ +#include +#include "analyzer-decls.h" + +int test_1 (void) +{ + int c = getchar (); + return c; +} + +int glob_2; +int test_2 (void) +{ + int c; + glob_2 = 42; + __analyzer_eval (glob_2 == 42); /* { dg-warning "TRUE" } */ + c = getchar (); + __analyzer_eval (glob_2 == 42); /* { dg-warning "TRUE" } */ + return c; +} -- 2.26.2
Re: [PATCH] c++: premature analysis of requires-expression [PR96410]
On Wed, 16 Sep 2020, Jason Merrill wrote: > On 9/16/20 1:34 PM, Patrick Palka wrote: > > On Thu, 13 Aug 2020, Jason Merrill wrote: > > > > > On 8/13/20 11:21 AM, Patrick Palka wrote: > > > > On Mon, 10 Aug 2020, Jason Merrill wrote: > > > > > > > > > On 8/10/20 2:18 PM, Patrick Palka wrote: > > > > > > On Mon, 10 Aug 2020, Patrick Palka wrote: > > > > > > > > > > > > > In the below testcase, semantic analysis of the > > > > > > > requires-expressions > > > > > > > in > > > > > > > the generic lambda must be delayed until instantiation of the > > > > > > > lambda > > > > > > > because the requirements depend on the lambda's template > > > > > > > arguments. > > > > > > > But > > > > > > > tsubst_requires_expr does semantic analysis even during > > > > > > > regeneration > > > > > > > of > > > > > > > the lambda, which leads to various bogus errors and ICEs since > > > > > > > some > > > > > > > subroutines aren't prepared to handle dependent/template trees. > > > > > > > > > > > > > > This patch adjusts subroutines of tsubst_requires_expr to avoid > > > > > > > doing > > > > > > > some problematic semantic analyses when processing_template_decl. > > > > > > > In particular, expr_noexcept_p generally can't be checked on a > > > > > > > dependent > > > > > > > expression. Next, tsubst_nested_requirement should avoid checking > > > > > > > satisfaction when processing_template_decl. And similarly for > > > > > > > convert_to_void (called from tsubst_valid_expression_requirement). > > > > > > > > > > I wonder if, instead of trying to do a partial substitution into a > > > > > requires-expression at all, we want to use the > > > > > PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the > > > > > arguments for later satisfaction? > > > > Like this? > > > > -- >8 -- > > > > Subject: [PATCH] c++: requires-expressions and partial instantiation > > [PR96410] > > > > This patch makes tsubst_requires_expr avoid substituting into a > > requires-expression when partially instantiating a generic lambda. This > > is necessary in general to ensure that we check its requirements in > > lexical order (see the first testcase below). Instead, template > > arguments are saved in the requires-expression until all arguments can > > be substituted into it at once, using a mechanism similar to > > PACK_EXPANSION_EXTRA_ARGS. > > > > This change incidentally also fixes the two mentioned PRs -- the problem > > there is that tsubst_requires_expr was performing semantic checks on > > templated trees, and some of the checks are not prepared to handle such > > trees. With this patch tsubst_requires_expr, no longer does any > > semantic checking at all when processing_template_decl. > > > > gcc/cp/ChangeLog: > > > > PR c++/96409 > > PR c++/96410 > > * constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS > > and REQUIRES_EXPR_REQS. Use REQUIRES_EXPR_EXTRA_ARGS and > > add_extra_args to defer substitution until we have all the > > template arguments. > > (finish_requires_expr): Adjust the call to build_min so that > > REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE. > > * cp-tree.def (REQUIRES_EXPR): Give it a third operand. > > * cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS, > > REQUIRES_EXPR_EXTRA_ARGS): Define. > > (add_extra_args): Declare. > > > > gcc/testsuite/ChangeLog: > > > > PR c++/96409 > > PR c++/96410 > > * g++.dg/cpp2a/concepts-lambda13.C: New test. > > * g++.dg/cpp2a/concepts-lambda14.C: New test. > > --- > > gcc/cp/constraint.cc | 21 +++- > > gcc/cp/cp-tree.def| 8 +++--- > > gcc/cp/cp-tree.h | 16 > > .../g++.dg/cpp2a/concepts-lambda13.C | 18 + > > .../g++.dg/cpp2a/concepts-lambda14.C | 25 +++ > > 5 files changed, 79 insertions(+), 9 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C > > create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda14.C > > > > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc > > index ad9d47070e3..9a06d763554 100644 > > --- a/gcc/cp/constraint.cc > > +++ b/gcc/cp/constraint.cc > > @@ -2175,7 +2175,19 @@ tsubst_requires_expr (tree t, tree args, > > /* A requires-expression is an unevaluated context. */ > > cp_unevaluated u; > > - tree parms = TREE_OPERAND (t, 0); > > + args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args); > > + if (processing_template_decl) > > +{ > > + /* We're partially instantiating a generic lambda. Substituting into > > +this requires-expression now may cause its requirements to get > > +checked out of order, so instead just remember the template > > +arguments and wait until we can substitute them all at once. */ > > + t = copy_node (t); > > + REQUIRES_EXPR_EXTRA_ARGS (t) = args; > > Don't we need
Re: [patch] Fix dangling references in thunks at -O0
This introduces an ICE building the glibc testsuite for alpha (bisected), s390 and sparc (symptoms appear the same, not bisected to confirm the exact revision). See bug 97078. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] c++: premature analysis of requires-expression [PR96410]
On 9/16/20 1:34 PM, Patrick Palka wrote: On Thu, 13 Aug 2020, Jason Merrill wrote: On 8/13/20 11:21 AM, Patrick Palka wrote: On Mon, 10 Aug 2020, Jason Merrill wrote: On 8/10/20 2:18 PM, Patrick Palka wrote: On Mon, 10 Aug 2020, Patrick Palka wrote: In the below testcase, semantic analysis of the requires-expressions in the generic lambda must be delayed until instantiation of the lambda because the requirements depend on the lambda's template arguments. But tsubst_requires_expr does semantic analysis even during regeneration of the lambda, which leads to various bogus errors and ICEs since some subroutines aren't prepared to handle dependent/template trees. This patch adjusts subroutines of tsubst_requires_expr to avoid doing some problematic semantic analyses when processing_template_decl. In particular, expr_noexcept_p generally can't be checked on a dependent expression. Next, tsubst_nested_requirement should avoid checking satisfaction when processing_template_decl. And similarly for convert_to_void (called from tsubst_valid_expression_requirement). I wonder if, instead of trying to do a partial substitution into a requires-expression at all, we want to use the PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the arguments for later satisfaction? Like this? -- >8 -- Subject: [PATCH] c++: requires-expressions and partial instantiation [PR96410] This patch makes tsubst_requires_expr avoid substituting into a requires-expression when partially instantiating a generic lambda. This is necessary in general to ensure that we check its requirements in lexical order (see the first testcase below). Instead, template arguments are saved in the requires-expression until all arguments can be substituted into it at once, using a mechanism similar to PACK_EXPANSION_EXTRA_ARGS. This change incidentally also fixes the two mentioned PRs -- the problem there is that tsubst_requires_expr was performing semantic checks on templated trees, and some of the checks are not prepared to handle such trees. With this patch tsubst_requires_expr, no longer does any semantic checking at all when processing_template_decl. gcc/cp/ChangeLog: PR c++/96409 PR c++/96410 * constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS and REQUIRES_EXPR_REQS. Use REQUIRES_EXPR_EXTRA_ARGS and add_extra_args to defer substitution until we have all the template arguments. (finish_requires_expr): Adjust the call to build_min so that REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE. * cp-tree.def (REQUIRES_EXPR): Give it a third operand. * cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS, REQUIRES_EXPR_EXTRA_ARGS): Define. (add_extra_args): Declare. gcc/testsuite/ChangeLog: PR c++/96409 PR c++/96410 * g++.dg/cpp2a/concepts-lambda13.C: New test. * g++.dg/cpp2a/concepts-lambda14.C: New test. --- gcc/cp/constraint.cc | 21 +++- gcc/cp/cp-tree.def| 8 +++--- gcc/cp/cp-tree.h | 16 .../g++.dg/cpp2a/concepts-lambda13.C | 18 + .../g++.dg/cpp2a/concepts-lambda14.C | 25 +++ 5 files changed, 79 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda14.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index ad9d47070e3..9a06d763554 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -2175,7 +2175,19 @@ tsubst_requires_expr (tree t, tree args, /* A requires-expression is an unevaluated context. */ cp_unevaluated u; - tree parms = TREE_OPERAND (t, 0); + args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args); + if (processing_template_decl) +{ + /* We're partially instantiating a generic lambda. Substituting into +this requires-expression now may cause its requirements to get +checked out of order, so instead just remember the template +arguments and wait until we can substitute them all at once. */ + t = copy_node (t); + REQUIRES_EXPR_EXTRA_ARGS (t) = args; Don't we need build_extra_args, in case the requires_expr is in a local_specializations context like a function body? + return t; +} + + tree parms = REQUIRES_EXPR_PARMS (t); if (parms) { parms = tsubst_constraint_variables (parms, args, info); @@ -2183,14 +2195,11 @@ tsubst_requires_expr (tree t, tree args, return boolean_false_node; } - tree reqs = TREE_OPERAND (t, 1); + tree reqs = REQUIRES_EXPR_REQS (t); reqs = tsubst_requirement_body (reqs, args, info); if (reqs == error_mark_node) return boolean_false_node; - if (processing_template_decl) -return finish_requires_expr (cp_expr_location (t), parms, reqs);
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
Segher and Richard, Now there are two major concerns from the discussion so far: 1. (From Richard): Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. I think that both of the above 2 concerns are important and should be addressed for the correct implementation. In order to support 1, we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()” since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around. More restrictions need to be added to these new zero insns. (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). So, based on the above, I propose the following approach that will resolve the above 2 concerns: 1. Add 2 new target hooks: A. targetm.pro_epilogue_use (reg) This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to prevent deleting register setting instructions in prologue and epilogue. B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs) This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS. A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 2. Add a new pass, pass_zero_call_used_regs, in the beginning of pass_late_compilation. This pass will search all “return”s, and compute the hard register set for zeroing, “need_zeroed_hardregs”, based on data flow information, user request, and function abi. Then call targetm.gen_zero_call_used_regs(need_zeroed_hardregs). 3. X86 backend will implement a special version for “gen_zero_call_used_regs”, and “pro_epilogue_use”. Let me know if you have any more comment on this approach. thanks. Qing > On Sep 16, 2020, at 5:35 AM, Segher Boessenkool > wrote: > > On Tue, Sep 15, 2020 at 08:51:57PM -0500, Qing Zhao wrote: >>> On Sep 15, 2020, at 6:09 PM, Segher Boessenkool >>> wrote: >>> If you want the zeroing insns to stay with the return, you have to >>> express that in RTL. >> >> What do you mean by “express that in RTL”? >> Could you please explain this in more details? > > Exactly as I say: you need to tell in the RTL that the insns should stay > together. > > Easiest is to just make it one RTL insn. There are other ways, but > those do not help anything here afaics. > >> Do you mean to implement this in “targetm.gen_return” and >> “targetm.gen_simple_return”? > > That is the easiest way, yes. > >>> Anything else is extremely fragile. > > > Segher
Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]
Hi! On Thu, Sep 10, 2020 at 07:18:01PM +0100, Richard Sandiford wrote: > It's all a bit unfortunate really. Having different representations > for shifts inside and outside MEMs is the Second Great RTL Mistake. Yes... All targets with something that computes shifted addresses (like in a LEA style insn) needs to have that insn in two representations. Can this still be fixed? > (The first was modeless const_ints. :-)) I *like* those. Almost always. Stockholm syndrome? These days a mode on them should not cost much (in storage size). It should simplify quite some code, too. > If we do that, we should be able to remove the handling of > extract-based addresses in aarch64_classify_index & co. If we do what? I don't follow, sorry. (Patch to combine sounds fine fwiw; patches welcome, as always.) Segher
Re: [PING][PATCH] improve validation of attribute arguments (PR c/78666)
On 9/15/20 5:15 PM, Joseph Myers wrote: On Wed, 9 Sep 2020, Martin Sebor via Gcc-patches wrote: Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552500.html Aldy provided a bunch of comments on this patch but I'm still looking for a formal approval. This patch is OK. Some testing revealed that the code has different semantics for strings: it compares them including all nuls embedded in them, while I think the right semantics for attributes is to only consider strings up and including the first nul (i.e., apply the usual string semantics). A test case for this corner case is as follows: __attribute__ ((section ("foo\0bar"))) void f (void); __attribute__ ((section ("foo"))) void f (void) { } Without operand_equal_p() GCC accepts this and puts f in section foo. With the operand_equal_p() change above, it complains: ignoring attribute ‘section ("foo\x00bar")’ because it conflicts with previous ‘section ("foor")’ [-Wattributes] I would rather not change the behavior in this corner case just to save a few lines of code. If it's thought that string arguments to attributes (some or all) should be interpreted differently than in other contexts it seems that such a change should be made separately and documented in the manual. I think that for at least the section attribute, embedded NULs are suspect. In that case, so are wide strings, but for some attributes wide strings should be accepted and handled sensibly (but aren't, see bug 91182). I agree. Clang has the same "bug" as GCC would if it used operand_equal_p(). Besides embedded nuls, other characters are problematic and cause (sometimes confusing) assembler errors. Examples are the empty string, embedded whitespace, or mismatched quotes. It would be nice to do some basic validation and print a nice warning for the most common problems. Martin
Re: Problem with static const objects and LTO
On 9/16/20 11:52 AM, Joseph Myers wrote: > On Wed, 16 Sep 2020, Jeff Law via Gcc-patches wrote: > >> ISTM this is a lot like the problem we have where we inline functions >> with static data. To fix those we use STB_GNU_UNIQUE. But I don't see >> any code in the C front-end which would utilize STB_GNU_UNIQUE. It's >> support seems limited to C++. >> >> >> How is this supposed to work for C? > C inline functions don't try to address this. The standard has a rule "An > inline definition of a function with external linkage shall not contain a > definition of a modifiable object with static or thread storage duration, > and shall not contain a reference to an identifier with internal > linkage.", which avoids some cases of this, but you can still get multiple > copies of objects (with static storage duration, not modifiable, no > linkage, i.e. static const defined inside an inline function) as noted in > a footnote "Since an inline definition is distinct from the corresponding > external definition and from any other corresponding inline definitions in > other translation units, all corresponding objects with static storage > duration are also distinct in each of the definitions.". I was kindof guessing C would end up with something like the above -- I'm just happy its as well documented as it is. Thanks! The more I ponder this problem the more worried I get. Before LTO we could consider the TU an indivisible unit and if the main program referenced something in the TU, then a static link of that TU would bring the entire TU into the main program and that would satisfy all references to all symbols defined by the TU, including those in any DSOs used by the main program. So it's an indivisible unit at runtime too. I could change internal data structures, rebuild the static library & main program (leaving the DSOs alone) and it'd just work. In an LTO world the TU isn't indivisible anymore. LTO will happily discard things which don't appear to be used. So parts of the TU may be in the main program, other parts may be in DSOs used by the main program. This can mean that objects are unexpectedly being passed across DSO boundaries and the details of those objects has, in effect, become part of the ABI. If I was to change an internal data structure, build the static library and main program we could get bad behavior because an instance of that data structure could be passed to a DSO because the main executable is "incomplete" and we end up calling copies of routines from the (not rebuilt) DSOs. Jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] preprocessor: Fix ICE with too long line in fmtwarn [PR96935]
On Wed, Sep 16, 2020 at 03:15:19PM -0400, David Malcolm via Gcc-patches wrote: > On Wed, 2020-09-16 at 11:16 -0400, Marek Polacek wrote: > > Here we ICE in char_span::subspan because the offset it gets is -1. > > It's -1 because get_substring_ranges_for_loc gets a location whose > > column was 0. That only happens in testcases like the attached where > > we're dealing with extremely long lines (at least 4065 chars it > > seems). > > This does happen in practice, though, so it's not just a theoretical > > problem (e.g. when building the SU2 suite). > > > > Fixed by checking that the column get_substring_ranges_for_loc gets > > is > > sane, akin to other checks in that function. > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10? > > [...] > > Thanks for the patch. > > > index d573b90341a..ea3d4f34220 100644 > > --- a/gcc/input.c > > +++ b/gcc/input.c > > @@ -1461,6 +1461,8 @@ get_substring_ranges_for_loc (cpp_reader > > *pfile, > >size_t literal_length = finish.column - start.column + 1; > > > >/* Ensure that we don't crash if we got the wrong > > location. */ > > + if (start.column < 1) > > + return "line is too long"; > > Nit: I believe this could also happen for very large source files when > locations exceed LINE_MAP_MAX_LOCATION_WITH_COLS. > > So the error message should read "zero start column", or even just > "start.column < 1" I think. (if we have a negative column number then > something has gone badly wrong; not sure if that's expressible via > #line directives) > > OK with that change. > Dave Thanks for the quick review, pushed with the "zero start column" change. Marek
[OG10] Backporting + merge of GCC 10 into branch
OG10 = devel/omp/gcc-10 e0e1d281472 Fortran: OpenMP - fix simd with (last)private (PR97061) 13ae89bc9c9 Merge remote-tracking branch 'origin/releases/gcc-10' into devel/omp/gcc-10 5a7ae4f073f [OpenMP] Fix mapping of artificial variables (PR94874) 3a6cb32b2c7 OpenMP/Fortran: Permit impure ELEMENTAL in omp directives 2dcc1721877 libgomp/target.c: Silence -Wuninitialized warning 76e74f04063 Merge remote-tracking branch 'origin/releases/gcc-10' into devel/omp/gcc-10 - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Re: Commit 052204fac580 was never sent to gcc-patches
Hi! On Wed, Sep 16, 2020 at 08:37:34PM +0200, Andrea Corallo wrote: > Segher Boessenkool writes: > > > ... and it causes testsuite regressions on Power. We haven't determined > > et if it actually is worse code, but there are testcases that trip on > > it. Either way, all patches should be send to gcc-patches@, whether > > pre-approved or not. > > > > Please correct this. Thanks! > > the patch has been discussed here > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553839.html Ah, with a different title. And I also searched for your name, but for some reason that did not work. Huh. > If you think it is better I've no problem with reverting it otherwise > I'll be happy to help solving the caused issue. No, please keep it; as I said, we don't yet know if it actually is a problem for us, it just makes testcases fail. Thanks, Segher
Re: [PATCH] preprocessor: Fix ICE with too long line in fmtwarn [PR96935]
On Wed, 2020-09-16 at 11:16 -0400, Marek Polacek wrote: > Here we ICE in char_span::subspan because the offset it gets is -1. > It's -1 because get_substring_ranges_for_loc gets a location whose > column was 0. That only happens in testcases like the attached where > we're dealing with extremely long lines (at least 4065 chars it > seems). > This does happen in practice, though, so it's not just a theoretical > problem (e.g. when building the SU2 suite). > > Fixed by checking that the column get_substring_ranges_for_loc gets > is > sane, akin to other checks in that function. > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10? [...] Thanks for the patch. > index d573b90341a..ea3d4f34220 100644 > --- a/gcc/input.c > +++ b/gcc/input.c > @@ -1461,6 +1461,8 @@ get_substring_ranges_for_loc (cpp_reader > *pfile, >size_t literal_length = finish.column - start.column + 1; > >/* Ensure that we don't crash if we got the wrong > location. */ > + if (start.column < 1) > + return "line is too long"; Nit: I believe this could also happen for very large source files when locations exceed LINE_MAP_MAX_LOCATION_WITH_COLS. So the error message should read "zero start column", or even just "start.column < 1" I think. (if we have a negative column number then something has gone badly wrong; not sure if that's expressible via #line directives) OK with that change. Dave >if (line.length () < (start.column - 1 + literal_length)) > return "line is not wide enough"; > [...]
Re: [PING 2][PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters
On 9/15/20 5:02 PM, Joseph Myers wrote: On Wed, 9 Sep 2020, Martin Sebor via Gcc-patches wrote: Joseph, do you have any concerns with or comments on the most recent patch or is it okay as is? https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552266.html I'm not yet convinced by the logic for extracting an array bound from a parameter declared using a typedef for an array type. Say you have typedef int A[3]; void f (A *x[*]); so an argument that is an array, using [*], of pointers to arrays, where those latter arrays are specified using the typedef. As I read the logic, first the pointer declarator is handled (ignored), then the array declarator results in [*] being stored in spec, then the "if (pd->kind == cdk_id)" handling comes into play - and because spec is "*" and vbchain is NULL_TREE, the upper bound of A gets extracted, but the upper bound of A should be irrelevant here because it's a type that's the target of a pointer. The information from parm->specs->type logically comes before, not after, the information from the declarator. I see, you're right. These arrays of pointers and pointers to arrays are going to be the death of me... Thanks for the test case. I've tweaked the function some more to handle it and added it to the test suite. As far as I can see, if one declaration gets part of the parameter type (involving VLAs) from a typedef and another declaration gets that part of the type directly in the declaration, the two spec strings constructed might differ in the number of VLA bounds mentioned in the spec strings. Is the code using those strings robust to handling the case where some of the VLA bounds are missing because they came from a typedef? Ugh. I'd completely forgotten those are possible. I've added code to the function to handle those as well, and a new test to verify. Hopefully I didn't open a can of worms by doing that. Attached is an updated revision of the patch. Besides the tweaks above it also contains a cosmetic change to the warning issued for mismatches in unspecified VLA bounds: it points at the decl with more of them to guide the user to specify them rather than make them all unspecified. Martin [2/5] - C front end support to detect out-of-bounds accesses to array parameters. gcc/c-family/ChangeLog: PR c/50584 * c-common.h (warn_parm_array_mismatch): Declare new function. (has_attribute): Move declaration of an existing function. (build_attr_access_from_parms): Declare new function. * c-warn.c (parm_array_as_string): Define new function. (plus_one): Define new function. (warn_parm_ptrarray_mismatch): Define new function. (warn_parm_array_mismatch): Define new function. (vla_bound_parm_decl): New function. * c.opt (-Warray-parameter, -Wvla-parameter): New options. * c-pretty-print.c (pp_c_type_qualifier_list): Don't print array type qualifiers here... (c_pretty_printer::direct_abstract_declarator): ...but instead print them in brackets here. Also print [static]. Strip extraneous expressions from VLA bounds. gcc/c/ChangeLog: PR c/50584 * c-decl.c (lookup_last_decl): Define new function. (c_decl_attributes): Call it. (start_decl): Add argument and use it. (finish_decl): Call build_attr_access_from_parms and decl_attributes. (get_parm_array_spec): Define new function. (push_parm_decl): Call get_parm_array_spec. (start_function): Call warn_parm_array_mismatch. Build attribute access and add it to current function. * c-parser.c (c_parser_declaration_or_fndef): Diagnose mismatches in forms of array parameters. * c-tree.h (start_decl): Add argument. gcc/ChangeLog: PR c/50584 * calls.c (initialize_argument_information): Remove assertion. * doc/invoke.texi (-Warray-parameter, -Wvla-parameter): Document. * tree-pretty-print.c (dump_generic_node): Correct handling of qualifiers. gcc/testsuite/ChangeLog: PR c/50584 * c-c++-common/Warray-bounds-6.c: Correct C++ declaration, adjust text of expected diagnostics. * gcc.dg/Wbuiltin-declaration-mismatch-9.c: Prune expected warning. * gcc.dg/Warray-parameter-2.c: New test. * gcc.dg/Warray-parameter-3.c: New test. * gcc.dg/Warray-parameter-4.c: New test. * gcc.dg/Warray-parameter-5.c: New test. * gcc.dg/Warray-parameter.c: New test. * gcc.dg/Wvla-parameter-2.c: New test. * gcc.dg/Wvla-parameter-3.c: New test. * gcc.dg/Wvla-parameter-4.c: New test. * gcc.dg/Wvla-parameter.c: New test. diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index 4fc64bc4aa6..a61b25f95f1 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -1321,6 +1321,7 @@ extern void c_do_switch_warnings (splay_tree, location_t, tree, tree, bool); extern void warn_for_omitted_condop (location_t, tree); extern bool warn_for_restrict (unsigned, tree *, unsigned); extern void warn_for_address_or_pointer_of_packed_member (tree, tree); +extern void warn_parm_array_mismatch (location_t, tree, tree); /* Places where an lvalue, or modifiable lvalue, may be required.
Re: [PATCH] rs6000: Add rs6000_cfun_pcrel_p
On 9/16/20 12:09 PM, Segher Boessenkool wrote: On Wed, Sep 16, 2020 at 08:36:35AM -0500, Bill Schmidt wrote: This is a cleanup requested by Segher in a previous review. Most uses of rs6000_pcrel_p are called for the current function. A specialized version for cfun is more efficient for these uses. Then rename that one to rs6000_function_pcrel_p or similar? People will keep trying to use the shorter name otherwise ;-) (Actually all uses are called for the current function, so now rs6000_pcrel_p is an orphan. However, I think it's worth keeping around in case we have a late discovery where we need it.) (Heck, you could even use that short name for the cfun one, if you want. Sorry i didn't think of that before.) Okay for trunk with or without such further changes. Thanks! Great, I will make both of those changes and commit. Thanks, Bill Segher
Re: [PATCH] Allow copying of symbolic ranges to an irange.
On 9/16/20 12:25 PM, Aldy Hernandez wrote: >> // Swap min/max if they are out of order. Return TRUE if further > these seems OK, but can't there be anti-ranges with symbolics too? ie > ~[a_12, a_12] > The code for that just does: > > else if (src.kind () == VR_ANTI_RANGE) > set (src.min (), src.max (), VR_ANTI_RANGE); > > That should just go to varying I guess? Ah, you're right. I've tweaked the patch and have added a corresponding test. > > The conversion to legacy anti-range code also seems a little suspect in > some cases... > > Finally, we theoretically shouldn't be accessing 'min()' and 'max()' > fields in a multirange, which also looks like might happen in the final > else clause. > > I wonder if it might be less complex to simply have 2 routines, like > copy_to_legacy() and copy_from_legacy() instead of trying to handle > then together? I do find it seems to require more thinking than it > should to follow the cases :-) Sigh, yes. That code is too clever for its own good. I'll work on it as a follow-up. OK pending tests? OK. but do follow it up. Andrew
[pushed] libbacktrace, Mach-O : Support PowerPC archs.
Hi, This adds the PPC architecture variants for Mach-O libbacktrace. With this (as for X86 and Arm) when dsymutil is run on the binary we get a basic usable backtrace. Testsuite results on powerpc-apple-darwin9 are the same as for X86: * btest fails (TBC why) * dwarf5 tests fail because dsymutil does not handle that so far. tested on x86_64-darwin16, powerpc-darwin9, approved by Ian on a github pull request. pushed to master, thanks Iain libbacktrace/ChangeLog: * macho.c (MACH_O_CPU_TYPE_PPC): New. (MACH_O_CPU_TYPE_PPC64): New. Add compile-tests for powerpc to the Mach-O variants. --- libbacktrace/macho.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libbacktrace/macho.c b/libbacktrace/macho.c index 241d54b5e5e..1f8bc50238f 100644 --- a/libbacktrace/macho.c +++ b/libbacktrace/macho.c @@ -128,9 +128,11 @@ struct macho_fat_arch_64 #define MACH_O_CPU_TYPE_X86 7 #define MACH_O_CPU_TYPE_ARM 12 +#define MACH_O_CPU_TYPE_PPC 18 #define MACH_O_CPU_TYPE_X86_64 (MACH_O_CPU_TYPE_X86 | MACH_O_CPU_ARCH_ABI64) #define MACH_O_CPU_TYPE_ARM64 (MACH_O_CPU_TYPE_ARM | MACH_O_CPU_ARCH_ABI64) +#define MACH_O_CPU_TYPE_PPC64 (MACH_O_CPU_TYPE_PPC | MACH_O_CPU_ARCH_ABI64) /* The header of a load command. */ @@ -776,6 +778,10 @@ macho_add_fat (struct backtrace_state *state, const char *filename, cputype = MACH_O_CPU_TYPE_ARM64; #elif defined (__arm__) cputype = MACH_O_CPU_TYPE_ARM; +#elif defined (__ppc__) + cputype = MACH_O_CPU_TYPE_PPC; +#elif defined (__ppc64__) + cputype = MACH_O_CPU_TYPE_PPC64; #else error_callback (data, "unknown Mach-O architecture", 0); goto fail; -- 2.24.1
Re: c++: local-scope OMP UDR reductions have no template head
On Wed, Sep 16, 2020 at 02:38:22PM -0400, Nathan Sidwell wrote: > On 9/16/20 2:31 PM, Jakub Jelinek wrote: > > On Wed, Sep 16, 2020 at 02:11:32PM -0400, Nathan Sidwell wrote: > > > Jakub, > > > are you ok with the bool return from cp_check_omp_declare_reduction? That > > > seemed like the least invasive change. > > > > > > This corrects the earlier problems with removing the template header > > > from local omp reductions. And it uncovered a latent bug. When we > > > tsubst such a decl, we immediately tsubst its body. > > > cp_check_omp_declare_reduction gets a success return value to gate > > > that instantiation. > > > > > > udr-2.C got a further error, as the omp checking machinery doesn't > > > appear to turn the reduction into an error mark when failing. I > > > didn't dig into that further. udr-3.C appears to have been invalid > > > and accidentally worked. > > > > The attached patch doesn't match the ChangeLog... > > here's the right one, problem with ordering by time or name :( LGTM, if both make check-c++-all RUNTESTFLAGS='gomp.exp goacc.exp' in gcc/ and make check RUNTESTFLAGS=c++.exp in libgomp/ build dirs shows any regressions, no objections. Jakub
Re: c++: local-scope OMP UDR reductions have no template head
On 9/16/20 2:31 PM, Jakub Jelinek wrote: On Wed, Sep 16, 2020 at 02:11:32PM -0400, Nathan Sidwell wrote: Jakub, are you ok with the bool return from cp_check_omp_declare_reduction? That seemed like the least invasive change. This corrects the earlier problems with removing the template header from local omp reductions. And it uncovered a latent bug. When we tsubst such a decl, we immediately tsubst its body. cp_check_omp_declare_reduction gets a success return value to gate that instantiation. udr-2.C got a further error, as the omp checking machinery doesn't appear to turn the reduction into an error mark when failing. I didn't dig into that further. udr-3.C appears to have been invalid and accidentally worked. The attached patch doesn't match the ChangeLog... here's the right one, problem with ordering by time or name :( nathan -- Nathan Sidwell diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h index 6e4de7d0c4b..5b727348e51 100644 --- i/gcc/cp/cp-tree.h +++ w/gcc/cp/cp-tree.h @@ -7228,7 +7228,7 @@ extern void simplify_aggr_init_expr (tree *); extern void finalize_nrv (tree *, tree, tree); extern tree omp_reduction_id (enum tree_code, tree, tree); extern tree cp_remove_omp_priv_cleanup_stmt (tree *, int *, void *); -extern void cp_check_omp_declare_reduction (tree); +extern bool cp_check_omp_declare_reduction (tree); extern void finish_omp_declare_simd_methods (tree); extern tree finish_omp_clauses (tree, enum c_omp_region_type); extern tree push_omp_privatization_clauses (bool); diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c index 289452ab740..cfe5ff4a94f 100644 --- i/gcc/cp/pt.c +++ w/gcc/cp/pt.c @@ -227,6 +227,7 @@ static tree canonicalize_expr_argument (tree, tsubst_flags_t); static tree make_argument_pack (tree); static void register_parameter_specializations (tree, tree); static tree enclosing_instantiation_of (tree tctx); +static void instantiate_body (tree pattern, tree args, tree d, bool nested); /* Make the current scope suitable for access checking when we are processing T. T can be FUNCTION_DECL for instantiated function @@ -6073,10 +6074,7 @@ push_template_decl_real (tree decl, bool is_friend) retrofit_lang_decl (decl); if (DECL_LANG_SPECIFIC (decl) && !(VAR_OR_FUNCTION_DECL_P (decl) - && DECL_LOCAL_DECL_P (decl) - /* OMP reductions still need a template header. */ - && !(TREE_CODE (decl) == FUNCTION_DECL - && DECL_OMP_DECLARE_REDUCTION_P (decl + && DECL_LOCAL_DECL_P (decl))) DECL_TEMPLATE_INFO (decl) = info; } @@ -13714,8 +13712,7 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t complain, gcc_assert (DECL_TEMPLATE_INFO (t) != NULL_TREE || DECL_LOCAL_DECL_P (t)); - if (DECL_LOCAL_DECL_P (t) - && !DECL_OMP_DECLARE_REDUCTION_P (t)) + if (DECL_LOCAL_DECL_P (t)) { if (tree spec = retrieve_local_specialization (t)) return spec; @@ -13970,8 +13967,7 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t complain, && !uses_template_parms (argvec)) tsubst_default_arguments (r, complain); } - else if (DECL_LOCAL_DECL_P (r) - && !DECL_OMP_DECLARE_REDUCTION_P (r)) + else if (DECL_LOCAL_DECL_P (r)) { if (!cp_unevaluated_operand) register_local_specialization (r, t); @@ -18083,7 +18079,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl, DECL_CONTEXT (decl) = global_namespace; pushdecl (decl); DECL_CONTEXT (decl) = current_function_decl; - cp_check_omp_declare_reduction (decl); + if (cp_check_omp_declare_reduction (decl)) + instantiate_body (pattern_decl, args, decl, true); } else { @@ -25448,15 +25445,24 @@ register_parameter_specializations (tree pattern, tree inst) } /* Instantiate the body of D using PATTERN with ARGS. We have - already determined PATTERN is the correct template to use. */ + already determined PATTERN is the correct template to use. + NESTED_P is true if this is a nested function, in which case + PATTERN will be a FUNCTION_DECL not a TEMPLATE_DECL. */ static void -instantiate_body (tree pattern, tree args, tree d) +instantiate_body (tree pattern, tree args, tree d, bool nested_p) { - gcc_checking_assert (TREE_CODE (pattern) == TEMPLATE_DECL); - - tree td = pattern; - tree code_pattern = DECL_TEMPLATE_RESULT (td); + tree td = NULL_TREE; + tree code_pattern = pattern; + + if (!nested_p) +{ + td = pattern; + code_pattern = DECL_TEMPLATE_RESULT (td); +} + else +/* Only OMP reductions are nested. */ +gcc_checking_assert (DECL_OMP_DECLARE_REDUCTION_P (code_pattern)); vec omp_privatization_save; if (current_function_decl) @@ -25489,9 +25495,10 @@ instantiate_body (tree pattern, tree args, tree d) instantiate_decl do not try to instantiate it again. */ DECL_TEMPLATE_INSTANTIATED (d) = 1; - /* Regenerate the declaration in case the template has been modified - by a
Re: Commit 052204fac580 was never sent to gcc-patches
Segher Boessenkool writes: > ... and it causes testsuite regressions on Power. We haven't determined > et if it actually is worse code, but there are testcases that trip on > it. Either way, all patches should be send to gcc-patches@, whether > pre-approved or not. > > Please correct this. Thanks! Hi Segher, the patch has been discussed here https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553839.html If you think it is better I've no problem with reverting it otherwise I'll be happy to help solving the caused issue. Regards Andrea
Re: c++: local-scope OMP UDR reductions have no template head
On Wed, Sep 16, 2020 at 02:11:32PM -0400, Nathan Sidwell wrote: > Jakub, > are you ok with the bool return from cp_check_omp_declare_reduction? That > seemed like the least invasive change. > > This corrects the earlier problems with removing the template header > from local omp reductions. And it uncovered a latent bug. When we > tsubst such a decl, we immediately tsubst its body. > cp_check_omp_declare_reduction gets a success return value to gate > that instantiation. > > udr-2.C got a further error, as the omp checking machinery doesn't > appear to turn the reduction into an error mark when failing. I > didn't dig into that further. udr-3.C appears to have been invalid > and accidentally worked. The attached patch doesn't match the ChangeLog... > gcc/cp/ > * cp-tree.h (cp_check_omp_declare_reduction): Return bool. > * semantics.c (cp_check_omp_declare_reduction): Return true on for > success. > * pt.c (push_template_decl_real): OMP reductions do not get a > template header. > (tsubst_function_decl): Remove special casing for local decl omp > reductions. > (tsubst_expr): Call instantiate_body for a local omp reduction. > (instantiate_body): Add nested_p parm, and deal with such > instantiations. > (instantiate_decl): Reject FUNCTION_SCOPE entities, adjust > instantiate_body call. > gcc/testsuite/ > * g++.dg/gomp/udr-2.C: Add additional expected error. > libgomp/ > * testsuite/libgomp.c++/udr-3.C: Add missing ctor. > > -- > Nathan Sidwell > diff --git i/gcc/c-family/c-opts.c w/gcc/c-family/c-opts.c > index 23934416f64..24e21ce0d0c 100644 > --- i/gcc/c-family/c-opts.c > +++ w/gcc/c-family/c-opts.c > @@ -1129,7 +1129,10 @@ c_common_post_options (const char **pfilename) >input_location = UNKNOWN_LOCATION; > >*pfilename = this_input_filename > -= cpp_read_main_file (parse_in, in_fnames[0], !cpp_opts->preprocessed); > += cpp_read_main_file (parse_in, in_fnames[0], > + /* We'll inject preamble pieces if this is > + not preprocessed. */ > + !cpp_opts->preprocessed); >/* Don't do any compilation or preprocessing if there is no input file. */ >if (this_input_filename == NULL) > { > @@ -1453,6 +1456,7 @@ c_finish_options (void) > = linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0, > _(""), 0)); >cb_file_change (parse_in, bltin_map); > + linemap_line_start (line_table, 0, 1); > >/* Make sure all of the builtins about to be declared have >BUILTINS_LOCATION has their location_t. */ > @@ -1476,6 +1480,7 @@ c_finish_options (void) > = linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0, > _(""), 0)); >cb_file_change (parse_in, cmd_map); > + linemap_line_start (line_table, 0, 1); > >/* All command line defines must have the same location. */ >cpp_force_token_locations (parse_in, cmd_map->start_location); Jakub
Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]
On Wed, 16 Sep 2020, Tom de Vries wrote: > [ cc-ing author omp support for nvptx. ] The issue looks familiar. I recognized it back in 2017 (and LLVM people recognized it too for their GPU targets). In an attempt to get agreement to fix the issue "properly" for GCC I found a similar issue that affects all targets, not just offloading, and filed it as PR 80053. (yes, there are no addressable labels involved in offloading, but nevertheless the nature of the middle-end issue is related) Alexander
Commit 052204fac580 was never sent to gcc-patches
... and it causes testsuite regressions on Power. We haven't determined et if it actually is worse code, but there are testcases that trip on it. Either way, all patches should be send to gcc-patches@, whether pre-approved or not. Please correct this. Thanks! Segher
c++: local-scope OMP UDR reductions have no template head
Jakub, are you ok with the bool return from cp_check_omp_declare_reduction? That seemed like the least invasive change. This corrects the earlier problems with removing the template header from local omp reductions. And it uncovered a latent bug. When we tsubst such a decl, we immediately tsubst its body. cp_check_omp_declare_reduction gets a success return value to gate that instantiation. udr-2.C got a further error, as the omp checking machinery doesn't appear to turn the reduction into an error mark when failing. I didn't dig into that further. udr-3.C appears to have been invalid and accidentally worked. gcc/cp/ * cp-tree.h (cp_check_omp_declare_reduction): Return bool. * semantics.c (cp_check_omp_declare_reduction): Return true on for success. * pt.c (push_template_decl_real): OMP reductions do not get a template header. (tsubst_function_decl): Remove special casing for local decl omp reductions. (tsubst_expr): Call instantiate_body for a local omp reduction. (instantiate_body): Add nested_p parm, and deal with such instantiations. (instantiate_decl): Reject FUNCTION_SCOPE entities, adjust instantiate_body call. gcc/testsuite/ * g++.dg/gomp/udr-2.C: Add additional expected error. libgomp/ * testsuite/libgomp.c++/udr-3.C: Add missing ctor. -- Nathan Sidwell diff --git i/gcc/c-family/c-opts.c w/gcc/c-family/c-opts.c index 23934416f64..24e21ce0d0c 100644 --- i/gcc/c-family/c-opts.c +++ w/gcc/c-family/c-opts.c @@ -1129,7 +1129,10 @@ c_common_post_options (const char **pfilename) input_location = UNKNOWN_LOCATION; *pfilename = this_input_filename -= cpp_read_main_file (parse_in, in_fnames[0], !cpp_opts->preprocessed); += cpp_read_main_file (parse_in, in_fnames[0], + /* We'll inject preamble pieces if this is + not preprocessed. */ + !cpp_opts->preprocessed); /* Don't do any compilation or preprocessing if there is no input file. */ if (this_input_filename == NULL) { @@ -1453,6 +1456,7 @@ c_finish_options (void) = linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0, _(""), 0)); cb_file_change (parse_in, bltin_map); + linemap_line_start (line_table, 0, 1); /* Make sure all of the builtins about to be declared have BUILTINS_LOCATION has their location_t. */ @@ -1476,6 +1480,7 @@ c_finish_options (void) = linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0, _(""), 0)); cb_file_change (parse_in, cmd_map); + linemap_line_start (line_table, 0, 1); /* All command line defines must have the same location. */ cpp_force_token_locations (parse_in, cmd_map->start_location);
Re: Problem with static const objects and LTO
On Wed, 16 Sep 2020, Jeff Law via Gcc-patches wrote: > ISTM this is a lot like the problem we have where we inline functions > with static data. To fix those we use STB_GNU_UNIQUE. But I don't see > any code in the C front-end which would utilize STB_GNU_UNIQUE. It's > support seems limited to C++. > > > How is this supposed to work for C? C inline functions don't try to address this. The standard has a rule "An inline definition of a function with external linkage shall not contain a definition of a modifiable object with static or thread storage duration, and shall not contain a reference to an identifier with internal linkage.", which avoids some cases of this, but you can still get multiple copies of objects (with static storage duration, not modifiable, no linkage, i.e. static const defined inside an inline function) as noted in a footnote "Since an inline definition is distinct from the corresponding external definition and from any other corresponding inline definitions in other translation units, all corresponding objects with static storage duration are also distinct in each of the definitions.". -- Joseph S. Myers jos...@codesourcery.com
Re: Problem with static const objects and LTO
On 9/16/20 11:32 AM, H.J. Lu wrote: > On Wed, Sep 16, 2020 at 10:24 AM Jeff Law wrote: >> >> On 9/16/20 11:13 AM, H.J. Lu wrote: >>> On Wed, Sep 16, 2020 at 10:10 AM Jeff Law wrote: On 9/16/20 11:05 AM, H.J. Lu wrote: > On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches > wrote: >> Consider a TU with file scoped "static const object utf8_sb_map". A >> routine within the TU will stuff _sb_map into an object, something >> like: >> >> fu (...) >> >> { >> >> if (cond) >> >> dfa->sb_char = utf8_sb_map; >> >> else >> >> dfa->sb_char = malloc (...); >> >> } >> >> >> There is another routine in the TU which looks like >> >> bar (...) >> >> { >> >> if (dfa->sb_char != utf8_sb_map) >> >> free (dfa->sb_char); >> >> } >> >> >> Now imagine that the TU is compiled (with LTO) into a static library, >> libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a >> and references the first routine (fu). We get a copy of fu in the DSO >> along with a copy of utf8_sb_map. >> >> >> Then imagine there's a main executable that dynamicly links against >> libdso.so, then links statically against libgl.a. Assume the main >> executable does not directly reference fu(), but does call a routine in >> libdso.so which eventually calls fu(). Also assume the main executable >> directly calls bar(). Again, remember we're compiling with LTO, so we >> don't suck in the entire TU, just the routines/data we need. >> >> >> In this scenario, both libdso.so and the main executable are going to a >> copy of utf8_sb_map and they'll be at different addresses. So when the >> main executable calls into libdso.so which in turn calls libdso's copy >> of fu() which stuffs the address of utf8_sb_map from the DSO into >> dfa->sb_char. Later the main executable calls bar() that's in the main >> executable. It does the comparison to see if dfa->sb_char is equal to >> utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map >> and naturally free() blows us because it was passed a static object, not >> a malloc'd object. >> >> >> ISTM this is a lot like the problem we have where we inline functions >> with static data. To fix those we use STB_GNU_UNIQUE. But I don't see >> any code in the C front-end which would utilize STB_GNU_UNIQUE. It's >> support seems limited to C++. >> >> >> How is this supposed to work for C? >> >> >> Jeff >> >> > Can you group utf8_sb_map, fu and bar together so that they are defined > together? They're all defined within the same TU in gnulib. It's the LTO dead/unreachable code elimination that results in just parts of the TU being copied into the DSO and a different set copied into the main executable. In many ways LTO makes this look a lot like the static data member problems we've had to deal with in the C++ world. >>> In this case, LTO should treat them as in a single group. Removing >>> one group member should remove the whole group. Keep one member >>> should keep the whole group. >> Do you mean ensure they're all in a partition together? I think that >> might work in the immediate term, but is probably brittle in the long >> term. I'd tend to lean towards forcing these static data objects to be >> STB_GNU_UNIQUE -- that seems more robust to me. > Isn't STB_GNU_UNIQUE binding global? How does it work with > > static const int foo; > > and > > static const double foo; > > in different files? It's global in the sense that it uniqifies across TU boundaries, which is its purpose. Consider a function scoped static object in a template or C++ inlined function. We can end up with multiple copies in different TUs and we have to collapse them to a single representative object. We'd need to do the same thing here. Without thinking a whole lot about it, but ISTM that if LTO pulls in a function that references a static object, then that static object would need to be STB_GNU_UNIQUE. For something like you've shown above, I'd *hope* we give an ODR error :-) Jeff pEpkey.asc Description: application/pgp-keys
[PATCH] [OG10] Xfail libgomp.oacc-fortran/privatized-ref-2.f90 when offloading to nvptx
Hello I have committed this patch to xfail libgomp.oacc-fortran/privatized-ref-2.f90 when the offload target is nvptx, as the generated code has some alloca calls which are currently not supported by nvptx (PR65181). Kwok commit 1245f6f615fa08d2ab4165598c9db72c4dad4467 Author: Kwok Cheung Yeung Date: Wed Sep 16 10:19:35 2020 -0700 XFAIL libgomp.oacc-fortran/privatized-ref-2.f90 on nvptx The testcase uses alloca, which is not currently supported on nvptx (see PR65181). 2020-09-16 Kwok Cheung Yeung libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: XFAIL on nvptx. diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index 6e313aa..890a4e2 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,5 +1,9 @@ 2020-09-16 Kwok Cheung Yeung + * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: XFAIL on nvptx. + +2020-09-16 Kwok Cheung Yeung + * testsuite/libgomp.oacc-c++/privatized-ref-2.C (workers, vectors): Reduce number of workers to 16. * testsuite/libgomp.oacc-c++/privatized-ref-3.C (workers, vectors): diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 index ca8fbe8..658ab9e 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 @@ -1,4 +1,5 @@ ! { dg-do run } +! { dg-xfail-if "no alloca support" { offload_target_nvptx } } program main implicit none (type, external)
Re: [PATCH] c++: premature analysis of requires-expression [PR96410]
On Thu, 13 Aug 2020, Jason Merrill wrote: > On 8/13/20 11:21 AM, Patrick Palka wrote: > > On Mon, 10 Aug 2020, Jason Merrill wrote: > > > > > On 8/10/20 2:18 PM, Patrick Palka wrote: > > > > On Mon, 10 Aug 2020, Patrick Palka wrote: > > > > > > > > > In the below testcase, semantic analysis of the requires-expressions > > > > > in > > > > > the generic lambda must be delayed until instantiation of the lambda > > > > > because the requirements depend on the lambda's template arguments. > > > > > But > > > > > tsubst_requires_expr does semantic analysis even during regeneration > > > > > of > > > > > the lambda, which leads to various bogus errors and ICEs since some > > > > > subroutines aren't prepared to handle dependent/template trees. > > > > > > > > > > This patch adjusts subroutines of tsubst_requires_expr to avoid doing > > > > > some problematic semantic analyses when processing_template_decl. > > > > > In particular, expr_noexcept_p generally can't be checked on a > > > > > dependent > > > > > expression. Next, tsubst_nested_requirement should avoid checking > > > > > satisfaction when processing_template_decl. And similarly for > > > > > convert_to_void (called from tsubst_valid_expression_requirement). > > > > > > I wonder if, instead of trying to do a partial substitution into a > > > requires-expression at all, we want to use the > > > PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the > > > arguments for later satisfaction? Like this? -- >8 -- Subject: [PATCH] c++: requires-expressions and partial instantiation [PR96410] This patch makes tsubst_requires_expr avoid substituting into a requires-expression when partially instantiating a generic lambda. This is necessary in general to ensure that we check its requirements in lexical order (see the first testcase below). Instead, template arguments are saved in the requires-expression until all arguments can be substituted into it at once, using a mechanism similar to PACK_EXPANSION_EXTRA_ARGS. This change incidentally also fixes the two mentioned PRs -- the problem there is that tsubst_requires_expr was performing semantic checks on templated trees, and some of the checks are not prepared to handle such trees. With this patch tsubst_requires_expr, no longer does any semantic checking at all when processing_template_decl. gcc/cp/ChangeLog: PR c++/96409 PR c++/96410 * constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS and REQUIRES_EXPR_REQS. Use REQUIRES_EXPR_EXTRA_ARGS and add_extra_args to defer substitution until we have all the template arguments. (finish_requires_expr): Adjust the call to build_min so that REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE. * cp-tree.def (REQUIRES_EXPR): Give it a third operand. * cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS, REQUIRES_EXPR_EXTRA_ARGS): Define. (add_extra_args): Declare. gcc/testsuite/ChangeLog: PR c++/96409 PR c++/96410 * g++.dg/cpp2a/concepts-lambda13.C: New test. * g++.dg/cpp2a/concepts-lambda14.C: New test. --- gcc/cp/constraint.cc | 21 +++- gcc/cp/cp-tree.def| 8 +++--- gcc/cp/cp-tree.h | 16 .../g++.dg/cpp2a/concepts-lambda13.C | 18 + .../g++.dg/cpp2a/concepts-lambda14.C | 25 +++ 5 files changed, 79 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda14.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index ad9d47070e3..9a06d763554 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -2175,7 +2175,19 @@ tsubst_requires_expr (tree t, tree args, /* A requires-expression is an unevaluated context. */ cp_unevaluated u; - tree parms = TREE_OPERAND (t, 0); + args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args); + if (processing_template_decl) +{ + /* We're partially instantiating a generic lambda. Substituting into +this requires-expression now may cause its requirements to get +checked out of order, so instead just remember the template +arguments and wait until we can substitute them all at once. */ + t = copy_node (t); + REQUIRES_EXPR_EXTRA_ARGS (t) = args; + return t; +} + + tree parms = REQUIRES_EXPR_PARMS (t); if (parms) { parms = tsubst_constraint_variables (parms, args, info); @@ -2183,14 +2195,11 @@ tsubst_requires_expr (tree t, tree args, return boolean_false_node; } - tree reqs = TREE_OPERAND (t, 1); + tree reqs = REQUIRES_EXPR_REQS (t); reqs = tsubst_requirement_body (reqs, args, info); if (reqs == error_mark_node) return boolean_false_node; - if (processing_template_decl) -return finish_requires_expr
Re: Problem with static const objects and LTO
On Wed, Sep 16, 2020 at 10:24 AM Jeff Law wrote: > > > On 9/16/20 11:13 AM, H.J. Lu wrote: > > On Wed, Sep 16, 2020 at 10:10 AM Jeff Law wrote: > >> > >> On 9/16/20 11:05 AM, H.J. Lu wrote: > >>> On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches > >>> wrote: > Consider a TU with file scoped "static const object utf8_sb_map". A > routine within the TU will stuff _sb_map into an object, something > like: > > fu (...) > > { > > if (cond) > > dfa->sb_char = utf8_sb_map; > > else > > dfa->sb_char = malloc (...); > > } > > > There is another routine in the TU which looks like > > bar (...) > > { > > if (dfa->sb_char != utf8_sb_map) > > free (dfa->sb_char); > > } > > > Now imagine that the TU is compiled (with LTO) into a static library, > libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a > and references the first routine (fu). We get a copy of fu in the DSO > along with a copy of utf8_sb_map. > > > Then imagine there's a main executable that dynamicly links against > libdso.so, then links statically against libgl.a. Assume the main > executable does not directly reference fu(), but does call a routine in > libdso.so which eventually calls fu(). Also assume the main executable > directly calls bar(). Again, remember we're compiling with LTO, so we > don't suck in the entire TU, just the routines/data we need. > > > In this scenario, both libdso.so and the main executable are going to a > copy of utf8_sb_map and they'll be at different addresses. So when the > main executable calls into libdso.so which in turn calls libdso's copy > of fu() which stuffs the address of utf8_sb_map from the DSO into > dfa->sb_char. Later the main executable calls bar() that's in the main > executable. It does the comparison to see if dfa->sb_char is equal to > utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map > and naturally free() blows us because it was passed a static object, not > a malloc'd object. > > > ISTM this is a lot like the problem we have where we inline functions > with static data. To fix those we use STB_GNU_UNIQUE. But I don't see > any code in the C front-end which would utilize STB_GNU_UNIQUE. It's > support seems limited to C++. > > > How is this supposed to work for C? > > > Jeff > > > >>> Can you group utf8_sb_map, fu and bar together so that they are defined > >>> together? > >> They're all defined within the same TU in gnulib. It's the LTO > >> dead/unreachable code elimination that results in just parts of the TU > >> being copied into the DSO and a different set copied into the main > >> executable. In many ways LTO makes this look a lot like the static data > >> member problems we've had to deal with in the C++ world. > > In this case, LTO should treat them as in a single group. Removing > > one group member should remove the whole group. Keep one member > > should keep the whole group. > > Do you mean ensure they're all in a partition together? I think that > might work in the immediate term, but is probably brittle in the long > term. I'd tend to lean towards forcing these static data objects to be > STB_GNU_UNIQUE -- that seems more robust to me. Isn't STB_GNU_UNIQUE binding global? How does it work with static const int foo; and static const double foo; in different files? -- H.J.
Re: Problem with static const objects and LTO
On 9/16/20 11:13 AM, H.J. Lu wrote: > On Wed, Sep 16, 2020 at 10:10 AM Jeff Law wrote: >> >> On 9/16/20 11:05 AM, H.J. Lu wrote: >>> On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches >>> wrote: Consider a TU with file scoped "static const object utf8_sb_map". A routine within the TU will stuff _sb_map into an object, something like: fu (...) { if (cond) dfa->sb_char = utf8_sb_map; else dfa->sb_char = malloc (...); } There is another routine in the TU which looks like bar (...) { if (dfa->sb_char != utf8_sb_map) free (dfa->sb_char); } Now imagine that the TU is compiled (with LTO) into a static library, libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a and references the first routine (fu). We get a copy of fu in the DSO along with a copy of utf8_sb_map. Then imagine there's a main executable that dynamicly links against libdso.so, then links statically against libgl.a. Assume the main executable does not directly reference fu(), but does call a routine in libdso.so which eventually calls fu(). Also assume the main executable directly calls bar(). Again, remember we're compiling with LTO, so we don't suck in the entire TU, just the routines/data we need. In this scenario, both libdso.so and the main executable are going to a copy of utf8_sb_map and they'll be at different addresses. So when the main executable calls into libdso.so which in turn calls libdso's copy of fu() which stuffs the address of utf8_sb_map from the DSO into dfa->sb_char. Later the main executable calls bar() that's in the main executable. It does the comparison to see if dfa->sb_char is equal to utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map and naturally free() blows us because it was passed a static object, not a malloc'd object. ISTM this is a lot like the problem we have where we inline functions with static data. To fix those we use STB_GNU_UNIQUE. But I don't see any code in the C front-end which would utilize STB_GNU_UNIQUE. It's support seems limited to C++. How is this supposed to work for C? Jeff >>> Can you group utf8_sb_map, fu and bar together so that they are defined >>> together? >> They're all defined within the same TU in gnulib. It's the LTO >> dead/unreachable code elimination that results in just parts of the TU >> being copied into the DSO and a different set copied into the main >> executable. In many ways LTO makes this look a lot like the static data >> member problems we've had to deal with in the C++ world. > In this case, LTO should treat them as in a single group. Removing > one group member should remove the whole group. Keep one member > should keep the whole group. Do you mean ensure they're all in a partition together? I think that might work in the immediate term, but is probably brittle in the long term. I'd tend to lean towards forcing these static data objects to be STB_GNU_UNIQUE -- that seems more robust to me. jeff > pEpkey.asc Description: application/pgp-keys
Re: Problem with static const objects and LTO
On Wed, Sep 16, 2020 at 10:10 AM Jeff Law wrote: > > > On 9/16/20 11:05 AM, H.J. Lu wrote: > > On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches > > wrote: > >> > >> Consider a TU with file scoped "static const object utf8_sb_map". A > >> routine within the TU will stuff _sb_map into an object, something > >> like: > >> > >> fu (...) > >> > >> { > >> > >> if (cond) > >> > >> dfa->sb_char = utf8_sb_map; > >> > >> else > >> > >> dfa->sb_char = malloc (...); > >> > >> } > >> > >> > >> There is another routine in the TU which looks like > >> > >> bar (...) > >> > >> { > >> > >> if (dfa->sb_char != utf8_sb_map) > >> > >> free (dfa->sb_char); > >> > >> } > >> > >> > >> Now imagine that the TU is compiled (with LTO) into a static library, > >> libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a > >> and references the first routine (fu). We get a copy of fu in the DSO > >> along with a copy of utf8_sb_map. > >> > >> > >> Then imagine there's a main executable that dynamicly links against > >> libdso.so, then links statically against libgl.a. Assume the main > >> executable does not directly reference fu(), but does call a routine in > >> libdso.so which eventually calls fu(). Also assume the main executable > >> directly calls bar(). Again, remember we're compiling with LTO, so we > >> don't suck in the entire TU, just the routines/data we need. > >> > >> > >> In this scenario, both libdso.so and the main executable are going to a > >> copy of utf8_sb_map and they'll be at different addresses. So when the > >> main executable calls into libdso.so which in turn calls libdso's copy > >> of fu() which stuffs the address of utf8_sb_map from the DSO into > >> dfa->sb_char. Later the main executable calls bar() that's in the main > >> executable. It does the comparison to see if dfa->sb_char is equal to > >> utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map > >> and naturally free() blows us because it was passed a static object, not > >> a malloc'd object. > >> > >> > >> ISTM this is a lot like the problem we have where we inline functions > >> with static data. To fix those we use STB_GNU_UNIQUE. But I don't see > >> any code in the C front-end which would utilize STB_GNU_UNIQUE. It's > >> support seems limited to C++. > >> > >> > >> How is this supposed to work for C? > >> > >> > >> Jeff > >> > >> > > Can you group utf8_sb_map, fu and bar together so that they are defined > > together? > > They're all defined within the same TU in gnulib. It's the LTO > dead/unreachable code elimination that results in just parts of the TU > being copied into the DSO and a different set copied into the main > executable. In many ways LTO makes this look a lot like the static data > member problems we've had to deal with in the C++ world. In this case, LTO should treat them as in a single group. Removing one group member should remove the whole group. Keep one member should keep the whole group. -- H.J.
Re: [PATCH] rs6000: Add rs6000_cfun_pcrel_p
On Wed, Sep 16, 2020 at 08:36:35AM -0500, Bill Schmidt wrote: > This is a cleanup requested by Segher in a previous review. Most > uses of rs6000_pcrel_p are called for the current function. A > specialized version for cfun is more efficient for these uses. Then rename that one to rs6000_function_pcrel_p or similar? People will keep trying to use the shorter name otherwise ;-) > (Actually all uses are called for the current function, so now > rs6000_pcrel_p is an orphan. However, I think it's worth keeping > around in case we have a late discovery where we need it.) (Heck, you could even use that short name for the cfun one, if you want. Sorry i didn't think of that before.) Okay for trunk with or without such further changes. Thanks! Segher
Re: Problem with static const objects and LTO
On 9/16/20 11:05 AM, H.J. Lu wrote: > On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches > wrote: >> >> Consider a TU with file scoped "static const object utf8_sb_map". A >> routine within the TU will stuff _sb_map into an object, something >> like: >> >> fu (...) >> >> { >> >> if (cond) >> >> dfa->sb_char = utf8_sb_map; >> >> else >> >> dfa->sb_char = malloc (...); >> >> } >> >> >> There is another routine in the TU which looks like >> >> bar (...) >> >> { >> >> if (dfa->sb_char != utf8_sb_map) >> >> free (dfa->sb_char); >> >> } >> >> >> Now imagine that the TU is compiled (with LTO) into a static library, >> libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a >> and references the first routine (fu). We get a copy of fu in the DSO >> along with a copy of utf8_sb_map. >> >> >> Then imagine there's a main executable that dynamicly links against >> libdso.so, then links statically against libgl.a. Assume the main >> executable does not directly reference fu(), but does call a routine in >> libdso.so which eventually calls fu(). Also assume the main executable >> directly calls bar(). Again, remember we're compiling with LTO, so we >> don't suck in the entire TU, just the routines/data we need. >> >> >> In this scenario, both libdso.so and the main executable are going to a >> copy of utf8_sb_map and they'll be at different addresses. So when the >> main executable calls into libdso.so which in turn calls libdso's copy >> of fu() which stuffs the address of utf8_sb_map from the DSO into >> dfa->sb_char. Later the main executable calls bar() that's in the main >> executable. It does the comparison to see if dfa->sb_char is equal to >> utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map >> and naturally free() blows us because it was passed a static object, not >> a malloc'd object. >> >> >> ISTM this is a lot like the problem we have where we inline functions >> with static data. To fix those we use STB_GNU_UNIQUE. But I don't see >> any code in the C front-end which would utilize STB_GNU_UNIQUE. It's >> support seems limited to C++. >> >> >> How is this supposed to work for C? >> >> >> Jeff >> >> > Can you group utf8_sb_map, fu and bar together so that they are defined > together? They're all defined within the same TU in gnulib. It's the LTO dead/unreachable code elimination that results in just parts of the TU being copied into the DSO and a different set copied into the main executable. In many ways LTO makes this look a lot like the static data member problems we've had to deal with in the C++ world. jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]
Hi Richard, On 10/09/2020 19:18, Richard Sandiford wrote: > Alex Coplan writes: > > Hello, > > > > Since r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf introduced a > > canonicalization from mult to shift on address reloads, a missing > > pattern in the AArch64 backend was exposed. > > > > In particular, we were missing the ashift variant of > > *add__multp2 (this mult variant is redundant and was > > removed in r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8). > > > > This patch adds that missing pattern (fixing PR96998), updates > > aarch64_is_extend_from_extract() to work for the shift pattern (instead > > of the redundant mult variant) and updates callers in the cost > > calculations to apply the costing for the shift variant instead. > > Hmm. I think we should instead fix this in combine. Ok, thanks for the review. Please find a revised patch attached which fixes this in combine instead. > If we do that, we should be able to remove the handling of > extract-based addresses in aarch64_classify_index & co. Nice. I've tried to remove all the redundant extract-based address handling in the AArch64 backend as a result. The revised patch also fixes up a comment in combine which appears to have suffered from bitrot. Testing: * Bootstrap and regtest on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu. * Regtested an x64 -> aarch64-none-elf cross with -mabi=ilp32. * Checked that we no longer ICE when building linux-next on AArch64. OK for trunk? Thanks, Alex --- gcc/ChangeLog: * combine.c (expand_compound_operation): Tweak variable name in comment to match source. (make_extraction): Handle mult by power of two in addition to ashift. * config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Delete. (aarch64_classify_index): Remove redundant extend-as-extract handling. (aarch64_strip_extend): Likewise. (aarch64_rtx_arith_op_extract_p): Likewise, remove now-unused parameter. Update callers... (aarch64_rtx_costs): ... here. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr96998.c: New test. diff --git a/gcc/combine.c b/gcc/combine.c index c88382e..cd8e544 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) } /* If we reach here, we want to return a pair of shifts. The inner - shift is a left shift of BITSIZE - POS - LEN bits. The outer - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or logical depending on the value of UNSIGNEDP. If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be @@ -7650,20 +7650,29 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, is_mode = GET_MODE (SUBREG_REG (inner)); inner = SUBREG_REG (inner); } - else if (GET_CODE (inner) == ASHIFT + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) && CONST_INT_P (XEXP (inner, 1)) - && pos_rtx == 0 && pos == 0 - && len > UINTVAL (XEXP (inner, 1))) -{ - /* We're extracting the least significant bits of an rtx -(ashift X (const_int C)), where LEN > C. Extract the -least significant (LEN - C) bits of X, giving an rtx -whose mode is MODE, then shift it left C times. */ - new_rtx = make_extraction (mode, XEXP (inner, 0), -0, 0, len - INTVAL (XEXP (inner, 1)), -unsignedp, in_dest, in_compare); - if (new_rtx != 0) - return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); + && pos_rtx == 0 && pos == 0) +{ + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); + const bool mult = GET_CODE (inner) == MULT; + const int shift_amt = mult ? exact_log2 (ci) : ci; + + if (shift_amt > 0 && len > (unsigned)shift_amt) + { + /* We're extracting the least significant bits of an rtx +(ashift X (const_int C)) or (mult X (const_int (2^C))), +where LEN > C. Extract the least significant (LEN - C) bits +of X, giving an rtx whose mode is MODE, then shift it left +C times. */ + new_rtx = make_extraction (mode, XEXP (inner, 0), +0, 0, len - shift_amt, +unsignedp, in_dest, in_compare); + if (new_rtx) + return mult + ? gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1)) + : gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); + } } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b251f39..56738ae 100644 --- a/gcc/config/aarch64/aarch64.c +++
Re: Problem with static const objects and LTO
On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches wrote: > > > Consider a TU with file scoped "static const object utf8_sb_map". A > routine within the TU will stuff _sb_map into an object, something > like: > > fu (...) > > { > > if (cond) > > dfa->sb_char = utf8_sb_map; > > else > > dfa->sb_char = malloc (...); > > } > > > There is another routine in the TU which looks like > > bar (...) > > { > > if (dfa->sb_char != utf8_sb_map) > > free (dfa->sb_char); > > } > > > Now imagine that the TU is compiled (with LTO) into a static library, > libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a > and references the first routine (fu). We get a copy of fu in the DSO > along with a copy of utf8_sb_map. > > > Then imagine there's a main executable that dynamicly links against > libdso.so, then links statically against libgl.a. Assume the main > executable does not directly reference fu(), but does call a routine in > libdso.so which eventually calls fu(). Also assume the main executable > directly calls bar(). Again, remember we're compiling with LTO, so we > don't suck in the entire TU, just the routines/data we need. > > > In this scenario, both libdso.so and the main executable are going to a > copy of utf8_sb_map and they'll be at different addresses. So when the > main executable calls into libdso.so which in turn calls libdso's copy > of fu() which stuffs the address of utf8_sb_map from the DSO into > dfa->sb_char. Later the main executable calls bar() that's in the main > executable. It does the comparison to see if dfa->sb_char is equal to > utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map > and naturally free() blows us because it was passed a static object, not > a malloc'd object. > > > ISTM this is a lot like the problem we have where we inline functions > with static data. To fix those we use STB_GNU_UNIQUE. But I don't see > any code in the C front-end which would utilize STB_GNU_UNIQUE. It's > support seems limited to C++. > > > How is this supposed to work for C? > > > Jeff > > Can you group utf8_sb_map, fu and bar together so that they are defined together? -- H.J.
Problem with static const objects and LTO
Consider a TU with file scoped "static const object utf8_sb_map". A routine within the TU will stuff _sb_map into an object, something like: fu (...) { if (cond) dfa->sb_char = utf8_sb_map; else dfa->sb_char = malloc (...); } There is another routine in the TU which looks like bar (...) { if (dfa->sb_char != utf8_sb_map) free (dfa->sb_char); } Now imagine that the TU is compiled (with LTO) into a static library, libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a and references the first routine (fu). We get a copy of fu in the DSO along with a copy of utf8_sb_map. Then imagine there's a main executable that dynamicly links against libdso.so, then links statically against libgl.a. Assume the main executable does not directly reference fu(), but does call a routine in libdso.so which eventually calls fu(). Also assume the main executable directly calls bar(). Again, remember we're compiling with LTO, so we don't suck in the entire TU, just the routines/data we need. In this scenario, both libdso.so and the main executable are going to a copy of utf8_sb_map and they'll be at different addresses. So when the main executable calls into libdso.so which in turn calls libdso's copy of fu() which stuffs the address of utf8_sb_map from the DSO into dfa->sb_char. Later the main executable calls bar() that's in the main executable. It does the comparison to see if dfa->sb_char is equal to utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map and naturally free() blows us because it was passed a static object, not a malloc'd object. ISTM this is a lot like the problem we have where we inline functions with static data. To fix those we use STB_GNU_UNIQUE. But I don't see any code in the C front-end which would utilize STB_GNU_UNIQUE. It's support seems limited to C++. How is this supposed to work for C? Jeff pEpkey.asc Description: application/pgp-keys
Re: [PATCH] Allow copying of symbolic ranges to an irange.
>> // Swap min/max if they are out of order. Return TRUE if further > these seems OK, but can't there be anti-ranges with symbolics too? ie > ~[a_12, a_12] > The code for that just does: > > else if (src.kind () == VR_ANTI_RANGE) > set (src.min (), src.max (), VR_ANTI_RANGE); > > That should just go to varying I guess? Ah, you're right. I've tweaked the patch and have added a corresponding test. > > The conversion to legacy anti-range code also seems a little suspect in > some cases... > > Finally, we theoretically shouldn't be accessing 'min()' and 'max()' > fields in a multirange, which also looks like might happen in the final > else clause. > > I wonder if it might be less complex to simply have 2 routines, like > copy_to_legacy() and copy_from_legacy() instead of trying to handle > then together? I do find it seems to require more thinking than it > should to follow the cases :-) Sigh, yes. That code is too clever for its own good. I'll work on it as a follow-up. OK pending tests? Aldy gcc/ChangeLog: * range-op.cc (multi_precision_range_tests): Normalize symbolics when copying to a multi-range. * value-range.cc (irange::copy_legacy_range): Add test. --- gcc/range-op.cc| 15 +++ gcc/value-range.cc | 19 +-- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/gcc/range-op.cc b/gcc/range-op.cc index fbf78be0a3c..3ab268f101e 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -3453,6 +3453,21 @@ multi_precision_range_tests () small = big; ASSERT_TRUE (small == int_range<1> (INT (21), INT (21), VR_ANTI_RANGE)); + // Copying a legacy symbolic to an int_range should normalize the + // symbolic at copy time. + { +tree ssa = make_ssa_name (integer_type_node); +value_range legacy_range (ssa, INT (25)); +int_range<2> copy = legacy_range; +ASSERT_TRUE (copy == int_range<2> (vrp_val_min (integer_type_node), + INT (25))); + +// Test that copying ~[abc_23, abc_23] to a multi-range yields varying. +legacy_range = value_range (ssa, ssa, VR_ANTI_RANGE); +copy = legacy_range; +ASSERT_TRUE (copy.varying_p ()); + } + range3_tests (); } diff --git a/gcc/value-range.cc b/gcc/value-range.cc index 20aa4f114c9..ed2c322ded9 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -92,7 +92,12 @@ irange::copy_legacy_range (const irange ) else if (src.varying_p ()) set_varying (src.type ()); else if (src.kind () == VR_ANTI_RANGE) -set (src.min (), src.max (), VR_ANTI_RANGE); +{ + if (src.legacy_mode_p () && !range_has_numeric_bounds_p ()) + set_varying (src.type ()); + else + set (src.min (), src.max (), VR_ANTI_RANGE); +} else if (legacy_mode_p () && src.maybe_anti_range ()) { int_range<3> tmp (src); @@ -101,7 +106,17 @@ irange::copy_legacy_range (const irange ) VR_ANTI_RANGE); } else -set (src.min (), src.max (), VR_RANGE); +{ + // If copying legacy to int_range, normalize any symbolics. + if (src.legacy_mode_p () && !range_has_numeric_bounds_p ()) + { + value_range cst (src); + cst.normalize_symbolics (); + set (cst.min (), cst.max ()); + return; + } + set (src.min (), src.max ()); +} } // Swap min/max if they are out of order. Return TRUE if further -- 2.26.2
[PATCH][Arm] Enable MVE SIMD modes for vectorization
Hi all, This patch enables SIMD modes for MVE auto-vectorization. In this patch, the integer and float MVE SIMD modes are returned by arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when MVE or MVE_FLOAT is enabled. Then the expanders for auto-vectorization can be used for generating MVE SIMD code. This patch also fixes bugs in MVE vreiterpretq_*.c tests which are revealed by the enabled MVE SIMD modes. The tests are for checking the MVE reinterpret intrinsics. There are two functions in each of the tests. The two functions contain the pattern of identical code so that they are folded in icf pass. Because of icf, the instruction count only checks one function which is 8. However when the SIMD modes are enabled, the estimation of the code size becomes smaller so that inlining is applied after icf, then the instruction count becomes 16 which causes failure of the tests. Because the icf is not the expected pattern to be tested but causes above issues, -fno-ipa-icf is applied to the tests to avoid unstable instruction count. This patch is separated from https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html because this part is not strongly connected to the aim of that one so that causing confusion. Regtested and bootstraped. Is it OK for trunk please? Thanks Dennis gcc/ChangeLog: 2020-09-15 Dennis Zhang * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes. gcc/testsuite/ChangeLog: 2020-09-15 Dennis Zhang * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional option -fno-ipa-icf and change the instruction count from 8 to 16. * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index dd78141519e..c50d5aca6a9 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28964,6 +28964,30 @@ arm_preferred_simd_mode (scalar_mode mode) default:; } + if (TARGET_HAVE_MVE) +switch (mode) + { + case QImode: + return V16QImode; + case HImode: + return V8HImode; + case SImode: + return V4SImode; + + default:; + } + + if (TARGET_HAVE_MVE_FLOAT) +switch (mode) + { + case HFmode: + return V8HFmode; + case SFmode: + return V4SFmode; + + default:; + } + return word_mode; } diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c index f59f69734ed..2398d894861 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c @@ -1,6 +1,6 @@ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ -/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-O2 -fno-ipa-icf" } */ #include "arm_mve.h" int8x16_t value1; @@ -41,4 +41,4 @@ foo1 () return vaddq_f16 (r7, vreinterpretq_f16 (value9)); } -/* { dg-final { scan-assembler-times "vadd.f16" 8 } } */ +/* { dg-final { scan-assembler-times "vadd.f16" 16 } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c index dac47c7e924..5a58dc6eb4c 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c @@ -1,6 +1,6 @@ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ -/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-O2 -fno-ipa-icf" } */ #include "arm_mve.h" int16x8_t value1; @@ -41,4 +41,4 @@ foo1 () return vaddq_f32 (r7, vreinterpretq_f32 (value9)); } -/* { dg-final { scan-assembler-times "vadd.f32" 8 } } */ +/* { dg-final { scan-assembler-times "vadd.f32" 16 } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c index edc2f2f3bc6..9ab05e95420 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c @@ -1,6 +1,6 @@ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ -/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-O2
Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]
On 9/16/20 1:24 PM, Alexander Monakov wrote: > Can you supply a tar with tree dumps for me to look at please? > Hi Alexander, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654#c6 . > Also, if you can check if the problem can be triggered without a > collapsed loop (e.g. try removing collapse(2), remove mentions of > d2) and if so supply dumps from that instead, I'd appreciate that too. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654#c8 Thanks, - Tom > Alexander > > On Wed, 16 Sep 2020, Tom de Vries wrote: > >> [ cc-ing author omp support for nvptx. ] >> >> On 9/16/20 12:39 PM, Tobias Burnus wrote: >>> Hi Tom, hi Richard, hello all, >>> >>> @Richard: does it look okay from the ME side? >>> @Tom: Can you check which IFN_GOMP_SIMT should be >>> excluded with -ftracer? >>> >>> Pre-remark, I do not know much about SIMT – except that they >>> only appear with nvptx and somehow relate to lanes on the >>> GPU. >>> >>> In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows, >>> if a basic block with GOMP_SIMT_VOTE_ANY in it is duplicated, >>> which happens via -ftracer for this testcase, the result is wrong. >>> >>> The testcase ignores all the loop work but via "lastprivate" takes >>> the upper loop bound (as assigned to the loop indices); instead of >>> the expected 32*32 = 1024, either some number (like 4 or very large >>> or negative) is returned. >>> >>> While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I >>> have the feeling that at least GOMP_SIMT_LAST_LANE should be >>> not copied - but I might be wrong. >>> >>> Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from >>> that list – or any of the following added as well? >>> GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, >>> GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY, >>> GOMP_SIMT_XCHG_IDX >>> >>> OK for mainline? >>> >>> Tobias >>> >>> - >>> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / >>> Germany >>> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, >>> Alexander Walter
Re: libbacktrace: Add support for MiniDebugInfo
On 16/09/2020 07:26, Ian Lance Taylor wrote: > On Wed, Sep 16, 2020, 4:03 AM Alex Coplan wrote: > > > Hi Ian, > > > > On 14/09/2020 14:12, Ian Lance Taylor via Gcc-patches wrote: > > > This patch to libbacktrace adds support for MiniDebugInfo, as > > > requested in PR 93608. > > > > This appears to introduce a failure in the libbacktrace testsuite > > (observed on both x86 and aarch64): > > > > ../gcc/libbacktrace/../test-driver: line 107: 7905 Segmentation fault > > (core dumped) "$@" > $log_file 2>&1 > > FAIL: mtest_minidebug > > > > I tested on x86 without seeing anything like this. Can you give me any > more details? Thanks. Sure. On an Ubuntu 18.04 / x86-64 system with current trunk, configuring with: ~/toolchain/src/gcc/configure \ --prefix=`pwd` \ --enable-languages=c,c++ \ --disable-multilib \ --disable-bootstrap running `make && make check-libbacktrace` gives the failure described above. Here is the contents of libbacktrace/test-suite.log: = package-unused version-unused: ./test-suite.log = # TOTAL: 33 # PASS: 32 # SKIP: 0 # XFAIL: 0 # FAIL: 1 # XPASS: 0 # ERROR: 0 .. contents:: :depth: 2 FAIL: mtest_minidebug = no debug info in ELF executable no debug info in ELF executable no debug info in ELF executable no debug info in ELF executable no debug info in ELF executable no debug info in ELF executable no debug info in ELF executable test1: not enough frames; got 0, expected at least 3 test1: [0]: got expected f3 test1: [1]: got expected f2 FAIL mtest_minidebug (exit status: 139) --- Let me know if you need any more info. Thanks, Alex
Re: [aarch64] Backport missing NEON intrinsics to GCC8
Thanks Christophe for reporting the errors. On 9/16/20, 7:45 AM, "Kyrylo Tkachov" wrote: > > The new tests vld1x3 and vld1x4 fail on arm, I am seeing new failures > > on gcc-8 and gcc-9 branches > > after r8-10451, r8-10452 and r9-8874. > > Is that expected/fixed later in the backport series? > > (on the release branches, my validations are running for every commit) > > Hmm, IIRC they're not implemented for arm IIRC so they should be xfailed or > skipped on arm. When I look at the log for those files there are no additions. Why does arm execute tests from gcc.target/aarch64/ on gcc9 and gcc8? Why arm does not fail with those extra tests on gcc10 and on master? I'm still trying to figure out how to properly fix this. Thanks, Sebastian
[PATCH] preprocessor: Fix ICE with too long line in fmtwarn [PR96935]
Here we ICE in char_span::subspan because the offset it gets is -1. It's -1 because get_substring_ranges_for_loc gets a location whose column was 0. That only happens in testcases like the attached where we're dealing with extremely long lines (at least 4065 chars it seems). This does happen in practice, though, so it's not just a theoretical problem (e.g. when building the SU2 suite). Fixed by checking that the column get_substring_ranges_for_loc gets is sane, akin to other checks in that function. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10? gcc/ChangeLog: PR preprocessor/96935 * input.c (get_substring_ranges_for_loc): Return if start.column is less than 1. gcc/testsuite/ChangeLog: PR preprocessor/96935 * gcc.dg/format/pr96935.c: New test. --- gcc/input.c | 2 ++ gcc/testsuite/gcc.dg/format/pr96935.c | 9 + 2 files changed, 11 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/format/pr96935.c diff --git a/gcc/input.c b/gcc/input.c index d573b90341a..ea3d4f34220 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -1461,6 +1461,8 @@ get_substring_ranges_for_loc (cpp_reader *pfile, size_t literal_length = finish.column - start.column + 1; /* Ensure that we don't crash if we got the wrong location. */ + if (start.column < 1) + return "line is too long"; if (line.length () < (start.column - 1 + literal_length)) return "line is not wide enough"; diff --git a/gcc/testsuite/gcc.dg/format/pr96935.c b/gcc/testsuite/gcc.dg/format/pr96935.c new file mode 100644 index 000..c31109704be --- /dev/null +++ b/gcc/testsuite/gcc.dg/format/pr96935.c @@ -0,0 +1,9 @@ +/* PR preprocessor/96935 */ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +void +foo (char const *name) +{ + /*
Re: [PATCH] IRA: Don't make a global register eliminable
On Wed, Sep 16, 2020 at 7:46 AM Richard Sandiford wrote: > > "H.J. Lu" writes: > > On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford > > wrote: > >> > >> Thanks for looking at this. > >> > >> "H.J. Lu" writes: > >> > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c > >> > Author: Richard Sandiford > >> > Date: Wed Oct 2 07:37:10 2019 + > >> > > >> > [LRA] Don't make eliminable registers live (PR91957) > >> > > >> > didn't make eliminable registers live which breaks > >> > > >> > register void *cur_pro asm("reg"); > >> > > >> > where "reg" is an eliminable register. Make fixed eliminable registers > >> > live to fix it. > >> > >> I don't think fixedness itself is the issue here: it's usual for at > >> least some registers involved in eliminations to be fixed registers. > >> > >> I think what makes this case different is instead that cur_pro/ebp > >> is a global register. But IMO things have already gone wrong if we > >> think that a global register is eliminable. > >> > >> So I wonder if instead we should check global_regs at the beginning of: > >> > >> for (i = 0; i < fp_reg_count; i++) > >> if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, > >> HARD_FRAME_POINTER_REGNUM + i)) > >> { > >> SET_HARD_REG_BIT (eliminable_regset, > >> HARD_FRAME_POINTER_REGNUM + i); > >> if (frame_pointer_needed) > >> SET_HARD_REG_BIT (ira_no_alloc_regs, > >> HARD_FRAME_POINTER_REGNUM + i); > >> } > >> else if (frame_pointer_needed) > >> error ("%s cannot be used in % here", > >> reg_names[HARD_FRAME_POINTER_REGNUM + i]); > >> else > >> df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true); > >> > >> (ira_setup_eliminable_regset), and handle the global_regs[] case in > >> the same way as the else case, i.e. short-circuiting both of the ifs. > >> > > > > Like this? > > Sorry for the delay. I was testing this in parallel. > > Bootstrapped & regression-tested on x86_64-linux-gnu. > Thanks. -- H.J.
c++: Avoid confusing 'nested' name
instantiate_body has a local var call 'nested', which indicates that this instantiation was caused during the body of some function -- not necessarily its containing scope. That's confusing, let's just use 'current_function_decl' directly. Then we can also simplify the push_to_top_level logic, which /does/ indicate whether this is an actual nested function. (C++ does not have nested functions, but OMP ODRs fall into that category. A follow up patch will use that more usual meaning of 'nested' wrt to functions.) gcc/cp/ * pt.c (instantiate_body): Remove 'nested' var, simplify push_to_top logic. pushing to trunk -- Nathan Sidwell diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c index e8aa950b8fa..289452ab740 100644 --- i/gcc/cp/pt.c +++ w/gcc/cp/pt.c @@ -25458,17 +25458,15 @@ instantiate_body (tree pattern, tree args, tree d) tree td = pattern; tree code_pattern = DECL_TEMPLATE_RESULT (td); - tree fn_context = decl_function_context (d); - if (LAMBDA_FUNCTION_P (d)) -/* tsubst_lambda_expr resolved any references to enclosing functions. */ -fn_context = NULL_TREE; - bool nested = current_function_decl != NULL_TREE; - bool push_to_top = !(nested && fn_context == current_function_decl); - vec omp_privatization_save; - if (nested) + if (current_function_decl) save_omp_privatization_clauses (omp_privatization_save); + bool push_to_top += !(current_function_decl + && !LAMBDA_FUNCTION_P (d) + && decl_function_context (d) == current_function_decl); + if (push_to_top) push_to_top_level (); else @@ -25595,7 +25593,7 @@ instantiate_body (tree pattern, tree args, tree d) else pop_function_context (); - if (nested) + if (current_function_decl) restore_omp_privatization_clauses (omp_privatization_save); }
Re: [PATCH] IRA: Don't make a global register eliminable
"H.J. Lu" writes: > On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford > wrote: >> >> Thanks for looking at this. >> >> "H.J. Lu" writes: >> > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c >> > Author: Richard Sandiford >> > Date: Wed Oct 2 07:37:10 2019 + >> > >> > [LRA] Don't make eliminable registers live (PR91957) >> > >> > didn't make eliminable registers live which breaks >> > >> > register void *cur_pro asm("reg"); >> > >> > where "reg" is an eliminable register. Make fixed eliminable registers >> > live to fix it. >> >> I don't think fixedness itself is the issue here: it's usual for at >> least some registers involved in eliminations to be fixed registers. >> >> I think what makes this case different is instead that cur_pro/ebp >> is a global register. But IMO things have already gone wrong if we >> think that a global register is eliminable. >> >> So I wonder if instead we should check global_regs at the beginning of: >> >> for (i = 0; i < fp_reg_count; i++) >> if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, >> HARD_FRAME_POINTER_REGNUM + i)) >> { >> SET_HARD_REG_BIT (eliminable_regset, >> HARD_FRAME_POINTER_REGNUM + i); >> if (frame_pointer_needed) >> SET_HARD_REG_BIT (ira_no_alloc_regs, >> HARD_FRAME_POINTER_REGNUM + i); >> } >> else if (frame_pointer_needed) >> error ("%s cannot be used in % here", >> reg_names[HARD_FRAME_POINTER_REGNUM + i]); >> else >> df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true); >> >> (ira_setup_eliminable_regset), and handle the global_regs[] case in >> the same way as the else case, i.e. short-circuiting both of the ifs. >> > > Like this? Sorry for the delay. I was testing this in parallel. Bootstrapped & regression-tested on x86_64-linux-gnu. Thanks, Richard >From af4499845d26fe65573b21197a79fd22fd38694e Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 15 Sep 2020 06:23:26 -0700 Subject: [PATCH] ira: Fix elimination for global hard FPs [PR91957] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If the hard frame pointer is being used as a global register, we should skip the usual handling for eliminations. As the comment says, the register cannot in that case be eliminated (or eliminated to) and is already marked live where appropriate. Doing this removes the duplicate error for gcc.target/i386/pr82673.c. The “cannot be used in 'asm' here” message is meant to be for asm statements rather than register asms, and the function that the error is reported against doesn't use asm. gcc/ 2020-09-16 Richard Sandiford PR middle-end/91957 * ira.c (ira_setup_eliminable_regset): Skip the special elimination handling of the hard frame pointer if the hard frame pointer is fixed. gcc/testsuite/ 2020-09-16 H.J. Lu Richard Sandiford PR middle-end/91957 * g++.target/i386/pr97054.C: New test. * gcc.target/i386/pr82673.c: Remove redundant extra message. --- gcc/ira.c | 8 ++- gcc/testsuite/g++.target/i386/pr97054.C | 96 + gcc/testsuite/gcc.target/i386/pr82673.c | 2 +- 3 files changed, 103 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr97054.C diff --git a/gcc/ira.c b/gcc/ira.c index a759f3c2aa8..27d1b3c857d 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -2310,8 +2310,12 @@ ira_setup_eliminable_regset (void) if (!HARD_FRAME_POINTER_IS_FRAME_POINTER) { for (i = 0; i < fp_reg_count; i++) - if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, - HARD_FRAME_POINTER_REGNUM + i)) + if (global_regs[HARD_FRAME_POINTER_REGNUM + i]) + /* Nothing to do: the register is already treated as live +where appropriate, and cannot be eliminated. */ + ; + else if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, +HARD_FRAME_POINTER_REGNUM + i)) { SET_HARD_REG_BIT (eliminable_regset, HARD_FRAME_POINTER_REGNUM + i); diff --git a/gcc/testsuite/g++.target/i386/pr97054.C b/gcc/testsuite/g++.target/i386/pr97054.C new file mode 100644 index 000..d0693af2a42 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr97054.C @@ -0,0 +1,96 @@ +// { dg-do run { target { ! ia32 } } } +// { dg-require-effective-target fstack_protector } +// { dg-options "-O2 -fno-strict-aliasing -msse4.2 -mfpmath=sse -fPIC -fstack-protector-strong -O2" } + +struct p2_icode *ipc; +register int pars asm("r13"); +register struct processor *cur_pro asm("rbp"); +register int a asm("rbx"); +register int c asm("r14"); +typedef long lina_t; +typedef long la_t; +typedef processor processor_t; +typedef p2_icode p2_icode_t; +typedef enum
Re: libbacktrace: Add support for MiniDebugInfo
On Wed, Sep 16, 2020, 4:03 AM Alex Coplan wrote: > Hi Ian, > > On 14/09/2020 14:12, Ian Lance Taylor via Gcc-patches wrote: > > This patch to libbacktrace adds support for MiniDebugInfo, as > > requested in PR 93608. > > This appears to introduce a failure in the libbacktrace testsuite > (observed on both x86 and aarch64): > > ../gcc/libbacktrace/../test-driver: line 107: 7905 Segmentation fault > (core dumped) "$@" > $log_file 2>&1 > FAIL: mtest_minidebug I tested on x86 without seeing anything like this. Can you give me any more details? Thanks. Ian > > MiniDebugInfo stores compressed symbol tables for an executable, where > > the executable is otherwise stripped. It is documented at > > https://fedoraproject.org/wiki/Features/MiniDebugInfo and > > https://sourceware.org/gdb/current/onlinedocs/gdb/MiniDebugInfo.html. > > > > Unfortunately, although debug info processors are already required to > > handle data compressed using zlib aka gzip in order to handle zdebug > > and SHT_COMPRESSED, the MiniDebugInfo implementation choose to instead > > compress the debug information with xz which uses LZMA2. This in > > effect forces all debug info processors to add a second decompression > > algorithm. Note to future developers: do not do this. > > > > This libbacktrace implementation includes a new small LZMA > > decompressor that just decompresses from one buffer to another. It is > > not heavily optimized, and on my laptop runs at about 78% of the speed > > of using liblzma directly. I think this is a decent tradeoff for > > keeping the code reasonably small. > > > > Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed > > to mainline. > > > > Ian > > Thanks, > Alex >
c++: Break out actual instantiation from instantiate_decl
This refactors instantiate_decl, breaking out the actual instantiation work to instantiate_body. That'll allow me to address the OMP UDR issue, but it also means we have slightly neater code in instantiate_decl anyway. gcc/cp/ * pt.c (instantiate_body): New, broken out of .. (instantiate_decl): ... here. Call it. pushing to trunk -- Nathan Sidwell diff --git c/gcc/cp/pt.c w/gcc/cp/pt.c index 1aea105edd5..e8aa950b8fa 100644 --- c/gcc/cp/pt.c +++ w/gcc/cp/pt.c @@ -25447,6 +25447,158 @@ register_parameter_specializations (tree pattern, tree inst) gcc_assert (!spec_parm); } +/* Instantiate the body of D using PATTERN with ARGS. We have + already determined PATTERN is the correct template to use. */ + +static void +instantiate_body (tree pattern, tree args, tree d) +{ + gcc_checking_assert (TREE_CODE (pattern) == TEMPLATE_DECL); + + tree td = pattern; + tree code_pattern = DECL_TEMPLATE_RESULT (td); + + tree fn_context = decl_function_context (d); + if (LAMBDA_FUNCTION_P (d)) +/* tsubst_lambda_expr resolved any references to enclosing functions. */ +fn_context = NULL_TREE; + bool nested = current_function_decl != NULL_TREE; + bool push_to_top = !(nested && fn_context == current_function_decl); + + vec omp_privatization_save; + if (nested) +save_omp_privatization_clauses (omp_privatization_save); + + if (push_to_top) +push_to_top_level (); + else +{ + gcc_assert (!processing_template_decl); + push_function_context (); + cp_unevaluated_operand = 0; + c_inhibit_evaluation_warnings = 0; +} + + if (VAR_P (d)) +{ + /* The variable might be a lambda's extra scope, and that + lambda's visibility depends on D's. */ + maybe_commonize_var (d); + determine_visibility (d); +} + + /* Mark D as instantiated so that recursive calls to + instantiate_decl do not try to instantiate it again. */ + DECL_TEMPLATE_INSTANTIATED (d) = 1; + + /* Regenerate the declaration in case the template has been modified + by a subsequent redeclaration. */ + regenerate_decl_from_template (d, td, args); + + /* We already set the file and line above. Reset them now in case + they changed as a result of calling regenerate_decl_from_template. */ + input_location = DECL_SOURCE_LOCATION (d); + + if (VAR_P (d)) +{ + tree init; + bool const_init = false; + + /* Clear out DECL_RTL; whatever was there before may not be right + since we've reset the type of the declaration. */ + SET_DECL_RTL (d, NULL); + DECL_IN_AGGR_P (d) = 0; + + /* The initializer is placed in DECL_INITIAL by + regenerate_decl_from_template so we don't need to + push/pop_access_scope again here. Pull it out so that + cp_finish_decl can process it. */ + init = DECL_INITIAL (d); + DECL_INITIAL (d) = NULL_TREE; + DECL_INITIALIZED_P (d) = 0; + + /* Clear DECL_EXTERNAL so that cp_finish_decl will process the + initializer. That function will defer actual emission until + we have a chance to determine linkage. */ + DECL_EXTERNAL (d) = 0; + + /* Enter the scope of D so that access-checking works correctly. */ + bool enter_context = DECL_CLASS_SCOPE_P (d); + if (enter_context) +push_nested_class (DECL_CONTEXT (d)); + + const_init = DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (code_pattern); + cp_finish_decl (d, init, const_init, NULL_TREE, 0); + + if (enter_context) +pop_nested_class (); +} + else if (TREE_CODE (d) == FUNCTION_DECL && DECL_DEFAULTED_FN (code_pattern)) +synthesize_method (d); + else if (TREE_CODE (d) == FUNCTION_DECL) +{ + /* Set up the list of local specializations. */ + local_specialization_stack lss (push_to_top ? lss_blank : lss_copy); + tree block = NULL_TREE; + + /* Set up context. */ + if (DECL_OMP_DECLARE_REDUCTION_P (code_pattern) + && TREE_CODE (DECL_CONTEXT (code_pattern)) == FUNCTION_DECL) + block = push_stmt_list (); + else + start_preparsed_function (d, NULL_TREE, SF_PRE_PARSED); + + perform_instantiation_time_access_checks (code_pattern, args); + + /* Create substitution entries for the parameters. */ + register_parameter_specializations (code_pattern, d); + + /* Substitute into the body of the function. */ + if (DECL_OMP_DECLARE_REDUCTION_P (code_pattern)) + tsubst_omp_udr (DECL_SAVED_TREE (code_pattern), args, + tf_warning_or_error, DECL_TI_TEMPLATE (d)); + else + { + tsubst_expr (DECL_SAVED_TREE (code_pattern), args, + tf_warning_or_error, DECL_TI_TEMPLATE (d), + /*integral_constant_expression_p=*/false); + + /* Set the current input_location to the end of the function + so that finish_function knows where we are. */ + input_location + = DECL_STRUCT_FUNCTION (code_pattern)->function_end_locus; + + /* Remember if we saw an infinite loop in the template. */ +
Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts
Ping this patch set. Thanks, Chung-Lin On 2020/9/1 9:16 PM, Chung-Lin Tang wrote: Hi Jakub, this patch set implements parts of the target mapping changes introduced in OpenMP 5.0, mainly the attachment requirements for pointer-based list items, and the clause ordering. The first patch here are the C/C++ front-end changes. The entire set of changes has been tested for without regressions for the compiler and libgomp. Hope this is ready to commit to master. Thanks, Chung-Lin gcc/c-family/ * c-common.h (c_omp_adjust_clauses): New declaration. * c-omp.c (c_omp_adjust_clauses): New function. gcc/c/ * c-parser.c (c_parser_omp_target_data): Add use of new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as handled map clause kind. (c_parser_omp_target_enter_data): Likewise. (c_parser_omp_target_exit_data): Likewise. (c_parser_omp_target): Likewise. * c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. (c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and same struct field access to co-exist on OpenMP construct. gcc/cp/ * parser.c (cp_parser_omp_target_data): Add use of new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as handled map clause kind. (cp_parser_omp_target_enter_data): Likewise. (cp_parser_omp_target_exit_data): Likewise. (cp_parser_omp_target): Likewise. * semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix interaction between reference case and attach/detach. (finish_omp_clauses): Adjust bitmap checks to allow struct decl and same struct field access to co-exist on OpenMP construct.
Re: [Patch] Fortran: OpenMP - fix simd with (last)private (PR97061
On Wed, Sep 16, 2020 at 04:08:16PM +0200, Tobias Burnus wrote: > Using linear (added by the compiler) + explicit lastprivate gives an error. > > To quote Jakub from the PR: > "If it is the FE that adds the linear clause, then it > shouldn't when there is explicit lastprivate clause. > This is a new thing in OpenMP 5.0, in 4.5 it was invalid to > make the iterate anything but linear in this case, now > it can be made private or lastprivate." > > Actually, the check which permitted this was committed > early during GCC 11 development, but the some updates of > trans-openmp.c were missing. > > OK? Yes, thanks. > Fortran: OpenMP - fix simd with (last)private (PR97061) > > gcc/fortran/ChangeLog: > > PR fortran/97061 > * trans-openmp.c (gfc_trans_omp_do): Handle simd with (last)private. > > gcc/testsuite/ChangeLog: > > PR fortran/97061 > * gfortran.dg/gomp/openmp-simd-6.f90: New test. Jakub
[Patch] Fortran: OpenMP - fix simd with (last)private (PR97061
Using linear (added by the compiler) + explicit lastprivate gives an error. To quote Jakub from the PR: "If it is the FE that adds the linear clause, then it shouldn't when there is explicit lastprivate clause. This is a new thing in OpenMP 5.0, in 4.5 it was invalid to make the iterate anything but linear in this case, now it can be made private or lastprivate." Actually, the check which permitted this was committed early during GCC 11 development, but the some updates of trans-openmp.c were missing. OK? Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter Fortran: OpenMP - fix simd with (last)private (PR97061) gcc/fortran/ChangeLog: PR fortran/97061 * trans-openmp.c (gfc_trans_omp_do): Handle simd with (last)private. gcc/testsuite/ChangeLog: PR fortran/97061 * gfortran.dg/gomp/openmp-simd-6.f90: New test. gcc/fortran/trans-openmp.c | 37 -- gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 | 62 2 files changed, 85 insertions(+), 14 deletions(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 9ec0df204ac..378088a9d04 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -4401,20 +4401,29 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, if (clauses) { gfc_omp_namelist *n = NULL; - if (op != EXEC_OMP_DISTRIBUTE) - for (n = clauses->lists[(op == EXEC_OMP_SIMD && collapse == 1) -? OMP_LIST_LINEAR : OMP_LIST_LASTPRIVATE]; + if (op == EXEC_OMP_SIMD && collapse == 1) + for (n = clauses->lists[OMP_LIST_LINEAR]; n != NULL; n = n->next) if (code->ext.iterator->var->symtree->n.sym == n->sym) - break; - if (n != NULL) - dovar_found = 1; - else if (n == NULL && op != EXEC_OMP_SIMD) + { + dovar_found = 3; + break; + } + if (n == NULL && op != EXEC_OMP_DISTRIBUTE) + for (n = clauses->lists[OMP_LIST_LASTPRIVATE]; + n != NULL; n = n->next) + if (code->ext.iterator->var->symtree->n.sym == n->sym) + { + dovar_found = 2; + break; + } + if (n == NULL) for (n = clauses->lists[OMP_LIST_PRIVATE]; n != NULL; n = n->next) if (code->ext.iterator->var->symtree->n.sym == n->sym) - break; - if (n != NULL) - dovar_found++; + { + dovar_found = 1; + break; + } } /* Evaluate all the expressions in the iterator. */ @@ -4512,7 +4521,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, if (orig_decls) TREE_VEC_ELT (orig_decls, i) = dovar_decl; - if (dovar_found == 2 + if (dovar_found == 3 && op == EXEC_OMP_SIMD && collapse == 1 && !simple) @@ -4536,7 +4545,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, omp_clauses = gfc_trans_add_clause (tmp, omp_clauses); } if (!simple) - dovar_found = 2; + dovar_found = 3; } else if (!dovar_found && !simple) { @@ -4544,7 +4553,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, OMP_CLAUSE_DECL (tmp) = dovar_decl; omp_clauses = gfc_trans_add_clause (tmp, omp_clauses); } - if (dovar_found == 2) + if (dovar_found > 1) { tree c = NULL; @@ -4612,7 +4621,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, } if (!simple) { - if (op != EXEC_OMP_SIMD) + if (op != EXEC_OMP_SIMD || dovar_found == 1) tmp = build_omp_clause (input_location, OMP_CLAUSE_PRIVATE); else if (collapse == 1) { diff --git a/gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 b/gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 new file mode 100644 index 000..361e0dad343 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 @@ -0,0 +1,62 @@ +! { dg-additional-options "-fdump-tree-original" } +! +! PR fortran/97061 + +integer function f3 (a1, b1, u) + implicit none + integer :: a1, b1, d1 + integer u(0:1023) + !$omp teams distribute parallel do simd default(none) firstprivate (a1, b1) shared(u) lastprivate(d1) + do d1 = a1, b1-1 + u(d1) = 5 + end do +end + +subroutine foo(n, m, u) + implicit none + integer :: hh, ii, jj, n, m + integer u(0:1023) + !$omp simd private(ii) + do ii = n, m +u(ii) = 5 + end do + !$omp simd linear(jj:1) + do jj = 2, m+n +u(jj) = 6 + end do + !$omp simd + do hh = 2, m+n +u(hh) = 6 + end do +end + +subroutine bar(n, m, u) + implicit none + integer :: kkk, lll, ooo, ppp, n, m + integer u(:,:) + !$omp simd lastprivate(kkk) lastprivate(lll) collapse(2) + do kkk = n, m +do lll = n, m + u(kkk, lll) = 5 +end do + end do + !$omp simd private(kkk) private(lll) collapse(2) + do ooo = n, m +do ppp = n, m + u(ooo, ppp) = 5 +end do + end do +end + + +! { dg-final { scan-tree-dump-times "#pragma omp teams
[PATCH, OpenMP 5, C++] Implement implicit mapping of this[:1] (PR92120)
Hi Jakub, this patch implements automatically adding map(tofrom: this[:1]) to omp target regions inside non-static member functions, as specified in OpenMP 5.0. This patch factors away some parts of cp_parser_omp_target, into a new finish_omp_target function, and implements the new clause adding there. For target regions in normal non-static member functions, the case is more simple. For the inside lambda function case, this is implemented by copying the entire __closure as a "to" map first (and yeah, this patch allows target regions inside lambda functions to largely work, but since it's just a copying of __closure, the capture by reference case still shouldn't work yet). __closure->this is then implemented by an always_pointer map clause. I've added two testcases, as both compiler scan testcases and libgomp executable test. Testing of g++ and libgomp both are regression free with nvptx offloading. Is this okay for trunk? Thanks, Chung-Lin 2020-09-16 Chung-Lin Tang PR middle-end/92120 gcc/cp/ * cp-tree.h (finish_omp_target): New declaration. (set_omp_target_this_expr): Likewise. * lambda.c (lambda_expr_this_capture): Add call to set_omp_target_this_expr. * parser.c (cp_parser_omp_target): Factor out code, change to call finish_omp_target, add re-initing call to set_omp_target_this_expr. * semantics.c (omp_target_this_expr): New static variable. (finish_non_static_data_member): Add call to set_omp_target_this_expr. (finish_this_expr): Likewise. (set_omp_target_this_expr): New function to set omp_target_this_expr. (finish_omp_target): New function with code merged from cp_parser_omp_target, plus code to add this and __closure map clauses for OpenMP. gcc/testsuite/ * g++.dg/gomp/target-this-1.C: New testcase. * g++.dg/gomp/target-this-2.C: New testcase. libgomp/testsuite/ * libgomp.c++/target-this-1.C: New testcase. * libgomp.c++/target-this-2.C: New testcase. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 6e4de7d0c4b..81e72449856 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7241,6 +7241,11 @@ extern tree finish_omp_structured_block (tree); extern tree finish_oacc_data (tree, tree); extern tree finish_oacc_host_data (tree, tree); extern tree finish_omp_construct (enum tree_code, tree, tree); + +extern tree finish_omp_target (location_t, tree, tree, bool); +extern void set_omp_target_this_expr (tree); + + extern tree begin_omp_parallel (void); extern tree finish_omp_parallel(tree, tree); extern tree begin_omp_task (void); diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index c94fe8edb8e..aea5f5adc52 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -842,6 +842,9 @@ lambda_expr_this_capture (tree lambda, int add_capture_p) type cast (_expr.cast_ 5.4) to the type of 'this'. [ The cast ensures that the transformed expression is an rvalue. ] */ result = rvalue (result); + + /* Acknowledge to OpenMP target that 'this' was referenced. */ + set_omp_target_this_expr (result); } return result; diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index fba3fcc0c4c..46de8e6cb65 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -40742,8 +40742,6 @@ static bool cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, enum pragma_context context, bool *if_p) { - tree *pc = NULL, stmt; - if (flag_openmp) omp_requires_mask = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED); @@ -40796,6 +40794,7 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, keep_next_level (true); tree sb = begin_omp_structured_block (), ret; unsigned save = cp_parser_begin_omp_structured_block (parser); + set_omp_target_this_expr (NULL_TREE); switch (ccode) { case OMP_TEAMS: @@ -40847,15 +40846,9 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc; } } - tree stmt = make_node (OMP_TARGET); - TREE_TYPE (stmt) = void_type_node; - OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET]; - OMP_TARGET_BODY (stmt) = body; - OMP_TARGET_COMBINED (stmt) = 1; - SET_EXPR_LOCATION (stmt, pragma_tok->location); - add_stmt (stmt); - pc = _TARGET_CLAUSES (stmt); - goto check_clauses; + finish_omp_target (pragma_tok->location, +cclauses[C_OMP_CLAUSE_SPLIT_TARGET], body, true); + return true; } else if (!flag_openmp) /* flag_openmp_simd */ { @@ -40892,46 +40885,13 @@
[PATCH] Canonicalize real X - CST to X + -CST
This removes the canonicalization of X + -CST to X - CST to facilitate SLP vectorization analysis where we currently treat the added testcase as two-operation plus blend vectorization opportunity. While there may be the possibility to avoid this in the vectorizer itself the canonicalization is somewhat premature - it's motivation is that a FP constant of both signs is common and thus canonicalizing to positive values optimizes constant pool. But if you mix, say, x * -7.1 and x - 7.1 this backfires - this instead looks like a local optimization to be applied in RTL CSE or even LRA load inheritance. I filed PR97071 for this. Bootstrapped and tested on x86_64-unknown-linux-gnu, any objections? Thanks, Richard. 2020-09-16 Richard Biener * fold-const.c (fold_binary_loc): Remove condition on REAL_CST for A - B -> A + (-B) transform. (negate_expr_p): Remove condition on REAL_CST. * match.pd (negate_expr_p): Likewise. (X + -C -> X - C): Remove. * gcc.dg/vect/slp-49.c: New testcase. * gcc.dg/tree-ssa/pr90356-1.c: Adjust. * gcc.dg/tree-ssa/pr90356-3.c: Likewise. * gcc.dg/tree-ssa/pr90356-4.c: Likewise. * gcc.target/i386/pr54855-3.c: Likewise. * gfortran.dg/reassoc_1.f90: Likewise. * gfortran.dg/reassoc_2.f90: Likewise. --- gcc/fold-const.c | 7 ++- gcc/match.pd | 11 +-- gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr90356-4.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-49.c| 15 +++ gcc/testsuite/gcc.target/i386/pr54855-3.c | 2 +- gcc/testsuite/gfortran.dg/reassoc_1.f90 | 2 +- gcc/testsuite/gfortran.dg/reassoc_2.f90 | 2 +- 9 files changed, 25 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/slp-49.c diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 0cc80adf632..ec369169a4c 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -401,7 +401,7 @@ negate_expr_p (tree t) case REAL_CST: /* We want to canonicalize to positive real constants. Pretend that only negative ones can be easily negated. */ - return REAL_VALUE_NEGATIVE (TREE_REAL_CST (t)); + return true; case COMPLEX_CST: return negate_expr_p (TREE_REALPART (t)) @@ -10962,10 +10962,7 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type, /* A - B -> A + (-B) if B is easily negatable. */ if (negate_expr_p (op1) && ! TYPE_OVERFLOW_SANITIZED (type) - && ((FLOAT_TYPE_P (type) - /* Avoid this transformation if B is a positive REAL_CST. */ - && (TREE_CODE (op1) != REAL_CST - || REAL_VALUE_NEGATIVE (TREE_REAL_CST (op1 + && (FLOAT_TYPE_P (type) || INTEGRAL_TYPE_P (type))) return fold_build2_loc (loc, PLUS_EXPR, type, fold_convert_loc (loc, type, arg0), diff --git a/gcc/match.pd b/gcc/match.pd index 7d63bb973cb..9f14d50ae2d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1307,8 +1307,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (negate @0) (if (!TYPE_OVERFLOW_SANITIZED (type (match negate_expr_p - REAL_CST - (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (t) + REAL_CST) /* VECTOR_CST handling of non-wrapping types would recurse in unsupported ways. */ (match negate_expr_p @@ -3326,14 +3325,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Canonicalization of binary operations. */ -/* Convert X + -C into X - C. */ -(simplify - (plus @0 REAL_CST@1) - (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1))) - (with { tree tem = const_unop (NEGATE_EXPR, type, @1); } - (if (!TREE_OVERFLOW (tem) || !flag_trapping_math) -(minus @0 { tem; }) - /* Convert x+x into x*2. */ (simplify (plus @0 @0) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c index c3a15ea21af..ac95ec56810 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c @@ -2,8 +2,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fno-rounding-math -fsignaling-nans -fsigned-zeros -fdump-tree-optimized" } */ /* { dg-final { scan-tree-dump-times "x_\[0-9]*.D. \\+ 0.0;" 12 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "y_\[0-9]*.D. - 0.0;" 4 "optimized" } } */ -/* { dg-final { scan-tree-dump-times " \[+-] 0.0;" 16 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "y_\[0-9]*.D. \\+ -0.0;" 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \\+ -?0.0;" 16 "optimized" } } */ double f1 (double x) { return (x + 0.0) + 0.0; } double f2 (double y) { return (y + (-0.0)) + (-0.0); } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c index e658130c69f..951971b95ee 100644 ---
Re: [PATCH V2] vec: don't select partial vectors when looping on full vectors
Richard Sandiford writes: > OK with two incredibly petty comments fixed: [...] Installed with the two suggestions as 052204fac58. Thanks for reviewing! Andrea
[PATCH] IRA: Don't make a global register eliminable
On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford wrote: > > Thanks for looking at this. > > "H.J. Lu" writes: > > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c > > Author: Richard Sandiford > > Date: Wed Oct 2 07:37:10 2019 + > > > > [LRA] Don't make eliminable registers live (PR91957) > > > > didn't make eliminable registers live which breaks > > > > register void *cur_pro asm("reg"); > > > > where "reg" is an eliminable register. Make fixed eliminable registers > > live to fix it. > > I don't think fixedness itself is the issue here: it's usual for at > least some registers involved in eliminations to be fixed registers. > > I think what makes this case different is instead that cur_pro/ebp > is a global register. But IMO things have already gone wrong if we > think that a global register is eliminable. > > So I wonder if instead we should check global_regs at the beginning of: > > for (i = 0; i < fp_reg_count; i++) > if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, > HARD_FRAME_POINTER_REGNUM + i)) > { > SET_HARD_REG_BIT (eliminable_regset, > HARD_FRAME_POINTER_REGNUM + i); > if (frame_pointer_needed) > SET_HARD_REG_BIT (ira_no_alloc_regs, > HARD_FRAME_POINTER_REGNUM + i); > } > else if (frame_pointer_needed) > error ("%s cannot be used in % here", > reg_names[HARD_FRAME_POINTER_REGNUM + i]); > else > df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true); > > (ira_setup_eliminable_regset), and handle the global_regs[] case in > the same way as the else case, i.e. short-circuiting both of the ifs. > Like this? Thanks. -- H.J. From d1c852691d245d90666c1b95fc7f8db2b241 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 15 Sep 2020 04:17:54 -0700 Subject: [PATCH] IRA: Don't make a global register eliminable commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c Author: Richard Sandiford Date: Wed Oct 2 07:37:10 2019 + [LRA] Don't make eliminable registers live (PR91957) didn't make eliminable registers live which breaks register void *cur_pro asm("reg"); where "reg" is an eliminable register. Update ira_setup_eliminable_regset to avoid making global registers eliminable. gcc/ PR middle-end/97054 * ira.c (ira_setup_eliminable_regset): Don't make a global register eliminable. gcc/testsuite/ PR middle-end/97054 * g++.target/i386/pr97054.C: New test. --- gcc/ira.c | 30 gcc/testsuite/g++.target/i386/pr97054.C | 96 + 2 files changed, 114 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr97054.C diff --git a/gcc/ira.c b/gcc/ira.c index a759f3c2aa8..506c6e6783e 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -2309,21 +2309,27 @@ ira_setup_eliminable_regset (void) } if (!HARD_FRAME_POINTER_IS_FRAME_POINTER) { + /* NB: Don't make a global register eliminable. */ for (i = 0; i < fp_reg_count; i++) - if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, -HARD_FRAME_POINTER_REGNUM + i)) + if (global_regs[HARD_FRAME_POINTER_REGNUM + i]) + df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true); + else { - SET_HARD_REG_BIT (eliminable_regset, - HARD_FRAME_POINTER_REGNUM + i); - if (frame_pointer_needed) - SET_HARD_REG_BIT (ira_no_alloc_regs, -HARD_FRAME_POINTER_REGNUM + i); + if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, +HARD_FRAME_POINTER_REGNUM + i)) + { + SET_HARD_REG_BIT (eliminable_regset, + HARD_FRAME_POINTER_REGNUM + i); + if (frame_pointer_needed) + SET_HARD_REG_BIT (ira_no_alloc_regs, +HARD_FRAME_POINTER_REGNUM + i); + } + else if (frame_pointer_needed) + error ("%s cannot be used in % here", + reg_names[HARD_FRAME_POINTER_REGNUM + i]); + else + df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true); } - else if (frame_pointer_needed) - error ("%s cannot be used in % here", - reg_names[HARD_FRAME_POINTER_REGNUM + i]); - else - df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true); } } diff --git a/gcc/testsuite/g++.target/i386/pr97054.C b/gcc/testsuite/g++.target/i386/pr97054.C new file mode 100644 index 000..d0693af2a42 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr97054.C @@ -0,0 +1,96 @@ +// { dg-do run { target { ! ia32 } } } +// { dg-require-effective-target fstack_protector } +// { dg-options "-O2 -fno-strict-aliasing -msse4.2 -mfpmath=sse -fPIC -fstack-protector-strong -O2" } + +struct p2_icode *ipc; +register int pars asm("r13"); +register struct processor *cur_pro asm("rbp"); +register int a asm("rbx"); +register int c asm("r14"); +typedef long lina_t; +typedef long la_t; +typedef processor processor_t; +typedef p2_icode p2_icode_t; +typedef enum { + P2_Return_Action_Next, +}
[PATCH] rs6000: Add rs6000_cfun_pcrel_p
This is a cleanup requested by Segher in a previous review. Most uses of rs6000_pcrel_p are called for the current function. A specialized version for cfun is more efficient for these uses. (Actually all uses are called for the current function, so now rs6000_pcrel_p is an orphan. However, I think it's worth keeping around in case we have a late discovery where we need it.) Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this okay for trunk? Thanks, Bill 2020-09-16 Bill Schmidt gcc/ * config/rs6000/predicates.md (current_file_function_operand): Use rs6000_cfun_pcrel_p. * config/rs6000/rs6000-logue.c (rs6000_decl_ok_for_sibcall): Likewise. (rs6000_global_entry_point_prologue_needed_p): Likewise. (rs6000_output_function_prologue): Likewise. * config/rs6000/rs6000-protos.h (rs6000_cfun_pcrel_p): New prototype. * config/rs6000/rs6000.c (rs6000_legitimize_tls_address): Use rs6000_cfun_pcrel_p. (rs6000_indirect_call_template_1): Likewise. (rs6000_longcall_ref): Likewise. (rs6000_call_aix): Likewise. (rs6000_sibcall_aix): Likewise. (rs6000_cfun_pcrel_p): New function. * config/rs6000/rs6000.md (*pltseq_plt_pcrel): Use rs6000_cfun_pcrel_p. (*call_local): Likewise. (*call_value_local): Likewise. (*call_nonlocal_aix): Likewise. (*call_value_nonlocal_aix): Likewise. (*call_indirect_pcrel): Likewise. (*call_value_indirect_pcrel): Likewise. --- gcc/config/rs6000/predicates.md | 2 +- gcc/config/rs6000/rs6000-logue.c | 6 +++--- gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.c| 32 --- gcc/config/rs6000/rs6000.md | 14 +++--- 5 files changed, 33 insertions(+), 22 deletions(-) diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 2709e46f7e5..974a045cce0 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1056,7 +1056,7 @@ (define_predicate "current_file_function_operand" && SYMBOL_REF_DECL (op) != NULL && TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL && (rs6000_fndecl_pcrel_p (SYMBOL_REF_DECL (op)) -!= rs6000_pcrel_p (cfun)))"))) +!= rs6000_cfun_pcrel_p ()))"))) ;; Return 1 if this operand is a valid input for a move insn. (define_predicate "input_operand" diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c index 5a2cb7fdf2c..7534b7f17dd 100644 --- a/gcc/config/rs6000/rs6000-logue.c +++ b/gcc/config/rs6000/rs6000-logue.c @@ -1085,7 +1085,7 @@ rs6000_decl_ok_for_sibcall (tree decl) r2 for its caller's TOC. Such a function may make sibcalls to any function, whether local or external, without restriction based on TOC-save/restore rules. */ - if (rs6000_pcrel_p (cfun)) + if (rs6000_cfun_pcrel_p ()) return true; /* Otherwise, under the AIX or ELFv2 ABIs we can't allow sibcalls @@ -2562,7 +2562,7 @@ rs6000_global_entry_point_prologue_needed_p (void) return false; /* PC-relative functions never generate a global entry point prologue. */ - if (rs6000_pcrel_p (cfun)) + if (rs6000_cfun_pcrel_p ()) return false; /* Ensure we have a global entry point for thunks. ??? We could @@ -3978,7 +3978,7 @@ rs6000_output_function_prologue (FILE *file) fputs ("\n", file); } - else if (rs6000_pcrel_p (cfun)) + else if (rs6000_cfun_pcrel_p ()) { const char *name = XSTR (XEXP (DECL_RTL (current_function_decl), 0), 0); /* All functions compiled to use PC-relative addressing will diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 02e4d71922f..e75f8944d56 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -153,6 +153,7 @@ extern rtx rs6000_allocate_stack_temp (machine_mode, bool, bool); extern align_flags rs6000_loop_align (rtx); extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool); extern bool rs6000_pcrel_p (struct function *); +extern bool rs6000_cfun_pcrel_p (void); extern bool rs6000_fndecl_pcrel_p (const_tree); /* Different PowerPC instruction formats that are used by GCC. There are diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6d0c5509b0e..f4b3e417fdd 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -8780,7 +8780,7 @@ rs6000_legitimize_tls_address (rtx addr, enum tls_model model) dest = gen_reg_rtx (Pmode); if (model == TLS_MODEL_LOCAL_EXEC - && (rs6000_tls_size == 16 || rs6000_pcrel_p (cfun))) + && (rs6000_tls_size == 16 || rs6000_cfun_pcrel_p ())) { rtx tlsreg; @@ -8827,7 +8827,7 @@
Re: [PATCH] debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure
On Wed, Sep 16, 2020 at 02:33:03PM +0200, Mark Wielaard wrote: > > 2020-09-15 Jakub Jelinek > > > > * configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG, > > HAVE_AS_WORKING_DWARF_4_FLAG): New tests. > > * gcc.c (ASM_DEBUG_DWARF_OPTION): Define. > > (ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of > > "--gdwarf2". Use %{cond:opt1;:opt2} style. > > (ASM_DEBUG_OPTION_DWARF_OPT): Define. > > (ASM_DEBUG_OPTION_SPEC): Define. > > (asm_debug_option): New variable. > > (asm_options): Add "%(asm_debug_option)". > > (static_specs): Add asm_debug_option entry. > > (static_spec_functions): Add dwarf-version-gt. > > (debug_level_greater_than_spec_func): New function. > > * config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define. > > * config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine. > > * config.in: Regenerated. > > * configure: Regenerated. > > I tested against the binutils-2_35-branch which will become 2.35.1 next > week. The configure tests succeed there. So I think you can change the > [elf,2,36,0] to [elf,2,35,1]. No problem with such a change, although it isn't that important, because most of the people don't use in-tree builds and for out of tree builds the actual test rather than version comparison is used instead. Jakub
Re: [PATCH] rs6000: Fix misnamed built-in
On 9/16/20 5:32 AM, Segher Boessenkool wrote: On Tue, Sep 15, 2020 at 08:38:42PM -0500, Bill Schmidt wrote: The description in rs6000-builtin.def provides for a builtin named __builtin_altivec_xst_len_r. However, it is hand-defined in altivec_init_builtins as __builtin_xst_len_r, against the usual naming practice. Fix that. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions; committed as obvious. Does no one use this? Should we provide an alias with the old (wrong) name? What does the documentation say it should be? It is obvious of course, except for that "might break users' code" thing ;-) Sorry, I should have mentioned this. It's unlikely that people are using the incorrect name, as it is undocumented, and hidden by the supported overloaded interfaces. The only documented interface is "vec_xl_len_r", which remains unchanged (and the internal hookup from that to this is unchanged by the patch). If anyone's using the undocumented incorrect interface, I'd rather that get discovered and fixed. Thanks, Bill Segher
[PATCH] libstdc++: use lt_host_flags for libstdc++.la
For platforms like Mingw and Cygwin, cygwin refuses to generate the shared library without using -no-undefined. Attached patch makes sure the right flags are used, since libtool is already used to link libstdc++. Patch OK? From 4ba039687182fccd67e1170f89e259e1c4a58eeb Mon Sep 17 00:00:00 2001 From: Jonathan Yong <10wa...@gmail.com> Date: Sun, 22 Mar 2020 09:56:58 +0800 Subject: [PATCH 1/1] libstdc++: use lt_host_flags for libstdc++.la Signed-off-by: Jonathan Yong <10wa...@gmail.com> --- libstdc++-v3/src/Makefile.am | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am index a139adc81b3..498f533f3d3 100644 --- a/libstdc++-v3/src/Makefile.am +++ b/libstdc++-v3/src/Makefile.am @@ -107,7 +107,7 @@ libstdc___la_DEPENDENCIES = \ libstdc___la_LDFLAGS = \ -version-info $(libtool_VERSION) ${version_arg} -lm -libstdc___la_LINK = $(CXXLINK) $(libstdc___la_LDFLAGS) +libstdc___la_LINK = $(CXXLINK) $(libstdc___la_LDFLAGS) $(lt_host_flags) # Use special rules for compatibility-ldbl.cc compilation, as we need to # pass -mlong-double-64. -- 2.11.4.GIT signature.asc Description: OpenPGP digital signature
Re: [PATCH] -mno-xsave should imply -mno-avx since -mavx implies -mxsave
> gcc/ChangeLog > > * common/config/i386/i386-common.c > (OPTION_MASK_ISA_AVX_UNSET): Remove OPTION_MASK_ISA_XSAVE_UNSET. > (OPTION_MASK_ISA_XSAVE_UNSET): Add OPTION_MASK_ISA_AVX_UNSET. > > gcc/testsuite/ChangeLog > > * gcc.target/i386/xsave-avx-1.c: New test. OK for mainline and backports. Thanks, Uros.
Re: [PATCH] Increase rtx_cost of sse_to_integer in skylake_cost.
> gcc/ChangeLog > > PR target/96861 > * config/i386/x86-tune-costs.h (skylake_cost): increase rtx > cost of sse_to_integer from 2 to 6. > > gcc/testsuite > > * gcc.target/i386/pr95021-3.c: Add -mtune=generic. OK. Thanks, Uros.
RE: [aarch64] Backport missing NEON intrinsics to GCC8
> -Original Message- > From: Christophe Lyon > Sent: 16 September 2020 13:43 > To: Pop, Sebastian > Cc: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org > Subject: Re: [aarch64] Backport missing NEON intrinsics to GCC8 > > On Wed, 16 Sep 2020 at 07:21, Pop, Sebastian via Gcc-patches > wrote: > > > > Thanks Kyrill for your review. > > > > I committed the patches to the gcc-8 branch: > > > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2c55e6caa9432b2c1f081 > cb3aeddd36abec03233 > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a4004f62d60ada3a20dbf301 > 46ca461047a575cc > > > > and to the gcc-9 branch: > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c5aca0333b723d5e2036659 > 3cd01047d105f54e4 > > > > Sebastian > > > > Hi Sebastian, > > The new tests vld1x3 and vld1x4 fail on arm, I am seeing new failures > on gcc-8 and gcc-9 branches > after r8-10451, r8-10452 and r9-8874. > Is that expected/fixed later in the backport series? > (on the release branches, my validations are running for every commit) Hmm, IIRC they're not implemented for arm IIRC so they should be xfailed or skipped on arm. Kyrill > > Thanks, > > Christophe > > > > On 9/15/20, 7:46 AM, "Kyrylo Tkachov" wrote: > > > > Hi Sebastian, > > > > This patch implements missing intrinsics. > > I'm okay with this being applied to the GCC 8 branch as these intrinsics > have been defined in ACLE for a long time. > > It is arguably a bug that they've been missing from GCC8. > > Their implementation is fairly self-contained we haven't had any bugs > reported against these in my recollection. > > > > So ok on the grounds that it's a bug-fix. > > Thanks, > > Kyrill > > > > From: Pop, Sebastian > > Sent: 11 September 2020 20:54 > > To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov > > > Subject: [aarch64] Backport missing NEON intrinsics to GCC8 > > > > Hi, > > > > gcc-8 branch is missing NEON intrinsics for loads and stores. > > Attached patches pass bootstrap and regression testing on Graviton2 > aarch64-linux. > > > > Ok to commit to gcc-8 branch? > > > > Thanks, > > Sebastian > >
Re: [aarch64] Backport missing NEON intrinsics to GCC8
On Wed, 16 Sep 2020 at 07:21, Pop, Sebastian via Gcc-patches wrote: > > Thanks Kyrill for your review. > > I committed the patches to the gcc-8 branch: > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2c55e6caa9432b2c1f081cb3aeddd36abec03233 > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a4004f62d60ada3a20dbf30146ca461047a575cc > > and to the gcc-9 branch: > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c5aca0333b723d5e20366593cd01047d105f54e4 > > Sebastian > Hi Sebastian, The new tests vld1x3 and vld1x4 fail on arm, I am seeing new failures on gcc-8 and gcc-9 branches after r8-10451, r8-10452 and r9-8874. Is that expected/fixed later in the backport series? (on the release branches, my validations are running for every commit) Thanks, Christophe > On 9/15/20, 7:46 AM, "Kyrylo Tkachov" wrote: > > Hi Sebastian, > > This patch implements missing intrinsics. > I'm okay with this being applied to the GCC 8 branch as these intrinsics > have been defined in ACLE for a long time. > It is arguably a bug that they've been missing from GCC8. > Their implementation is fairly self-contained we haven't had any bugs > reported against these in my recollection. > > So ok on the grounds that it's a bug-fix. > Thanks, > Kyrill > > From: Pop, Sebastian > Sent: 11 September 2020 20:54 > To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov > Subject: [aarch64] Backport missing NEON intrinsics to GCC8 > > Hi, > > gcc-8 branch is missing NEON intrinsics for loads and stores. > Attached patches pass bootstrap and regression testing on Graviton2 > aarch64-linux. > > Ok to commit to gcc-8 branch? > > Thanks, > Sebastian >
Re: [PATCH] debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure
Hi, On Tue, 2020-09-15 at 20:40 +0200, Jakub Jelinek wrote: > Ok, here it is in patch form. > I've briefly tested it, with the older binutils I have around (no --gdwarf-N > support), with latest gas (--gdwarf-N that can be passed to as even when > compiling C/C++ etc. code and emitting .debug_line) and latest gas with > Mark's fix > reverted (--gdwarf-N support, but can only pass it to as when assembling > user .s/.S files, not when compiling C/C++ etc.). > Will bootstrap/regtest (with the older binutils) later tonight. > > 2020-09-15 Jakub Jelinek > > * configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG, > HAVE_AS_WORKING_DWARF_4_FLAG): New tests. > * gcc.c (ASM_DEBUG_DWARF_OPTION): Define. > (ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of > "--gdwarf2". Use %{cond:opt1;:opt2} style. > (ASM_DEBUG_OPTION_DWARF_OPT): Define. > (ASM_DEBUG_OPTION_SPEC): Define. > (asm_debug_option): New variable. > (asm_options): Add "%(asm_debug_option)". > (static_specs): Add asm_debug_option entry. > (static_spec_functions): Add dwarf-version-gt. > (debug_level_greater_than_spec_func): New function. > * config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define. > * config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine. > * config.in: Regenerated. > * configure: Regenerated. I tested against the binutils-2_35-branch which will become 2.35.1 next week. The configure tests succeed there. So I think you can change the [elf,2,36,0] to [elf,2,35,1]. Also tested that they fail with an older binutils (2.32). Cheers, Mark
Re: [PATCH PR93334][RFC]Skip output dep if values stored are byte wise the same
On Tue, Sep 15, 2020 at 2:28 PM bin.cheng via Gcc-patches wrote: > > Hi, > As suggested by PR93334 comments, this patch adds an interface identifying > output dependence which can be skipped in terms of reordering and skip it in > loop distribution. It also adds a new test case. Any comment? Looks good to me. Thanks, Richard. > Thanks, > bin
Re: [PATCH] Fix testcases to avoid plusminus-with-convert pattern (PR 97066)
On Wed, Sep 16, 2020 at 10:54 AM Feng Xue OS via Gcc-patches wrote: > > With the new pattern rule (T)(A) +- (T)(B) -> (T)(A +- B), > some testcases are simplified and could not keep expected > code pattern as test-check. Minor changes are made to those > cases to avoid simplification effect of the rule. > > Tested on x86_64-linux and aarch64-linux. OK. > Feng > --- > 2020-09-16 Feng Xue > > gcc/testsuite/ > PR testsuite/97066 > * gcc.dg/ifcvt-3.c: Modified to suppress simplification. > * gcc.dg/tree-ssa/20030807-10.c: Likewise.
Re: [PATCH] libstdc++: use a link test to test for -Wl,-z,relro
On 9/13/20 3:37 PM, JonY wrote: > On 9/10/20 2:23 PM, JonY wrote: >> Do a link test instead of just a grep. The linker can >> support multiple targets, but not all targets can use it. >> >> Cygwin/MinGW ld can support ELF but the PE format for Windows itself >> does not support such a feature. Attached patch OK? >> >> I'm not confident with regenerating configure due to some unrelated >> changes added, can someone else help with that? >> > > Ping? > Ping 2? signature.asc Description: OpenPGP digital signature
Re: [PATCH] IBM Z: Fix *vec_tf_to_v1tf constraints
On 03.09.20 08:39, Ilya Leoshkevich wrote: > Bootstrapped (with BOOT_CFLAGS='-g -O2 -Wno-error=maybe-uninitialized') > and regtested on s390x-redhat-linux. Ok for master? > > > > Certain alternatives of *vec_tf_to_v1tf use "v" constraint for its > TFmode source operand. Therefore it is assigned to VEC_REGS class, and > when it is reloaded using *movtf_64, whose relevant alternatives need > FP_REGS, LRA loops and ICE happens. The reason is that register class > mismatch causes LRA to emit another reload, which triggers this issue > again. > > Fix by using "f" constraint, which is more appropriate for FP register > pairs anyway. > > gcc/ChangeLog: > > 2020-09-02 Ilya Leoshkevich > > * config/s390/vector.md(*vec_tf_to_v1tf): Use "f" instead of "v" > for the source operand. Ok. Thanks! Andreas > --- > gcc/config/s390/vector.md | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md > index 131bbda09bc..2573b7d980a 100644 > --- a/gcc/config/s390/vector.md > +++ b/gcc/config/s390/vector.md > @@ -567,7 +567,7 @@ (define_insn "*vec_splats_bswap_elem" > ; single vector register. > (define_insn "*vec_tf_to_v1tf" >[(set (match_operand:V1TF 0 "nonimmediate_operand" > "=v,v,R,v,v") > - (vec_duplicate:V1TF (match_operand:TF 1 "general_operand" > "v,R,v,G,d")))] > + (vec_duplicate:V1TF (match_operand:TF 1 "general_operand" > "f,R,f,G,d")))] >"TARGET_VX" >"@ > vmrhg\t%v0,%1,%N1 >
Re: openmp question
On Wed, Sep 16, 2020 at 08:00:01AM -0400, Nathan Sidwell wrote: > is libgomp/testsuite/libgomp.c++/udr-3.C well formed? > > Specifically V::baz? > > If you remove the templatedness and plug in typedefs for T & D on its only > instantiation (as the attached does), you get the following errors: > > devvm293:414>g++ -std=c++11 -c -fopenmp udr-3.C > udr-3.C: In member function 'void V::baz()': > udr-3.C:112:27: error: no matching function for call to 'W::W(int)' > 112 | initializer (omp_priv (12)) > | ^ I guess you're right that W::W(int) should be defined. And the question is why it isn't diagnosed during instantiation, sure. The initialized (omp_priv (12)) etc. clause means that for the type(s) for which the declare reduction is defined, when omp-low.c privatizes the variable, it should be done as if the source contained W var (12); This is done by attaching code to call the right ctors/dtors to the clauses and omp-low.c emitting those sequences. Jakub
openmp question
Jakub, is libgomp/testsuite/libgomp.c++/udr-3.C well formed? Specifically V::baz? If you remove the templatedness and plug in typedefs for T & D on its only instantiation (as the attached does), you get the following errors: devvm293:414>g++ -std=c++11 -c -fopenmp udr-3.C udr-3.C: In member function 'void V::baz()': udr-3.C:112:27: error: no matching function for call to 'W::W(int)' 112 | initializer (omp_priv (12)) | ^ udr-3.C:88:3: note: candidate: 'W::W()' 88 | W () : v (6) {} | ^ udr-3.C:88:3: note: candidate expects 0 arguments, 1 provided udr-3.C:85:8: note: candidate: 'constexpr W::W(const W&)' 85 | struct W |^ udr-3.C:85:8: note: no known conversion for argument 1 from 'int' to 'const W&' udr-3.C:115:26: error: no matching function for call to 'W::W(int)' 115 | initializer (omp_priv (9)) | ^ udr-3.C:88:3: note: candidate: 'W::W()' 88 | W () : v (6) {} | ^ udr-3.C:88:3: note: candidate expects 0 arguments, 1 provided udr-3.C:85:8: note: candidate: 'constexpr W::W(const W&)' 85 | struct W |^ udr-3.C:85:8: note: no known conversion for argument 1 from 'int' to 'const W&' It would seem to me you should get the same missing ctor errors when it's an instantiated template. And such errors are what I observe with my WIP dealing with my recent breakage of that test. (adding the missing ctor makes the test pass) Also, it seems cp_check_omp_declare_reduction (d) doesn;t set the reduction to a suitably formed error_mark_node on failure-- so downstream consumers can see the rejected cases. Is there a reason for that? nathan -- Nathan Sidwell // { dg-do run } extern "C" void abort (); void dblinit (double *p) { *p = 2.0; } namespace NS { template struct U { void foo (U &, bool); U (); }; template struct S { int s; #pragma omp declare reduction (foo : U<0>, S : omp_out.foo (omp_in, false)) #pragma omp declare reduction (foo : int : omp_out += omp_in) \ initializer (omp_priv = N + 2) #pragma omp declare reduction (foo : double : omp_out += omp_in) \ initializer (dblinit (_priv)) void baz (int v) { S s; int q = 0; if (s.s != 6 || v != 0) abort (); s.s = 20; double d = 4.0; #pragma omp parallel num_threads (4) reduction (foo : s, v, d) \ reduction (::NS::U::operator + : q) { if (s.s != 6 || q != 0 || v != N + 2 || d != 2.0) abort (); asm volatile ("" : "+m" (s.s), "+r" (q), "+r" (v)); s.s++; q++; v++; } if (s.s != 20 + q * 7 || (N + 3) * q != v || d != 4.0 + 2.0 * q) abort (); } void foo (S ) { s += x.s; } void foo (S , bool y) { s += x.s; if (y) abort (); } S (const S ) { s = x.s + 1; } S (const S , bool y) { s = x.s + 2; if (y) abort (); } S () { s = 6; } S (int x) { s = x; } ~S (); }; #pragma omp declare reduction (bar : S<1> : omp_out.foo (omp_in)) \ initializer (omp_priv (8)) } template NS::S::~S () { if (s < 6) abort (); s = -1; /* Ensure the above store is not DSEd. */ asm volatile ("" : : "r" () : "memory"); } template struct T : public NS::S { void baz () { NS::S s; int q = 0; if (s.s != 6) abort (); #pragma omp parallel num_threads (4) reduction (foo:s) \ reduction (+: q) { if (s.s != 6 || q != 0) abort (); asm volatile ("" : "+m" (s.s), "+r" (q)); s.s += 2; q++; } if (s.s != 6 + q * 8) abort (); } }; struct W { int v; W () : v (6) {} // ADD to fix: W (int i) : v (i) {} ~W () {} }; // Remove templatedness // template struct V { using T =NS::S<0>; // Types instantiated oer using D= double; // Likewise #pragma omp declare reduction (baz: T: omp_out.s += omp_in.s) \ initializer (omp_priv (11)) #pragma omp declare reduction (baz: D: omp_out += omp_in) \ initializer (dblinit (_priv)) static void dblinit (D *x) { *x = 3.0; } void baz () { T t; V v; int q = 0; D d = 4.0; if (t.s != 6 || v.v != 4) abort (); #pragma omp declare reduction (+ : V, W : omp_out.v -= omp_in.v) \ initializer (omp_priv (12)) { #pragma omp declare reduction (+ : W, V : omp_out.v += omp_in.v) \ initializer (omp_priv (9)) #pragma omp parallel num_threads (4) reduction (+: v, q) \ reduction (baz: t, d) { if (t.s != 11 || v.v != 9 || q != 0 || d != 3.0) abort (); asm volatile ("" : "+m" (t.s), "+m" (v.v), "+r" (q)); t.s += 2; v.v += 3; q++; } if (t.s != 6 + 13 * q || v.v != 4 + 12 * q || d != 4.0 + 3.0 * q) abort (); } } int v; V () : v (4) {} V (int x) : v (x) {} ~V () {} }; int main () { NS::S<0> u; u.baz (0); T<2> t; t.baz (); NS::S<1> s; int q = 0; if (s.s != 6) abort (); // Test ADL #pragma omp parallel num_threads (4) reduction (bar:s) reduction (+:q) { if (s.s != 8 || q != 0) abort (); asm volatile ("" : "+m" (s.s), "+r" (q));
Re: [PATCH] Ignore the clobbered stack pointer in asm statment
On Wed, Sep 16, 2020 at 12:34:50PM +0100, Richard Sandiford wrote: > Jakub Jelinek via Gcc-patches writes: > > On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote: > >> Something like this for GCC 8 and 9. > > > > Guess my preference would be to do this everywhere and then let's discuss if > > we change the warning into error there or keep it being deprecated. > > Agreed FWIW. On turning it into an error: I think it might be better > to wait a bit longer if we can. Ok. The patch is ok for trunk and affected release branches after a week. > > Though, let's see what others want to say about this. > > > >> Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack > >> pointer is clobbered by asm statement. > >> > >> gcc/ > >> > >>PR target/97032 > >>* cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm > >>to true if the stack pointer is clobbered by asm statement. > >>* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm. > >>* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true > >>if the stack pointer is clobbered by asm statement. > >> > >> gcc/testsuite/ > >> > >>PR target/97032 > >>* gcc.target/i386/pr97032.c: New test. Jakub
[PATCH] C-SKY: Add -msim option
gcc/ChangeLog: * config/csky/csky.opt (msim): New. * doc/invoke.texi (C-SKY Options): Document -msim. * config/csky/csky-elf.h (LIB_SPEC): Add simulator runtime. --- gcc/config/csky/csky-elf.h | 10 -- gcc/config/csky/csky.opt | 4 gcc/doc/invoke.texi| 7 ++- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/gcc/config/csky/csky-elf.h b/gcc/config/csky/csky-elf.h index 15a0e73..a79d757 100644 --- a/gcc/config/csky/csky-elf.h +++ b/gcc/config/csky/csky-elf.h @@ -70,8 +70,14 @@ %{EL:-EL} -X" #undef LIB_SPEC -#define LIB_SPEC \ - "%{pthread:-lpthread} -lc %{mccrt:-lcc-rt}" +#define LIB_SPEC "\ +%{pthread:-lpthread} \ +--start-group \ +-lc \ +%{msim:-lsemi}%{!msim:-lnosys} \ +--end-group \ +%{mccrt:-lcc-rt} \ +" /* FIXME add this to LIB_SPEC when need */ /* %{!shared:%{profile:-lc_p}%{!profile:-lc}}" */ diff --git a/gcc/config/csky/csky.opt b/gcc/config/csky/csky.opt index 60b51e5..505a764 100644 --- a/gcc/config/csky/csky.opt +++ b/gcc/config/csky/csky.opt @@ -192,3 +192,7 @@ Set the branch costs to roughly the specified number of instructions. msched-prolog Target Report Var(flag_sched_prolog) Init(0) Permit scheduling of function prologue and epilogue sequences. + +msim +Target +Use the simulator runtime. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6d9ff2c..9176c83 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -829,7 +829,7 @@ Objective-C and Objective-C++ Dialects}. -mdsp -medsp -mvdsp @gol -mdiv -msmart -mhigh-registers -manchor @gol -mpushpop -mmultiple-stld -mconstpool -mstack-size -mccrt @gol --mbranch-cost=@var{n} -mcse-cc -msched-prolog} +-mbranch-cost=@var{n} -mcse-cc -msched-prolog -msim} @emph{Darwin Options} @gccoptlist{-all_load -allowable_client -arch -arch_errors_fatal @gol @@ -20835,6 +20835,11 @@ this option can result in code that is not compliant with the C-SKY V2 ABI prologue requirements and that cannot be debugged or backtraced. It is disabled by default. +@item -msim +@opindex msim +Links the library libsemi.a which is in compatible with simulator. Applicable +to ELF compiler only. + @end table @node Darwin Options -- 1.9.1
Re: [PATCH] aarch64: Fix ICE on fpsr fpcr getters [PR96968]
Andrea Corallo writes: > @@ -2034,6 +2034,16 @@ aarch64_expand_fpsr_fpcr_setter (int unspec, > machine_mode mode, tree exp) >emit_insn (gen_aarch64_set (unspec, mode, op)); > } > > +/* Expand a fpsr or fpcr getter (depending on UNSPEC) using MODE. > + Return the target. */ > +static rtx > +aarch64_expand_fpsr_fpcr_getter (int unspec, machine_mode mode) > +{ > + rtx target = gen_reg_rtx (mode); > + emit_insn (gen_aarch64_get (unspec, mode, target)); > + return target; > +} I agree this is functionally correct, but if a valid target has been given to the caller, it's generally better to use it. So IMO it would be better to use the expand_insn machinery, passing the original target to create_output_operand. Thanks, Richard > + > /* Expand an expression EXP that calls built-in function FCODE, > with result going to TARGET if that's convenient. IGNORE is true > if the result of the builtin is ignored. */ > @@ -2048,26 +2058,22 @@ aarch64_general_expand_builtin (unsigned int fcode, > tree exp, rtx target, >switch (fcode) > { > case AARCH64_BUILTIN_GET_FPCR: > - emit_insn (gen_aarch64_get (UNSPECV_GET_FPCR, SImode, target)); > - return target; > + return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPCR, SImode); > case AARCH64_BUILTIN_SET_FPCR: >aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPCR, SImode, exp); >return target; > case AARCH64_BUILTIN_GET_FPSR: > - emit_insn (gen_aarch64_get (UNSPECV_GET_FPSR, SImode, target)); > - return target; > + return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPSR, SImode); > case AARCH64_BUILTIN_SET_FPSR: >aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPSR, SImode, exp); >return target; > case AARCH64_BUILTIN_GET_FPCR64: > - emit_insn (gen_aarch64_get (UNSPECV_GET_FPCR, DImode, target)); > - return target; > + return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPCR, DImode); > case AARCH64_BUILTIN_SET_FPCR64: >aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPCR, DImode, exp); >return target; > case AARCH64_BUILTIN_GET_FPSR64: > - emit_insn (gen_aarch64_get (UNSPECV_GET_FPSR, DImode, target)); > - return target; > + return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPSR, DImode); > case AARCH64_BUILTIN_SET_FPSR64: >aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPSR, DImode, exp); >return target; > diff --git a/gcc/testsuite/gcc.target/aarch64/pr96968.c > b/gcc/testsuite/gcc.target/aarch64/pr96968.c > new file mode 100644 > index 000..21ffd955153 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr96968.c > @@ -0,0 +1,28 @@ > +/* { dg-options "-O1" } */ > + > +void > +fpsr_getter (void) > +{ > + unsigned int fpsr = __builtin_aarch64_get_fpsr (); > +} > + > +void > +fpsr64_getter (void) > +{ > + unsigned long fpsr = __builtin_aarch64_get_fpsr64 (); > +} > + > +void > +fpcr_getter (void) > +{ > + unsigned int fpcr = __builtin_aarch64_get_fpcr (); > +} > + > +void > +fpcr64_getter (void) > +{ > + unsigned long fpcr = __builtin_aarch64_get_fpcr64 (); > +} > + > +/* { dg-final { scan-assembler-times {\tmrs\tx0, fpsr\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tmrs\tx0, fpcr\n} 2 } } */
Re: [PATCH 3/3] C-SKY: Refine target name for elf target test
All of the three pathes have been pushed to trunk. Thanks, Cooper On 9/16/20 6:34 PM, Jojo R wrote: gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_profiling_available): Refine name of elf target. --- gcc/testsuite/lib/target-supports.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 6881b66..60f76db 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -698,7 +698,7 @@ proc check_profiling_available { test_what } { || [istarget avr-*-*] || [istarget bfin-*-*] || [istarget cris-*-*] -|| [istarget csky-*-elf] +|| [istarget csky-*-elf*] || [istarget fido-*-elf] || [istarget h8300-*-*] || [istarget lm32-*-*]
Re: [PATCH] Ignore the clobbered stack pointer in asm statment
Jakub Jelinek via Gcc-patches writes: > On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote: >> Something like this for GCC 8 and 9. > > Guess my preference would be to do this everywhere and then let's discuss if > we change the warning into error there or keep it being deprecated. Agreed FWIW. On turning it into an error: I think it might be better to wait a bit longer if we can. Richard > > Though, let's see what others want to say about this. > >> Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack >> pointer is clobbered by asm statement. >> >> gcc/ >> >> PR target/97032 >> * cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm >> to true if the stack pointer is clobbered by asm statement. >> * emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm. >> * config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true >> if the stack pointer is clobbered by asm statement. >> >> gcc/testsuite/ >> >> PR target/97032 >> * gcc.target/i386/pr97032.c: New test. > > Jakub
Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]
Can you supply a tar with tree dumps for me to look at please? Also, if you can check if the problem can be triggered without a collapsed loop (e.g. try removing collapse(2), remove mentions of d2) and if so supply dumps from that instead, I'd appreciate that too. Alexander On Wed, 16 Sep 2020, Tom de Vries wrote: > [ cc-ing author omp support for nvptx. ] > > On 9/16/20 12:39 PM, Tobias Burnus wrote: > > Hi Tom, hi Richard, hello all, > > > > @Richard: does it look okay from the ME side? > > @Tom: Can you check which IFN_GOMP_SIMT should be > > excluded with -ftracer? > > > > Pre-remark, I do not know much about SIMT – except that they > > only appear with nvptx and somehow relate to lanes on the > > GPU. > > > > In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows, > > if a basic block with GOMP_SIMT_VOTE_ANY in it is duplicated, > > which happens via -ftracer for this testcase, the result is wrong. > > > > The testcase ignores all the loop work but via "lastprivate" takes > > the upper loop bound (as assigned to the loop indices); instead of > > the expected 32*32 = 1024, either some number (like 4 or very large > > or negative) is returned. > > > > While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I > > have the feeling that at least GOMP_SIMT_LAST_LANE should be > > not copied - but I might be wrong. > > > > Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from > > that list – or any of the following added as well? > > GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, > > GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY, > > GOMP_SIMT_XCHG_IDX > > > > OK for mainline? > > > > Tobias > > > > - > > Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / > > Germany > > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, > > Alexander Walter >
Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]
On Wed, Sep 16, 2020 at 10:31:54AM +0200, Richard Biener wrote: > On Tue, Sep 15, 2020 at 6:18 PM Segher Boessenkool > wrote: > > > > On Tue, Sep 15, 2020 at 08:51:09AM +0200, Richard Biener wrote: > > > On Tue, Sep 15, 2020 at 5:56 AM luoxhu wrote: > > > > > u[n % 4] = i; > > > > > > > > > > I guess. Is the % 4 mandated by the vec_insert semantics btw? > > > > (As an aside -- please use "& 3" instead: that works fine if n is signed > > as well, but modulo doesn't. Maybe that is in the patch already, I > > didn't check, sorry.) > > > > > note this is why I asked about the actual CPU instruction - as I read > > > Seghers mail > > > the instruction modifies a vector register, not memory. > > > > But note that the builtin is not the same as the machine instruction -- > > here there shouldn't be a difference if compiling for a new enough ISA, > > but the builtin is available on anything with at least AltiVec. > > Well, given we're trying to improve instruction selection that very much > should depend on the ISA. Thus if the target says it cannot vec_set > to a register in a variable position then we don't want to pretend it can. Of course :-) We shouldn't look at what the machine insn can do, but also not at what the source level constructs can; we should just ask the target code itself. Segher
Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]
[ cc-ing author omp support for nvptx. ] On 9/16/20 12:39 PM, Tobias Burnus wrote: > Hi Tom, hi Richard, hello all, > > @Richard: does it look okay from the ME side? > @Tom: Can you check which IFN_GOMP_SIMT should be > excluded with -ftracer? > > Pre-remark, I do not know much about SIMT – except that they > only appear with nvptx and somehow relate to lanes on the > GPU. > > In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows, > if a basic block with GOMP_SIMT_VOTE_ANY in it is duplicated, > which happens via -ftracer for this testcase, the result is wrong. > > The testcase ignores all the loop work but via "lastprivate" takes > the upper loop bound (as assigned to the loop indices); instead of > the expected 32*32 = 1024, either some number (like 4 or very large > or negative) is returned. > > While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I > have the feeling that at least GOMP_SIMT_LAST_LANE should be > not copied - but I might be wrong. > > Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from > that list – or any of the following added as well? > GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, > GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY, > GOMP_SIMT_XCHG_IDX > > OK for mainline? > > Tobias > > - > Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / > Germany > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, > Alexander Walter