PING^2 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-09-16 Thread H.J. Lu via Gcc-patches
On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu  wrote:
>
> On Tue, May 19, 2020 at 5:14 AM H.J. Lu  wrote:
> >
> > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak  wrote:
> > >
> > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu  wrote:
> > > >
> > > > Duplicate the cmpstrn pattern for cmpmem.  The only difference is that
> > > > the length argument of cmpmem is guaranteed to be less than or equal to
> > > > lengths of 2 memory areas.  Since "repz cmpsb" can be much slower than
> > > > memcmp function implemented with vector instruction, see
> > > >
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > > >
> > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only.
> > >
> > > If there is no benefit compared to the library implementation, then
> > > enable these patterns only when -minline-all-stringops is used.
> >
> > Fixed.
> >
> > > Eventually these should be reimplemented with SSE4 string instructions.
> > >
> > > Honza is the author of the block handling x86 system, I'll leave the
> > > review to him.
> >
> > We used to expand memcmp to "repz cmpsb" via cmpstrnsi.  It was changed
> > by
> >
> > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f
> > Author: Nick Clifton 
> > Date:   Fri Aug 12 16:26:11 2011 +
> >
> > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern.
> >
> > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi
> > pattern.
> > * doc/md.texi (cmpstrn): Note that the comparison stops if both
> > fetched bytes are zero.
> > (cmpstr): Likewise.
> > (cmpmem): Note that the comparison does not stop if both of the
> > fetched bytes are zero.
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151
> >
> > is a regression.
> >
> > Honza, can you take a look at this?
> >
>
> PING:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html
>

PING.

-- 
H.J.


PING^2 [PATCH] x86: Inline strncmp only with -minline-all-stringops

2020-09-16 Thread H.J. Lu via Gcc-patches
On Wed, Aug 19, 2020 at 6:25 AM H.J. Lu  wrote:
>
> On Wed, Jul 15, 2020 at 10:42:27AM -0700, H.J. Lu wrote:
> > Expand strncmp to "repz cmpsb" only with -minline-all-stringops since
> > "repz cmpsb" can be much slower than strncmp function implemented with
> > vector instructions, see
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> >
> > gcc/
> >
> >   PR target/95458
> >   * config/i386/i386-expand.c (ix86_expand_cmpstrn_or_cmpmem):
> >   Return false for -mno-inline-all-stringops.
> >
> > gcc/testsuite/
> >
> >   PR target/95458
> >   * gcc.target/i386/pr95458-1.c: New test.
> >   * gcc.target/i386/pr95458-2.c: Likewise.
>
> Expand strncmp to "repz cmpsb" only with -minline-all-stringops since
> "repz cmpsb" can be much slower than strncmp function implemented with
> vector instructions, see
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
>
> gcc/
>
> PR target/95458
> * config/i386/i386.md (cmpstrnsi): Expand only with
> TARGET_INLINE_ALL_STRINGOPS.
>
> gcc/testsuite/
>
> PR target/95458
> * gcc.target/i386/pr95458-1.c: New test.
> * gcc.target/i386/pr95458-2.c: Likewise.
> ---
>  gcc/config/i386/i386.md   |  8 +++-
>  gcc/testsuite/gcc.target/i386/pr95458-1.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr95458-2.c |  7 +++
>  3 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr95458-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr95458-2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index fb677e17817..f3fbed81c4a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -18007,7 +18007,13 @@ (define_expand "cmpstrnsi"
>  {
>rtx addr1, addr2, countreg, align, out;
>
> -  if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS)
> +  /* Expand strncmp only with -minline-all-stringops since
> + "repz cmpsb" can be much slower than strncmp functions
> + implemented with vector instructions, see
> +
> + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> +   */
> +  if (!TARGET_INLINE_ALL_STRINGOPS)
>  FAIL;
>
>/* Can't use this if the user has appropriated ecx, esi or edi.  */
> diff --git a/gcc/testsuite/gcc.target/i386/pr95458-1.c 
> b/gcc/testsuite/gcc.target/i386/pr95458-1.c
> new file mode 100644
> index 000..231a4787dce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr95458-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -minline-all-stringops" } */
> +
> +int
> +func (char *d, unsigned int l)
> +{
> +  return __builtin_strncmp (d, "foo", l) ? 1 : 2;
> +}
> +
> +/* { dg-final { scan-assembler-not "call\[\\t \]*_?strncmp" } } */
> +/* { dg-final { scan-assembler "cmpsb" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr95458-2.c 
> b/gcc/testsuite/gcc.target/i386/pr95458-2.c
> new file mode 100644
> index 000..1a620444770
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr95458-2.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-inline-all-stringops" } */
> +
> +#include "pr95458-1.c"
> +
> +/* { dg-final { scan-assembler "call\[\\t \]*_?strncmp" } } */
> +/* { dg-final { scan-assembler-not "cmpsb" } } */
> --
> 2.26.2
>

PING.

-- 
H.J.


Re: [PATCH] fixincludes/fixfixes.c: Fix 'set but not used' warning.

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/11/20 6:21 AM, Christophe Lyon via Gcc-patches wrote:
> pz_tmp_base and pz_tmp_dot are always set, but used only when
> _PC_NAME_MAX is defined.
>
> This patch moves their declaration and definition undef #ifdef
> _PC_NAME_MAX to avoid this warning.
>
> 2020-09-11  Torbjörn SVENSSON  
>   Christophe Lyon  
>
>   fixincludes/
>   * fixfixes.c (pz_tmp_base, pz_tmp_dot): Define only with
>   _PC_NAME_MAX.

OK.  I assume Christophe can commit the change.


Jeff



pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] Compile gcc.target/i386/pr78904-4a.c with -mtune=generic

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/15/20 8:32 AM, H.J. Lu via Gcc-patches wrote:
> commit e95395926a84a2406faefe0995295d199d595440
> Author: Uros Bizjak 
> Date:   Thu Jun 18 20:12:48 2020 +0200
>
> i386: Fix mode of ZERO_EXTRACT RTXes, remove ext_register_operand 
> predicate.
>
> caused
>
> FAIL: gcc.target/i386/pr78904-4a.c scan-assembler [ \t]movb[\t ]+%.h, t
>
> when compiled with --target_board='unix{-m32\ -march=cascadelake}'.  With
> -mtune=generic:
>
>   movzwl  4(%esp), %edx
>   movl8(%esp), %eax
>   movb%dh, t(%eax)
>   ret
>
> With -mtune=cascadelake:
>
>   movzbl  5(%esp), %edx
>   movl8(%esp), %eax
>   movb%dl, t(%eax)
>   ret
>
> Add -mtune=generic for --target_board='unix{-m32\ -march=cascadelake}'.

OK with an appropriate ChangeLog entry on the commit.

jeff




pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] Increase rtx_cost of sse_to_integer in skylake_cost.

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/15/20 9:20 PM, Hongtao Liu via Gcc-patches wrote:
> Hi:
>   Rtx cost of sse_to_integer would be used by pass_stv as a
> measurement for the scalar-to-vector transformation. As
> https://gcc.gnu.org/pipermail/gcc-patches/2019-August/528839.html
> indicates, movement between sse regs and gprs should be much expensive
> than movement inside gprs(which is 2 as default). This patch would
> also fix "pr96861".
>
>   Bootstrap is ok, regression test is ok for both "i386.exp=*
> --target_board='unix{-m32,}'" and "i386.exp=*
> --target_board='unix{-m32\ -march=cascadelake,-m64\
> -march=cascadelake}"".
>   No big impact on SPEC2017.
>   Ok for trunk?
>
> gcc/ChangeLog
>
> PR target/96861
> * config/i386/x86-tune-costs.h (skylake_cost): increase rtx
> cost of sse_to_integer from 2 to 6.
>
> gcc/testsuite
>
> * gcc.target/i386/pr95021-3.c: Add -mtune=generic.

I'll defer to HJ's judgement here.  If he's OK with it, then it's fine
by me.


jeff




pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] gcc: Make strchr return value pointers const

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/7/20 6:13 PM, JonY via Gcc-patches wrote:
> On 9/4/20 12:47 PM, Martin Storsjö wrote:
>> Hi,
>>
>> On Fri, 4 Sep 2020, Jakub Jelinek wrote:
>>
>>> On Tue, Sep 01, 2020 at 04:01:42PM +0300, Martin Storsjö wrote:
 This fixes compilation of codepaths for dos-like filesystems
 with Clang. When built with clang, it treats C input files as C++
 when the compiler driver is invoked in C++ mode, triggering errors
 when the return value of strchr() on a pointer to const is assigned
 to a pointer to non-const variable.
>>> Not really specific to clang, e.g. glibc does that in its headers too
>>> as the C++ standard mandates that (and I guess mingw should do that too).
>>>
 This matches similar variables outside of the ifdefs for dos-like
 path handling.

 2020-09-01  Martin Storsjö  

 gcc/Changelog:
     * dwarf2out.c (file_name_acquire): Make a strchr return value
     pointer to const.

 libcpp/Changelog:
     * files.c (remap_filename): Make a strchr return value pointer
     to const.
>>> LGTM.  And it is short enough not to need copyright assignment, so ok for
>>> trunk.
>> Thanks! Can someone commit this for me?
>>
>> // Martin
> Ping can anyone commit this?
>
> Are platform maintainers allowed to push general changes like these? If
> so I can push soon.

If it's been ack'd by a maintainer, yes.  Jakub definitely qualifies as
a maintainer, so feel free to push it on Martin's behalf.


jeff

>


pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] doc: use @code{} instead of @samp{@command{}} around 'date %s'

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/11/20 4:41 PM, Sergei Trofimovich via Gcc-patches wrote:
> From: Sergei Trofimovich 
>
> Before the change 'man gcc' rendered "SOURCE_DATE_EPOCH" section as:
> ... the output of @command{date +%s} on GNU/Linux ...
> After the change it renders as:
> ... the output of "date +%s" on GNU/Linux ...
>
> gcc/ChangeLog:
>
>   * doc/cppenv.texi: Use @code{} instead of @samp{@command{}}
>   around 'date %s'.

OK

jeff




pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] document -Wuninitialized for allocated objects

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/15/20 11:06 AM, Martin Sebor via Gcc-patches wrote:
> The attached patch updates the manual to mention that Wuninitialized
> and -Wmaybe-uninitialized are issued for both auto and allocated
> objects, as well as for passing pointers to uninitialized objects
> to const-qualified parameters.  Both of these features are GCC 11
> enhancements.
>
> Martin
>
> patches-gcc-wuninit-allocated.diff
>
> Document -Wuninitialized for allocated objects.
>
> gcc/ChangeLog:
>
>   * doc/invoke.texi (-Wuninitialized): Document -Wuninitialized for
>   allocated objects.
>(-Wmaybe-uninitialized): Same.

OK

jeff




pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] work harder to avoid -Wuninitialized for empty structs (PR 96295)

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/15/20 1:00 PM, Martin Sebor via Gcc-patches wrote:
> The -Wuninitialized/-Wmaybe-uninitialized enhancement to warn when
> a pointer or reference to an uninitialized object is passed to
> a const-qualified function argument tries to avoid triggering for
> objects of empty types.  However, the suppression is incomplete
> and lets the warning trigger in some corner cases.  The attached
> patch extends the suppression to those as well.
>
> Tested on x86_64-linux.  I will plan to commit it later this week
> if there are no objections.
>
> Martin
>
> gcc-96295.diff
>
> Work harder to avoid -Wuninitialized for objects of empty structs (PR 
> middle-end/96295).
>
> Resolves:
> PR middle-end/96295 - -Wmaybe-uninitialized warning for range operator with
> reference to an empty struct
>
> gcc/ChangeLog:
>
>   PR middle-end/96295
>   * tree-ssa-uninit.c (maybe_warn_operand): Work harder to avoid
>   warning for objects of empty structs
>
> gcc/testsuite/ChangeLog:
>
>   PR middle-end/96295
>   * g++.dg/warn/Wuninitialized-11.C: New test.

FWIW, we had a build in Fedora rawhide fail just in the last couple
weeks because of the enhanced uninitialized warning.  I've already
forgotten the package, but it was a true positive, though sadly it was
just in the package's testsuite and thus we didn't actually find/fix a
user visible package bug.


OK for the trunk.  Thanks.


jeff




pEpkey.asc
Description: application/pgp-keys


Re: [RS6000] rs6000_rtx_costs cost IOR

2020-09-16 Thread Alan Modra via Gcc-patches
On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote:
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.
> 
> >  case IOR:
> > -  /* FIXME */
> >*total = COSTS_N_INSNS (1);
> > -  return true;
> 
> Hey this was okay for over five years :-)
> 
> > +  left = XEXP (x, 0);
> > +  if (GET_CODE (left) == AND
> > + && CONST_INT_P (XEXP (left, 1)))
> 
> Add a comment that this is the integer insert insns?
> 
> > + // rotlsi3_insert_5
> 
> But use /* comments */.
> 
> > + /* Test both regs even though the one in the mask is
> > +constrained to be equal to the output.  Increasing
> > +cost may well result in rejecting an invalid insn
> > +earlier.  */
> 
> Is that ever actually useful?

Possibly not in this particular case, but I did see cases where
invalid insns were rejected early by costing non-reg sub-expressions.

Beside that, the default position on rtx_costs paths that return true
should be to cost any sub-expressions unless you know for sure they
are zero cost.  And yes, we fail to do that for some cases,
eg. mul_highpart.

> So this new block is pretty huge.  Can it easily be factored to a
> separate function?  Just the insert insns part, not all IOR.

Done in my local tree.

> Okay for trunk with the comments changed to the correct syntax, and
> factoring masked insert out to a separate function pre-approved if you
> want to do that.  Thanks!

I'll hold off committing until the whole rtx_costs patch series is
reviewed (not counting the rotate_and_mask_constant patch).

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/15/20 1:47 PM, Martin Sebor wrote:
> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
> can lead to a subsequent buffer overflow corrupting the heap or
> stack.  The attached patch diagnoses a subset of these cases where
> the overflow/wraparound is still detectable.
>
> Besides regtesting GCC on x86_64-linux I also verified the warning
> doesn't introduce any false positives into Glibc or Binutils/GDB
> builds on the same target.
>
> Martin
>
> gcc-96838.diff
>
> PR middle-end/96838 - missing warning on integer overflow in calls to 
> allocation functions
>
> gcc/ChangeLog:
>
>   PR middle-end/96838
>   * calls.c (eval_size_vflow): New function.
>   (get_size_range): Call it.  Add argument.
>   (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>   * calls.h (get_size_range): Add argument.
>
> gcc/testsuite/ChangeLog:
>
>   PR middle-end/96838
>   * gcc.dg/Walloc-size-larger-than-19.c: New test.
>   * gcc.dg/Walloc-size-larger-than-20.c: New test.

If an attacker can control an integer overflow that feeds an allocation,
then they can do all kinds of bad things.  In fact, when my son was
asking me attack vectors, this is one I said I'd look at if I were a bad
guy.


I'm a bit surprised you can't just query the range of the argument and
get the overflow status out of that range, but I don't see that in the
APIs.  How painful would it be to make that part of the API?  The
conceptual model would be to just ask for the range of the argument to
malloc which would include the range and a status bit indicating the
computation might have overflowed.


jeff




pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] -mno-xsave should imply -mno-avx since -mavx implies -mxsave

2020-09-16 Thread Hongtao Liu via Gcc-patches
Thanks!

On Wed, Sep 16, 2020 at 8:57 PM Uros Bizjak  wrote:
>
> > gcc/ChangeLog
> >
> > * common/config/i386/i386-common.c
> > (OPTION_MASK_ISA_AVX_UNSET): Remove OPTION_MASK_ISA_XSAVE_UNSET.
> > (OPTION_MASK_ISA_XSAVE_UNSET): Add OPTION_MASK_ISA_AVX_UNSET.
> >
> > gcc/testsuite/ChangeLog
> >
> > * gcc.target/i386/xsave-avx-1.c: New test.
>
> OK for mainline and backports.
>
> Thanks,
> Uros.



-- 
BR,
Hongtao


Re: [PATCH] Increase rtx_cost of sse_to_integer in skylake_cost.

2020-09-16 Thread Hongtao Liu via Gcc-patches
Thanks.

On Wed, Sep 16, 2020 at 8:54 PM Uros Bizjak  wrote:
>
> > gcc/ChangeLog
> >
> > PR target/96861
> > * config/i386/x86-tune-costs.h (skylake_cost): increase rtx
> > cost of sse_to_integer from 2 to 6.
> >
> > gcc/testsuite
> >
> > * gcc.target/i386/pr95021-3.c: Add -mtune=generic.
>
> OK.
>
> Thanks,
> Uros.



-- 
BR,
Hongtao


Re: [PATCH] c++: premature analysis of requires-expression [PR96410]

2020-09-16 Thread Jason Merrill via Gcc-patches

On 9/16/20 6:11 PM, Patrick Palka wrote:

On Wed, 16 Sep 2020, Jason Merrill wrote:


On 9/16/20 1:34 PM, Patrick Palka wrote:

On Thu, 13 Aug 2020, Jason Merrill wrote:


On 8/13/20 11:21 AM, Patrick Palka wrote:

On Mon, 10 Aug 2020, Jason Merrill wrote:


On 8/10/20 2:18 PM, Patrick Palka wrote:

On Mon, 10 Aug 2020, Patrick Palka wrote:


In the below testcase, semantic analysis of the
requires-expressions
in
the generic lambda must be delayed until instantiation of the
lambda
because the requirements depend on the lambda's template
arguments.
But
tsubst_requires_expr does semantic analysis even during
regeneration
of
the lambda, which leads to various bogus errors and ICEs since
some
subroutines aren't prepared to handle dependent/template trees.

This patch adjusts subroutines of tsubst_requires_expr to avoid
doing
some problematic semantic analyses when processing_template_decl.
In particular, expr_noexcept_p generally can't be checked on a
dependent
expression.  Next, tsubst_nested_requirement should avoid checking
satisfaction when processing_template_decl.  And similarly for
convert_to_void (called from tsubst_valid_expression_requirement).


I wonder if, instead of trying to do a partial substitution into a
requires-expression at all, we want to use the
PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the
arguments for later satisfaction?


Like this?

-- >8 --

Subject: [PATCH] c++: requires-expressions and partial instantiation
[PR96410]

This patch makes tsubst_requires_expr avoid substituting into a
requires-expression when partially instantiating a generic lambda.  This
is necessary in general to ensure that we check its requirements in
lexical order (see the first testcase below).  Instead, template
arguments are saved in the requires-expression until all arguments can
be substituted into it at once, using a mechanism similar to
PACK_EXPANSION_EXTRA_ARGS.

This change incidentally also fixes the two mentioned PRs -- the problem
there is that tsubst_requires_expr was performing semantic checks on
templated trees, and some of the checks are not prepared to handle such
trees.  With this patch tsubst_requires_expr, no longer does any
semantic checking at all when processing_template_decl.

gcc/cp/ChangeLog:

PR c++/96409
PR c++/96410
* constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS
and REQUIRES_EXPR_REQS.  Use REQUIRES_EXPR_EXTRA_ARGS and
add_extra_args to defer substitution until we have all the
template arguments.
(finish_requires_expr): Adjust the call to build_min so that
REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE.
* cp-tree.def (REQUIRES_EXPR): Give it a third operand.
* cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS,
REQUIRES_EXPR_EXTRA_ARGS): Define.
(add_extra_args): Declare.

gcc/testsuite/ChangeLog:

PR c++/96409
PR c++/96410
* g++.dg/cpp2a/concepts-lambda13.C: New test.
* g++.dg/cpp2a/concepts-lambda14.C: New test.
---
   gcc/cp/constraint.cc  | 21 +++-
   gcc/cp/cp-tree.def|  8 +++---
   gcc/cp/cp-tree.h  | 16 
   .../g++.dg/cpp2a/concepts-lambda13.C  | 18 +
   .../g++.dg/cpp2a/concepts-lambda14.C  | 25 +++
   5 files changed, 79 insertions(+), 9 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda14.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index ad9d47070e3..9a06d763554 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2175,7 +2175,19 @@ tsubst_requires_expr (tree t, tree args,
 /* A requires-expression is an unevaluated context.  */
 cp_unevaluated u;
   -  tree parms = TREE_OPERAND (t, 0);
+  args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args);
+  if (processing_template_decl)
+{
+  /* We're partially instantiating a generic lambda.  Substituting into
+this requires-expression now may cause its requirements to get
+checked out of order, so instead just remember the template
+arguments and wait until we can substitute them all at once.  */
+  t = copy_node (t);
+  REQUIRES_EXPR_EXTRA_ARGS (t) = args;


Don't we need build_extra_args, in case the requires_expr is in a
local_specializations context like a function body?


Ah yes probably... though I haven't yet been able to come up with a
concrete testcase that demonstrates we need it.  It seems we get away
without it because a requires-expression is an unevaluated operand, and
many callers of retrieve_local_specialization have fallback code that
triggers only when inside an unevaluated operand, e.g.
tsubst_copy  does:

   r = retrieve_local_specialization (t);
   if (r == NULL_TREE)
 {
   ...

   /* This 

Re: libbacktrace: Add support for MiniDebugInfo

2020-09-16 Thread Ian Lance Taylor via Gcc-patches
On Wed, Sep 16, 2020 at 8:54 AM Alex Coplan  wrote:
>
> On 16/09/2020 07:26, Ian Lance Taylor wrote:
> > On Wed, Sep 16, 2020, 4:03 AM Alex Coplan  wrote:
> >
> > > Hi Ian,
> > >
> > > On 14/09/2020 14:12, Ian Lance Taylor via Gcc-patches wrote:
> > > > This patch to libbacktrace adds support for MiniDebugInfo, as
> > > > requested in PR 93608.
> > >
> > > This appears to introduce a failure in the libbacktrace testsuite
> > > (observed on both x86 and aarch64):
> > >
> > > ../gcc/libbacktrace/../test-driver: line 107:  7905 Segmentation fault
> > >   (core dumped) "$@" > $log_file 2>&1
> > > FAIL: mtest_minidebug
> >
> >
> >
> > I tested on x86 without seeing anything like this.  Can you give me any
> > more details?  Thanks.
>
> Sure. On an Ubuntu 18.04 / x86-64 system with current trunk, configuring
> with:
>
> ~/toolchain/src/gcc/configure \
>   --prefix=`pwd` \
>   --enable-languages=c,c++ \
>   --disable-multilib \
>   --disable-bootstrap
>
> running `make && make check-libbacktrace` gives the failure described
> above. Here is the contents of libbacktrace/test-suite.log:
>
> =
> package-unused version-unused: ./test-suite.log
> =
>
> # TOTAL: 33
> # PASS:  32
> # SKIP:  0
> # XFAIL: 0
> # FAIL:  1
> # XPASS: 0
> # ERROR: 0
>
> .. contents:: :depth: 2
>
> FAIL: mtest_minidebug
> =
>
> no debug info in ELF executable
> no debug info in ELF executable
> no debug info in ELF executable
> no debug info in ELF executable
> no debug info in ELF executable
> no debug info in ELF executable
> no debug info in ELF executable
> test1: not enough frames; got 0, expected at least 3
> test1: [0]: got  expected f3
> test1: [1]: got  expected f2
> FAIL mtest_minidebug (exit status: 139)
>
> ---
>
> Let me know if you need any more info.


Thanks.  I think the problem arises when libbacktrace is unable to
find debug info for any shared library involved in the link.

This patch should fix it.  Bootstrapped and ran libbacktrace tests on
x86_64-pc-linux-gnu and aarch64-linux-gnu.  Committed to mainline.

Ian

PR libbacktrace/97080
* fileline.c (backtrace_syminfo_to_full_callback): New function.
(backtrace_syminfo_to_full_error_callback): New function.
* elf.c (elf_nodebug): Call syminfo_fn if possible.
* internal.h (struct backtrace_call_full): Define.
(backtrace_syminfo_to_full_callback): Declare.
(backtrace_syminfo_to_full_error_callback): Declare.
* mtest.c (f3): Only check all[i] if data.index permits.
diff --git a/libbacktrace/elf.c b/libbacktrace/elf.c
index dd004708246..941f820d944 100644
--- a/libbacktrace/elf.c
+++ b/libbacktrace/elf.c
@@ -547,18 +547,6 @@ elf_crc32_file (struct backtrace_state *state, int 
descriptor,
   return ret;
 }
 
-/* A dummy callback function used when we can't find any debug info.  */
-
-static int
-elf_nodebug (struct backtrace_state *state ATTRIBUTE_UNUSED,
-uintptr_t pc ATTRIBUTE_UNUSED,
-backtrace_full_callback callback ATTRIBUTE_UNUSED,
-backtrace_error_callback error_callback, void *data)
-{
-  error_callback (data, "no debug info in ELF executable", -1);
-  return 0;
-}
-
 /* A dummy callback function used when we can't find a symbol
table.  */
 
@@ -571,6 +559,33 @@ elf_nosyms (struct backtrace_state *state ATTRIBUTE_UNUSED,
   error_callback (data, "no symbol table in ELF executable", -1);
 }
 
+/* A callback function used when we can't find any debug info.  */
+
+static int
+elf_nodebug (struct backtrace_state *state, uintptr_t pc,
+backtrace_full_callback callback,
+backtrace_error_callback error_callback, void *data)
+{
+  if (state->syminfo_fn != NULL && state->syminfo_fn != elf_nosyms)
+{
+  struct backtrace_call_full bdata;
+
+  /* Fetch symbol information so that we can least get the
+function name.  */
+
+  bdata.full_callback = callback;
+  bdata.full_error_callback = error_callback;
+  bdata.full_data = data;
+  bdata.ret = 0;
+  state->syminfo_fn (state, pc, backtrace_syminfo_to_full_callback,
+backtrace_syminfo_to_full_error_callback, );
+  return bdata.ret;
+}
+
+  error_callback (data, "no debug info in ELF executable", -1);
+  return 0;
+}
+
 /* Compare struct elf_symbol for qsort.  */
 
 static int
diff --git a/libbacktrace/fileline.c b/libbacktrace/fileline.c
index be62b9899c5..cd1e10dd58c 100644
--- a/libbacktrace/fileline.c
+++ b/libbacktrace/fileline.c
@@ -317,3 +317,30 @@ backtrace_syminfo (struct backtrace_state *state, 
uintptr_t pc,
   state->syminfo_fn (state, pc, callback, error_callback, data);
   return 1;
 }
+
+/* A backtrace_syminfo_callback that can call into a
+   backtrace_full_callback, used when we have a symbol table but no
+   debug info.  */
+
+void
+backtrace_syminfo_to_full_callback (void *data, uintptr_t pc,
+   const char *symname,
+  

Re: [RS6000] rs6000_rtx_costs cost IOR

2020-09-16 Thread Segher Boessenkool
Hi!

On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote:
>   * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.

>  case IOR:
> -  /* FIXME */
>*total = COSTS_N_INSNS (1);
> -  return true;

Hey this was okay for over five years :-)

> +  left = XEXP (x, 0);
> +  if (GET_CODE (left) == AND
> +   && CONST_INT_P (XEXP (left, 1)))

Add a comment that this is the integer insert insns?

> +   // rotlsi3_insert_5

But use /* comments */.

> +   /* Test both regs even though the one in the mask is
> +  constrained to be equal to the output.  Increasing
> +  cost may well result in rejecting an invalid insn
> +  earlier.  */

Is that ever actually useful?

So this new block is pretty huge.  Can it easily be factored to a
separate function?  Just the insert insns part, not all IOR.

Okay for trunk with the comments changed to the correct syntax, and
factoring masked insert out to a separate function pre-approved if you
want to do that.  Thanks!


Segher


Re: [RS6000] rs6000_rtx_costs multi-insn constants

2020-09-16 Thread Segher Boessenkool
On Tue, Sep 15, 2020 at 10:49:43AM +0930, Alan Modra wrote:
> This small patch to rs6000_rtx_const considerably improves code

(_costs)

> generated for large constants in 64-bit code, teaching gcc that it is
> better to load a constant from memory than to generate a sequence of
> up to five dependent instructions.  Note that the rs6000 backend does
> generate large constants as loads from memory at expand time but
> optimisation passes replace them with SETs of the value due to not
> having correct costs.
> 
>   PR 94393
>   * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost multi-insn
>   constants.

Okay for trunk.  Note that some p10 insns take a floating point
immediate, but those need to be handled specially anyway.

Thanks!


Segher


Re: [RS6000] rs6000_rtx_costs comment

2020-09-16 Thread Segher Boessenkool
Hi!

This took a while to digest, sorry.

On Tue, Sep 15, 2020 at 10:49:42AM +0930, Alan Modra wrote:
> +   1) Calls from places like optabs.c:avoid_expensive_constant will
> +   come here with OUTER_CODE set to an operation such as AND with X
> +   being a CONST_INT or other CONSTANT_P type.  This will be compared
> +   against set_src_cost, where we'll come here with OUTER_CODE as SET
> +   and X the same constant.

This (and similar) reasons are why I still haven't made set_src_cost
based on insn_cost -- it is in some places compared to some rtx_cost.

> +   2) Calls from places like combine:distribute_and_simplify_rtx are
> +   asking whether a possibly quite complex SET_SRC can be implemented
> +   more cheaply than some other logically equivalent SET_SRC.

It is comparing the set_src_cost of two equivalent formulations, yeah.
This is one place where set_src_cost can be pretty easily replaced by
insn_cost (combine uses that in most other places, already, and that
was a quite useful change).

> +   3) Calls from places like default_noce_conversion_profitable_p will
> +   come here via seq_cost and pass the pattern of a SET insn in X.

The pattern of the single SET in any instruction that is single_set,
yeah.

> +   Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs
> +   will next see the SET_SRC.  The overall cost should be comparable
> +   to rs6000_insn_cost since the code is comparing one insn sequence
> +   (some of which may be costed by insn_cost) against another insn
> +   sequence.

Yes.  And our rtx_cost misses incredibly many cases, but most common
things are handled okay.

> +   4) Calls from places like cprop.c:try_replace_reg will come here
> +   with OUTER_CODE as INSN, and X either a valid pattern of a SET or
> +   one where some registers have been replaced with constants.  The
> +   replacements may make the SET invalid, for example if
> + (set (reg1) (and (reg2) (const_int 0xfff)))
> +   replaces reg2 as
> + (set (reg1) (and (symbol_ref) (const_int 0xfff)))
> +   then the replacement can't be implemented in one instruction and
> +   really the cost should be higher by one instruction.  However,
> +   the cost for invalid insns doesn't matter much except that a
> +   higher cost may lead to their rejection earlier.

Yup.  This uses set_rtx_cost, which also ideally will use insn_cost one
day.

> +   5) fwprop.c:should_replace_address puts yet another wrinkle on this
> +   function, where we prefer an address calculation that is more
> +   complex yet has the same address_cost.  In this case "more
> +   complex" is determined by having a higher set_src_cost.  So for
> +   example, if we want a plain (reg) address to be replaced with
> +   (plus (reg) (const)) when possible then PLUS needs to cost more
> +   than zero here.  */

Maybe it helps if you more prominenty mention set_rtx_cost and
set_src_cost?  Either way, okay for trunk.  Thanks!


Segher


Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-16 Thread Tobias Burnus

Hi Honza – some input would be really helpful!

Hi Jakub – updated version below.

On 9/16/20 12:36 PM, Jakub Jelinek wrote:

I think you want Honza on this primarily, I'm always lost
in the cgraph alias code.

(Likewise as this thread shows)

+  while (node->alias_target)
+node = symtab_node::get (node->alias_target);
+  node = node->ultimate_alias_target ();

I think the above is either you walk the aliases yourself, or use
ultimate_alias_target, but not both.


I think we need to distinguish between:
* aliases which end up with the same symbol name
  and are stored in the ref_list; example: cpp_implicit_alias.
* aliases like with the alias attribute, which is handled
  via alias_target and have different names.

Just experimentally:
* The 'while (node->alias_target)' properly resolves the
  attribute testcase (libgomp.c-c++-common/pr96390.c).
  Here, ultimate_alias_target () does not help as
  node->analyzed == 0.

* The 'node->ultimate_alias_target ()' works for the
  cpp_implicit_alias case (libgomp.c++/pr96390.C).
  Just looking at the alias target does not help as in this
  case, alias_target == NULL.


And the second thing is, I'm not sure how the aliases behave if the
ultimate alias target is properly marked as omp declare target, but some
of the aliases are not.  The offloaded code will still call the alias,
so do we somehow arrange for the aliases to be also emitted into the
offloading LTO IL?
[...] I wonder if the aliases that are needed shouldn't
be marked node->offloadable and have "omp declare target" attribute added
for them too.


Done now.

Okay – or do we find more issues?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

gcc/ChangeLog:

	PR middle-end/96390
	* omp-offload.c (omp_discover_declare_target_tgt_fn_r): Handle
	alias nodes.

libgomp/ChangeLog:

	PR middle-end/96390
	* testsuite/libgomp.c++/pr96390.C: New test.
	* testsuite/libgomp.c-c++-common/pr96390.c: New test.

 gcc/omp-offload.c| 50 
 libgomp/testsuite/libgomp.c++/pr96390.C  | 49 +++
 libgomp/testsuite/libgomp.c-c++-common/pr96390.c | 26 
 3 files changed, 117 insertions(+), 8 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 32c2485abd4..02dec44b8ce 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -196,21 +196,55 @@ omp_declare_target_var_p (tree decl)
 static tree
 omp_discover_declare_target_tgt_fn_r (tree *tp, int *walk_subtrees, void *data)
 {
-  if (TREE_CODE (*tp) == FUNCTION_DECL
-  && !omp_declare_target_fn_p (*tp)
-  && !lookup_attribute ("omp declare target host", DECL_ATTRIBUTES (*tp)))
+  if (TREE_CODE (*tp) == FUNCTION_DECL)
 {
-  tree id = get_identifier ("omp declare target");
-  if (!DECL_EXTERNAL (*tp) && DECL_SAVED_TREE (*tp))
-	((vec *) data)->safe_push (*tp);
-  DECL_ATTRIBUTES (*tp) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (*tp));
+  tree decl = *tp;
   symtab_node *node = symtab_node::get (*tp);
   if (node != NULL)
 	{
-	  node->offloadable = 1;
+	  /* First, find final FUNCTION_DECL; find final alias target and there
+	 ensure alias like cpp_implicit_alias are resolved by calling
+	 ultimate_alias_target; the latter does not resolve alias_target as
+	 node->analyzed = 0.  */
+	  symtab_node *orig_node = node;
+	  while (node->alias_target)
+	node = symtab_node::get (node->alias_target);
+	  node = node->ultimate_alias_target ();
+	  decl = node->decl;
+
+	  if (omp_declare_target_fn_p (decl)
+	  || lookup_attribute ("omp declare target host",
+   DECL_ATTRIBUTES (decl)))
+	return NULL_TREE;
+
 	  if (ENABLE_OFFLOADING)
 	g->have_offload = true;
+
+	  /* Now mark original node and all alias targets for offloading.  */
+	  node->offloadable = 1;
+	  if (orig_node != node)
+	{
+	  tree id = get_identifier ("omp declare target");
+	  while (orig_node->alias_target)
+		{
+		  orig_node = orig_node->ultimate_alias_target ();
+		  DECL_ATTRIBUTES (orig_node->decl)
+		= tree_cons (id, NULL_TREE,
+ DECL_ATTRIBUTES (orig_node->decl));
+		  orig_node = symtab_node::get (orig_node->alias_target);
+		}
+	}
 	}
+  else if (omp_declare_target_fn_p (decl)
+	   || lookup_attribute ("omp declare target host",
+DECL_ATTRIBUTES (decl)))
+	return NULL_TREE;
+
+  tree id = get_identifier ("omp declare target");
+  if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl))
+	((vec *) data)->safe_push (decl);
+  DECL_ATTRIBUTES (decl) = tree_cons (id, NULL_TREE,
+	  DECL_ATTRIBUTES (decl));
 }
   else if (TYPE_P (*tp))
 *walk_subtrees = 0;
diff --git a/libgomp/testsuite/libgomp.c++/pr96390.C b/libgomp/testsuite/libgomp.c++/pr96390.C
new file 

[committed] analyzer: fix state explosions due to SCC bug

2020-09-16 Thread David Malcolm via Gcc-patches
Debugging the state explosion of the very large switch statement in
gcc.dg/analyzer/pr96653.c showed that the worklist was failing to
order the exploded nodes correctly; the in-edges at the join point
after the switch were not getting processed together, but were instead
being rocessed in smaller batches, bloating the exploded graph until the
per-point limit was reached.

The root cause turned out to be a bug in creating the strongly-connected
components for the supergraph: the code was considering interprocedural
edges as well as intraprocedural edges, leading to unpredictable
misorderings of the SCC and worklist, leading to bloating of the
exploded graph.

This patch fixes the SCC creation so it only considers intraprocedural
edges within the supergraph.  It also tweaks worklist::key_t::cmp to
give higher precedence to call_string over differences within a
supernode, since enodes with different call_strings can't be merges.
In practise, none of my test cases were affected by this latter change,
though it seems to be the right thing to do.

With this patch, the very large switch statement in
gcc.dg/analyzer/pr96653.c is handled in a single call to
exploded_graph::maybe_process_run_of_before_supernode_enodes:
   merged 358 in-enodes into 2 out-enode(s) at SN: 402
and that testcase no longer hits the per-program-point limits.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Also lightly tested on powerpc64-linux-gnu, arm-unknown-eabi,
and aarch64-unknown-linux-gnu.
Pushed to master as r11-3247-gfd111c419d146ee47c7df9a36a535e8d843d4802.

gcc/analyzer/ChangeLog:
* engine.cc (strongly_connected_components::strong_connect): Only
consider intraprocedural edges when creating SCCs.
(worklist::key_t::cmp): Add comment.  Treat call_string
differences as more important than differences of program_point
within a supernode.

gcc/testsuite/ChangeLog:
PR analyzer/96653
* gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c: Update
expected number of exploded nodes.
* gcc.dg/analyzer/malloc-vs-local-1a.c: Update expected number
of exploded nodes.
* gcc.dg/analyzer/pr96653.c: Remove -Wno-analyzer-too-complex.
---
 gcc/analyzer/engine.cc| 22 ++-
 .../loop-0-up-to-n-by-1-with-iter-obj.c   |  3 ++-
 .../gcc.dg/analyzer/malloc-vs-local-1a.c  | 20 -
 gcc/testsuite/gcc.dg/analyzer/pr96653.c   |  3 +--
 4 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 637a990da4a..d03e23a9b6e 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1615,6 +1615,9 @@ strongly_connected_components::strong_connect (unsigned 
index)
   superedge *sedge;
   FOR_EACH_VEC_ELT (v_snode->m_succs, i, sedge)
 {
+  if (sedge->get_kind () != SUPEREDGE_CFG_EDGE
+ && sedge->get_kind () != SUPEREDGE_INTRAPROCEDURAL_CALL)
+   continue;
   supernode *w_snode = sedge->m_dest;
   per_node_data *w = _per_node[w_snode->m_index];
   if (w->m_index == -1)
@@ -1690,7 +1693,14 @@ worklist::add_node (exploded_node *enode)
 /* Comparator for implementing worklist::key_t comparison operators.
Return negative if KA is before KB
Return positive if KA is after KB
-   Return 0 if they are equal.  */
+   Return 0 if they are equal.
+
+   The ordering of the worklist is critical for performance and for
+   avoiding node explosions.  Ideally we want all enodes at a CFG join-point
+   with the same callstring to be sorted next to each other in the worklist
+   so that a run of consecutive enodes can be merged and processed "in bulk"
+   rather than individually or pairwise, minimizing the number of new enodes
+   created.  */
 
 int
 worklist::key_t::cmp (const worklist::key_t , const worklist::key_t )
@@ -1742,6 +1752,11 @@ worklist::key_t::cmp (const worklist::key_t , const 
worklist::key_t )
 
   gcc_assert (snode_a == snode_b);
 
+  /* The points might vary by callstring; try sorting by callstring.  */
+  int cs_cmp = call_string::cmp (call_string_a, call_string_b);
+  if (cs_cmp)
+return cs_cmp;
+
   /* Order within supernode via program point.  */
   int within_snode_cmp
 = function_point::cmp_within_supernode (point_a.get_function_point (),
@@ -1749,11 +1764,6 @@ worklist::key_t::cmp (const worklist::key_t , const 
worklist::key_t )
   if (within_snode_cmp)
 return within_snode_cmp;
 
-  /* The points might vary by callstring; try sorting by callstring.  */
-  int cs_cmp = call_string::cmp (call_string_a, call_string_b);
-  if (cs_cmp)
-return cs_cmp;
-
   /* Otherwise, we ought to have the same program_point.  */
   gcc_assert (point_a == point_b);
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c 
b/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c
index 0172c9b324c..2b0352711ae 100644
--- 

[committed] analyzer: show SCC ids in .dot dumps

2020-09-16 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as d2c4d5199cf277becc1f377536973815d1c9519c.

gcc/analyzer/ChangeLog:
* engine.cc (supernode_cluster::dump_dot): Show the SCC id
in the per-supernode clusters in FILENAME.eg.dot output.
(exploded_graph_annotator::add_node_annotations):
Show the SCC of the supernode in FILENAME.supernode.eg.dot output.
* exploded-graph.h (worklist::scc_id): New.
(exploded_graph::get_scc_id): New.
---
 gcc/analyzer/engine.cc| 6 --
 gcc/analyzer/exploded-graph.h | 9 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 53fafb58633..637a990da4a 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -3288,8 +3288,9 @@ public:
 (const void *)this);
 gv->indent ();
 gv->println ("style=\"dashed\";");
-gv->println ("label=\"SN: %i (bb: %i)\";",
-m_supernode->m_index, m_supernode->m_bb->index);
+gv->println ("label=\"SN: %i (bb: %i; scc: %i)\";",
+m_supernode->m_index, m_supernode->m_bb->index,
+args.m_eg.get_scc_id (*m_supernode));
 
 int i;
 exploded_node *enode;
@@ -4040,6 +4041,7 @@ public:
 
 gv->begin_td ();
 pp_string (pp, "BEFORE");
+pp_printf (pp, " (scc: %i)", m_eg.get_scc_id (n));
 gv->end_td ();
 
 unsigned i;
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 5d4c3190283..04e878fbdfc 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -652,6 +652,10 @@ public:
   exploded_node *take_next ();
   exploded_node *peek_next ();
   void add_node (exploded_node *enode);
+  int get_scc_id (const supernode ) const
+  {
+return m_scc.get_scc_id (snode.m_index);
+  }
 
 private:
   class key_t
@@ -783,6 +787,11 @@ public:
   const call_string_data_map_t *get_per_call_string_data () const
   { return _per_call_string_data; }
 
+  int get_scc_id (const supernode ) const
+  {
+return m_worklist.get_scc_id (node);
+  }
+
 private:
   void print_bar_charts (pretty_printer *pp) const;
 
-- 
2.26.2



[committed] analyzer: bulk merger/processing of runs of nodes at CFG join points

2020-09-16 Thread David Malcolm via Gcc-patches
Prior to this patch the analyzer worklist considered only one node or
two nodes at a time, processing and/or merging state individually or
pairwise.

This could lead to explosions of merger nodes at CFG join points,
especially after switch statements, which could have large numbers
of in-edges, and thus large numbers of merger exploded_nodes could
be created, exceeding the per-point limit and thus stopping analysis
with -Wanalyzer-too-complex.

This patch special-cases the handling for runs of consecutive
nodes in the worklist at a CFG join point, processing and merging
them all together.

The patch fixes a state explosion seen in bzip2.c seen when attempting
to reproduce PR analyzer/95188, in a switch statement in a loop for
argument parsing.  With this patch, the analyzer successfully
consolidates the state after the argument parsing to a single exploded
node.

In gcc.dg/analyzer/pr96653.c there is a switch statement with over 300
cases which leads to hitting the per-point limit.  With this patch
the consolidation code doesn't manage to merge all of them due to other
worklist-ordering bugs, and it still hits the per-point limits, but it
does manage some very long consolidations:
  merged 2 in-enodes into 2 out-enode(s) at SN: 403
  merged 2 in-enodes into 2 out-enode(s) at SN: 403
  merged 2 in-enodes into 1 out-enode(s) at SN: 11
  merged 29 in-enodes into 1 out-enode(s) at SN: 35
  merged 6 in-enodes into 1 out-enode(s) at SN: 41
  merged 31 in-enodes into 1 out-enode(s) at SN: 35
and with a followup patch to fix an SCC issue it manages:
  merged 358 in-enodes into 2 out-enode(s) at SN: 402

The patch appears to fix the failure on non-x86_64 of:
  gcc.dg/analyzer/pr93032-mztools.c (test for excess errors)
which is PR analyzer/96616.

Unfortunately, the patch introduces a memory leak false positive in
gcc.dg/analyzer/pr94851-1.c, but this appears to be a pre-existing bug
that was hidden by state-merging failures.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Also lightly tested on powerpc64-linux-gnu, arm-unknown-eabi,
and aarch64-unknown-linux-gnu to verify fix for PR analyzer/96616.
Pushed to master as r11-3245-gb28491dc2d79763ecbff4f0b9f1f3e48a443be1d.

gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::dump_dot): Show STATUS_BULK_MERGED.
(exploded_graph::process_worklist): Call
maybe_process_run_of_before_supernode_enodes.
(exploded_graph::maybe_process_run_of_before_supernode_enodes):
New.
(exploded_graph_annotator::print_enode): Show STATUS_BULK_MERGED.
* exploded-graph.h (enum exploded_node::status): Add
STATUS_BULK_MERGED.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/bzip2-arg-parse-1.c: New test.
* gcc.dg/analyzer/loop-n-down-to-1-by-1.c: Remove xfail.
* gcc.dg/analyzer/pr94851-1.c: Add xfail.
---
 gcc/analyzer/engine.cc| 207 ++
 gcc/analyzer/exploded-graph.h |   4 +
 .../gcc.dg/analyzer/bzip2-arg-parse-1.c   |  95 
 .../gcc.dg/analyzer/loop-n-down-to-1-by-1.c   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/pr94851-1.c |   3 +-
 5 files changed, 310 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/bzip2-arg-parse-1.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 5903b19b774..53fafb58633 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -848,6 +848,8 @@ exploded_node::dump_dot (graphviz_out *gv, const 
dump_args_t ) const
   pp_printf (pp, "EN: %i", m_index);
   if (m_status == STATUS_MERGER)
 pp_string (pp, " (merger)");
+  else if (m_status == STATUS_BULK_MERGED)
+pp_string (pp, " (bulk merged)");
   pp_newline (pp);
 
   if (args.show_enode_details_p (*this))
@@ -2211,6 +2213,12 @@ exploded_graph::process_worklist ()
   if (logger)
logger->log ("next to process: EN: %i", node->m_index);
 
+  /* If we have a run of nodes that are before-supernode, try merging and
+processing them together, rather than pairwise or individually.  */
+  if (flag_analyzer_state_merge && node != m_origin)
+   if (maybe_process_run_of_before_supernode_enodes (node))
+ goto handle_limit;
+
   /* Avoid exponential explosions of nodes by attempting to merge
 nodes that are at the same program point and which have
 sufficiently similar state.  */
@@ -2340,6 +2348,7 @@ exploded_graph::process_worklist ()
 
   process_node (node);
 
+handle_limit:
   /* Impose a hard limit on the number of exploded nodes, to ensure
 that the analysis terminates in the face of pathological state
 explosion (or bugs).
@@ -2367,6 +2376,201 @@ exploded_graph::process_worklist ()
 }
 }
 
+/* Attempt to process a consecutive run of sufficiently-similar nodes in
+   the worklist at a CFG join-point (having already popped ENODE from the
+   head of the worklist).
+
+   If ENODE's point is of the form 

[committed] analyzer: add program_point::get_next

2020-09-16 Thread David Malcolm via Gcc-patches
Avoid some future copy-and-paste by introducing a function.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as b9b5fc0c2175b34131d9fd0805b1b307f754f4f0.

gcc/analyzer/ChangeLog:
* engine.cc
(exploded_graph::process_node) :
Simplify by using program_point::get_next.
* program-point.cc (program_point::get_next): New.
* program-point.h (program_point::get_next): New decl.
---
 gcc/analyzer/engine.cc| 24 
 gcc/analyzer/program-point.cc | 29 +
 gcc/analyzer/program-point.h  |  2 ++
 3 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 8f5c5143ca5..5903b19b774 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -2458,26 +2458,10 @@ exploded_graph::process_node (exploded_node *node)
 );
  }
 
-   if (point.get_supernode ()->m_stmts.length () > 0)
- {
-   program_point next_point
- = program_point::before_stmt (point.get_supernode (), 0,
-   point.get_call_string ());
-   exploded_node *next
- = get_or_create_node (next_point, next_state, node);
-   if (next)
- add_edge (node, next, NULL);
- }
-   else
- {
-   program_point next_point
- = program_point::after_supernode (point.get_supernode (),
-   point.get_call_string ());
-   exploded_node *next = get_or_create_node (next_point, next_state,
- node);
-   if (next)
- add_edge (node, next, NULL);
- }
+   program_point next_point (point.get_next ());
+   exploded_node *next = get_or_create_node (next_point, next_state, node);
+   if (next)
+ add_edge (node, next, NULL);
   }
   break;
 case PK_BEFORE_STMT:
diff --git a/gcc/analyzer/program-point.cc b/gcc/analyzer/program-point.cc
index 2c0a4c42bf5..ef19e6e38e5 100644
--- a/gcc/analyzer/program-point.cc
+++ b/gcc/analyzer/program-point.cc
@@ -529,6 +529,35 @@ function_point::next_stmt ()
 }
 }
 
+/* For those program points for which there is a uniquely-defined
+   successor, return it.  */
+
+program_point
+program_point::get_next () const
+{
+  switch (m_function_point.get_kind ())
+{
+default:
+  gcc_unreachable ();
+case PK_ORIGIN:
+case PK_AFTER_SUPERNODE:
+  gcc_unreachable (); /* Not uniquely defined.  */
+case PK_BEFORE_SUPERNODE:
+  if (get_supernode ()->m_stmts.length () > 0)
+   return before_stmt (get_supernode (), 0, get_call_string ());
+  else
+   return after_supernode (get_supernode (), get_call_string ());
+case PK_BEFORE_STMT:
+  {
+   unsigned next_idx = get_stmt_idx ();
+   if (next_idx < get_supernode ()->m_stmts.length ())
+ return before_stmt (get_supernode (), next_idx, get_call_string ());
+   else
+ return after_supernode (get_supernode (), get_call_string ());
+  }
+}
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/analyzer/program-point.h b/gcc/analyzer/program-point.h
index cb11478468d..97fd0a5e9de 100644
--- a/gcc/analyzer/program-point.h
+++ b/gcc/analyzer/program-point.h
@@ -294,6 +294,8 @@ public:
   /* For before_stmt, go to next stmt.  */
   void next_stmt () { m_function_point.next_stmt (); }
 
+  program_point get_next () const;
+
  private:
   function_point m_function_point;
   call_string m_call_string;
-- 
2.26.2



[committed] analyzer: show program point in -Wanalyzer-too-complex

2020-09-16 Thread David Malcolm via Gcc-patches
I found this useful when debugging.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 6dd96e24ea3cb9919fedd4da35fbfd36ed98b0ea.

gcc/analyzer/ChangeLog:
* engine.cc (exploded_graph::get_or_create_node): Show the
program point when issuing -Wanalyzer-too-complex due to hitting
the per-program-point limit.
---
 gcc/analyzer/engine.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 49701b74fd4..8f5c5143ca5 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1982,6 +1982,7 @@ exploded_graph::get_or_create_node (const program_point 
,
   > param_analyzer_max_enodes_per_program_point)
 {
   pretty_printer pp;
+  point.print (, format (false));
   print_enode_indices (, per_point_data->m_enodes);
   if (logger)
logger->log ("not creating enode; too many at program point: %s",
-- 
2.26.2



[committed] analyzer: getchar has no side-effects

2020-09-16 Thread David Malcolm via Gcc-patches
Seen whilst debugging another issue, where the analyzer was assuming
conservatively that a call to getchar could clobber a global.

This is handled for most of the other stdio functions by the list
in sm-file.cc

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as e097c9ab83192fc2f738ec6426a275282e9a51ea.

gcc/analyzer/ChangeLog:
* region-model.cc (region_model::on_call_pre): Treat getchar as
having no side-effects.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/getchar-1.c: New test.
---
 gcc/analyzer/region-model.cc  |  5 +
 gcc/testsuite/gcc.dg/analyzer/getchar-1.c | 19 +++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/getchar-1.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index d53272e4332..1312391557d 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -732,6 +732,11 @@ region_model::on_call_pre (const gcall *call, 
region_model_context *ctxt)
return impl_call_calloc (cd);
   else if (is_named_call_p (callee_fndecl, "alloca", call, 1))
return impl_call_alloca (cd);
+  else if (is_named_call_p (callee_fndecl, "getchar", call, 0))
+   {
+ /* No side-effects (tracking stream state is out-of-scope
+for the analyzer).  */
+   }
   else if (is_named_call_p (callee_fndecl, "memset", call, 3))
{
  impl_call_memset (cd);
diff --git a/gcc/testsuite/gcc.dg/analyzer/getchar-1.c 
b/gcc/testsuite/gcc.dg/analyzer/getchar-1.c
new file mode 100644
index 000..25595e0786e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/getchar-1.c
@@ -0,0 +1,19 @@
+#include 
+#include "analyzer-decls.h"
+
+int test_1 (void)
+{
+  int c = getchar ();
+  return c;
+}
+
+int glob_2;
+int test_2 (void)
+{
+  int c;
+  glob_2 = 42;
+  __analyzer_eval (glob_2 == 42); /* { dg-warning "TRUE" } */
+  c = getchar ();
+  __analyzer_eval (glob_2 == 42); /* { dg-warning "TRUE" } */
+  return c;
+}
-- 
2.26.2



Re: [PATCH] c++: premature analysis of requires-expression [PR96410]

2020-09-16 Thread Patrick Palka via Gcc-patches
On Wed, 16 Sep 2020, Jason Merrill wrote:

> On 9/16/20 1:34 PM, Patrick Palka wrote:
> > On Thu, 13 Aug 2020, Jason Merrill wrote:
> > 
> > > On 8/13/20 11:21 AM, Patrick Palka wrote:
> > > > On Mon, 10 Aug 2020, Jason Merrill wrote:
> > > > 
> > > > > On 8/10/20 2:18 PM, Patrick Palka wrote:
> > > > > > On Mon, 10 Aug 2020, Patrick Palka wrote:
> > > > > > 
> > > > > > > In the below testcase, semantic analysis of the
> > > > > > > requires-expressions
> > > > > > > in
> > > > > > > the generic lambda must be delayed until instantiation of the
> > > > > > > lambda
> > > > > > > because the requirements depend on the lambda's template
> > > > > > > arguments.
> > > > > > > But
> > > > > > > tsubst_requires_expr does semantic analysis even during
> > > > > > > regeneration
> > > > > > > of
> > > > > > > the lambda, which leads to various bogus errors and ICEs since
> > > > > > > some
> > > > > > > subroutines aren't prepared to handle dependent/template trees.
> > > > > > > 
> > > > > > > This patch adjusts subroutines of tsubst_requires_expr to avoid
> > > > > > > doing
> > > > > > > some problematic semantic analyses when processing_template_decl.
> > > > > > > In particular, expr_noexcept_p generally can't be checked on a
> > > > > > > dependent
> > > > > > > expression.  Next, tsubst_nested_requirement should avoid checking
> > > > > > > satisfaction when processing_template_decl.  And similarly for
> > > > > > > convert_to_void (called from tsubst_valid_expression_requirement).
> > > > > 
> > > > > I wonder if, instead of trying to do a partial substitution into a
> > > > > requires-expression at all, we want to use the
> > > > > PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the
> > > > > arguments for later satisfaction?
> > 
> > Like this?
> > 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: requires-expressions and partial instantiation
> > [PR96410]
> > 
> > This patch makes tsubst_requires_expr avoid substituting into a
> > requires-expression when partially instantiating a generic lambda.  This
> > is necessary in general to ensure that we check its requirements in
> > lexical order (see the first testcase below).  Instead, template
> > arguments are saved in the requires-expression until all arguments can
> > be substituted into it at once, using a mechanism similar to
> > PACK_EXPANSION_EXTRA_ARGS.
> > 
> > This change incidentally also fixes the two mentioned PRs -- the problem
> > there is that tsubst_requires_expr was performing semantic checks on
> > templated trees, and some of the checks are not prepared to handle such
> > trees.  With this patch tsubst_requires_expr, no longer does any
> > semantic checking at all when processing_template_decl.
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/96409
> > PR c++/96410
> > * constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS
> > and REQUIRES_EXPR_REQS.  Use REQUIRES_EXPR_EXTRA_ARGS and
> > add_extra_args to defer substitution until we have all the
> > template arguments.
> > (finish_requires_expr): Adjust the call to build_min so that
> > REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE.
> > * cp-tree.def (REQUIRES_EXPR): Give it a third operand.
> > * cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS,
> > REQUIRES_EXPR_EXTRA_ARGS): Define.
> > (add_extra_args): Declare.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/96409
> > PR c++/96410
> > * g++.dg/cpp2a/concepts-lambda13.C: New test.
> > * g++.dg/cpp2a/concepts-lambda14.C: New test.
> > ---
> >   gcc/cp/constraint.cc  | 21 +++-
> >   gcc/cp/cp-tree.def|  8 +++---
> >   gcc/cp/cp-tree.h  | 16 
> >   .../g++.dg/cpp2a/concepts-lambda13.C  | 18 +
> >   .../g++.dg/cpp2a/concepts-lambda14.C  | 25 +++
> >   5 files changed, 79 insertions(+), 9 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda14.C
> > 
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index ad9d47070e3..9a06d763554 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -2175,7 +2175,19 @@ tsubst_requires_expr (tree t, tree args,
> > /* A requires-expression is an unevaluated context.  */
> > cp_unevaluated u;
> >   -  tree parms = TREE_OPERAND (t, 0);
> > +  args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args);
> > +  if (processing_template_decl)
> > +{
> > +  /* We're partially instantiating a generic lambda.  Substituting into
> > +this requires-expression now may cause its requirements to get
> > +checked out of order, so instead just remember the template
> > +arguments and wait until we can substitute them all at once.  */
> > +  t = copy_node (t);
> > +  REQUIRES_EXPR_EXTRA_ARGS (t) = args;
> 
> Don't we need 

Re: [patch] Fix dangling references in thunks at -O0

2020-09-16 Thread Joseph Myers
This introduces an ICE building the glibc testsuite for alpha (bisected), 
s390 and sparc (symptoms appear the same, not bisected to confirm the 
exact revision).  See bug 97078.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] c++: premature analysis of requires-expression [PR96410]

2020-09-16 Thread Jason Merrill via Gcc-patches

On 9/16/20 1:34 PM, Patrick Palka wrote:

On Thu, 13 Aug 2020, Jason Merrill wrote:


On 8/13/20 11:21 AM, Patrick Palka wrote:

On Mon, 10 Aug 2020, Jason Merrill wrote:


On 8/10/20 2:18 PM, Patrick Palka wrote:

On Mon, 10 Aug 2020, Patrick Palka wrote:


In the below testcase, semantic analysis of the requires-expressions
in
the generic lambda must be delayed until instantiation of the lambda
because the requirements depend on the lambda's template arguments.
But
tsubst_requires_expr does semantic analysis even during regeneration
of
the lambda, which leads to various bogus errors and ICEs since some
subroutines aren't prepared to handle dependent/template trees.

This patch adjusts subroutines of tsubst_requires_expr to avoid doing
some problematic semantic analyses when processing_template_decl.
In particular, expr_noexcept_p generally can't be checked on a
dependent
expression.  Next, tsubst_nested_requirement should avoid checking
satisfaction when processing_template_decl.  And similarly for
convert_to_void (called from tsubst_valid_expression_requirement).


I wonder if, instead of trying to do a partial substitution into a
requires-expression at all, we want to use the
PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the
arguments for later satisfaction?


Like this?

-- >8 --

Subject: [PATCH] c++: requires-expressions and partial instantiation [PR96410]

This patch makes tsubst_requires_expr avoid substituting into a
requires-expression when partially instantiating a generic lambda.  This
is necessary in general to ensure that we check its requirements in
lexical order (see the first testcase below).  Instead, template
arguments are saved in the requires-expression until all arguments can
be substituted into it at once, using a mechanism similar to
PACK_EXPANSION_EXTRA_ARGS.

This change incidentally also fixes the two mentioned PRs -- the problem
there is that tsubst_requires_expr was performing semantic checks on
templated trees, and some of the checks are not prepared to handle such
trees.  With this patch tsubst_requires_expr, no longer does any
semantic checking at all when processing_template_decl.

gcc/cp/ChangeLog:

PR c++/96409
PR c++/96410
* constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS
and REQUIRES_EXPR_REQS.  Use REQUIRES_EXPR_EXTRA_ARGS and
add_extra_args to defer substitution until we have all the
template arguments.
(finish_requires_expr): Adjust the call to build_min so that
REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE.
* cp-tree.def (REQUIRES_EXPR): Give it a third operand.
* cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS,
REQUIRES_EXPR_EXTRA_ARGS): Define.
(add_extra_args): Declare.

gcc/testsuite/ChangeLog:

PR c++/96409
PR c++/96410
* g++.dg/cpp2a/concepts-lambda13.C: New test.
* g++.dg/cpp2a/concepts-lambda14.C: New test.
---
  gcc/cp/constraint.cc  | 21 +++-
  gcc/cp/cp-tree.def|  8 +++---
  gcc/cp/cp-tree.h  | 16 
  .../g++.dg/cpp2a/concepts-lambda13.C  | 18 +
  .../g++.dg/cpp2a/concepts-lambda14.C  | 25 +++
  5 files changed, 79 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda14.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index ad9d47070e3..9a06d763554 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2175,7 +2175,19 @@ tsubst_requires_expr (tree t, tree args,
/* A requires-expression is an unevaluated context.  */
cp_unevaluated u;
  
-  tree parms = TREE_OPERAND (t, 0);

+  args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args);
+  if (processing_template_decl)
+{
+  /* We're partially instantiating a generic lambda.  Substituting into
+this requires-expression now may cause its requirements to get
+checked out of order, so instead just remember the template
+arguments and wait until we can substitute them all at once.  */
+  t = copy_node (t);
+  REQUIRES_EXPR_EXTRA_ARGS (t) = args;


Don't we need build_extra_args, in case the requires_expr is in a 
local_specializations context like a function body?



+  return t;
+}
+
+  tree parms = REQUIRES_EXPR_PARMS (t);
if (parms)
  {
parms = tsubst_constraint_variables (parms, args, info);
@@ -2183,14 +2195,11 @@ tsubst_requires_expr (tree t, tree args,
return boolean_false_node;
  }
  
-  tree reqs = TREE_OPERAND (t, 1);

+  tree reqs = REQUIRES_EXPR_REQS (t);
reqs = tsubst_requirement_body (reqs, args, info);
if (reqs == error_mark_node)
  return boolean_false_node;
  
-  if (processing_template_decl)

-return finish_requires_expr (cp_expr_location (t), parms, reqs);

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-16 Thread Qing Zhao via Gcc-patches
Segher and Richard, 

Now there are two major concerns from the discussion so far:

1. (From Richard):  Inserting zero insns should be done after 
pass_thread_prologue_and_epilogue since later passes (for example, 
pass_regrename) might introduce new used caller-saved registers. 
 So, we should do this in the beginning of pass_late_compilation (some 
targets wouldn’t cope with doing it later). 

2. (From Segher): The inserted zero insns should stay together with the return, 
no other insns should move in-between zero insns and return insns. Otherwise, a 
valid gadget could be formed. 

I think that both of the above 2 concerns are important and should be addressed 
for the correct implementation. 

In order to support 1,  we cannot implementing it in “targetm.gen_return()” and 
“targetm.gen_simple_return()”  since “targetm.gen_return()” and 
“targetm.gen_simple_return()” are called during 
pass_thread_prologue_and_epilogue, at that time, the use information still not 
correct. 

In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers 
is NOT enough to prevent all the zero insns from moving around.  More 
restrictions need to be added to these new zero insns.  (I think that marking 
these new zeroed registers as “unspec_volatile” at RTL level is necessary to 
prevent them from deleting from moving around). 


So, based on the above, I propose the following approach that will resolve the 
above 2 concerns:

1. Add 2 new target hooks:
   A. targetm.pro_epilogue_use (reg)
   This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
   prevent deleting register setting instructions in prologue and epilogue.

   B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
   This hook will gen a sequence of zeroing insns that zero the registers that 
specified in NEED_ZEROED_HARDREGS.

A default handler of “gen_zero_call_used_regs” could be defined in middle 
end, which use mov insns to zero registers, and then use 
“targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 


2. Add  a new pass, pass_zero_call_used_regs,  in the beginning of 
pass_late_compilation. 

This pass will search all “return”s, and compute the hard register set for 
zeroing, “need_zeroed_hardregs”, based on data flow information, user request, 
and function abi. 
Then call targetm.gen_zero_call_used_regs(need_zeroed_hardregs).

3. X86 backend will implement a special version for “gen_zero_call_used_regs”, 
and “pro_epilogue_use”.


Let me know if you have any more comment on this approach.

thanks.

Qing




> On Sep 16, 2020, at 5:35 AM, Segher Boessenkool  
> wrote:
> 
> On Tue, Sep 15, 2020 at 08:51:57PM -0500, Qing Zhao wrote:
>>> On Sep 15, 2020, at 6:09 PM, Segher Boessenkool 
>>>  wrote:
>>> If you want the zeroing insns to stay with the return, you have to
>>> express that in RTL.  
>> 
>> What do you mean by “express that in RTL”?
>> Could you please explain this in more details?
> 
> Exactly as I say: you need to tell in the RTL that the insns should stay
> together.
> 
> Easiest is to just make it one RTL insn.  There are other ways, but
> those do not help anything here afaics.
> 
>> Do you mean to implement this in “targetm.gen_return” and 
>> “targetm.gen_simple_return”?
> 
> That is the easiest way, yes.
> 
>>> Anything else is extremely fragile.
> 
> 
> Segher



Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

2020-09-16 Thread Segher Boessenkool
Hi!

On Thu, Sep 10, 2020 at 07:18:01PM +0100, Richard Sandiford wrote:



> It's all a bit unfortunate really.  Having different representations
> for shifts inside and outside MEMs is the Second Great RTL Mistake.

Yes...  All targets with something that computes shifted addresses (like
in a LEA style insn) needs to have that insn in two representations.

Can this still be fixed?

> (The first was modeless const_ints. :-))

I *like* those.  Almost always.  Stockholm syndrome?

These days a mode on them should not cost much (in storage size).  It
should simplify quite some code, too.

> If we do that, we should be able to remove the handling of
> extract-based addresses in aarch64_classify_index & co.

If we do what?  I don't follow, sorry.

(Patch to combine sounds fine fwiw; patches welcome, as always.)


Segher


Re: [PING][PATCH] improve validation of attribute arguments (PR c/78666)

2020-09-16 Thread Martin Sebor via Gcc-patches

On 9/15/20 5:15 PM, Joseph Myers wrote:

On Wed, 9 Sep 2020, Martin Sebor via Gcc-patches wrote:


Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552500.html

Aldy provided a bunch of comments on this patch but I'm still looking
for a formal approval.


This patch is OK.


Some testing revealed that the code has different semantics for
strings: it compares them including all nuls embedded in them,
while I think the right semantics for attributes is to only consider
strings up and including the first nul (i.e., apply the usual string
semantics).  A test case for this corner case is as follows:

    __attribute__ ((section ("foo\0bar"))) void f (void);
    __attribute__ ((section ("foo"))) void f (void) { }

Without operand_equal_p() GCC accepts this and puts f in section
foo.  With the operand_equal_p() change above, it complains:

ignoring attribute ‘section ("foo\x00bar")’ because it conflicts with
previous ‘section ("foor")’ [-Wattributes]

I would rather not change the behavior in this corner case just to
save a few lines of code.  If it's thought that string arguments
to attributes (some or all) should be interpreted differently than
in other contexts it seems that such a change should be made
separately and documented in the manual.


I think that for at least the section attribute, embedded NULs are
suspect.  In that case, so are wide strings, but for some attributes wide
strings should be accepted and handled sensibly (but aren't, see bug
91182).


I agree.  Clang has the same "bug" as GCC would if it used
operand_equal_p().

Besides embedded nuls, other characters are problematic and cause
(sometimes confusing) assembler errors.  Examples are the empty
string, embedded whitespace, or mismatched quotes.  It would be
nice to do some basic validation and print a nice warning for
the most common problems.

Martin


Re: Problem with static const objects and LTO

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/16/20 11:52 AM, Joseph Myers wrote:
> On Wed, 16 Sep 2020, Jeff Law via Gcc-patches wrote:
>
>> ISTM this is a lot like the problem we have where we inline functions
>> with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
>> any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
>> support seems limited to C++.
>>
>>
>> How is this supposed to work for C?
> C inline functions don't try to address this.  The standard has a rule "An 
> inline definition of a function with external linkage shall not contain a 
> definition of a modifiable object with static or thread storage duration, 
> and shall not contain a reference to an identifier with internal 
> linkage.", which avoids some cases of this, but you can still get multiple 
> copies of objects (with static storage duration, not modifiable, no 
> linkage, i.e. static const defined inside an inline function) as noted in 
> a footnote "Since an inline definition is distinct from the corresponding 
> external definition and from any other corresponding inline definitions in 
> other translation units, all corresponding objects with static storage 
> duration are also distinct in each of the definitions.".

I was kindof guessing C would end up with something like the above --
I'm just happy its as well documented as it is.  Thanks!



The more I ponder this problem the more worried I get.   Before LTO we
could consider the TU an indivisible unit and if the main program
referenced something in the TU, then a static link of that TU would
bring the entire TU into the main program and that would satisfy all
references to all symbols defined by the TU, including those in any DSOs
used by the main program.  So it's an indivisible unit at runtime too. 
I could change internal data structures, rebuild the static library &
main program (leaving the DSOs alone) and it'd just work.


In an LTO world the TU isn't indivisible anymore.  LTO will happily
discard things which don't appear to be used.   So parts of the TU may
be in the main program, other parts may be in DSOs used by the main
program.   This can mean that objects are unexpectedly being passed
across DSO boundaries and the details of those objects has, in effect,
become part of the ABI.  If I was to change an internal data structure,
build the static library and main program we could get bad behavior
because an instance of that data structure could be passed to a DSO
because the main executable is "incomplete" and we end up calling copies
of routines from the (not rebuilt) DSOs.


Jeff

 






pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] preprocessor: Fix ICE with too long line in fmtwarn [PR96935]

2020-09-16 Thread Marek Polacek via Gcc-patches
On Wed, Sep 16, 2020 at 03:15:19PM -0400, David Malcolm via Gcc-patches wrote:
> On Wed, 2020-09-16 at 11:16 -0400, Marek Polacek wrote:
> > Here we ICE in char_span::subspan because the offset it gets is -1.
> > It's -1 because get_substring_ranges_for_loc gets a location whose
> > column was 0.  That only happens in testcases like the attached where
> > we're dealing with extremely long lines (at least 4065 chars it
> > seems).
> > This does happen in practice, though, so it's not just a theoretical
> > problem (e.g. when building the SU2 suite).
> > 
> > Fixed by checking that the column get_substring_ranges_for_loc gets
> > is
> > sane, akin to other checks in that function.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?
> 
> [...]
> 
> Thanks for the patch.
> 
> > index d573b90341a..ea3d4f34220 100644
> > --- a/gcc/input.c
> > +++ b/gcc/input.c
> > @@ -1461,6 +1461,8 @@ get_substring_ranges_for_loc (cpp_reader
> > *pfile,
> >size_t literal_length = finish.column - start.column + 1;
> >  
> >/* Ensure that we don't crash if we got the wrong
> > location.  */
> > +  if (start.column < 1)
> > +   return "line is too long";
> 
> Nit: I believe this could also happen for very large source files when
> locations exceed LINE_MAP_MAX_LOCATION_WITH_COLS.
> 
> So the error message should read "zero start column", or even just
> "start.column < 1" I think.  (if we have a negative column number then
> something has gone badly wrong; not sure if that's expressible via
> #line directives)
> 
> OK with that change.
> Dave

Thanks for the quick review, pushed with the "zero start column" change.

Marek



[OG10] Backporting + merge of GCC 10 into branch

2020-09-16 Thread Tobias Burnus

OG10 = devel/omp/gcc-10

e0e1d281472 Fortran: OpenMP - fix simd with (last)private (PR97061)
13ae89bc9c9 Merge remote-tracking branch 'origin/releases/gcc-10' into 
devel/omp/gcc-10
5a7ae4f073f [OpenMP] Fix mapping of artificial variables (PR94874)
3a6cb32b2c7 OpenMP/Fortran: Permit impure ELEMENTAL in omp directives
2dcc1721877 libgomp/target.c: Silence -Wuninitialized warning
76e74f04063 Merge remote-tracking branch 'origin/releases/gcc-10' into 
devel/omp/gcc-10

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: Commit 052204fac580 was never sent to gcc-patches

2020-09-16 Thread Segher Boessenkool
Hi!

On Wed, Sep 16, 2020 at 08:37:34PM +0200, Andrea Corallo wrote:
> Segher Boessenkool  writes:
> 
> > ... and it causes testsuite regressions on Power.  We haven't determined
> > et if it actually is worse code, but there are testcases that trip on
> > it.  Either way, all patches should be send to gcc-patches@, whether
> > pre-approved or not.
> >
> > Please correct this.  Thanks!
> 
> the patch has been discussed here
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553839.html

Ah, with a different title.  And I also searched for your name, but
for some reason that did not work.  Huh.

> If you think it is better I've no problem with reverting it otherwise
> I'll be happy to help solving the caused issue.

No, please keep it; as I said, we don't yet know if it actually is a
problem for us, it just makes testcases fail.

Thanks,


Segher


Re: [PATCH] preprocessor: Fix ICE with too long line in fmtwarn [PR96935]

2020-09-16 Thread David Malcolm via Gcc-patches
On Wed, 2020-09-16 at 11:16 -0400, Marek Polacek wrote:
> Here we ICE in char_span::subspan because the offset it gets is -1.
> It's -1 because get_substring_ranges_for_loc gets a location whose
> column was 0.  That only happens in testcases like the attached where
> we're dealing with extremely long lines (at least 4065 chars it
> seems).
> This does happen in practice, though, so it's not just a theoretical
> problem (e.g. when building the SU2 suite).
> 
> Fixed by checking that the column get_substring_ranges_for_loc gets
> is
> sane, akin to other checks in that function.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

[...]

Thanks for the patch.

> index d573b90341a..ea3d4f34220 100644
> --- a/gcc/input.c
> +++ b/gcc/input.c
> @@ -1461,6 +1461,8 @@ get_substring_ranges_for_loc (cpp_reader
> *pfile,
>size_t literal_length = finish.column - start.column + 1;
>  
>/* Ensure that we don't crash if we got the wrong
> location.  */
> +  if (start.column < 1)
> + return "line is too long";

Nit: I believe this could also happen for very large source files when
locations exceed LINE_MAP_MAX_LOCATION_WITH_COLS.

So the error message should read "zero start column", or even just
"start.column < 1" I think.  (if we have a negative column number then
something has gone badly wrong; not sure if that's expressible via
#line directives)

OK with that change.
Dave

>if (line.length () < (start.column - 1 + literal_length))
>   return "line is not wide enough";
>  

[...]



Re: [PING 2][PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-09-16 Thread Martin Sebor via Gcc-patches

On 9/15/20 5:02 PM, Joseph Myers wrote:

On Wed, 9 Sep 2020, Martin Sebor via Gcc-patches wrote:


Joseph, do you have any concerns with or comments on the most
recent patch or is it okay as is?

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552266.html


I'm not yet convinced by the logic for extracting an array bound from a
parameter declared using a typedef for an array type.

Say you have

typedef int A[3];

void f (A *x[*]);

so an argument that is an array, using [*], of pointers to arrays, where
those latter arrays are specified using the typedef.  As I read the logic,
first the pointer declarator is handled (ignored), then the array
declarator results in [*] being stored in spec, then the "if (pd->kind ==
cdk_id)" handling comes into play - and because spec is "*" and vbchain is
NULL_TREE, the upper bound of A gets extracted, but the upper bound of A
should be irrelevant here because it's a type that's the target of a
pointer.  The information from parm->specs->type logically comes before,
not after, the information from the declarator.


I see, you're right.  These arrays of pointers and pointers to arrays
are going to be the death of me...  Thanks for the test case.  I've
tweaked the function some more to handle it and added it to the test
suite.



As far as I can see, if one declaration gets part of the parameter type
(involving VLAs) from a typedef and another declaration gets that part of
the type directly in the declaration, the two spec strings constructed
might differ in the number of VLA bounds mentioned in the spec strings.
Is the code using those strings robust to handling the case where some of
the VLA bounds are missing because they came from a typedef?


Ugh.  I'd completely forgotten those are possible.  I've added code
to the function to handle those as well, and a new test to verify.
Hopefully I didn't open a can of worms by doing that.

Attached is an updated revision of the patch.  Besides the tweaks
above it also contains a cosmetic change to the warning issued
for mismatches in unspecified VLA bounds: it points at the decl
with more of them to guide the user to specify them rather than
make them all unspecified.

Martin
[2/5] - C front end support to detect out-of-bounds accesses to array parameters.

gcc/c-family/ChangeLog:

	PR c/50584
	* c-common.h (warn_parm_array_mismatch): Declare new function.
	(has_attribute): Move declaration of an existing function.
	(build_attr_access_from_parms): Declare new function.
	* c-warn.c (parm_array_as_string): Define new function.
	(plus_one):  Define new function.
	(warn_parm_ptrarray_mismatch): Define new function.
	(warn_parm_array_mismatch):  Define new function.
	(vla_bound_parm_decl): New function.
	* c.opt (-Warray-parameter, -Wvla-parameter): New options.
	* c-pretty-print.c (pp_c_type_qualifier_list): Don't print array type
	qualifiers here...
	(c_pretty_printer::direct_abstract_declarator): ...but instead print
	them in brackets here.  Also print [static].  Strip extraneous
	expressions from VLA bounds.

gcc/c/ChangeLog:

	PR c/50584
	* c-decl.c (lookup_last_decl): Define new function.
	(c_decl_attributes): Call it.
	(start_decl): Add argument and use it.
	(finish_decl): Call build_attr_access_from_parms and decl_attributes.
	(get_parm_array_spec): Define new function.
	(push_parm_decl): Call get_parm_array_spec.
	(start_function): Call warn_parm_array_mismatch.  Build attribute
	access and add it to current function.
	* c-parser.c (c_parser_declaration_or_fndef): Diagnose mismatches
	in forms of array parameters.
	* c-tree.h (start_decl): Add argument.

gcc/ChangeLog:

	PR c/50584
	* calls.c (initialize_argument_information): Remove assertion.
	* doc/invoke.texi (-Warray-parameter, -Wvla-parameter): Document.
	* tree-pretty-print.c (dump_generic_node): Correct handling of
	qualifiers.

gcc/testsuite/ChangeLog:

	PR c/50584
	* c-c++-common/Warray-bounds-6.c: Correct C++ declaration, adjust
	text of expected diagnostics.
	* gcc.dg/Wbuiltin-declaration-mismatch-9.c: Prune expected warning.
	* gcc.dg/Warray-parameter-2.c: New test.
	* gcc.dg/Warray-parameter-3.c: New test.
	* gcc.dg/Warray-parameter-4.c: New test.
	* gcc.dg/Warray-parameter-5.c: New test.
	* gcc.dg/Warray-parameter.c: New test.
	* gcc.dg/Wvla-parameter-2.c: New test.
	* gcc.dg/Wvla-parameter-3.c: New test.
	* gcc.dg/Wvla-parameter-4.c: New test.
	* gcc.dg/Wvla-parameter.c: New test.

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 4fc64bc4aa6..a61b25f95f1 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1321,6 +1321,7 @@ extern void c_do_switch_warnings (splay_tree, location_t, tree, tree, bool);
 extern void warn_for_omitted_condop (location_t, tree);
 extern bool warn_for_restrict (unsigned, tree *, unsigned);
 extern void warn_for_address_or_pointer_of_packed_member (tree, tree);
+extern void warn_parm_array_mismatch (location_t, tree, tree);
 
 /* Places where an lvalue, or modifiable lvalue, may be required.

Re: [PATCH] rs6000: Add rs6000_cfun_pcrel_p

2020-09-16 Thread Bill Schmidt via Gcc-patches

On 9/16/20 12:09 PM, Segher Boessenkool wrote:

On Wed, Sep 16, 2020 at 08:36:35AM -0500, Bill Schmidt wrote:

This is a cleanup requested by Segher in a previous review.  Most
uses of rs6000_pcrel_p are called for the current function.  A
specialized version for cfun is more efficient for these uses.

Then rename that one to rs6000_function_pcrel_p or similar?  People will
keep trying to use the shorter name otherwise ;-)


(Actually all uses are called for the current function, so now
rs6000_pcrel_p is an orphan.  However, I think it's worth keeping
around in case we have a late discovery where we need it.)

(Heck, you could even use that short name for the cfun one, if you
want.  Sorry i didn't think of that before.)


Okay for trunk with or without such further changes.  Thanks!



Great, I will make both of those changes and commit.

Thanks,
Bill




Segher


Re: [PATCH] Allow copying of symbolic ranges to an irange.

2020-09-16 Thread Andrew MacLeod via Gcc-patches

On 9/16/20 12:25 PM, Aldy Hernandez wrote:



>>  // Swap min/max if they are out of order.  Return TRUE if further
> these seems OK, but can't there be anti-ranges with symbolics  too? ie
> ~[a_12, a_12]
> The code for that just does:
>
>   else if (src.kind () == VR_ANTI_RANGE)
>  set (src.min (), src.max (), VR_ANTI_RANGE);
>
> That should just go to varying I guess?

Ah, you're right.  I've tweaked the patch and have added a 
corresponding test.


>
> The conversion to legacy anti-range code also seems a little suspect in
> some cases...
>
> Finally, we theoretically shouldn't be accessing 'min()' and 'max()'
> fields in a multirange, which also looks like might happen in the final
> else clause.
>
> I wonder if it might be less complex to simply have 2 routines, like
> copy_to_legacy()  and copy_from_legacy()  instead of trying to handle
> then together?  I do find it seems to require more thinking than it
> should to follow the cases :-)

Sigh, yes.  That code is too clever for its own good.  I'll work on it 
as a follow-up.


OK pending tests?


OK.  but do follow it up.

Andrew



[pushed] libbacktrace, Mach-O : Support PowerPC archs.

2020-09-16 Thread Iain Sandoe

Hi,

This adds the PPC architecture variants for Mach-O libbacktrace.

With this (as for X86 and Arm) when dsymutil is run on the binary
we get a basic usable backtrace.

Testsuite results on powerpc-apple-darwin9 are the same as for X86:
 * btest fails (TBC why)
 * dwarf5 tests fail because dsymutil does not handle that so far.

tested on x86_64-darwin16, powerpc-darwin9,
approved by Ian on a github pull request.
pushed to master,
thanks
Iain

libbacktrace/ChangeLog:

* macho.c (MACH_O_CPU_TYPE_PPC): New.
(MACH_O_CPU_TYPE_PPC64): New.
Add compile-tests for powerpc to the Mach-O variants.
---
 libbacktrace/macho.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libbacktrace/macho.c b/libbacktrace/macho.c
index 241d54b5e5e..1f8bc50238f 100644
--- a/libbacktrace/macho.c
+++ b/libbacktrace/macho.c
@@ -128,9 +128,11 @@ struct macho_fat_arch_64

 #define MACH_O_CPU_TYPE_X86 7
 #define MACH_O_CPU_TYPE_ARM 12
+#define MACH_O_CPU_TYPE_PPC 18

 #define MACH_O_CPU_TYPE_X86_64 (MACH_O_CPU_TYPE_X86 | MACH_O_CPU_ARCH_ABI64)
 #define MACH_O_CPU_TYPE_ARM64  (MACH_O_CPU_TYPE_ARM | MACH_O_CPU_ARCH_ABI64)
+#define MACH_O_CPU_TYPE_PPC64  (MACH_O_CPU_TYPE_PPC | MACH_O_CPU_ARCH_ABI64)

 /* The header of a load command.  */

@@ -776,6 +778,10 @@ macho_add_fat (struct backtrace_state *state, const  
char *filename,

   cputype = MACH_O_CPU_TYPE_ARM64;
 #elif defined (__arm__)
   cputype = MACH_O_CPU_TYPE_ARM;
+#elif defined (__ppc__)
+  cputype = MACH_O_CPU_TYPE_PPC;
+#elif defined (__ppc64__)
+  cputype = MACH_O_CPU_TYPE_PPC64;
 #else
   error_callback (data, "unknown Mach-O architecture", 0);
   goto fail;
--
2.24.1




Re: c++: local-scope OMP UDR reductions have no template head

2020-09-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 16, 2020 at 02:38:22PM -0400, Nathan Sidwell wrote:
> On 9/16/20 2:31 PM, Jakub Jelinek wrote:
> > On Wed, Sep 16, 2020 at 02:11:32PM -0400, Nathan Sidwell wrote:
> > > Jakub,
> > > are you ok with the bool return from cp_check_omp_declare_reduction? That
> > > seemed like the least invasive change.
> > > 
> > > This corrects the earlier problems with removing the template header
> > > from local omp reductions.  And it uncovered a latent bug.  When we
> > > tsubst such a decl, we immediately tsubst its body.
> > > cp_check_omp_declare_reduction gets a success return value to gate
> > > that instantiation.
> > > 
> > > udr-2.C   got a further error, as the omp checking machinery doesn't
> > > appear to turn the reduction into an error mark   when failing.  I
> > > didn't dig into that further.  udr-3.C appears to have been invalid
> > > and accidentally worked.
> > 
> > The attached patch doesn't match the ChangeLog...
> 
> here's the right one,  problem with ordering by time or name :(

LGTM, if both make check-c++-all RUNTESTFLAGS='gomp.exp goacc.exp'
in gcc/ and make check RUNTESTFLAGS=c++.exp
in libgomp/ build dirs shows any regressions, no objections.

Jakub



Re: c++: local-scope OMP UDR reductions have no template head

2020-09-16 Thread Nathan Sidwell

On 9/16/20 2:31 PM, Jakub Jelinek wrote:

On Wed, Sep 16, 2020 at 02:11:32PM -0400, Nathan Sidwell wrote:

Jakub,
are you ok with the bool return from cp_check_omp_declare_reduction? That
seemed like the least invasive change.

This corrects the earlier problems with removing the template header
from local omp reductions.  And it uncovered a latent bug.  When we
tsubst such a decl, we immediately tsubst its body.
cp_check_omp_declare_reduction gets a success return value to gate
that instantiation.

udr-2.C got a further error, as the omp checking machinery doesn't
appear to turn the reduction into an error mark when failing.  I
didn't dig into that further.  udr-3.C appears to have been invalid
and accidentally worked.


The attached patch doesn't match the ChangeLog...


here's the right one,  problem with ordering by time or name :(

nathan

--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 6e4de7d0c4b..5b727348e51 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -7228,7 +7228,7 @@ extern void simplify_aggr_init_expr		(tree *);
 extern void finalize_nrv			(tree *, tree, tree);
 extern tree omp_reduction_id			(enum tree_code, tree, tree);
 extern tree cp_remove_omp_priv_cleanup_stmt	(tree *, int *, void *);
-extern void cp_check_omp_declare_reduction	(tree);
+extern bool cp_check_omp_declare_reduction	(tree);
 extern void finish_omp_declare_simd_methods	(tree);
 extern tree finish_omp_clauses			(tree, enum c_omp_region_type);
 extern tree push_omp_privatization_clauses	(bool);
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index 289452ab740..cfe5ff4a94f 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -227,6 +227,7 @@ static tree canonicalize_expr_argument (tree, tsubst_flags_t);
 static tree make_argument_pack (tree);
 static void register_parameter_specializations (tree, tree);
 static tree enclosing_instantiation_of (tree tctx);
+static void instantiate_body (tree pattern, tree args, tree d, bool nested);
 
 /* Make the current scope suitable for access checking when we are
processing T.  T can be FUNCTION_DECL for instantiated function
@@ -6073,10 +6074,7 @@ push_template_decl_real (tree decl, bool is_friend)
 	retrofit_lang_decl (decl);
   if (DECL_LANG_SPECIFIC (decl)
 	  && !(VAR_OR_FUNCTION_DECL_P (decl)
-	   && DECL_LOCAL_DECL_P (decl)
-	   /* OMP reductions still need a template header.  */
-	   && !(TREE_CODE (decl) == FUNCTION_DECL
-		&& DECL_OMP_DECLARE_REDUCTION_P (decl
+	   && DECL_LOCAL_DECL_P (decl)))
 	DECL_TEMPLATE_INFO (decl) = info;
 }
 
@@ -13714,8 +13712,7 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t complain,
   gcc_assert (DECL_TEMPLATE_INFO (t) != NULL_TREE
 	  || DECL_LOCAL_DECL_P (t));
 
-  if (DECL_LOCAL_DECL_P (t)
-  && !DECL_OMP_DECLARE_REDUCTION_P (t))
+  if (DECL_LOCAL_DECL_P (t))
 {
   if (tree spec = retrieve_local_specialization (t))
 	return spec;
@@ -13970,8 +13967,7 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t complain,
 	  && !uses_template_parms (argvec))
 	tsubst_default_arguments (r, complain);
 }
-  else if (DECL_LOCAL_DECL_P (r)
-	   && !DECL_OMP_DECLARE_REDUCTION_P (r))
+  else if (DECL_LOCAL_DECL_P (r))
 {
   if (!cp_unevaluated_operand)
 	register_local_specialization (r, t);
@@ -18083,7 +18079,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 		DECL_CONTEXT (decl) = global_namespace;
 		pushdecl (decl);
 		DECL_CONTEXT (decl) = current_function_decl;
-		cp_check_omp_declare_reduction (decl);
+		if (cp_check_omp_declare_reduction (decl))
+		  instantiate_body (pattern_decl, args, decl, true);
 		  }
 		else
 		  {
@@ -25448,15 +25445,24 @@ register_parameter_specializations (tree pattern, tree inst)
 }
 
 /* Instantiate the body of D using PATTERN with ARGS.  We have
-   already determined PATTERN is the correct template to use.  */
+   already determined PATTERN is the correct template to use.
+   NESTED_P is true if this is a nested function, in which case
+   PATTERN will be a FUNCTION_DECL not a TEMPLATE_DECL.  */
 
 static void
-instantiate_body (tree pattern, tree args, tree d)
+instantiate_body (tree pattern, tree args, tree d, bool nested_p)
 {
-  gcc_checking_assert (TREE_CODE (pattern) == TEMPLATE_DECL);
-  
-  tree td = pattern;
-  tree code_pattern = DECL_TEMPLATE_RESULT (td);
+  tree td = NULL_TREE;
+  tree code_pattern = pattern;
+
+  if (!nested_p)
+{
+  td = pattern;
+  code_pattern = DECL_TEMPLATE_RESULT (td);
+}
+  else
+/* Only OMP reductions are nested.  */
+gcc_checking_assert (DECL_OMP_DECLARE_REDUCTION_P (code_pattern));
 
   vec omp_privatization_save;
   if (current_function_decl)
@@ -25489,9 +25495,10 @@ instantiate_body (tree pattern, tree args, tree d)
  instantiate_decl do not try to instantiate it again.  */
   DECL_TEMPLATE_INSTANTIATED (d) = 1;
 
-  /* Regenerate the declaration in case the template has been modified
- by a 

Re: Commit 052204fac580 was never sent to gcc-patches

2020-09-16 Thread Andrea Corallo
Segher Boessenkool  writes:

> ... and it causes testsuite regressions on Power.  We haven't determined
> et if it actually is worse code, but there are testcases that trip on
> it.  Either way, all patches should be send to gcc-patches@, whether
> pre-approved or not.
>
> Please correct this.  Thanks!

Hi Segher,

the patch has been discussed here
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553839.html

If you think it is better I've no problem with reverting it otherwise
I'll be happy to help solving the caused issue.

Regards

  Andrea


Re: c++: local-scope OMP UDR reductions have no template head

2020-09-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 16, 2020 at 02:11:32PM -0400, Nathan Sidwell wrote:
> Jakub,
> are you ok with the bool return from cp_check_omp_declare_reduction? That
> seemed like the least invasive change.
> 
> This corrects the earlier problems with removing the template header
> from local omp reductions.  And it uncovered a latent bug.  When we
> tsubst such a decl, we immediately tsubst its body.
> cp_check_omp_declare_reduction gets a success return value to gate
> that instantiation.
> 
> udr-2.C   got a further error, as the omp checking machinery doesn't
> appear to turn the reduction into an error mark   when failing.  I
> didn't dig into that further.  udr-3.C appears to have been invalid
> and accidentally worked.

The attached patch doesn't match the ChangeLog...

> gcc/cp/
> * cp-tree.h (cp_check_omp_declare_reduction): Return bool.
> * semantics.c (cp_check_omp_declare_reduction): Return true on for
> success.
> * pt.c (push_template_decl_real): OMP reductions do not get a
> template header.
> (tsubst_function_decl): Remove special casing for local decl omp
> reductions.
>   (tsubst_expr): Call instantiate_body for a local omp reduction.
>   (instantiate_body): Add nested_p parm, and deal with such
>   instantiations.
>   (instantiate_decl): Reject FUNCTION_SCOPE entities, adjust
>   instantiate_body call.
>   gcc/testsuite/
>   * g++.dg/gomp/udr-2.C: Add additional expected error.
> libgomp/
> * testsuite/libgomp.c++/udr-3.C: Add missing ctor.
> 
> -- 
> Nathan Sidwell

> diff --git i/gcc/c-family/c-opts.c w/gcc/c-family/c-opts.c
> index 23934416f64..24e21ce0d0c 100644
> --- i/gcc/c-family/c-opts.c
> +++ w/gcc/c-family/c-opts.c
> @@ -1129,7 +1129,10 @@ c_common_post_options (const char **pfilename)
>input_location = UNKNOWN_LOCATION;
>  
>*pfilename = this_input_filename
> -= cpp_read_main_file (parse_in, in_fnames[0], !cpp_opts->preprocessed);
> += cpp_read_main_file (parse_in, in_fnames[0],
> +   /* We'll inject preamble pieces if this is
> +  not preprocessed.  */
> +   !cpp_opts->preprocessed);
>/* Don't do any compilation or preprocessing if there is no input file.  */
>if (this_input_filename == NULL)
>  {
> @@ -1453,6 +1456,7 @@ c_finish_options (void)
>   = linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0,
>  _(""), 0));
>cb_file_change (parse_in, bltin_map);
> +  linemap_line_start (line_table, 0, 1);
>  
>/* Make sure all of the builtins about to be declared have
>BUILTINS_LOCATION has their location_t.  */
> @@ -1476,6 +1480,7 @@ c_finish_options (void)
>   = linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0,
>  _(""), 0));
>cb_file_change (parse_in, cmd_map);
> +  linemap_line_start (line_table, 0, 1);
>  
>/* All command line defines must have the same location.  */
>cpp_force_token_locations (parse_in, cmd_map->start_location);


Jakub



Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]

2020-09-16 Thread Alexander Monakov via Gcc-patches



On Wed, 16 Sep 2020, Tom de Vries wrote:

> [ cc-ing author omp support for nvptx. ]

The issue looks familiar. I recognized it back in 2017 (and LLVM people
recognized it too for their GPU targets). In an attempt to get agreement
to fix the issue "properly" for GCC I found a similar issue that affects
all targets, not just offloading, and filed it as PR 80053.

(yes, there are no addressable labels involved in offloading, but nevertheless
the nature of the middle-end issue is related)

Alexander


Commit 052204fac580 was never sent to gcc-patches

2020-09-16 Thread Segher Boessenkool
... and it causes testsuite regressions on Power.  We haven't determined
et if it actually is worse code, but there are testcases that trip on
it.  Either way, all patches should be send to gcc-patches@, whether
pre-approved or not.

Please correct this.  Thanks!


Segher


c++: local-scope OMP UDR reductions have no template head

2020-09-16 Thread Nathan Sidwell

Jakub,
are you ok with the bool return from cp_check_omp_declare_reduction? 
That seemed like the least invasive change.


This corrects the earlier problems with removing the template header
from local omp reductions.  And it uncovered a latent bug.  When we
tsubst such a decl, we immediately tsubst its body.
cp_check_omp_declare_reduction gets a success return value to gate
that instantiation.

udr-2.C got a further error, as the omp checking machinery doesn't
appear to turn the reduction into an error mark when failing.  I
didn't dig into that further.  udr-3.C appears to have been invalid
and accidentally worked.

gcc/cp/
* cp-tree.h (cp_check_omp_declare_reduction): Return bool.
* semantics.c (cp_check_omp_declare_reduction): Return true on for
success.
* pt.c (push_template_decl_real): OMP reductions do not get a
template header.
(tsubst_function_decl): Remove special casing for local decl omp
reductions.
(tsubst_expr): Call instantiate_body for a local omp reduction.
(instantiate_body): Add nested_p parm, and deal with such
instantiations.
(instantiate_decl): Reject FUNCTION_SCOPE entities, adjust
instantiate_body call.
gcc/testsuite/
* g++.dg/gomp/udr-2.C: Add additional expected error.
libgomp/
* testsuite/libgomp.c++/udr-3.C: Add missing ctor.

--
Nathan Sidwell
diff --git i/gcc/c-family/c-opts.c w/gcc/c-family/c-opts.c
index 23934416f64..24e21ce0d0c 100644
--- i/gcc/c-family/c-opts.c
+++ w/gcc/c-family/c-opts.c
@@ -1129,7 +1129,10 @@ c_common_post_options (const char **pfilename)
   input_location = UNKNOWN_LOCATION;
 
   *pfilename = this_input_filename
-= cpp_read_main_file (parse_in, in_fnames[0], !cpp_opts->preprocessed);
+= cpp_read_main_file (parse_in, in_fnames[0],
+			  /* We'll inject preamble pieces if this is
+			 not preprocessed.  */
+			  !cpp_opts->preprocessed);
   /* Don't do any compilation or preprocessing if there is no input file.  */
   if (this_input_filename == NULL)
 {
@@ -1453,6 +1456,7 @@ c_finish_options (void)
 	= linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0,
 	   _(""), 0));
   cb_file_change (parse_in, bltin_map);
+  linemap_line_start (line_table, 0, 1);
 
   /* Make sure all of the builtins about to be declared have
 	 BUILTINS_LOCATION has their location_t.  */
@@ -1476,6 +1480,7 @@ c_finish_options (void)
 	= linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0,
 	   _(""), 0));
   cb_file_change (parse_in, cmd_map);
+  linemap_line_start (line_table, 0, 1);
 
   /* All command line defines must have the same location.  */
   cpp_force_token_locations (parse_in, cmd_map->start_location);


Re: Problem with static const objects and LTO

2020-09-16 Thread Joseph Myers
On Wed, 16 Sep 2020, Jeff Law via Gcc-patches wrote:

> ISTM this is a lot like the problem we have where we inline functions
> with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
> any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
> support seems limited to C++.
> 
> 
> How is this supposed to work for C?

C inline functions don't try to address this.  The standard has a rule "An 
inline definition of a function with external linkage shall not contain a 
definition of a modifiable object with static or thread storage duration, 
and shall not contain a reference to an identifier with internal 
linkage.", which avoids some cases of this, but you can still get multiple 
copies of objects (with static storage duration, not modifiable, no 
linkage, i.e. static const defined inside an inline function) as noted in 
a footnote "Since an inline definition is distinct from the corresponding 
external definition and from any other corresponding inline definitions in 
other translation units, all corresponding objects with static storage 
duration are also distinct in each of the definitions.".

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Problem with static const objects and LTO

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/16/20 11:32 AM, H.J. Lu wrote:
> On Wed, Sep 16, 2020 at 10:24 AM Jeff Law  wrote:
>>
>> On 9/16/20 11:13 AM, H.J. Lu wrote:
>>> On Wed, Sep 16, 2020 at 10:10 AM Jeff Law  wrote:
 On 9/16/20 11:05 AM, H.J. Lu wrote:
> On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches
>  wrote:
>> Consider a TU with file scoped "static const object utf8_sb_map".   A
>> routine within the TU will stuff _sb_map into an object, something
>> like:
>>
>> fu (...)
>>
>> {
>>
>>   if (cond)
>>
>> dfa->sb_char = utf8_sb_map;
>>
>>   else
>>
>> dfa->sb_char = malloc (...);
>>
>> }
>>
>>
>> There is another routine in the TU which looks like
>>
>> bar (...)
>>
>> {
>>
>>   if (dfa->sb_char != utf8_sb_map)
>>
>> free (dfa->sb_char);
>>
>> }
>>
>>
>> Now imagine that the TU is compiled (with LTO) into a static library,
>> libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a
>> and references the first routine (fu).  We get a copy of fu in the DSO
>> along with a copy of utf8_sb_map.
>>
>>
>> Then imagine there's a main executable that dynamicly links against
>> libdso.so, then links statically against libgl.a.  Assume the  main
>> executable does not directly reference fu(), but does call a routine in
>> libdso.so which eventually calls fu().  Also assume the main executable
>> directly calls bar().  Again, remember we're compiling with LTO, so we
>> don't suck in the entire TU, just the routines/data we need.
>>
>>
>> In this scenario, both libdso.so and the main executable are going to a
>> copy of utf8_sb_map and they'll be at different addresses.  So when the
>> main executable calls into libdso.so which in turn calls libdso's copy
>> of fu() which stuffs the address of utf8_sb_map from the DSO into
>> dfa->sb_char.  Later the main executable calls bar() that's in the main
>> executable.  It does the comparison to see if dfa->sb_char is equal to
>> utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map
>> and naturally free() blows us because it was passed a static object, not
>> a malloc'd object.
>>
>>
>> ISTM this is a lot like the problem we have where we inline functions
>> with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
>> any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
>> support seems limited to C++.
>>
>>
>> How is this supposed to work for C?
>>
>>
>> Jeff
>>
>>
> Can you group utf8_sb_map, fu and bar together so that they are defined
> together?
 They're all defined within the same TU in gnulib.  It's the LTO
 dead/unreachable code elimination that results in just parts of the TU
 being copied into the DSO and a different set copied into the main
 executable.  In many ways LTO makes this look a lot like the static data
 member problems we've had to deal with in the C++ world.
>>> In this case, LTO should treat them as in a single group.   Removing
>>> one group member should remove the whole group.  Keep one member
>>> should keep the whole group.
>> Do you mean ensure they're all in a partition together?  I think that
>> might work in the immediate term, but is probably brittle in the long
>> term.  I'd tend to lean towards forcing these static data objects to be
>> STB_GNU_UNIQUE -- that seems more robust to me.
> Isn't STB_GNU_UNIQUE binding global?  How does it work with
>
> static const int foo;
>
> and
>
> static const double foo;
>
> in different files?

It's global in the sense that it uniqifies across TU boundaries, which
is its purpose.  Consider a function scoped static object in a template
or C++ inlined function.  We can end up with multiple copies in
different TUs and we have to collapse them to a single representative
object.  We'd need to do the same thing here.  Without thinking a whole
lot about it, but ISTM that if LTO pulls in a function that references a
static object, then that static object would need to be STB_GNU_UNIQUE.


For something like you've shown above, I'd *hope* we give an ODR error :-)


Jeff



pEpkey.asc
Description: application/pgp-keys


[PATCH] [OG10] Xfail libgomp.oacc-fortran/privatized-ref-2.f90 when offloading to nvptx

2020-09-16 Thread Kwok Cheung Yeung

Hello

I have committed this patch to xfail libgomp.oacc-fortran/privatized-ref-2.f90 
when the offload target is nvptx, as the generated code has some alloca calls 
which are currently not supported by nvptx (PR65181).


Kwok

commit 1245f6f615fa08d2ab4165598c9db72c4dad4467
Author: Kwok Cheung Yeung 
Date:   Wed Sep 16 10:19:35 2020 -0700

XFAIL libgomp.oacc-fortran/privatized-ref-2.f90 on nvptx

The testcase uses alloca, which is not currently supported on nvptx
(see PR65181).

2020-09-16  Kwok Cheung Yeung  

libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: XFAIL on nvptx.

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 6e313aa..890a4e2 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,9 @@
 2020-09-16  Kwok Cheung Yeung  
 
+   * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: XFAIL on nvptx.
+
+2020-09-16  Kwok Cheung Yeung  
+
* testsuite/libgomp.oacc-c++/privatized-ref-2.C (workers, vectors):
Reduce number of workers to 16.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C (workers, vectors):
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
index ca8fbe8..658ab9e 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-xfail-if "no alloca support" { offload_target_nvptx } }
 
 program main
   implicit none (type, external)


Re: [PATCH] c++: premature analysis of requires-expression [PR96410]

2020-09-16 Thread Patrick Palka via Gcc-patches
On Thu, 13 Aug 2020, Jason Merrill wrote:

> On 8/13/20 11:21 AM, Patrick Palka wrote:
> > On Mon, 10 Aug 2020, Jason Merrill wrote:
> > 
> > > On 8/10/20 2:18 PM, Patrick Palka wrote:
> > > > On Mon, 10 Aug 2020, Patrick Palka wrote:
> > > > 
> > > > > In the below testcase, semantic analysis of the requires-expressions
> > > > > in
> > > > > the generic lambda must be delayed until instantiation of the lambda
> > > > > because the requirements depend on the lambda's template arguments.
> > > > > But
> > > > > tsubst_requires_expr does semantic analysis even during regeneration
> > > > > of
> > > > > the lambda, which leads to various bogus errors and ICEs since some
> > > > > subroutines aren't prepared to handle dependent/template trees.
> > > > > 
> > > > > This patch adjusts subroutines of tsubst_requires_expr to avoid doing
> > > > > some problematic semantic analyses when processing_template_decl.
> > > > > In particular, expr_noexcept_p generally can't be checked on a
> > > > > dependent
> > > > > expression.  Next, tsubst_nested_requirement should avoid checking
> > > > > satisfaction when processing_template_decl.  And similarly for
> > > > > convert_to_void (called from tsubst_valid_expression_requirement).
> > > 
> > > I wonder if, instead of trying to do a partial substitution into a
> > > requires-expression at all, we want to use the
> > > PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the
> > > arguments for later satisfaction?

Like this?

-- >8 --

Subject: [PATCH] c++: requires-expressions and partial instantiation [PR96410]

This patch makes tsubst_requires_expr avoid substituting into a
requires-expression when partially instantiating a generic lambda.  This
is necessary in general to ensure that we check its requirements in
lexical order (see the first testcase below).  Instead, template
arguments are saved in the requires-expression until all arguments can
be substituted into it at once, using a mechanism similar to
PACK_EXPANSION_EXTRA_ARGS.

This change incidentally also fixes the two mentioned PRs -- the problem
there is that tsubst_requires_expr was performing semantic checks on
templated trees, and some of the checks are not prepared to handle such
trees.  With this patch tsubst_requires_expr, no longer does any
semantic checking at all when processing_template_decl.

gcc/cp/ChangeLog:

PR c++/96409
PR c++/96410
* constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS
and REQUIRES_EXPR_REQS.  Use REQUIRES_EXPR_EXTRA_ARGS and
add_extra_args to defer substitution until we have all the
template arguments.
(finish_requires_expr): Adjust the call to build_min so that
REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE.
* cp-tree.def (REQUIRES_EXPR): Give it a third operand.
* cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS,
REQUIRES_EXPR_EXTRA_ARGS): Define.
(add_extra_args): Declare.

gcc/testsuite/ChangeLog:

PR c++/96409
PR c++/96410
* g++.dg/cpp2a/concepts-lambda13.C: New test.
* g++.dg/cpp2a/concepts-lambda14.C: New test.
---
 gcc/cp/constraint.cc  | 21 +++-
 gcc/cp/cp-tree.def|  8 +++---
 gcc/cp/cp-tree.h  | 16 
 .../g++.dg/cpp2a/concepts-lambda13.C  | 18 +
 .../g++.dg/cpp2a/concepts-lambda14.C  | 25 +++
 5 files changed, 79 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda14.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index ad9d47070e3..9a06d763554 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2175,7 +2175,19 @@ tsubst_requires_expr (tree t, tree args,
   /* A requires-expression is an unevaluated context.  */
   cp_unevaluated u;
 
-  tree parms = TREE_OPERAND (t, 0);
+  args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args);
+  if (processing_template_decl)
+{
+  /* We're partially instantiating a generic lambda.  Substituting into
+this requires-expression now may cause its requirements to get
+checked out of order, so instead just remember the template
+arguments and wait until we can substitute them all at once.  */
+  t = copy_node (t);
+  REQUIRES_EXPR_EXTRA_ARGS (t) = args;
+  return t;
+}
+
+  tree parms = REQUIRES_EXPR_PARMS (t);
   if (parms)
 {
   parms = tsubst_constraint_variables (parms, args, info);
@@ -2183,14 +2195,11 @@ tsubst_requires_expr (tree t, tree args,
return boolean_false_node;
 }
 
-  tree reqs = TREE_OPERAND (t, 1);
+  tree reqs = REQUIRES_EXPR_REQS (t);
   reqs = tsubst_requirement_body (reqs, args, info);
   if (reqs == error_mark_node)
 return boolean_false_node;
 
-  if (processing_template_decl)
-return finish_requires_expr 

Re: Problem with static const objects and LTO

2020-09-16 Thread H.J. Lu via Gcc-patches
On Wed, Sep 16, 2020 at 10:24 AM Jeff Law  wrote:
>
>
> On 9/16/20 11:13 AM, H.J. Lu wrote:
> > On Wed, Sep 16, 2020 at 10:10 AM Jeff Law  wrote:
> >>
> >> On 9/16/20 11:05 AM, H.J. Lu wrote:
> >>> On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches
> >>>  wrote:
>  Consider a TU with file scoped "static const object utf8_sb_map".   A
>  routine within the TU will stuff _sb_map into an object, something
>  like:
> 
>  fu (...)
> 
>  {
> 
>    if (cond)
> 
>  dfa->sb_char = utf8_sb_map;
> 
>    else
> 
>  dfa->sb_char = malloc (...);
> 
>  }
> 
> 
>  There is another routine in the TU which looks like
> 
>  bar (...)
> 
>  {
> 
>    if (dfa->sb_char != utf8_sb_map)
> 
>  free (dfa->sb_char);
> 
>  }
> 
> 
>  Now imagine that the TU is compiled (with LTO) into a static library,
>  libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a
>  and references the first routine (fu).  We get a copy of fu in the DSO
>  along with a copy of utf8_sb_map.
> 
> 
>  Then imagine there's a main executable that dynamicly links against
>  libdso.so, then links statically against libgl.a.  Assume the  main
>  executable does not directly reference fu(), but does call a routine in
>  libdso.so which eventually calls fu().  Also assume the main executable
>  directly calls bar().  Again, remember we're compiling with LTO, so we
>  don't suck in the entire TU, just the routines/data we need.
> 
> 
>  In this scenario, both libdso.so and the main executable are going to a
>  copy of utf8_sb_map and they'll be at different addresses.  So when the
>  main executable calls into libdso.so which in turn calls libdso's copy
>  of fu() which stuffs the address of utf8_sb_map from the DSO into
>  dfa->sb_char.  Later the main executable calls bar() that's in the main
>  executable.  It does the comparison to see if dfa->sb_char is equal to
>  utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map
>  and naturally free() blows us because it was passed a static object, not
>  a malloc'd object.
> 
> 
>  ISTM this is a lot like the problem we have where we inline functions
>  with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
>  any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
>  support seems limited to C++.
> 
> 
>  How is this supposed to work for C?
> 
> 
>  Jeff
> 
> 
> >>> Can you group utf8_sb_map, fu and bar together so that they are defined
> >>> together?
> >> They're all defined within the same TU in gnulib.  It's the LTO
> >> dead/unreachable code elimination that results in just parts of the TU
> >> being copied into the DSO and a different set copied into the main
> >> executable.  In many ways LTO makes this look a lot like the static data
> >> member problems we've had to deal with in the C++ world.
> > In this case, LTO should treat them as in a single group.   Removing
> > one group member should remove the whole group.  Keep one member
> > should keep the whole group.
>
> Do you mean ensure they're all in a partition together?  I think that
> might work in the immediate term, but is probably brittle in the long
> term.  I'd tend to lean towards forcing these static data objects to be
> STB_GNU_UNIQUE -- that seems more robust to me.

Isn't STB_GNU_UNIQUE binding global?  How does it work with

static const int foo;

and

static const double foo;

in different files?

-- 
H.J.


Re: Problem with static const objects and LTO

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/16/20 11:13 AM, H.J. Lu wrote:
> On Wed, Sep 16, 2020 at 10:10 AM Jeff Law  wrote:
>>
>> On 9/16/20 11:05 AM, H.J. Lu wrote:
>>> On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches
>>>  wrote:
 Consider a TU with file scoped "static const object utf8_sb_map".   A
 routine within the TU will stuff _sb_map into an object, something
 like:

 fu (...)

 {

   if (cond)

 dfa->sb_char = utf8_sb_map;

   else

 dfa->sb_char = malloc (...);

 }


 There is another routine in the TU which looks like

 bar (...)

 {

   if (dfa->sb_char != utf8_sb_map)

 free (dfa->sb_char);

 }


 Now imagine that the TU is compiled (with LTO) into a static library,
 libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a
 and references the first routine (fu).  We get a copy of fu in the DSO
 along with a copy of utf8_sb_map.


 Then imagine there's a main executable that dynamicly links against
 libdso.so, then links statically against libgl.a.  Assume the  main
 executable does not directly reference fu(), but does call a routine in
 libdso.so which eventually calls fu().  Also assume the main executable
 directly calls bar().  Again, remember we're compiling with LTO, so we
 don't suck in the entire TU, just the routines/data we need.


 In this scenario, both libdso.so and the main executable are going to a
 copy of utf8_sb_map and they'll be at different addresses.  So when the
 main executable calls into libdso.so which in turn calls libdso's copy
 of fu() which stuffs the address of utf8_sb_map from the DSO into
 dfa->sb_char.  Later the main executable calls bar() that's in the main
 executable.  It does the comparison to see if dfa->sb_char is equal to
 utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map
 and naturally free() blows us because it was passed a static object, not
 a malloc'd object.


 ISTM this is a lot like the problem we have where we inline functions
 with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
 any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
 support seems limited to C++.


 How is this supposed to work for C?


 Jeff


>>> Can you group utf8_sb_map, fu and bar together so that they are defined
>>> together?
>> They're all defined within the same TU in gnulib.  It's the LTO
>> dead/unreachable code elimination that results in just parts of the TU
>> being copied into the DSO and a different set copied into the main
>> executable.  In many ways LTO makes this look a lot like the static data
>> member problems we've had to deal with in the C++ world.
> In this case, LTO should treat them as in a single group.   Removing
> one group member should remove the whole group.  Keep one member
> should keep the whole group.

Do you mean ensure they're all in a partition together?  I think that
might work in the immediate term, but is probably brittle in the long
term.  I'd tend to lean towards forcing these static data objects to be
STB_GNU_UNIQUE -- that seems more robust to me.


jeff


>


pEpkey.asc
Description: application/pgp-keys


Re: Problem with static const objects and LTO

2020-09-16 Thread H.J. Lu via Gcc-patches
On Wed, Sep 16, 2020 at 10:10 AM Jeff Law  wrote:
>
>
> On 9/16/20 11:05 AM, H.J. Lu wrote:
> > On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches
> >  wrote:
> >>
> >> Consider a TU with file scoped "static const object utf8_sb_map".   A
> >> routine within the TU will stuff _sb_map into an object, something
> >> like:
> >>
> >> fu (...)
> >>
> >> {
> >>
> >>   if (cond)
> >>
> >> dfa->sb_char = utf8_sb_map;
> >>
> >>   else
> >>
> >> dfa->sb_char = malloc (...);
> >>
> >> }
> >>
> >>
> >> There is another routine in the TU which looks like
> >>
> >> bar (...)
> >>
> >> {
> >>
> >>   if (dfa->sb_char != utf8_sb_map)
> >>
> >> free (dfa->sb_char);
> >>
> >> }
> >>
> >>
> >> Now imagine that the TU is compiled (with LTO) into a static library,
> >> libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a
> >> and references the first routine (fu).  We get a copy of fu in the DSO
> >> along with a copy of utf8_sb_map.
> >>
> >>
> >> Then imagine there's a main executable that dynamicly links against
> >> libdso.so, then links statically against libgl.a.  Assume the  main
> >> executable does not directly reference fu(), but does call a routine in
> >> libdso.so which eventually calls fu().  Also assume the main executable
> >> directly calls bar().  Again, remember we're compiling with LTO, so we
> >> don't suck in the entire TU, just the routines/data we need.
> >>
> >>
> >> In this scenario, both libdso.so and the main executable are going to a
> >> copy of utf8_sb_map and they'll be at different addresses.  So when the
> >> main executable calls into libdso.so which in turn calls libdso's copy
> >> of fu() which stuffs the address of utf8_sb_map from the DSO into
> >> dfa->sb_char.  Later the main executable calls bar() that's in the main
> >> executable.  It does the comparison to see if dfa->sb_char is equal to
> >> utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map
> >> and naturally free() blows us because it was passed a static object, not
> >> a malloc'd object.
> >>
> >>
> >> ISTM this is a lot like the problem we have where we inline functions
> >> with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
> >> any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
> >> support seems limited to C++.
> >>
> >>
> >> How is this supposed to work for C?
> >>
> >>
> >> Jeff
> >>
> >>
> > Can you group utf8_sb_map, fu and bar together so that they are defined
> > together?
>
> They're all defined within the same TU in gnulib.  It's the LTO
> dead/unreachable code elimination that results in just parts of the TU
> being copied into the DSO and a different set copied into the main
> executable.  In many ways LTO makes this look a lot like the static data
> member problems we've had to deal with in the C++ world.

In this case, LTO should treat them as in a single group.   Removing
one group member should remove the whole group.  Keep one member
should keep the whole group.

-- 
H.J.


Re: [PATCH] rs6000: Add rs6000_cfun_pcrel_p

2020-09-16 Thread Segher Boessenkool
On Wed, Sep 16, 2020 at 08:36:35AM -0500, Bill Schmidt wrote:
> This is a cleanup requested by Segher in a previous review.  Most
> uses of rs6000_pcrel_p are called for the current function.  A
> specialized version for cfun is more efficient for these uses.

Then rename that one to rs6000_function_pcrel_p or similar?  People will
keep trying to use the shorter name otherwise ;-)

> (Actually all uses are called for the current function, so now
> rs6000_pcrel_p is an orphan.  However, I think it's worth keeping
> around in case we have a late discovery where we need it.)

(Heck, you could even use that short name for the cfun one, if you
want.  Sorry i didn't think of that before.)


Okay for trunk with or without such further changes.  Thanks!


Segher


Re: Problem with static const objects and LTO

2020-09-16 Thread Jeff Law via Gcc-patches

On 9/16/20 11:05 AM, H.J. Lu wrote:
> On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches
>  wrote:
>>
>> Consider a TU with file scoped "static const object utf8_sb_map".   A
>> routine within the TU will stuff _sb_map into an object, something
>> like:
>>
>> fu (...)
>>
>> {
>>
>>   if (cond)
>>
>> dfa->sb_char = utf8_sb_map;
>>
>>   else
>>
>> dfa->sb_char = malloc (...);
>>
>> }
>>
>>
>> There is another routine in the TU which looks like
>>
>> bar (...)
>>
>> {
>>
>>   if (dfa->sb_char != utf8_sb_map)
>>
>> free (dfa->sb_char);
>>
>> }
>>
>>
>> Now imagine that the TU is compiled (with LTO) into a static library,
>> libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a
>> and references the first routine (fu).  We get a copy of fu in the DSO
>> along with a copy of utf8_sb_map.
>>
>>
>> Then imagine there's a main executable that dynamicly links against
>> libdso.so, then links statically against libgl.a.  Assume the  main
>> executable does not directly reference fu(), but does call a routine in
>> libdso.so which eventually calls fu().  Also assume the main executable
>> directly calls bar().  Again, remember we're compiling with LTO, so we
>> don't suck in the entire TU, just the routines/data we need.
>>
>>
>> In this scenario, both libdso.so and the main executable are going to a
>> copy of utf8_sb_map and they'll be at different addresses.  So when the
>> main executable calls into libdso.so which in turn calls libdso's copy
>> of fu() which stuffs the address of utf8_sb_map from the DSO into
>> dfa->sb_char.  Later the main executable calls bar() that's in the main
>> executable.  It does the comparison to see if dfa->sb_char is equal to
>> utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map
>> and naturally free() blows us because it was passed a static object, not
>> a malloc'd object.
>>
>>
>> ISTM this is a lot like the problem we have where we inline functions
>> with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
>> any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
>> support seems limited to C++.
>>
>>
>> How is this supposed to work for C?
>>
>>
>> Jeff
>>
>>
> Can you group utf8_sb_map, fu and bar together so that they are defined
> together?

They're all defined within the same TU in gnulib.  It's the LTO
dead/unreachable code elimination that results in just parts of the TU
being copied into the DSO and a different set copied into the main
executable.  In many ways LTO makes this look a lot like the static data
member problems we've had to deal with in the C++ world.


jeff



pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

2020-09-16 Thread Alex Coplan
Hi Richard,

On 10/09/2020 19:18, Richard Sandiford wrote:
> Alex Coplan  writes:
> > Hello,
> >
> > Since r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf introduced a
> > canonicalization from mult to shift on address reloads, a missing
> > pattern in the AArch64 backend was exposed.
> >
> > In particular, we were missing the ashift variant of
> > *add__multp2 (this mult variant is redundant and was
> > removed in r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8).
> >
> > This patch adds that missing pattern (fixing PR96998), updates
> > aarch64_is_extend_from_extract() to work for the shift pattern (instead
> > of the redundant mult variant) and updates callers in the cost
> > calculations to apply the costing for the shift variant instead.
> 
> Hmm.  I think we should instead fix this in combine.

Ok, thanks for the review. Please find a revised patch attached which
fixes this in combine instead.

> If we do that, we should be able to remove the handling of
> extract-based addresses in aarch64_classify_index & co.

Nice. I've tried to remove all the redundant extract-based address
handling in the AArch64 backend as a result.

The revised patch also fixes up a comment in combine which appears to
have suffered from bitrot.

Testing:
 * Bootstrap and regtest on aarch64-none-linux-gnu,
   arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu.
 * Regtested an x64 -> aarch64-none-elf cross with -mabi=ilp32.
 * Checked that we no longer ICE when building linux-next on AArch64.

OK for trunk?

Thanks,
Alex

---

gcc/ChangeLog:

* combine.c (expand_compound_operation): Tweak variable name in
comment to match source.
(make_extraction): Handle mult by power of two in addition to
ashift.
* config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Delete.
(aarch64_classify_index): Remove redundant extend-as-extract handling.
(aarch64_strip_extend): Likewise.
(aarch64_rtx_arith_op_extract_p): Likewise, remove now-unused
parameter. Update callers...
(aarch64_rtx_costs): ... here.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr96998.c: New test.
diff --git a/gcc/combine.c b/gcc/combine.c
index c88382e..cd8e544 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x)
 }
 
   /* If we reach here, we want to return a pair of shifts.  The inner
- shift is a left shift of BITSIZE - POS - LEN bits.  The outer
- shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
+ shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
+ shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
  logical depending on the value of UNSIGNEDP.
 
  If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
@@ -7650,20 +7650,29 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
is_mode = GET_MODE (SUBREG_REG (inner));
   inner = SUBREG_REG (inner);
 }
-  else if (GET_CODE (inner) == ASHIFT
+  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
   && CONST_INT_P (XEXP (inner, 1))
-  && pos_rtx == 0 && pos == 0
-  && len > UINTVAL (XEXP (inner, 1)))
-{
-  /* We're extracting the least significant bits of an rtx
-(ashift X (const_int C)), where LEN > C.  Extract the
-least significant (LEN - C) bits of X, giving an rtx
-whose mode is MODE, then shift it left C times.  */
-  new_rtx = make_extraction (mode, XEXP (inner, 0),
-0, 0, len - INTVAL (XEXP (inner, 1)),
-unsignedp, in_dest, in_compare);
-  if (new_rtx != 0)
-   return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
+  && pos_rtx == 0 && pos == 0)
+{
+  const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
+  const bool mult = GET_CODE (inner) == MULT;
+  const int shift_amt = mult ? exact_log2 (ci) : ci;
+
+  if (shift_amt > 0 && len > (unsigned)shift_amt)
+   {
+ /* We're extracting the least significant bits of an rtx
+(ashift X (const_int C)) or (mult X (const_int (2^C))),
+where LEN > C.  Extract the least significant (LEN - C) bits
+of X, giving an rtx whose mode is MODE, then shift it left
+C times.  */
+ new_rtx = make_extraction (mode, XEXP (inner, 0),
+0, 0, len - shift_amt,
+unsignedp, in_dest, in_compare);
+ if (new_rtx)
+   return mult
+   ? gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1))
+   : gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
+   }
 }
   else if (GET_CODE (inner) == TRUNCATE
   /* If trying or potentionally trying to extract
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b251f39..56738ae 100644
--- a/gcc/config/aarch64/aarch64.c
+++ 

Re: Problem with static const objects and LTO

2020-09-16 Thread H.J. Lu via Gcc-patches
On Wed, Sep 16, 2020 at 9:53 AM Jeff Law via Gcc-patches
 wrote:
>
>
> Consider a TU with file scoped "static const object utf8_sb_map".   A
> routine within the TU will stuff _sb_map into an object, something
> like:
>
> fu (...)
>
> {
>
>   if (cond)
>
> dfa->sb_char = utf8_sb_map;
>
>   else
>
> dfa->sb_char = malloc (...);
>
> }
>
>
> There is another routine in the TU which looks like
>
> bar (...)
>
> {
>
>   if (dfa->sb_char != utf8_sb_map)
>
> free (dfa->sb_char);
>
> }
>
>
> Now imagine that the TU is compiled (with LTO) into a static library,
> libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a
> and references the first routine (fu).  We get a copy of fu in the DSO
> along with a copy of utf8_sb_map.
>
>
> Then imagine there's a main executable that dynamicly links against
> libdso.so, then links statically against libgl.a.  Assume the  main
> executable does not directly reference fu(), but does call a routine in
> libdso.so which eventually calls fu().  Also assume the main executable
> directly calls bar().  Again, remember we're compiling with LTO, so we
> don't suck in the entire TU, just the routines/data we need.
>
>
> In this scenario, both libdso.so and the main executable are going to a
> copy of utf8_sb_map and they'll be at different addresses.  So when the
> main executable calls into libdso.so which in turn calls libdso's copy
> of fu() which stuffs the address of utf8_sb_map from the DSO into
> dfa->sb_char.  Later the main executable calls bar() that's in the main
> executable.  It does the comparison to see if dfa->sb_char is equal to
> utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map
> and naturally free() blows us because it was passed a static object, not
> a malloc'd object.
>
>
> ISTM this is a lot like the problem we have where we inline functions
> with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
> any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
> support seems limited to C++.
>
>
> How is this supposed to work for C?
>
>
> Jeff
>
>

Can you group utf8_sb_map, fu and bar together so that they are defined
together?

-- 
H.J.


Problem with static const objects and LTO

2020-09-16 Thread Jeff Law via Gcc-patches

Consider a TU with file scoped "static const object utf8_sb_map".   A
routine within the TU will stuff _sb_map into an object, something
like:

fu (...)

{

  if (cond)

    dfa->sb_char = utf8_sb_map;

  else

    dfa->sb_char = malloc (...);

}


There is another routine in the TU which looks like

bar (...)

{

  if (dfa->sb_char != utf8_sb_map)

    free (dfa->sb_char);

}


Now imagine that the TU is compiled (with LTO) into a static library,
libgl.a and there's a DSO (libdso.so) which gets linked against libgl.a
and references the first routine (fu).  We get a copy of fu in the DSO
along with a copy of utf8_sb_map.


Then imagine there's a main executable that dynamicly links against
libdso.so, then links statically against libgl.a.  Assume the  main
executable does not directly reference fu(), but does call a routine in
libdso.so which eventually calls fu().  Also assume the main executable
directly calls bar().  Again, remember we're compiling with LTO, so we
don't suck in the entire TU, just the routines/data we need.


In this scenario, both libdso.so and the main executable are going to a
copy of utf8_sb_map and they'll be at different addresses.  So when the
main executable calls into libdso.so which in turn calls libdso's copy
of fu() which stuffs the address of utf8_sb_map from the DSO into
dfa->sb_char.  Later the main executable calls bar() that's in the main
executable.  It does the comparison to see if dfa->sb_char is equal to
utf8_sb_map -- but it's using the main executable's copy of utf8_sb_map
and naturally free() blows us because it was passed a static object, not
a malloc'd object.


ISTM this is a lot like the problem we have where we inline functions
with static data.   To fix those we use STB_GNU_UNIQUE.  But I don't see
any code in the C front-end which would utilize STB_GNU_UNIQUE.  It's
support seems limited to C++.


How is this supposed to work for C?


Jeff




pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] Allow copying of symbolic ranges to an irange.

2020-09-16 Thread Aldy Hernandez via Gcc-patches




>>  // Swap min/max if they are out of order.  Return TRUE if further
> these seems OK, but can't there be anti-ranges with symbolics  too? ie
> ~[a_12, a_12]
> The code for that just does:
>
>   else if (src.kind () == VR_ANTI_RANGE)
>  set (src.min (), src.max (), VR_ANTI_RANGE);
>
> That should just go to varying I guess?

Ah, you're right.  I've tweaked the patch and have added a corresponding 
test.


>
> The conversion to legacy anti-range code also seems a little suspect in
> some cases...
>
> Finally, we theoretically shouldn't be accessing 'min()' and 'max()'
> fields in a multirange, which also looks like might happen in the final
> else clause.
>
> I wonder if it might be less complex to simply have 2 routines, like
> copy_to_legacy()  and copy_from_legacy()  instead of trying to handle
> then together?  I do find it seems to require more thinking than it
> should to follow the cases :-)

Sigh, yes.  That code is too clever for its own good.  I'll work on it 
as a follow-up.


OK pending tests?

Aldy

gcc/ChangeLog:

	* range-op.cc (multi_precision_range_tests): Normalize symbolics when 
copying to a multi-range.

* value-range.cc (irange::copy_legacy_range): Add test.
---
 gcc/range-op.cc| 15 +++
 gcc/value-range.cc | 19 +--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index fbf78be0a3c..3ab268f101e 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -3453,6 +3453,21 @@ multi_precision_range_tests ()
   small = big;
   ASSERT_TRUE (small == int_range<1> (INT (21), INT (21), VR_ANTI_RANGE));

+  // Copying a legacy symbolic to an int_range should normalize the
+  // symbolic at copy time.
+  {
+tree ssa = make_ssa_name (integer_type_node);
+value_range legacy_range (ssa, INT (25));
+int_range<2> copy = legacy_range;
+ASSERT_TRUE (copy == int_range<2>  (vrp_val_min (integer_type_node),
+   INT (25)));
+
+// Test that copying ~[abc_23, abc_23] to a multi-range yields varying.
+legacy_range = value_range (ssa, ssa, VR_ANTI_RANGE);
+copy = legacy_range;
+ASSERT_TRUE (copy.varying_p ());
+  }
+
   range3_tests ();
 }

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 20aa4f114c9..ed2c322ded9 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -92,7 +92,12 @@ irange::copy_legacy_range (const irange )
   else if (src.varying_p ())
 set_varying (src.type ());
   else if (src.kind () == VR_ANTI_RANGE)
-set (src.min (), src.max (), VR_ANTI_RANGE);
+{
+  if (src.legacy_mode_p () && !range_has_numeric_bounds_p ())
+   set_varying (src.type ());
+  else
+   set (src.min (), src.max (), VR_ANTI_RANGE);
+}
   else if (legacy_mode_p () && src.maybe_anti_range ())
 {
   int_range<3> tmp (src);
@@ -101,7 +106,17 @@ irange::copy_legacy_range (const irange )
   VR_ANTI_RANGE);
 }
   else
-set (src.min (), src.max (), VR_RANGE);
+{
+  // If copying legacy to int_range, normalize any symbolics.
+  if (src.legacy_mode_p () && !range_has_numeric_bounds_p ())
+   {
+ value_range cst (src);
+ cst.normalize_symbolics ();
+ set (cst.min (), cst.max ());
+ return;
+   }
+  set (src.min (), src.max ());
+}
 }

 // Swap min/max if they are out of order.  Return TRUE if further
--
2.26.2



[PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-09-16 Thread Dennis Zhang
Hi all,

This patch enables SIMD modes for MVE auto-vectorization.
In this patch, the integer and float MVE SIMD modes are returned by 
arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when 
MVE or MVE_FLOAT is enabled.
Then the expanders for auto-vectorization can be used for generating MVE 
SIMD code.

This patch also fixes bugs in MVE vreiterpretq_*.c tests which are 
revealed by the enabled MVE SIMD modes.
The tests are for checking the MVE reinterpret intrinsics.
There are two functions in each of the tests. The two functions contain 
the pattern of identical code so that they are folded in icf pass.
Because of icf, the instruction count only checks one function which is 8.
However when the SIMD modes are enabled, the estimation of the code size 
becomes smaller so that inlining is applied after icf, then the 
instruction count becomes 16 which causes failure of the tests.
Because the icf is not the expected pattern to be tested but causes 
above issues, -fno-ipa-icf is applied to the tests to avoid unstable 
instruction count.

This patch is separated from 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html 
because this part is not strongly connected to the aim of that one so 
that causing confusion.

Regtested and bootstraped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-09-15  Dennis Zhang  

* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.

gcc/testsuite/ChangeLog:

2020-09-15  Dennis Zhang  

* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
option -fno-ipa-icf and change the instruction count from 8 to 16.
* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd78141519e..c50d5aca6a9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28964,6 +28964,30 @@ arm_preferred_simd_mode (scalar_mode mode)
   default:;
   }
 
+  if (TARGET_HAVE_MVE)
+switch (mode)
+  {
+  case QImode:
+	return V16QImode;
+  case HImode:
+	return V8HImode;
+  case SImode:
+	return V4SImode;
+
+  default:;
+  }
+
+  if (TARGET_HAVE_MVE_FLOAT)
+switch (mode)
+  {
+  case HFmode:
+	return V8HFmode;
+  case SFmode:
+	return V4SFmode;
+
+  default:;
+  }
+
   return word_mode;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
index f59f69734ed..2398d894861 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f16 (r7, vreinterpretq_f16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
index dac47c7e924..5a58dc6eb4c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f32 (r7, vreinterpretq_f32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
index edc2f2f3bc6..9ab05e95420 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 

Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]

2020-09-16 Thread Tom de Vries
On 9/16/20 1:24 PM, Alexander Monakov wrote:
> Can you supply a tar with tree dumps for me to look at please?
> 

Hi Alexander,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654#c6 .

> Also, if you can check if the problem can be triggered without a
> collapsed loop (e.g. try removing collapse(2), remove mentions of
> d2) and if so supply dumps from that instead, I'd appreciate that too.
> 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654#c8

Thanks,
- Tom

> Alexander
> 
> On Wed, 16 Sep 2020, Tom de Vries wrote:
> 
>> [ cc-ing author omp support for nvptx. ]
>>
>> On 9/16/20 12:39 PM, Tobias Burnus wrote:
>>> Hi Tom, hi Richard, hello all,
>>>
>>> @Richard: does it look okay from the ME side?
>>> @Tom: Can you check which IFN_GOMP_SIMT should be
>>> excluded with -ftracer?
>>>
>>> Pre-remark, I do not know much about SIMT – except that they
>>> only appear with nvptx and somehow relate to lanes on the
>>> GPU.
>>>
>>> In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows,
>>> if a basic block with  GOMP_SIMT_VOTE_ANY in it is duplicated,
>>> which happens via -ftracer for this testcase, the result is wrong.
>>>
>>> The testcase ignores all the loop work but via "lastprivate" takes
>>> the upper loop bound (as assigned to the loop indices); instead of
>>> the expected 32*32 = 1024, either some number (like 4 or very large
>>> or negative) is returned.
>>>
>>> While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I
>>> have the feeling that at least GOMP_SIMT_LAST_LANE should be
>>> not copied - but I might be wrong.
>>>
>>> Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from
>>> that list – or any of the following added as well?
>>> GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT,
>>> GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY,
>>> GOMP_SIMT_XCHG_IDX
>>>
>>> OK for mainline?
>>>
>>> Tobias
>>>
>>> -
>>> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
>>> Germany
>>> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
>>> Alexander Walter


Re: libbacktrace: Add support for MiniDebugInfo

2020-09-16 Thread Alex Coplan
On 16/09/2020 07:26, Ian Lance Taylor wrote:
> On Wed, Sep 16, 2020, 4:03 AM Alex Coplan  wrote:
> 
> > Hi Ian,
> >
> > On 14/09/2020 14:12, Ian Lance Taylor via Gcc-patches wrote:
> > > This patch to libbacktrace adds support for MiniDebugInfo, as
> > > requested in PR 93608.
> >
> > This appears to introduce a failure in the libbacktrace testsuite
> > (observed on both x86 and aarch64):
> >
> > ../gcc/libbacktrace/../test-driver: line 107:  7905 Segmentation fault
> >   (core dumped) "$@" > $log_file 2>&1
> > FAIL: mtest_minidebug
> 
> 
> 
> I tested on x86 without seeing anything like this.  Can you give me any
> more details?  Thanks.

Sure. On an Ubuntu 18.04 / x86-64 system with current trunk, configuring
with:

~/toolchain/src/gcc/configure \
  --prefix=`pwd` \
  --enable-languages=c,c++ \
  --disable-multilib \
  --disable-bootstrap

running `make && make check-libbacktrace` gives the failure described
above. Here is the contents of libbacktrace/test-suite.log:

=
package-unused version-unused: ./test-suite.log
=

# TOTAL: 33
# PASS:  32
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: mtest_minidebug
=

no debug info in ELF executable
no debug info in ELF executable
no debug info in ELF executable
no debug info in ELF executable
no debug info in ELF executable
no debug info in ELF executable
no debug info in ELF executable
test1: not enough frames; got 0, expected at least 3
test1: [0]: got  expected f3
test1: [1]: got  expected f2
FAIL mtest_minidebug (exit status: 139)

---

Let me know if you need any more info.

Thanks,
Alex


Re: [aarch64] Backport missing NEON intrinsics to GCC8

2020-09-16 Thread Pop, Sebastian via Gcc-patches
Thanks Christophe for reporting the errors.

On 9/16/20, 7:45 AM, "Kyrylo Tkachov"  wrote:
> > The new tests vld1x3 and vld1x4 fail on arm, I am seeing new failures
> > on gcc-8 and gcc-9 branches
> > after r8-10451, r8-10452 and r9-8874.
> > Is that expected/fixed later in the backport series?
> > (on the release branches, my validations are running for every commit)
>
> Hmm, IIRC they're not implemented for arm IIRC so they should be xfailed or 
> skipped on arm.

When I look at the log for those files there are no additions.
Why does arm execute tests from gcc.target/aarch64/ on gcc9 and gcc8?
Why arm does not fail with those extra tests on gcc10 and on master?
I'm still trying to figure out how to properly fix this.

Thanks,
Sebastian



[PATCH] preprocessor: Fix ICE with too long line in fmtwarn [PR96935]

2020-09-16 Thread Marek Polacek via Gcc-patches
Here we ICE in char_span::subspan because the offset it gets is -1.
It's -1 because get_substring_ranges_for_loc gets a location whose
column was 0.  That only happens in testcases like the attached where
we're dealing with extremely long lines (at least 4065 chars it seems).
This does happen in practice, though, so it's not just a theoretical
problem (e.g. when building the SU2 suite).

Fixed by checking that the column get_substring_ranges_for_loc gets is
sane, akin to other checks in that function.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/ChangeLog:

PR preprocessor/96935
* input.c (get_substring_ranges_for_loc): Return if start.column
is less than 1.

gcc/testsuite/ChangeLog:

PR preprocessor/96935
* gcc.dg/format/pr96935.c: New test.
---
 gcc/input.c   | 2 ++
 gcc/testsuite/gcc.dg/format/pr96935.c | 9 +
 2 files changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/format/pr96935.c

diff --git a/gcc/input.c b/gcc/input.c
index d573b90341a..ea3d4f34220 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1461,6 +1461,8 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
   size_t literal_length = finish.column - start.column + 1;
 
   /* Ensure that we don't crash if we got the wrong location.  */
+  if (start.column < 1)
+   return "line is too long";
   if (line.length () < (start.column - 1 + literal_length))
return "line is not wide enough";
 
diff --git a/gcc/testsuite/gcc.dg/format/pr96935.c 
b/gcc/testsuite/gcc.dg/format/pr96935.c
new file mode 100644
index 000..c31109704be
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/format/pr96935.c
@@ -0,0 +1,9 @@
+/* PR preprocessor/96935 */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+void
+foo (char const *name)
+{
+  /* 

Re: [PATCH] IRA: Don't make a global register eliminable

2020-09-16 Thread H.J. Lu via Gcc-patches
On Wed, Sep 16, 2020 at 7:46 AM Richard Sandiford
 wrote:
>
> "H.J. Lu"  writes:
> > On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford
> >  wrote:
> >>
> >> Thanks for looking at this.
> >>
> >> "H.J. Lu"  writes:
> >> > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
> >> > Author: Richard Sandiford 
> >> > Date:   Wed Oct 2 07:37:10 2019 +
> >> >
> >> > [LRA] Don't make eliminable registers live (PR91957)
> >> >
> >> > didn't make eliminable registers live which breaks
> >> >
> >> > register void *cur_pro asm("reg");
> >> >
> >> > where "reg" is an eliminable register.  Make fixed eliminable registers
> >> > live to fix it.
> >>
> >> I don't think fixedness itself is the issue here: it's usual for at
> >> least some registers involved in eliminations to be fixed registers.
> >>
> >> I think what makes this case different is instead that cur_pro/ebp
> >> is a global register.  But IMO things have already gone wrong if we
> >> think that a global register is eliminable.
> >>
> >> So I wonder if instead we should check global_regs at the beginning of:
> >>
> >>   for (i = 0; i < fp_reg_count; i++)
> >> if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
> >> HARD_FRAME_POINTER_REGNUM + i))
> >>   {
> >> SET_HARD_REG_BIT (eliminable_regset,
> >>   HARD_FRAME_POINTER_REGNUM + i);
> >> if (frame_pointer_needed)
> >>   SET_HARD_REG_BIT (ira_no_alloc_regs,
> >> HARD_FRAME_POINTER_REGNUM + i);
> >>   }
> >> else if (frame_pointer_needed)
> >>   error ("%s cannot be used in % here",
> >>  reg_names[HARD_FRAME_POINTER_REGNUM + i]);
> >> else
> >>   df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
> >>
> >> (ira_setup_eliminable_regset), and handle the global_regs[] case in
> >> the same way as the else case, i.e. short-circuiting both of the ifs.
> >>
> >
> > Like this?
>
> Sorry for the delay.  I was testing this in parallel.
>
> Bootstrapped & regression-tested on x86_64-linux-gnu.
>

Thanks.

-- 
H.J.


c++: Avoid confusing 'nested' name

2020-09-16 Thread Nathan Sidwell


instantiate_body has a local var call 'nested', which indicates that
this instantiation was caused during the body of some function -- not
necessarily its containing scope.  That's confusing, let's just use
'current_function_decl' directly.  Then we can also simplify the
push_to_top_level logic, which /does/ indicate whether this is an
actual nested function.  (C++ does not have nested functions, but OMP
ODRs fall into that category.  A follow up patch will use that more
usual meaning of 'nested' wrt to functions.)

gcc/cp/
* pt.c (instantiate_body): Remove 'nested' var, simplify
push_to_top logic.

pushing to trunk

--
Nathan Sidwell
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index e8aa950b8fa..289452ab740 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -25458,17 +25458,15 @@ instantiate_body (tree pattern, tree args, tree d)
   tree td = pattern;
   tree code_pattern = DECL_TEMPLATE_RESULT (td);
 
-  tree fn_context = decl_function_context (d);
-  if (LAMBDA_FUNCTION_P (d))
-/* tsubst_lambda_expr resolved any references to enclosing functions.  */
-fn_context = NULL_TREE;
-  bool nested = current_function_decl != NULL_TREE;
-  bool push_to_top = !(nested && fn_context == current_function_decl);
-
   vec omp_privatization_save;
-  if (nested)
+  if (current_function_decl)
 save_omp_privatization_clauses (omp_privatization_save);
 
+  bool push_to_top
+= !(current_function_decl
+	&& !LAMBDA_FUNCTION_P (d)
+	&& decl_function_context (d) == current_function_decl);
+
   if (push_to_top)
 push_to_top_level ();
   else
@@ -25595,7 +25593,7 @@ instantiate_body (tree pattern, tree args, tree d)
   else
 pop_function_context ();
 
-  if (nested)
+  if (current_function_decl)
 restore_omp_privatization_clauses (omp_privatization_save);
 }
 


Re: [PATCH] IRA: Don't make a global register eliminable

2020-09-16 Thread Richard Sandiford
"H.J. Lu"  writes:
> On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford
>  wrote:
>>
>> Thanks for looking at this.
>>
>> "H.J. Lu"  writes:
>> > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
>> > Author: Richard Sandiford 
>> > Date:   Wed Oct 2 07:37:10 2019 +
>> >
>> > [LRA] Don't make eliminable registers live (PR91957)
>> >
>> > didn't make eliminable registers live which breaks
>> >
>> > register void *cur_pro asm("reg");
>> >
>> > where "reg" is an eliminable register.  Make fixed eliminable registers
>> > live to fix it.
>>
>> I don't think fixedness itself is the issue here: it's usual for at
>> least some registers involved in eliminations to be fixed registers.
>>
>> I think what makes this case different is instead that cur_pro/ebp
>> is a global register.  But IMO things have already gone wrong if we
>> think that a global register is eliminable.
>>
>> So I wonder if instead we should check global_regs at the beginning of:
>>
>>   for (i = 0; i < fp_reg_count; i++)
>> if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
>> HARD_FRAME_POINTER_REGNUM + i))
>>   {
>> SET_HARD_REG_BIT (eliminable_regset,
>>   HARD_FRAME_POINTER_REGNUM + i);
>> if (frame_pointer_needed)
>>   SET_HARD_REG_BIT (ira_no_alloc_regs,
>> HARD_FRAME_POINTER_REGNUM + i);
>>   }
>> else if (frame_pointer_needed)
>>   error ("%s cannot be used in % here",
>>  reg_names[HARD_FRAME_POINTER_REGNUM + i]);
>> else
>>   df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
>>
>> (ira_setup_eliminable_regset), and handle the global_regs[] case in
>> the same way as the else case, i.e. short-circuiting both of the ifs.
>>
>
> Like this?

Sorry for the delay.  I was testing this in parallel.

Bootstrapped & regression-tested on x86_64-linux-gnu.

Thanks,
Richard

>From af4499845d26fe65573b21197a79fd22fd38694e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 15 Sep 2020 06:23:26 -0700
Subject: [PATCH] ira: Fix elimination for global hard FPs [PR91957]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If the hard frame pointer is being used as a global register,
we should skip the usual handling for eliminations.  As the
comment says, the register cannot in that case be eliminated
(or eliminated to) and is already marked live where appropriate.

Doing this removes the duplicate error for gcc.target/i386/pr82673.c.
The “cannot be used in 'asm' here” message is meant to be for asm
statements rather than register asms, and the function that the
error is reported against doesn't use asm.

gcc/
2020-09-16  Richard Sandiford  

PR middle-end/91957
* ira.c (ira_setup_eliminable_regset): Skip the special elimination
handling of the hard frame pointer if the hard frame pointer is fixed.

gcc/testsuite/
2020-09-16  H.J. Lu  
Richard Sandiford  

PR middle-end/91957
* g++.target/i386/pr97054.C: New test.
* gcc.target/i386/pr82673.c: Remove redundant extra message.
---
 gcc/ira.c   |  8 ++-
 gcc/testsuite/g++.target/i386/pr97054.C | 96 +
 gcc/testsuite/gcc.target/i386/pr82673.c |  2 +-
 3 files changed, 103 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr97054.C

diff --git a/gcc/ira.c b/gcc/ira.c
index a759f3c2aa8..27d1b3c857d 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -2310,8 +2310,12 @@ ira_setup_eliminable_regset (void)
   if (!HARD_FRAME_POINTER_IS_FRAME_POINTER)
 {
   for (i = 0; i < fp_reg_count; i++)
-   if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
-   HARD_FRAME_POINTER_REGNUM + i))
+   if (global_regs[HARD_FRAME_POINTER_REGNUM + i])
+ /* Nothing to do: the register is already treated as live
+where appropriate, and cannot be eliminated.  */
+ ;
+   else if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
+HARD_FRAME_POINTER_REGNUM + i))
  {
SET_HARD_REG_BIT (eliminable_regset,
  HARD_FRAME_POINTER_REGNUM + i);
diff --git a/gcc/testsuite/g++.target/i386/pr97054.C 
b/gcc/testsuite/g++.target/i386/pr97054.C
new file mode 100644
index 000..d0693af2a42
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr97054.C
@@ -0,0 +1,96 @@
+// { dg-do run { target { ! ia32 } } }
+// { dg-require-effective-target fstack_protector }
+// { dg-options "-O2 -fno-strict-aliasing -msse4.2 -mfpmath=sse -fPIC 
-fstack-protector-strong -O2" }
+
+struct p2_icode *ipc;
+register int pars asm("r13");
+register struct processor *cur_pro asm("rbp");
+register int a asm("rbx");
+register int c asm("r14");
+typedef long lina_t;
+typedef long la_t;
+typedef processor processor_t;
+typedef p2_icode p2_icode_t;
+typedef enum 

Re: libbacktrace: Add support for MiniDebugInfo

2020-09-16 Thread Ian Lance Taylor via Gcc-patches
On Wed, Sep 16, 2020, 4:03 AM Alex Coplan  wrote:

> Hi Ian,
>
> On 14/09/2020 14:12, Ian Lance Taylor via Gcc-patches wrote:
> > This patch to libbacktrace adds support for MiniDebugInfo, as
> > requested in PR 93608.
>
> This appears to introduce a failure in the libbacktrace testsuite
> (observed on both x86 and aarch64):
>
> ../gcc/libbacktrace/../test-driver: line 107:  7905 Segmentation fault
>   (core dumped) "$@" > $log_file 2>&1
> FAIL: mtest_minidebug



I tested on x86 without seeing anything like this.  Can you give me any
more details?  Thanks.

Ian




> > MiniDebugInfo stores compressed symbol tables for an executable, where
> > the executable is otherwise stripped.  It is documented at
> > https://fedoraproject.org/wiki/Features/MiniDebugInfo and
> > https://sourceware.org/gdb/current/onlinedocs/gdb/MiniDebugInfo.html.
> >
> > Unfortunately, although debug info processors are already required to
> > handle data compressed using zlib aka gzip in order to handle zdebug
> > and SHT_COMPRESSED, the MiniDebugInfo implementation choose to instead
> > compress the debug information with xz which uses LZMA2.  This in
> > effect forces all debug info processors to add a second decompression
> > algorithm.  Note to future developers: do not do this.
> >
> > This libbacktrace implementation includes a new small LZMA
> > decompressor that just decompresses from one buffer to another.  It is
> > not heavily optimized, and on my laptop runs at about 78% of the speed
> > of using liblzma directly.  I think this is a decent tradeoff for
> > keeping the code reasonably small.
> >
> > Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> > to mainline.
> >
> > Ian
>
> Thanks,
> Alex
>


c++: Break out actual instantiation from instantiate_decl

2020-09-16 Thread Nathan Sidwell


This refactors instantiate_decl, breaking out the actual instantiation
work to instantiate_body.  That'll allow me to address the OMP UDR
issue, but it also means we have slightly neater code in
instantiate_decl anyway.

gcc/cp/
* pt.c (instantiate_body): New, broken out of ..
(instantiate_decl): ... here.  Call it.

pushing to trunk

--
Nathan Sidwell
diff --git c/gcc/cp/pt.c w/gcc/cp/pt.c
index 1aea105edd5..e8aa950b8fa 100644
--- c/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -25447,6 +25447,158 @@ register_parameter_specializations (tree pattern, tree inst)
   gcc_assert (!spec_parm);
 }
 
+/* Instantiate the body of D using PATTERN with ARGS.  We have
+   already determined PATTERN is the correct template to use.  */
+
+static void
+instantiate_body (tree pattern, tree args, tree d)
+{
+  gcc_checking_assert (TREE_CODE (pattern) == TEMPLATE_DECL);
+  
+  tree td = pattern;
+  tree code_pattern = DECL_TEMPLATE_RESULT (td);
+
+  tree fn_context = decl_function_context (d);
+  if (LAMBDA_FUNCTION_P (d))
+/* tsubst_lambda_expr resolved any references to enclosing functions.  */
+fn_context = NULL_TREE;
+  bool nested = current_function_decl != NULL_TREE;
+  bool push_to_top = !(nested && fn_context == current_function_decl);
+
+  vec omp_privatization_save;
+  if (nested)
+save_omp_privatization_clauses (omp_privatization_save);
+
+  if (push_to_top)
+push_to_top_level ();
+  else
+{
+  gcc_assert (!processing_template_decl);
+  push_function_context ();
+  cp_unevaluated_operand = 0;
+  c_inhibit_evaluation_warnings = 0;
+}
+
+  if (VAR_P (d))
+{
+  /* The variable might be a lambda's extra scope, and that
+	 lambda's visibility depends on D's.  */
+  maybe_commonize_var (d);
+  determine_visibility (d);
+}
+
+  /* Mark D as instantiated so that recursive calls to
+ instantiate_decl do not try to instantiate it again.  */
+  DECL_TEMPLATE_INSTANTIATED (d) = 1;
+
+  /* Regenerate the declaration in case the template has been modified
+ by a subsequent redeclaration.  */
+  regenerate_decl_from_template (d, td, args);
+
+  /* We already set the file and line above.  Reset them now in case
+ they changed as a result of calling regenerate_decl_from_template.  */
+  input_location = DECL_SOURCE_LOCATION (d);
+
+  if (VAR_P (d))
+{
+  tree init;
+  bool const_init = false;
+
+  /* Clear out DECL_RTL; whatever was there before may not be right
+	 since we've reset the type of the declaration.  */
+  SET_DECL_RTL (d, NULL);
+  DECL_IN_AGGR_P (d) = 0;
+
+  /* The initializer is placed in DECL_INITIAL by
+	 regenerate_decl_from_template so we don't need to
+	 push/pop_access_scope again here.  Pull it out so that
+	 cp_finish_decl can process it.  */
+  init = DECL_INITIAL (d);
+  DECL_INITIAL (d) = NULL_TREE;
+  DECL_INITIALIZED_P (d) = 0;
+
+  /* Clear DECL_EXTERNAL so that cp_finish_decl will process the
+	 initializer.  That function will defer actual emission until
+	 we have a chance to determine linkage.  */
+  DECL_EXTERNAL (d) = 0;
+
+  /* Enter the scope of D so that access-checking works correctly.  */
+  bool enter_context = DECL_CLASS_SCOPE_P (d);
+  if (enter_context)
+push_nested_class (DECL_CONTEXT (d));
+
+  const_init = DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (code_pattern);
+  cp_finish_decl (d, init, const_init, NULL_TREE, 0);
+
+  if (enter_context)
+pop_nested_class ();
+}
+  else if (TREE_CODE (d) == FUNCTION_DECL && DECL_DEFAULTED_FN (code_pattern))
+synthesize_method (d);
+  else if (TREE_CODE (d) == FUNCTION_DECL)
+{
+  /* Set up the list of local specializations.  */
+  local_specialization_stack lss (push_to_top ? lss_blank : lss_copy);
+  tree block = NULL_TREE;
+
+  /* Set up context.  */
+  if (DECL_OMP_DECLARE_REDUCTION_P (code_pattern)
+	  && TREE_CODE (DECL_CONTEXT (code_pattern)) == FUNCTION_DECL)
+	block = push_stmt_list ();
+  else
+	start_preparsed_function (d, NULL_TREE, SF_PRE_PARSED);
+
+  perform_instantiation_time_access_checks (code_pattern, args);
+
+  /* Create substitution entries for the parameters.  */
+  register_parameter_specializations (code_pattern, d);
+
+  /* Substitute into the body of the function.  */
+  if (DECL_OMP_DECLARE_REDUCTION_P (code_pattern))
+	tsubst_omp_udr (DECL_SAVED_TREE (code_pattern), args,
+			tf_warning_or_error, DECL_TI_TEMPLATE (d));
+  else
+	{
+	  tsubst_expr (DECL_SAVED_TREE (code_pattern), args,
+		   tf_warning_or_error, DECL_TI_TEMPLATE (d),
+		   /*integral_constant_expression_p=*/false);
+
+	  /* Set the current input_location to the end of the function
+	 so that finish_function knows where we are.  */
+	  input_location
+	= DECL_STRUCT_FUNCTION (code_pattern)->function_end_locus;
+
+	  /* Remember if we saw an infinite loop in the template.  */
+	  

Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts

2020-09-16 Thread Chung-Lin Tang

Ping this patch set.

Thanks,
Chung-Lin

On 2020/9/1 9:16 PM, Chung-Lin Tang wrote:

Hi Jakub,
this patch set implements parts of the target mapping changes introduced
in OpenMP 5.0, mainly the attachment requirements for pointer-based
list items, and the clause ordering.

The first patch here are the C/C++ front-end changes.

The entire set of changes has been tested for without regressions for
the compiler and libgomp. Hope this is ready to commit to master.

Thanks,
Chung-Lin

     gcc/c-family/
     * c-common.h (c_omp_adjust_clauses): New declaration.
     * c-omp.c (c_omp_adjust_clauses): New function.

     gcc/c/
     * c-parser.c (c_parser_omp_target_data): Add use of
     new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as
 handled map clause kind.
     (c_parser_omp_target_enter_data): Likewise.
     (c_parser_omp_target_exit_data): Likewise.
     (c_parser_omp_target): Likewise.
     * c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
     use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type.
     (c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and
     same struct field access to co-exist on OpenMP construct.

     gcc/cp/
     * parser.c (cp_parser_omp_target_data): Add use of
     new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as
     handled map clause kind.
     (cp_parser_omp_target_enter_data): Likewise.
 (cp_parser_omp_target_exit_data): Likewise.
 (cp_parser_omp_target): Likewise.
 * semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
 use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
 interaction between reference case and attach/detach.
 (finish_omp_clauses): Adjust bitmap checks to allow struct decl and
 same struct field access to co-exist on OpenMP construct.


Re: [Patch] Fortran: OpenMP - fix simd with (last)private (PR97061

2020-09-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 16, 2020 at 04:08:16PM +0200, Tobias Burnus wrote:
> Using linear (added by the compiler) + explicit lastprivate gives an error.
> 
> To quote Jakub from the PR:
> "If it is the FE that adds the linear clause, then it
> shouldn't when there is explicit lastprivate clause.
> This is a new thing in OpenMP 5.0, in 4.5 it was invalid to
> make the iterate anything but linear in this case, now
> it can be made private or lastprivate."
> 
> Actually, the check which permitted this was committed
> early during GCC 11 development, but the some updates of
> trans-openmp.c were missing.
> 
> OK?

Yes, thanks.

> Fortran: OpenMP - fix simd with (last)private (PR97061)
> 
> gcc/fortran/ChangeLog:
> 
>   PR fortran/97061
>   * trans-openmp.c (gfc_trans_omp_do): Handle simd with (last)private.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR fortran/97061
>   * gfortran.dg/gomp/openmp-simd-6.f90: New test.

Jakub



[Patch] Fortran: OpenMP - fix simd with (last)private (PR97061

2020-09-16 Thread Tobias Burnus

Using linear (added by the compiler) + explicit lastprivate gives an error.

To quote Jakub from the PR:
"If it is the FE that adds the linear clause, then it
shouldn't when there is explicit lastprivate clause.
This is a new thing in OpenMP 5.0, in 4.5 it was invalid to
make the iterate anything but linear in this case, now
it can be made private or lastprivate."

Actually, the check which permitted this was committed
early during GCC 11 development, but the some updates of
trans-openmp.c were missing.

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: OpenMP - fix simd with (last)private (PR97061)

gcc/fortran/ChangeLog:

	PR fortran/97061
	* trans-openmp.c (gfc_trans_omp_do): Handle simd with (last)private.

gcc/testsuite/ChangeLog:

	PR fortran/97061
	* gfortran.dg/gomp/openmp-simd-6.f90: New test.

 gcc/fortran/trans-openmp.c   | 37 --
 gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 | 62 
 2 files changed, 85 insertions(+), 14 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 9ec0df204ac..378088a9d04 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -4401,20 +4401,29 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   if (clauses)
 	{
 	  gfc_omp_namelist *n = NULL;
-	  if (op != EXEC_OMP_DISTRIBUTE)
-	for (n = clauses->lists[(op == EXEC_OMP_SIMD && collapse == 1)
-? OMP_LIST_LINEAR : OMP_LIST_LASTPRIVATE];
+	  if (op == EXEC_OMP_SIMD && collapse == 1)
+	for (n = clauses->lists[OMP_LIST_LINEAR];
 		 n != NULL; n = n->next)
 	  if (code->ext.iterator->var->symtree->n.sym == n->sym)
-		break;
-	  if (n != NULL)
-	dovar_found = 1;
-	  else if (n == NULL && op != EXEC_OMP_SIMD)
+		{
+		  dovar_found = 3;
+		  break;
+		}
+	  if (n == NULL && op != EXEC_OMP_DISTRIBUTE)
+	for (n = clauses->lists[OMP_LIST_LASTPRIVATE];
+		 n != NULL; n = n->next)
+	  if (code->ext.iterator->var->symtree->n.sym == n->sym)
+		{
+		  dovar_found = 2;
+		  break;
+		}
+	  if (n == NULL)
 	for (n = clauses->lists[OMP_LIST_PRIVATE]; n != NULL; n = n->next)
 	  if (code->ext.iterator->var->symtree->n.sym == n->sym)
-		break;
-	  if (n != NULL)
-	dovar_found++;
+		{
+		  dovar_found = 1;
+		  break;
+		}
 	}
 
   /* Evaluate all the expressions in the iterator.  */
@@ -4512,7 +4521,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   if (orig_decls)
 	TREE_VEC_ELT (orig_decls, i) = dovar_decl;
 
-  if (dovar_found == 2
+  if (dovar_found == 3
 	  && op == EXEC_OMP_SIMD
 	  && collapse == 1
 	  && !simple)
@@ -4536,7 +4545,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
 	  omp_clauses = gfc_trans_add_clause (tmp, omp_clauses);
 	}
 	  if (!simple)
-	dovar_found = 2;
+	dovar_found = 3;
 	}
   else if (!dovar_found && !simple)
 	{
@@ -4544,7 +4553,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
 	  OMP_CLAUSE_DECL (tmp) = dovar_decl;
 	  omp_clauses = gfc_trans_add_clause (tmp, omp_clauses);
 	}
-  if (dovar_found == 2)
+  if (dovar_found > 1)
 	{
 	  tree c = NULL;
 
@@ -4612,7 +4621,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
 	}
   if (!simple)
 	{
-	  if (op != EXEC_OMP_SIMD)
+	  if (op != EXEC_OMP_SIMD || dovar_found == 1)
 	tmp = build_omp_clause (input_location, OMP_CLAUSE_PRIVATE);
 	  else if (collapse == 1)
 	{
diff --git a/gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 b/gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90
new file mode 100644
index 000..361e0dad343
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90
@@ -0,0 +1,62 @@
+! { dg-additional-options "-fdump-tree-original" }
+!
+! PR fortran/97061
+
+integer function f3 (a1, b1, u)
+  implicit none
+  integer :: a1, b1, d1
+  integer u(0:1023)
+  !$omp teams distribute parallel do simd default(none) firstprivate (a1, b1) shared(u) lastprivate(d1) 
+  do d1 = a1, b1-1
+  u(d1) = 5
+  end do
+end
+
+subroutine foo(n, m, u)
+  implicit none
+  integer :: hh, ii, jj, n, m
+  integer u(0:1023)
+  !$omp simd private(ii)
+  do ii = n, m
+u(ii) = 5
+  end do
+  !$omp simd linear(jj:1)
+  do jj = 2, m+n
+u(jj) = 6
+  end do
+  !$omp simd
+  do hh = 2, m+n
+u(hh) = 6
+  end do
+end
+
+subroutine bar(n, m, u)
+  implicit none
+  integer :: kkk, lll, ooo, ppp, n, m
+  integer u(:,:)
+  !$omp simd lastprivate(kkk) lastprivate(lll) collapse(2)
+  do kkk = n, m
+do lll = n, m
+  u(kkk, lll) = 5
+end do
+  end do
+  !$omp simd private(kkk) private(lll) collapse(2)
+  do ooo = n, m
+do ppp = n, m
+  u(ooo, ppp) = 5
+end do
+  end do
+end
+
+
+! { dg-final { scan-tree-dump-times "#pragma omp teams 

[PATCH, OpenMP 5, C++] Implement implicit mapping of this[:1] (PR92120)

2020-09-16 Thread Chung-Lin Tang

Hi Jakub,
this patch implements automatically adding map(tofrom: this[:1]) to omp target
regions inside non-static member functions, as specified in OpenMP 5.0.

This patch factors away some parts of cp_parser_omp_target, into a new
finish_omp_target function, and implements the new clause adding there.

For target regions in normal non-static member functions, the case is more
simple. For the inside lambda function case, this is implemented by copying the
entire __closure as a "to" map first (and yeah, this patch allows target regions
inside lambda functions to largely work, but since it's just a copying of 
__closure,
the capture by reference case still shouldn't work yet). __closure->this is
then implemented by an always_pointer map clause.

I've added two testcases, as both compiler scan testcases and libgomp executable
test. Testing of g++ and libgomp both are regression free with nvptx offloading.
Is this okay for trunk?

Thanks,
Chung-Lin

2020-09-16  Chung-Lin Tang  

PR middle-end/92120

gcc/cp/
* cp-tree.h (finish_omp_target): New declaration.
(set_omp_target_this_expr): Likewise.
* lambda.c (lambda_expr_this_capture): Add call to
set_omp_target_this_expr.
* parser.c (cp_parser_omp_target): Factor out code, change to call
finish_omp_target, add re-initing call to set_omp_target_this_expr.
* semantics.c (omp_target_this_expr): New static variable.
(finish_non_static_data_member): Add call to set_omp_target_this_expr.
(finish_this_expr): Likewise.
(set_omp_target_this_expr): New function to set omp_target_this_expr.
(finish_omp_target): New function with code merged from
cp_parser_omp_target, plus code to add this and __closure map clauses
for OpenMP.

gcc/testsuite/
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.

libgomp/testsuite/
* libgomp.c++/target-this-1.C: New testcase.
* libgomp.c++/target-this-2.C: New testcase.
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6e4de7d0c4b..81e72449856 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7241,6 +7241,11 @@ extern tree finish_omp_structured_block  (tree);
 extern tree finish_oacc_data   (tree, tree);
 extern tree finish_oacc_host_data  (tree, tree);
 extern tree finish_omp_construct   (enum tree_code, tree, tree);
+
+extern tree finish_omp_target  (location_t, tree, tree, bool);
+extern void set_omp_target_this_expr   (tree);
+
+
 extern tree begin_omp_parallel (void);
 extern tree finish_omp_parallel(tree, tree);
 extern tree begin_omp_task (void);
diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index c94fe8edb8e..aea5f5adc52 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -842,6 +842,9 @@ lambda_expr_this_capture (tree lambda, int add_capture_p)
 type cast (_expr.cast_ 5.4) to the type of 'this'. [ The cast
 ensures that the transformed expression is an rvalue. ] */
   result = rvalue (result);
+
+  /* Acknowledge to OpenMP target that 'this' was referenced.  */
+  set_omp_target_this_expr (result);
 }
 
   return result;
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fba3fcc0c4c..46de8e6cb65 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -40742,8 +40742,6 @@ static bool
 cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok,
  enum pragma_context context, bool *if_p)
 {
-  tree *pc = NULL, stmt;
-
   if (flag_openmp)
 omp_requires_mask
   = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
@@ -40796,6 +40794,7 @@ cp_parser_omp_target (cp_parser *parser, cp_token 
*pragma_tok,
  keep_next_level (true);
  tree sb = begin_omp_structured_block (), ret;
  unsigned save = cp_parser_begin_omp_structured_block (parser);
+ set_omp_target_this_expr (NULL_TREE);
  switch (ccode)
{
case OMP_TEAMS:
@@ -40847,15 +40846,9 @@ cp_parser_omp_target (cp_parser *parser, cp_token 
*pragma_tok,
cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc;
  }
}
- tree stmt = make_node (OMP_TARGET);
- TREE_TYPE (stmt) = void_type_node;
- OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET];
- OMP_TARGET_BODY (stmt) = body;
- OMP_TARGET_COMBINED (stmt) = 1;
- SET_EXPR_LOCATION (stmt, pragma_tok->location);
- add_stmt (stmt);
- pc = _TARGET_CLAUSES (stmt);
- goto check_clauses;
+ finish_omp_target (pragma_tok->location,
+cclauses[C_OMP_CLAUSE_SPLIT_TARGET], body, true);
+ return true;
}
   else if (!flag_openmp)  /* flag_openmp_simd  */
{
@@ -40892,46 +40885,13 @@ 

[PATCH] Canonicalize real X - CST to X + -CST

2020-09-16 Thread Richard Biener
This removes the canonicalization of X + -CST to X - CST to facilitate
SLP vectorization analysis where we currently treat the added testcase
as two-operation plus blend vectorization opportunity.  While there
may be the possibility to avoid this in the vectorizer itself the
canonicalization is somewhat premature - it's motivation is that
a FP constant of both signs is common and thus canonicalizing to
positive values optimizes constant pool.  But if you mix, say,
x * -7.1 and x - 7.1 this backfires - this instead looks like
a local optimization to be applied in RTL CSE or even LRA
load inheritance.  I filed PR97071 for this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, any objections?

Thanks,
Richard.

2020-09-16  Richard Biener  

* fold-const.c (fold_binary_loc): Remove condition
on REAL_CST for A - B -> A + (-B) transform.
(negate_expr_p): Remove condition on REAL_CST.
* match.pd (negate_expr_p): Likewise.
(X + -C -> X - C): Remove.

* gcc.dg/vect/slp-49.c: New testcase.
* gcc.dg/tree-ssa/pr90356-1.c: Adjust.
* gcc.dg/tree-ssa/pr90356-3.c: Likewise.
* gcc.dg/tree-ssa/pr90356-4.c: Likewise.
* gcc.target/i386/pr54855-3.c: Likewise.
* gfortran.dg/reassoc_1.f90: Likewise.
* gfortran.dg/reassoc_2.f90: Likewise.
---
 gcc/fold-const.c  |  7 ++-
 gcc/match.pd  | 11 +--
 gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr90356-4.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-49.c| 15 +++
 gcc/testsuite/gcc.target/i386/pr54855-3.c |  2 +-
 gcc/testsuite/gfortran.dg/reassoc_1.f90   |  2 +-
 gcc/testsuite/gfortran.dg/reassoc_2.f90   |  2 +-
 9 files changed, 25 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-49.c

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0cc80adf632..ec369169a4c 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -401,7 +401,7 @@ negate_expr_p (tree t)
 case REAL_CST:
   /* We want to canonicalize to positive real constants.  Pretend
  that only negative ones can be easily negated.  */
-  return REAL_VALUE_NEGATIVE (TREE_REAL_CST (t));
+  return true;
 
 case COMPLEX_CST:
   return negate_expr_p (TREE_REALPART (t))
@@ -10962,10 +10962,7 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
   /* A - B -> A + (-B) if B is easily negatable.  */
   if (negate_expr_p (op1)
  && ! TYPE_OVERFLOW_SANITIZED (type)
- && ((FLOAT_TYPE_P (type)
-   /* Avoid this transformation if B is a positive REAL_CST.  */
-  && (TREE_CODE (op1) != REAL_CST
-  || REAL_VALUE_NEGATIVE (TREE_REAL_CST (op1
+ && (FLOAT_TYPE_P (type)
  || INTEGRAL_TYPE_P (type)))
return fold_build2_loc (loc, PLUS_EXPR, type,
fold_convert_loc (loc, type, arg0),
diff --git a/gcc/match.pd b/gcc/match.pd
index 7d63bb973cb..9f14d50ae2d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1307,8 +1307,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (negate @0)
  (if (!TYPE_OVERFLOW_SANITIZED (type
 (match negate_expr_p
- REAL_CST
- (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (t)
+ REAL_CST)
 /* VECTOR_CST handling of non-wrapping types would recurse in unsupported
ways.  */
 (match negate_expr_p
@@ -3326,14 +3325,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Canonicalization of binary operations.  */
 
-/* Convert X + -C into X - C.  */
-(simplify
- (plus @0 REAL_CST@1)
- (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
-  (with { tree tem = const_unop (NEGATE_EXPR, type, @1); }
-   (if (!TREE_OVERFLOW (tem) || !flag_trapping_math)
-(minus @0 { tem; })
-
 /* Convert x+x into x*2.  */
 (simplify
  (plus @0 @0)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c
index c3a15ea21af..ac95ec56810 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c
@@ -2,8 +2,8 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fno-rounding-math -fsignaling-nans -fsigned-zeros 
-fdump-tree-optimized" } */
 /* { dg-final { scan-tree-dump-times "x_\[0-9]*.D. \\+ 0.0;" 12 "optimized" } 
} */
-/* { dg-final { scan-tree-dump-times "y_\[0-9]*.D. - 0.0;" 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-times " \[+-] 0.0;" 16 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "y_\[0-9]*.D. \\+ -0.0;" 4 "optimized" } 
} */
+/* { dg-final { scan-tree-dump-times " \\+ -?0.0;" 16 "optimized" } } */
 
 double f1 (double x) { return (x + 0.0) + 0.0; }
 double f2 (double y) { return (y + (-0.0)) + (-0.0); }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c
index e658130c69f..951971b95ee 100644
--- 

Re: [PATCH V2] vec: don't select partial vectors when looping on full vectors

2020-09-16 Thread Andrea Corallo
Richard Sandiford  writes:

> OK with two incredibly petty comments fixed:

[...]

Installed with the two suggestions as 052204fac58.

Thanks for reviewing!

  Andrea


[PATCH] IRA: Don't make a global register eliminable

2020-09-16 Thread H.J. Lu via Gcc-patches
On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford
 wrote:
>
> Thanks for looking at this.
>
> "H.J. Lu"  writes:
> > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
> > Author: Richard Sandiford 
> > Date:   Wed Oct 2 07:37:10 2019 +
> >
> > [LRA] Don't make eliminable registers live (PR91957)
> >
> > didn't make eliminable registers live which breaks
> >
> > register void *cur_pro asm("reg");
> >
> > where "reg" is an eliminable register.  Make fixed eliminable registers
> > live to fix it.
>
> I don't think fixedness itself is the issue here: it's usual for at
> least some registers involved in eliminations to be fixed registers.
>
> I think what makes this case different is instead that cur_pro/ebp
> is a global register.  But IMO things have already gone wrong if we
> think that a global register is eliminable.
>
> So I wonder if instead we should check global_regs at the beginning of:
>
>   for (i = 0; i < fp_reg_count; i++)
> if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
> HARD_FRAME_POINTER_REGNUM + i))
>   {
> SET_HARD_REG_BIT (eliminable_regset,
>   HARD_FRAME_POINTER_REGNUM + i);
> if (frame_pointer_needed)
>   SET_HARD_REG_BIT (ira_no_alloc_regs,
> HARD_FRAME_POINTER_REGNUM + i);
>   }
> else if (frame_pointer_needed)
>   error ("%s cannot be used in % here",
>  reg_names[HARD_FRAME_POINTER_REGNUM + i]);
> else
>   df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
>
> (ira_setup_eliminable_regset), and handle the global_regs[] case in
> the same way as the else case, i.e. short-circuiting both of the ifs.
>

Like this?

Thanks.

-- 
H.J.
From d1c852691d245d90666c1b95fc7f8db2b241 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 15 Sep 2020 04:17:54 -0700
Subject: [PATCH] IRA: Don't make a global register eliminable

commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
Author: Richard Sandiford 
Date:   Wed Oct 2 07:37:10 2019 +

[LRA] Don't make eliminable registers live (PR91957)

didn't make eliminable registers live which breaks

register void *cur_pro asm("reg");

where "reg" is an eliminable register.  Update ira_setup_eliminable_regset
to avoid making global registers eliminable.

gcc/

	PR middle-end/97054
	* ira.c (ira_setup_eliminable_regset): Don't make a global
	register eliminable.

gcc/testsuite/

	PR middle-end/97054
	* g++.target/i386/pr97054.C: New test.
---
 gcc/ira.c   | 30 
 gcc/testsuite/g++.target/i386/pr97054.C | 96 +
 2 files changed, 114 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr97054.C

diff --git a/gcc/ira.c b/gcc/ira.c
index a759f3c2aa8..506c6e6783e 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -2309,21 +2309,27 @@ ira_setup_eliminable_regset (void)
 }
   if (!HARD_FRAME_POINTER_IS_FRAME_POINTER)
 {
+  /* NB: Don't make a global register eliminable.  */
   for (i = 0; i < fp_reg_count; i++)
-	if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
-HARD_FRAME_POINTER_REGNUM + i))
+	if (global_regs[HARD_FRAME_POINTER_REGNUM + i])
+	  df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
+	else
 	  {
-	SET_HARD_REG_BIT (eliminable_regset,
-			  HARD_FRAME_POINTER_REGNUM + i);
-	if (frame_pointer_needed)
-	  SET_HARD_REG_BIT (ira_no_alloc_regs,
-HARD_FRAME_POINTER_REGNUM + i);
+	if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
+HARD_FRAME_POINTER_REGNUM + i))
+	  {
+		SET_HARD_REG_BIT (eliminable_regset,
+  HARD_FRAME_POINTER_REGNUM + i);
+		if (frame_pointer_needed)
+		  SET_HARD_REG_BIT (ira_no_alloc_regs,
+HARD_FRAME_POINTER_REGNUM + i);
+	  }
+	else if (frame_pointer_needed)
+	  error ("%s cannot be used in % here",
+		 reg_names[HARD_FRAME_POINTER_REGNUM + i]);
+	else
+	  df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
 	  }
-	else if (frame_pointer_needed)
-	  error ("%s cannot be used in % here",
-		 reg_names[HARD_FRAME_POINTER_REGNUM + i]);
-	else
-	  df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
 }
 }
 
diff --git a/gcc/testsuite/g++.target/i386/pr97054.C b/gcc/testsuite/g++.target/i386/pr97054.C
new file mode 100644
index 000..d0693af2a42
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr97054.C
@@ -0,0 +1,96 @@
+// { dg-do run { target { ! ia32 } } }
+// { dg-require-effective-target fstack_protector }
+// { dg-options "-O2 -fno-strict-aliasing -msse4.2 -mfpmath=sse -fPIC -fstack-protector-strong -O2" }
+
+struct p2_icode *ipc;
+register int pars asm("r13");
+register struct processor *cur_pro asm("rbp");
+register int a asm("rbx");
+register int c asm("r14");
+typedef long lina_t;
+typedef long la_t;
+typedef processor processor_t;
+typedef p2_icode p2_icode_t;
+typedef enum {
+  P2_Return_Action_Next,
+} 

[PATCH] rs6000: Add rs6000_cfun_pcrel_p

2020-09-16 Thread Bill Schmidt via Gcc-patches
This is a cleanup requested by Segher in a previous review.  Most
uses of rs6000_pcrel_p are called for the current function.  A
specialized version for cfun is more efficient for these uses.

(Actually all uses are called for the current function, so now
rs6000_pcrel_p is an orphan.  However, I think it's worth keeping
around in case we have a late discovery where we need it.)

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this okay for trunk?

Thanks,
Bill


2020-09-16  Bill Schmidt  

gcc/
* config/rs6000/predicates.md (current_file_function_operand): Use
rs6000_cfun_pcrel_p.
* config/rs6000/rs6000-logue.c (rs6000_decl_ok_for_sibcall):
Likewise.
(rs6000_global_entry_point_prologue_needed_p): Likewise.
(rs6000_output_function_prologue): Likewise.
* config/rs6000/rs6000-protos.h (rs6000_cfun_pcrel_p): New
prototype.
* config/rs6000/rs6000.c (rs6000_legitimize_tls_address): Use
rs6000_cfun_pcrel_p.
(rs6000_indirect_call_template_1): Likewise.
(rs6000_longcall_ref): Likewise.
(rs6000_call_aix): Likewise.
(rs6000_sibcall_aix): Likewise.
(rs6000_cfun_pcrel_p): New function.
* config/rs6000/rs6000.md (*pltseq_plt_pcrel): Use
rs6000_cfun_pcrel_p.
(*call_local): Likewise.
(*call_value_local): Likewise.
(*call_nonlocal_aix): Likewise.
(*call_value_nonlocal_aix): Likewise.
(*call_indirect_pcrel): Likewise.
(*call_value_indirect_pcrel): Likewise.
---
 gcc/config/rs6000/predicates.md   |  2 +-
 gcc/config/rs6000/rs6000-logue.c  |  6 +++---
 gcc/config/rs6000/rs6000-protos.h |  1 +
 gcc/config/rs6000/rs6000.c| 32 ---
 gcc/config/rs6000/rs6000.md   | 14 +++---
 5 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 2709e46f7e5..974a045cce0 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1056,7 +1056,7 @@ (define_predicate "current_file_function_operand"
 && SYMBOL_REF_DECL (op) != NULL
 && TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL
 && (rs6000_fndecl_pcrel_p (SYMBOL_REF_DECL (op))
-!= rs6000_pcrel_p (cfun)))")))
+!= rs6000_cfun_pcrel_p ()))")))
 
 ;; Return 1 if this operand is a valid input for a move insn.
 (define_predicate "input_operand"
diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 5a2cb7fdf2c..7534b7f17dd 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -1085,7 +1085,7 @@ rs6000_decl_ok_for_sibcall (tree decl)
 r2 for its caller's TOC.  Such a function may make sibcalls to any
 function, whether local or external, without restriction based on
 TOC-save/restore rules.  */
-  if (rs6000_pcrel_p (cfun))
+  if (rs6000_cfun_pcrel_p ())
return true;
 
   /* Otherwise, under the AIX or ELFv2 ABIs we can't allow sibcalls
@@ -2562,7 +2562,7 @@ rs6000_global_entry_point_prologue_needed_p (void)
 return false;
 
   /* PC-relative functions never generate a global entry point prologue.  */
-  if (rs6000_pcrel_p (cfun))
+  if (rs6000_cfun_pcrel_p ())
 return false;
 
   /* Ensure we have a global entry point for thunks.   ??? We could
@@ -3978,7 +3978,7 @@ rs6000_output_function_prologue (FILE *file)
   fputs ("\n", file);
 }
 
-  else if (rs6000_pcrel_p (cfun))
+  else if (rs6000_cfun_pcrel_p ())
 {
   const char *name = XSTR (XEXP (DECL_RTL (current_function_decl), 0), 0);
   /* All functions compiled to use PC-relative addressing will
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 02e4d71922f..e75f8944d56 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -153,6 +153,7 @@ extern rtx rs6000_allocate_stack_temp (machine_mode, bool, 
bool);
 extern align_flags rs6000_loop_align (rtx);
 extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool);
 extern bool rs6000_pcrel_p (struct function *);
+extern bool rs6000_cfun_pcrel_p (void);
 extern bool rs6000_fndecl_pcrel_p (const_tree);
 
 /* Different PowerPC instruction formats that are used by GCC.  There are
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6d0c5509b0e..f4b3e417fdd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8780,7 +8780,7 @@ rs6000_legitimize_tls_address (rtx addr, enum tls_model 
model)
 
   dest = gen_reg_rtx (Pmode);
   if (model == TLS_MODEL_LOCAL_EXEC
-  && (rs6000_tls_size == 16 || rs6000_pcrel_p (cfun)))
+  && (rs6000_tls_size == 16 || rs6000_cfun_pcrel_p ()))
 {
   rtx tlsreg;
 
@@ -8827,7 +8827,7 @@ 

Re: [PATCH] debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure

2020-09-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 16, 2020 at 02:33:03PM +0200, Mark Wielaard wrote:
> > 2020-09-15  Jakub Jelinek  
> > 
> > * configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG,
> > HAVE_AS_WORKING_DWARF_4_FLAG): New tests.
> > * gcc.c (ASM_DEBUG_DWARF_OPTION): Define.
> > (ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of
> > "--gdwarf2".  Use %{cond:opt1;:opt2} style.
> > (ASM_DEBUG_OPTION_DWARF_OPT): Define.
> > (ASM_DEBUG_OPTION_SPEC): Define.
> > (asm_debug_option): New variable.
> > (asm_options): Add "%(asm_debug_option)".
> > (static_specs): Add asm_debug_option entry.
> > (static_spec_functions): Add dwarf-version-gt.
> > (debug_level_greater_than_spec_func): New function.
> > * config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define.
> > * config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine.
> > * config.in: Regenerated.
> > * configure: Regenerated.
> 
> I tested against the binutils-2_35-branch which will become 2.35.1 next
> week. The configure tests succeed there. So I think you can change the
> [elf,2,36,0] to [elf,2,35,1].

No problem with such a change, although it isn't that important, because
most of the people don't use in-tree builds and for out of tree builds
the actual test rather than version comparison is used instead.

Jakub



Re: [PATCH] rs6000: Fix misnamed built-in

2020-09-16 Thread Bill Schmidt via Gcc-patches

On 9/16/20 5:32 AM, Segher Boessenkool wrote:

On Tue, Sep 15, 2020 at 08:38:42PM -0500, Bill Schmidt wrote:

The description in rs6000-builtin.def provides for a builtin named
__builtin_altivec_xst_len_r.  However, it is hand-defined in
altivec_init_builtins as __builtin_xst_len_r, against the usual naming
practice.  Fix that.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions; committed as obvious.

Does no one use this?  Should we provide an alias with the old (wrong)
name?  What does the documentation say it should be?

It is obvious of course, except for that "might break users' code"
thing ;-)



Sorry, I should have mentioned this.  It's unlikely that people are 
using the incorrect name, as it is undocumented, and hidden by the 
supported overloaded interfaces.  The only documented interface is 
"vec_xl_len_r", which remains unchanged (and the internal hookup from 
that to this is unchanged by the patch).


If anyone's using the undocumented incorrect interface, I'd rather that 
get discovered and fixed.


Thanks,
Bill



Segher


[PATCH] libstdc++: use lt_host_flags for libstdc++.la

2020-09-16 Thread JonY via Gcc-patches
For platforms like Mingw and Cygwin, cygwin refuses to generate the
shared library without using -no-undefined.

Attached patch makes sure the right flags are used, since libtool is
already used to link libstdc++.

Patch OK?
From 4ba039687182fccd67e1170f89e259e1c4a58eeb Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Sun, 22 Mar 2020 09:56:58 +0800
Subject: [PATCH 1/1] libstdc++: use lt_host_flags for libstdc++.la

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 libstdc++-v3/src/Makefile.am | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
index a139adc81b3..498f533f3d3 100644
--- a/libstdc++-v3/src/Makefile.am
+++ b/libstdc++-v3/src/Makefile.am
@@ -107,7 +107,7 @@ libstdc___la_DEPENDENCIES = \
 libstdc___la_LDFLAGS = \
 	-version-info $(libtool_VERSION) ${version_arg} -lm
 
-libstdc___la_LINK = $(CXXLINK) $(libstdc___la_LDFLAGS)
+libstdc___la_LINK = $(CXXLINK) $(libstdc___la_LDFLAGS) $(lt_host_flags)
 
 # Use special rules for compatibility-ldbl.cc compilation, as we need to
 # pass -mlong-double-64.
-- 
2.11.4.GIT



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] -mno-xsave should imply -mno-avx since -mavx implies -mxsave

2020-09-16 Thread Uros Bizjak via Gcc-patches
> gcc/ChangeLog
>
> * common/config/i386/i386-common.c
> (OPTION_MASK_ISA_AVX_UNSET): Remove OPTION_MASK_ISA_XSAVE_UNSET.
> (OPTION_MASK_ISA_XSAVE_UNSET): Add OPTION_MASK_ISA_AVX_UNSET.
>
> gcc/testsuite/ChangeLog
>
> * gcc.target/i386/xsave-avx-1.c: New test.

OK for mainline and backports.

Thanks,
Uros.


Re: [PATCH] Increase rtx_cost of sse_to_integer in skylake_cost.

2020-09-16 Thread Uros Bizjak via Gcc-patches
> gcc/ChangeLog
>
> PR target/96861
> * config/i386/x86-tune-costs.h (skylake_cost): increase rtx
> cost of sse_to_integer from 2 to 6.
>
> gcc/testsuite
>
> * gcc.target/i386/pr95021-3.c: Add -mtune=generic.

OK.

Thanks,
Uros.


RE: [aarch64] Backport missing NEON intrinsics to GCC8

2020-09-16 Thread Kyrylo Tkachov


> -Original Message-
> From: Christophe Lyon 
> Sent: 16 September 2020 13:43
> To: Pop, Sebastian 
> Cc: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Subject: Re: [aarch64] Backport missing NEON intrinsics to GCC8
> 
> On Wed, 16 Sep 2020 at 07:21, Pop, Sebastian via Gcc-patches
>  wrote:
> >
> > Thanks Kyrill for your review.
> >
> > I committed the patches to the gcc-8 branch:
> >
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2c55e6caa9432b2c1f081
> cb3aeddd36abec03233
> >
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a4004f62d60ada3a20dbf301
> 46ca461047a575cc
> >
> > and to the gcc-9 branch:
> >
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c5aca0333b723d5e2036659
> 3cd01047d105f54e4
> >
> > Sebastian
> >
> 
> Hi Sebastian,
> 
> The new tests vld1x3 and vld1x4 fail on arm, I am seeing new failures
> on gcc-8 and gcc-9 branches
> after r8-10451, r8-10452 and r9-8874.
> Is that expected/fixed later in the backport series?
> (on the release branches, my validations are running for every commit)

Hmm, IIRC they're not implemented for arm IIRC so they should be xfailed or 
skipped on arm.

Kyrill

> 
> Thanks,
> 
> Christophe
> 
> 
> > On 9/15/20, 7:46 AM, "Kyrylo Tkachov"  wrote:
> >
> > Hi Sebastian,
> >
> > This patch implements missing intrinsics.
> > I'm okay with this being applied to the GCC 8 branch as these intrinsics
> have been defined in ACLE for a long time.
> > It is arguably a bug that they've been missing from GCC8.
> > Their implementation is fairly self-contained we haven't had any bugs
> reported against these in my recollection.
> >
> > So ok on the grounds that it's a bug-fix.
> > Thanks,
> > Kyrill
> >
> > From: Pop, Sebastian 
> > Sent: 11 September 2020 20:54
> > To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov
> 
> > Subject: [aarch64] Backport missing NEON intrinsics to GCC8
> >
> > Hi,
> >
> > gcc-8 branch is missing NEON intrinsics for loads and stores.
> > Attached patches pass bootstrap and regression testing on Graviton2
> aarch64-linux.
> >
> > Ok to commit to gcc-8 branch?
> >
> > Thanks,
> > Sebastian
> >


Re: [aarch64] Backport missing NEON intrinsics to GCC8

2020-09-16 Thread Christophe Lyon via Gcc-patches
On Wed, 16 Sep 2020 at 07:21, Pop, Sebastian via Gcc-patches
 wrote:
>
> Thanks Kyrill for your review.
>
> I committed the patches to the gcc-8 branch:
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2c55e6caa9432b2c1f081cb3aeddd36abec03233
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a4004f62d60ada3a20dbf30146ca461047a575cc
>
> and to the gcc-9 branch:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c5aca0333b723d5e20366593cd01047d105f54e4
>
> Sebastian
>

Hi Sebastian,

The new tests vld1x3 and vld1x4 fail on arm, I am seeing new failures
on gcc-8 and gcc-9 branches
after r8-10451, r8-10452 and r9-8874.
Is that expected/fixed later in the backport series?
(on the release branches, my validations are running for every commit)

Thanks,

Christophe


> On 9/15/20, 7:46 AM, "Kyrylo Tkachov"  wrote:
>
> Hi Sebastian,
>
> This patch implements missing intrinsics.
> I'm okay with this being applied to the GCC 8 branch as these intrinsics 
> have been defined in ACLE for a long time.
> It is arguably a bug that they've been missing from GCC8.
> Their implementation is fairly self-contained we haven't had any bugs 
> reported against these in my recollection.
>
> So ok on the grounds that it's a bug-fix.
> Thanks,
> Kyrill
>
> From: Pop, Sebastian 
> Sent: 11 September 2020 20:54
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov 
> Subject: [aarch64] Backport missing NEON intrinsics to GCC8
>
> Hi,
>
> gcc-8 branch is missing NEON intrinsics for loads and stores.
> Attached patches pass bootstrap and regression testing on Graviton2 
> aarch64-linux.
>
> Ok to commit to gcc-8 branch?
>
> Thanks,
> Sebastian
>


Re: [PATCH] debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure

2020-09-16 Thread Mark Wielaard
Hi,

On Tue, 2020-09-15 at 20:40 +0200, Jakub Jelinek wrote:
> Ok, here it is in patch form.
> I've briefly tested it, with the older binutils I have around (no --gdwarf-N
> support), with latest gas (--gdwarf-N that can be passed to as even when
> compiling C/C++ etc. code and emitting .debug_line) and latest gas with 
> Mark's fix
> reverted (--gdwarf-N support, but can only pass it to as when assembling
> user .s/.S files, not when compiling C/C++ etc.).
> Will bootstrap/regtest (with the older binutils) later tonight.
> 
> 2020-09-15  Jakub Jelinek  
> 
>   * configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG,
>   HAVE_AS_WORKING_DWARF_4_FLAG): New tests.
>   * gcc.c (ASM_DEBUG_DWARF_OPTION): Define.
>   (ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of
>   "--gdwarf2".  Use %{cond:opt1;:opt2} style.
>   (ASM_DEBUG_OPTION_DWARF_OPT): Define.
>   (ASM_DEBUG_OPTION_SPEC): Define.
>   (asm_debug_option): New variable.
>   (asm_options): Add "%(asm_debug_option)".
>   (static_specs): Add asm_debug_option entry.
>   (static_spec_functions): Add dwarf-version-gt.
>   (debug_level_greater_than_spec_func): New function.
>   * config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define.
>   * config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine.
>   * config.in: Regenerated.
>   * configure: Regenerated.

I tested against the binutils-2_35-branch which will become 2.35.1 next
week. The configure tests succeed there. So I think you can change the
[elf,2,36,0] to [elf,2,35,1].

Also tested that they fail with an older binutils (2.32).

Cheers,

Mark


Re: [PATCH PR93334][RFC]Skip output dep if values stored are byte wise the same

2020-09-16 Thread Richard Biener via Gcc-patches
On Tue, Sep 15, 2020 at 2:28 PM bin.cheng via Gcc-patches
 wrote:
>
> Hi,
> As suggested by PR93334 comments, this patch adds an interface identifying
> output dependence which can be skipped in terms of reordering and skip it in
> loop distribution.  It also adds a new test case.  Any comment?

Looks good to me.

Thanks,
Richard.

> Thanks,
> bin


Re: [PATCH] Fix testcases to avoid plusminus-with-convert pattern (PR 97066)

2020-09-16 Thread Richard Biener via Gcc-patches
On Wed, Sep 16, 2020 at 10:54 AM Feng Xue OS via Gcc-patches
 wrote:
>
> With the new pattern rule (T)(A) +- (T)(B) -> (T)(A +- B),
> some testcases are simplified and could not keep expected
> code pattern as test-check. Minor changes are made to those
> cases to avoid simplification effect of the rule.
>
> Tested on x86_64-linux and aarch64-linux.

OK.

> Feng
> ---
> 2020-09-16  Feng Xue  
>
> gcc/testsuite/
> PR testsuite/97066
> * gcc.dg/ifcvt-3.c: Modified to suppress simplification.
> * gcc.dg/tree-ssa/20030807-10.c: Likewise.


Re: [PATCH] libstdc++: use a link test to test for -Wl,-z,relro

2020-09-16 Thread JonY via Gcc-patches
On 9/13/20 3:37 PM, JonY wrote:
> On 9/10/20 2:23 PM, JonY wrote:
>> Do a link test instead of just a grep. The linker can
>> support multiple targets, but not all targets can use it.
>>
>> Cygwin/MinGW ld can support ELF but the PE format for Windows itself
>> does not support such a feature. Attached patch OK?
>>
>> I'm not confident with regenerating configure due to some unrelated
>> changes added, can someone else help with that?
>>
> 
> Ping?
> 

Ping 2?



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] IBM Z: Fix *vec_tf_to_v1tf constraints

2020-09-16 Thread Andreas Krebbel via Gcc-patches
On 03.09.20 08:39, Ilya Leoshkevich wrote:
> Bootstrapped (with BOOT_CFLAGS='-g -O2 -Wno-error=maybe-uninitialized')
> and regtested on s390x-redhat-linux. Ok for master?
> 
> 
> 
> Certain alternatives of *vec_tf_to_v1tf use "v" constraint for its
> TFmode source operand. Therefore it is assigned to VEC_REGS class, and
> when it is reloaded using *movtf_64, whose relevant alternatives need
> FP_REGS, LRA loops and ICE happens. The reason is that register class
> mismatch causes LRA to emit another reload, which triggers this issue
> again.
> 
> Fix by using "f" constraint, which is more appropriate for FP register
> pairs anyway.
> 
> gcc/ChangeLog:
> 
> 2020-09-02  Ilya Leoshkevich  
> 
>   * config/s390/vector.md(*vec_tf_to_v1tf): Use "f" instead of "v"
> for the source operand.

Ok. Thanks!

Andreas

> ---
>  gcc/config/s390/vector.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index 131bbda09bc..2573b7d980a 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -567,7 +567,7 @@ (define_insn "*vec_splats_bswap_elem"
>  ; single vector register.
>  (define_insn "*vec_tf_to_v1tf"
>[(set (match_operand:V1TF   0 "nonimmediate_operand" 
> "=v,v,R,v,v")
> - (vec_duplicate:V1TF (match_operand:TF 1 "general_operand"   
> "v,R,v,G,d")))]
> + (vec_duplicate:V1TF (match_operand:TF 1 "general_operand"   
> "f,R,f,G,d")))]
>"TARGET_VX"
>"@
> vmrhg\t%v0,%1,%N1
> 



Re: openmp question

2020-09-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 16, 2020 at 08:00:01AM -0400, Nathan Sidwell wrote:
> is libgomp/testsuite/libgomp.c++/udr-3.C well formed?
> 
> Specifically V::baz?
> 
> If you remove the templatedness and plug in typedefs for T & D on its only
> instantiation (as the attached does), you get the following errors:
> 
> devvm293:414>g++ -std=c++11 -c -fopenmp udr-3.C
> udr-3.C: In member function 'void V::baz()':
> udr-3.C:112:27: error: no matching function for call to 'W::W(int)'
>   112 |  initializer (omp_priv (12))
>   |   ^

I guess you're right that W::W(int) should be defined.
And the question is why it isn't diagnosed during instantiation, sure.

The initialized (omp_priv (12))
etc. clause means that for the type(s) for which the declare reduction
is defined, when omp-low.c privatizes the variable, it should be done
as if the source contained W var (12);
This is done by attaching code to call the right ctors/dtors to the clauses
and omp-low.c emitting those sequences.

Jakub



openmp question

2020-09-16 Thread Nathan Sidwell

Jakub,
is libgomp/testsuite/libgomp.c++/udr-3.C well formed?

Specifically V::baz?

If you remove the templatedness and plug in typedefs for T & D on its 
only instantiation (as the attached does), you get the following errors:


devvm293:414>g++ -std=c++11 -c -fopenmp udr-3.C
udr-3.C: In member function 'void V::baz()':
udr-3.C:112:27: error: no matching function for call to 'W::W(int)'
  112 |  initializer (omp_priv (12))
  |   ^
udr-3.C:88:3: note: candidate: 'W::W()'
   88 |   W () : v (6) {}
  |   ^
udr-3.C:88:3: note:   candidate expects 0 arguments, 1 provided
udr-3.C:85:8: note: candidate: 'constexpr W::W(const W&)'
   85 | struct W
  |^
udr-3.C:85:8: note:   no known conversion for argument 1 from 'int' to 
'const W&'

udr-3.C:115:26: error: no matching function for call to 'W::W(int)'
  115 |  initializer (omp_priv (9))
  |  ^
udr-3.C:88:3: note: candidate: 'W::W()'
   88 |   W () : v (6) {}
  |   ^
udr-3.C:88:3: note:   candidate expects 0 arguments, 1 provided
udr-3.C:85:8: note: candidate: 'constexpr W::W(const W&)'
   85 | struct W
  |^
udr-3.C:85:8: note:   no known conversion for argument 1 from 'int' to 
'const W&'


It would seem to me you should get the same missing ctor errors when 
it's an instantiated template.


And such errors are what I observe with my WIP dealing with my recent 
breakage of that test. (adding the missing ctor makes the test pass)


Also, it seems cp_check_omp_declare_reduction (d) doesn;t set the 
reduction to a suitably formed error_mark_node on failure-- so 
downstream consumers can see the rejected cases.  Is there a reason for 
that?


nathan
--
Nathan Sidwell
// { dg-do run }

extern "C" void abort ();

void
dblinit (double *p)
{
  *p = 2.0;
}

namespace NS
{
  template 
  struct U
  {
void foo (U &, bool);
U ();
  };
  template 
  struct S
  {
int s;
#pragma omp declare reduction (foo : U<0>, S : omp_out.foo (omp_in, false))
#pragma omp declare reduction (foo : int : omp_out += omp_in) \
	initializer (omp_priv = N + 2)
#pragma omp declare reduction (foo : double : omp_out += omp_in) \
	initializer (dblinit (_priv))
void baz (int v)
{
  S s;
  int q = 0;
  if (s.s != 6 || v != 0) abort ();
  s.s = 20;
  double d = 4.0;
  #pragma omp parallel num_threads (4) reduction (foo : s, v, d) \
	reduction (::NS::U::operator + : q)
  {
	if (s.s != 6 || q != 0 || v != N + 2 || d != 2.0) abort ();
	asm volatile ("" : "+m" (s.s), "+r" (q), "+r" (v));
	s.s++; q++; v++;
  }
  if (s.s != 20 + q * 7 || (N + 3) * q != v || d != 4.0 + 2.0 * q)
	abort ();
}
void foo (S ) { s += x.s; }
void foo (S , bool y) { s += x.s; if (y) abort (); }
S (const S ) { s = x.s + 1; }
S (const S , bool y) { s = x.s + 2; if (y) abort (); }
S () { s = 6; }
S (int x) { s = x; }
~S ();
  };
  #pragma omp declare reduction (bar : S<1> : omp_out.foo (omp_in)) \
	initializer (omp_priv (8))
}

template 
NS::S::~S ()
{
  if (s < 6) abort ();
  s = -1;
  /* Ensure the above store is not DSEd.  */
  asm volatile ("" : : "r" () : "memory");
}

template 
struct T : public NS::S
{
  void baz ()
  {
NS::S s;
int q = 0;
if (s.s != 6) abort ();
#pragma omp parallel num_threads (4) reduction (foo:s) \
	reduction (+: q)
{
  if (s.s != 6 || q != 0) abort ();
  asm volatile ("" : "+m" (s.s), "+r" (q));
  s.s += 2; q++;
}
if (s.s != 6 + q * 8) abort ();
  }
};

struct W
{
  int v;
  W () : v (6) {}
  // ADD to fix:  W (int i) : v (i) {}
  ~W () {}
};

// Remove templatedness
// template 
struct V
{
  using T =NS::S<0>; // Types instantiated oer
  using D= double; // Likewise
  #pragma omp declare reduction (baz: T: omp_out.s += omp_in.s) \
	initializer (omp_priv (11))
  #pragma omp declare reduction (baz: D: omp_out += omp_in) \
	initializer (dblinit (_priv))
  static void dblinit (D *x) { *x = 3.0; }
  void baz ()
  {
T t;
V v;
int q = 0;
D d = 4.0;
if (t.s != 6 || v.v != 4) abort ();
#pragma omp declare reduction (+ : V, W : omp_out.v -= omp_in.v) \
	initializer (omp_priv (12))
{
  #pragma omp declare reduction (+ : W, V : omp_out.v += omp_in.v) \
	initializer (omp_priv (9))
  #pragma omp parallel num_threads (4) reduction (+: v, q) \
	reduction (baz: t, d)
  {
	if (t.s != 11 || v.v != 9 || q != 0 || d != 3.0) abort ();
	asm volatile ("" : "+m" (t.s), "+m" (v.v), "+r" (q));
	t.s += 2; v.v += 3; q++;
  }
  if (t.s != 6 + 13 * q || v.v != 4 + 12 * q || d != 4.0 + 3.0 * q)
	abort ();
}
  }
  int v;
  V () : v (4) {}
  V (int x) : v (x) {}
  ~V () {}
};

int
main ()
{
  NS::S<0> u;
  u.baz (0);
  T<2> t;
  t.baz ();
  NS::S<1> s;
  int q = 0;
  if (s.s != 6) abort ();
  // Test ADL
  #pragma omp parallel num_threads (4) reduction (bar:s) reduction (+:q)
  {
if (s.s != 8 || q != 0) abort ();
asm volatile ("" : "+m" (s.s), "+r" (q));

Re: [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 16, 2020 at 12:34:50PM +0100, Richard Sandiford wrote:
> Jakub Jelinek via Gcc-patches  writes:
> > On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote:
> >> Something like this for GCC 8 and 9.
> >
> > Guess my preference would be to do this everywhere and then let's discuss if
> > we change the warning into error there or keep it being deprecated.
> 
> Agreed FWIW.  On turning it into an error: I think it might be better
> to wait a bit longer if we can.

Ok.  The patch is ok for trunk and affected release branches after a week.

> > Though, let's see what others want to say about this.
> >
> >> Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
> >> pointer is clobbered by asm statement.
> >> 
> >> gcc/
> >> 
> >>PR target/97032
> >>* cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
> >>to true if the stack pointer is clobbered by asm statement.
> >>* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
> >>* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
> >>if the stack pointer is clobbered by asm statement.
> >> 
> >> gcc/testsuite/
> >> 
> >>PR target/97032
> >>* gcc.target/i386/pr97032.c: New test.

Jakub



[PATCH] C-SKY: Add -msim option

2020-09-16 Thread Jojo R
gcc/ChangeLog:

* config/csky/csky.opt (msim): New.
* doc/invoke.texi (C-SKY Options): Document -msim.
* config/csky/csky-elf.h (LIB_SPEC): Add simulator runtime.

---
 gcc/config/csky/csky-elf.h | 10 --
 gcc/config/csky/csky.opt   |  4 
 gcc/doc/invoke.texi|  7 ++-
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/gcc/config/csky/csky-elf.h b/gcc/config/csky/csky-elf.h
index 15a0e73..a79d757 100644
--- a/gcc/config/csky/csky-elf.h
+++ b/gcc/config/csky/csky-elf.h
@@ -70,8 +70,14 @@
  %{EL:-EL} -X"
 
 #undef LIB_SPEC
-#define LIB_SPEC \
-  "%{pthread:-lpthread} -lc %{mccrt:-lcc-rt}"
+#define LIB_SPEC "\
+%{pthread:-lpthread} \
+--start-group \
+-lc \
+%{msim:-lsemi}%{!msim:-lnosys} \
+--end-group \
+%{mccrt:-lcc-rt} \
+"
 /* FIXME add this to LIB_SPEC when need */
 /*   %{!shared:%{profile:-lc_p}%{!profile:-lc}}" */
 
diff --git a/gcc/config/csky/csky.opt b/gcc/config/csky/csky.opt
index 60b51e5..505a764 100644
--- a/gcc/config/csky/csky.opt
+++ b/gcc/config/csky/csky.opt
@@ -192,3 +192,7 @@ Set the branch costs to roughly the specified number of 
instructions.
 msched-prolog
 Target Report Var(flag_sched_prolog) Init(0)
 Permit scheduling of function prologue and epilogue sequences.
+
+msim
+Target
+Use the simulator runtime.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6d9ff2c..9176c83 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -829,7 +829,7 @@ Objective-C and Objective-C++ Dialects}.
 -mdsp  -medsp  -mvdsp @gol
 -mdiv  -msmart  -mhigh-registers  -manchor @gol
 -mpushpop  -mmultiple-stld  -mconstpool  -mstack-size  -mccrt @gol
--mbranch-cost=@var{n}  -mcse-cc  -msched-prolog}
+-mbranch-cost=@var{n}  -mcse-cc  -msched-prolog -msim}
 
 @emph{Darwin Options}
 @gccoptlist{-all_load  -allowable_client  -arch  -arch_errors_fatal @gol
@@ -20835,6 +20835,11 @@ this option can result in code that is not compliant 
with the C-SKY V2 ABI
 prologue requirements and that cannot be debugged or backtraced.
 It is disabled by default.
 
+@item -msim
+@opindex msim
+Links the library libsemi.a which is in compatible with simulator. Applicable
+to ELF compiler only.
+
 @end table
 
 @node Darwin Options
-- 
1.9.1



Re: [PATCH] aarch64: Fix ICE on fpsr fpcr getters [PR96968]

2020-09-16 Thread Richard Sandiford
Andrea Corallo  writes:
> @@ -2034,6 +2034,16 @@ aarch64_expand_fpsr_fpcr_setter (int unspec, 
> machine_mode mode, tree exp)
>emit_insn (gen_aarch64_set (unspec, mode, op));
>  }
>  
> +/* Expand a fpsr or fpcr getter (depending on UNSPEC) using MODE.
> +   Return the target.  */
> +static rtx
> +aarch64_expand_fpsr_fpcr_getter (int unspec, machine_mode mode)
> +{
> +  rtx target = gen_reg_rtx (mode);
> +  emit_insn (gen_aarch64_get (unspec, mode, target));
> +  return target;
> +}

I agree this is functionally correct, but if a valid target has
been given to the caller, it's generally better to use it.
So IMO it would be better to use the expand_insn machinery,
passing the original target to create_output_operand.

Thanks,
Richard

> +
>  /* Expand an expression EXP that calls built-in function FCODE,
> with result going to TARGET if that's convenient.  IGNORE is true
> if the result of the builtin is ignored.  */
> @@ -2048,26 +2058,22 @@ aarch64_general_expand_builtin (unsigned int fcode, 
> tree exp, rtx target,
>switch (fcode)
>  {
>  case AARCH64_BUILTIN_GET_FPCR:
> -  emit_insn (gen_aarch64_get (UNSPECV_GET_FPCR, SImode, target));
> -  return target;
> +  return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPCR, SImode);
>  case AARCH64_BUILTIN_SET_FPCR:
>aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPCR, SImode, exp);
>return target;
>  case AARCH64_BUILTIN_GET_FPSR:
> -  emit_insn (gen_aarch64_get (UNSPECV_GET_FPSR, SImode, target));
> -  return target;
> +  return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPSR, SImode);
>  case AARCH64_BUILTIN_SET_FPSR:
>aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPSR, SImode, exp);
>return target;
>  case AARCH64_BUILTIN_GET_FPCR64:
> -  emit_insn (gen_aarch64_get (UNSPECV_GET_FPCR, DImode, target));
> -  return target;
> +  return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPCR, DImode);
>  case AARCH64_BUILTIN_SET_FPCR64:
>aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPCR, DImode, exp);
>return target;
>  case AARCH64_BUILTIN_GET_FPSR64:
> -  emit_insn (gen_aarch64_get (UNSPECV_GET_FPSR, DImode, target));
> -  return target;
> +  return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPSR, DImode);
>  case AARCH64_BUILTIN_SET_FPSR64:
>aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPSR, DImode, exp);
>return target;
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr96968.c 
> b/gcc/testsuite/gcc.target/aarch64/pr96968.c
> new file mode 100644
> index 000..21ffd955153
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr96968.c
> @@ -0,0 +1,28 @@
> +/* { dg-options "-O1" } */
> +
> +void
> +fpsr_getter (void)
> +{
> +  unsigned int fpsr = __builtin_aarch64_get_fpsr ();
> +}
> +
> +void
> +fpsr64_getter (void)
> +{
> +  unsigned long fpsr = __builtin_aarch64_get_fpsr64 ();
> +}
> +
> +void
> +fpcr_getter (void)
> +{
> +  unsigned int fpcr = __builtin_aarch64_get_fpcr ();
> +}
> +
> +void
> +fpcr64_getter (void)
> +{
> +  unsigned long fpcr = __builtin_aarch64_get_fpcr64 ();
> +}
> +
> +/* { dg-final { scan-assembler-times {\tmrs\tx0, fpsr\n} 2 } } */
> +/* { dg-final { scan-assembler-times {\tmrs\tx0, fpcr\n} 2 } } */


Re: [PATCH 3/3] C-SKY: Refine target name for elf target test

2020-09-16 Thread Cooper Qu via Gcc-patches

All of the three pathes have been pushed to trunk.


Thanks,

Cooper


On 9/16/20 6:34 PM, Jojo R wrote:

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_profiling_available): Refine name of 
elf target.

---
  gcc/testsuite/lib/target-supports.exp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 6881b66..60f76db 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -698,7 +698,7 @@ proc check_profiling_available { test_what } {
 || [istarget avr-*-*]
 || [istarget bfin-*-*]
 || [istarget cris-*-*]
-|| [istarget csky-*-elf]
+|| [istarget csky-*-elf*]
 || [istarget fido-*-elf]
 || [istarget h8300-*-*]
 || [istarget lm32-*-*]


Re: [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-16 Thread Richard Sandiford
Jakub Jelinek via Gcc-patches  writes:
> On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote:
>> Something like this for GCC 8 and 9.
>
> Guess my preference would be to do this everywhere and then let's discuss if
> we change the warning into error there or keep it being deprecated.

Agreed FWIW.  On turning it into an error: I think it might be better
to wait a bit longer if we can.

Richard


>
> Though, let's see what others want to say about this.
>
>> Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
>> pointer is clobbered by asm statement.
>> 
>> gcc/
>> 
>>  PR target/97032
>>  * cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
>>  to true if the stack pointer is clobbered by asm statement.
>>  * emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
>>  * config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
>>  if the stack pointer is clobbered by asm statement.
>> 
>> gcc/testsuite/
>> 
>>  PR target/97032
>>  * gcc.target/i386/pr97032.c: New test.
>
>   Jakub


Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]

2020-09-16 Thread Alexander Monakov via Gcc-patches
Can you supply a tar with tree dumps for me to look at please?

Also, if you can check if the problem can be triggered without a
collapsed loop (e.g. try removing collapse(2), remove mentions of
d2) and if so supply dumps from that instead, I'd appreciate that too.

Alexander

On Wed, 16 Sep 2020, Tom de Vries wrote:

> [ cc-ing author omp support for nvptx. ]
> 
> On 9/16/20 12:39 PM, Tobias Burnus wrote:
> > Hi Tom, hi Richard, hello all,
> > 
> > @Richard: does it look okay from the ME side?
> > @Tom: Can you check which IFN_GOMP_SIMT should be
> > excluded with -ftracer?
> > 
> > Pre-remark, I do not know much about SIMT – except that they
> > only appear with nvptx and somehow relate to lanes on the
> > GPU.
> > 
> > In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows,
> > if a basic block with  GOMP_SIMT_VOTE_ANY in it is duplicated,
> > which happens via -ftracer for this testcase, the result is wrong.
> > 
> > The testcase ignores all the loop work but via "lastprivate" takes
> > the upper loop bound (as assigned to the loop indices); instead of
> > the expected 32*32 = 1024, either some number (like 4 or very large
> > or negative) is returned.
> > 
> > While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I
> > have the feeling that at least GOMP_SIMT_LAST_LANE should be
> > not copied - but I might be wrong.
> > 
> > Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from
> > that list – or any of the following added as well?
> > GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT,
> > GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY,
> > GOMP_SIMT_XCHG_IDX
> > 
> > OK for mainline?
> > 
> > Tobias
> > 
> > -
> > Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
> > Germany
> > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> > Alexander Walter
> 


Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-16 Thread Segher Boessenkool
On Wed, Sep 16, 2020 at 10:31:54AM +0200, Richard Biener wrote:
> On Tue, Sep 15, 2020 at 6:18 PM Segher Boessenkool
>  wrote:
> >
> > On Tue, Sep 15, 2020 at 08:51:09AM +0200, Richard Biener wrote:
> > > On Tue, Sep 15, 2020 at 5:56 AM luoxhu  wrote:
> > > > > u[n % 4] = i;
> > > > >
> > > > > I guess.  Is the % 4 mandated by the vec_insert semantics btw?
> >
> > (As an aside -- please use "& 3" instead: that works fine if n is signed
> > as well, but modulo doesn't.  Maybe that is in the patch already, I
> > didn't check, sorry.)
> >
> > > note this is why I asked about the actual CPU instruction - as I read
> > > Seghers mail
> > > the instruction modifies a vector register, not memory.
> >
> > But note that the builtin is not the same as the machine instruction --
> > here there shouldn't be a difference if compiling for a new enough ISA,
> > but the builtin is available on anything with at least AltiVec.
> 
> Well, given we're trying to improve instruction selection that very much
> should depend on the ISA.  Thus if the target says it cannot vec_set
> to a register in a variable position then we don't want to pretend it can.

Of course :-)  We shouldn't look at what the machine insn can do, but
also not at what the source level constructs can; we should just ask the
target code itself.


Segher


Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]

2020-09-16 Thread Tom de Vries
[ cc-ing author omp support for nvptx. ]

On 9/16/20 12:39 PM, Tobias Burnus wrote:
> Hi Tom, hi Richard, hello all,
> 
> @Richard: does it look okay from the ME side?
> @Tom: Can you check which IFN_GOMP_SIMT should be
> excluded with -ftracer?
> 
> Pre-remark, I do not know much about SIMT – except that they
> only appear with nvptx and somehow relate to lanes on the
> GPU.
> 
> In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows,
> if a basic block with  GOMP_SIMT_VOTE_ANY in it is duplicated,
> which happens via -ftracer for this testcase, the result is wrong.
> 
> The testcase ignores all the loop work but via "lastprivate" takes
> the upper loop bound (as assigned to the loop indices); instead of
> the expected 32*32 = 1024, either some number (like 4 or very large
> or negative) is returned.
> 
> While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I
> have the feeling that at least GOMP_SIMT_LAST_LANE should be
> not copied - but I might be wrong.
> 
> Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from
> that list – or any of the following added as well?
> GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT,
> GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY,
> GOMP_SIMT_XCHG_IDX
> 
> OK for mainline?
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
> Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter


  1   2   >