Re: [PATCH 00/14] rs6000: Begin replacing built-in support

2020-02-04 Thread Richard Biener
On Tue, Feb 4, 2020 at 6:40 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Mon, Feb 03, 2020 at 08:26:01PM -0600, Bill Schmidt wrote:
> > The current built-in support in the rs6000 back end requires at least
> > a master's degree in spelunking to comprehend.  It's full of cruft,
> > redundancy, and unused bits of code, and long overdue for a
> > replacement.  This is the first part of my project to do that.
>
> Woohoo :-)

Indeed.

> > My intent is to make adding new built-in functions as simple as adding
> > a few lines to a couple of files, and automatically generating as much
> > of the initialization, overload resolution, and expansion logic as
> > possible.  This patch series establishes the format of the input files
> > and creates a new program (rs6000-genbif) to:
>
> Let's call it rs6000-gen-builtins or similar.  Not as cryptic.

I believe we talked about this a few years ago.  Any reason this is powerpc
specific?  If sufficiently generic most targets would benefit and maybe even
frontends and the middle-end could make use of this.  The generator
program, that is.  (disclaimer: I didn't look into the patches at all)

I always wondered if we can make our C frontend spit out things from
C declarations (with maybe extra #pragmas for some of the more obscure
details) and how to fit that into the bootstrap.

> > Note that none of the code in this patch set affects GCC's operation
> > at all, with the exception of patch #14.  Patch 14 causes the program
> > rs6000-genbif to be built and executed, producing the output files,
> > and linking rs6000-bif.o into the executable.  However, none of the
> > code in rs6000-bif.o is called, so the only effect is to make the gcc
> > executable larger.
>
> If it builds at all ;-)
>
> > I'd like to consider at least patches 1-13 as stage 4 material for the
> > current release.  I'd prefer to also include patch 14 for convenience,
> > but I understand if that's not deemed acceptable.
> >
> > I've attempted to break this up into logical pieces for easy
> > consumption, but some of the patches may still be a bit large.  Please
> > let me know if you'd like me to break any of them up.
>
> I will.  "Large" isn't a problem if it is a lot of the same thing.  If
> it is two or more things, having them in separate patches is easier to
> review; if there is just one case that is different, put it in a separate
> patch if that can be done; otherwise, please point it out in the patch
> commit message.
>
> >   Initial create of rs6000-genbif.c.
>
> Subjects do not end in a dot (but do start with a capital).
>
> >   Add stubs for input files.  These will grow much larger.
>
> The second half of this does not belong in the title, but in the body.
>
>
> Segher


Re: [PATCH][DOCUMENTATION] Document ASLR for Precompiled Headers.

2020-02-04 Thread Richard Biener
On Tue, Feb 4, 2020 at 4:09 PM Martin Liška  wrote:
>
> On 2/4/20 2:59 PM, Jakub Jelinek wrote:
> > On Tue, Feb 04, 2020 at 02:55:31PM +0100, Richard Biener wrote:
> >> On Tue, Feb 4, 2020 at 2:45 PM Martin Liška  wrote:
> >>>
> >>> Hi.
> >>>
> >>> The patch is about enhancement of the documentation, where
> >>> we do not support ASLR for Precompiled Headers.
> >>>
> >>> Ready for trunk?
> >>
> >> I think we support ASLR just fine?
> >
> > We do support ASLR just fine, just the PCH from it will not be bitwise
> > reproduceable.  So the doc patch is misplaced (putting it between sentence
> > talking about options and the list of those options is weird) and should 
> > make
> > it clear that ASLR may result in non-reproduceable PCH files.
>
> You are right. I fixed the placing of the hunk and relaxed the wording.

I think the wording is still too defensive - "non-reproducible" results suggest
the binaries produced are somehow not reproducible (or worse).  But it's
just the PCH file itself that can differ between builds.  So something like

"Address space layout randomization (ASLR) can lead to not binary identical
PCH files.  If you rely on stable PCH file contents disable ASLR when generating
PCH files."

?

>
> Martin
>
> >
> >>> gcc/ChangeLog:
> >>>
> >>> 2020-02-04  Martin Liska  
> >>>
> >>>  PR c++/92717
> >>>  * doc/invoke.texi: Document that one should
> >>>  not combine ASLR and -fpch.
> >>> ---
> >>>gcc/doc/invoke.texi | 2 ++
> >>>1 file changed, 2 insertions(+)
> >>>
> >>>
> >
> >   Jakub
> >
>


Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-02-04 Thread Jakub Jelinek
On Tue, Feb 04, 2020 at 06:04:09PM -0700, Jeff Law wrote:
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn)
> sets[i].rtl = 0;
>   }
>  
> +  /* Similarly for no-op moves.  */
> +  if (n_sets == 1
> +   && GET_CODE (src) == REG

Just nits:
REG_P (src) ?

> +   && src == dest)

Is pointer comparison ok?  I mean, shouldn't we instead do
rtx_equal_p (src, dest), set_noop_p (sets[i].rtl) or noop_move_p (insn)?

> + * gcc.c-torture/compile/pr90275.c: New test

Missing full stop.

> +++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
> @@ -0,0 +1,27 @@
> +a, b, c;

int 

> +
> +long long d;
> +
> +e() {

void

(unless the ommission of those makes it not reproduce anymore, which I
doubt).

> +
> +  char f;
> +
> +  for (;;) {
> +
> +c = a = c ? 5 : 0;
> +
> +if (f) {
> +
> +  b = a;
> +
> +  f = d;
> +
> +}
> +
> +(d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
> +
> +  }
> +
> +}


Jakub



[PATCH coroutines] Build co_await/yield_expr with unknown_type in processing_template_decl phase

2020-02-04 Thread JunMa

Hi
This patch builds co_await/yield_expr with unknown_type when we can not
know the promise type in processing_template_decl phase. it avoid to
confuse compiler when handing type deduction and conversion.

Bootstrap and test on X86_64, is it OK?

Regards
JunMa

gcc/cp
2020-02-05  Jun Ma 

    * coroutines.cc (finish_co_await_expr): Build co_await_expr
    with unknown_type_node.
    (finish_co_yield_expr): Ditto.
    *pt.c (type_dependent_expression_p): Set co_await/yield_expr
    with unknown type as dependent.

gcc/testsuite
2020-02-05  Jun Ma 

    * g++.dg/coroutines/torture/co-await-14-template-traits.C: New 
test.
---
 gcc/cp/coroutines.cc  |  8 +++
 gcc/cp/pt.c   |  5 
 .../torture/co-await-14-template-traits.C | 24 +++
 3 files changed, 32 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/g++.dg/coroutines/torture/co-await-14-template-traits.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 7525d7c035a..e380ee24c5f 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -834,9 +834,8 @@ finish_co_await_expr (location_t kw, tree expr)
   /* If we don't know the promise type, we can't proceed.  */
   tree functype = TREE_TYPE (current_function_decl);
   if (dependent_type_p (functype) || type_dependent_expression_p (expr))
-   return build5_loc (kw, CO_AWAIT_EXPR, TREE_TYPE (expr), expr, NULL_TREE,
-  NULL_TREE, NULL_TREE, integer_zero_node);
-}
+   return build5_loc (kw, CO_AWAIT_EXPR, unknown_type_node, expr,
+  NULL_TREE, NULL_TREE, NULL_TREE, integer_zero_node);
 
   /* We must be able to look up the "await_transform" method in the scope of
  the promise type, and obtain its return type.  */
@@ -912,9 +911,8 @@ finish_co_yield_expr (location_t kw, tree expr)
   tree functype = TREE_TYPE (current_function_decl);
   /* If we don't know the promise type, we can't proceed.  */
   if (dependent_type_p (functype) || type_dependent_expression_p (expr))
-   return build2_loc (kw, CO_YIELD_EXPR, TREE_TYPE (expr), expr,
+   return build2_loc (kw, CO_YIELD_EXPR, unknown_type_node, expr,
   NULL_TREE);
-}
 
   if (!coro_promise_type_found_p (current_function_decl, kw))
 /* We must be able to look up the "yield_value" method in the scope of
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 40ff3c3a089..cfc3393991e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -26743,6 +26743,11 @@ type_dependent_expression_p (tree expression)
   if (TREE_CODE (expression) == SCOPE_REF)
return false;
 
+  /* CO_AWAIT/YIELD_EXPR with unknown type is always dependent.  */
+  if (TREE_CODE (expression) == CO_AWAIT_EXPR
+ || TREE_CODE (expression) == CO_YIELD_EXPR)
+   return true;
+
   if (BASELINK_P (expression))
{
  if (BASELINK_OPTYPE (expression)
diff --git 
a/gcc/testsuite/g++.dg/coroutines/torture/co-await-14-template-traits.C 
b/gcc/testsuite/g++.dg/coroutines/torture/co-await-14-template-traits.C
new file mode 100644
index 000..4e670b1c308
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/torture/co-await-14-template-traits.C
@@ -0,0 +1,24 @@
+//  { dg-do compile }
+//  Test we create co_await_expr with dependent type rather than type of 
awaitable class
+
+#include "../coro.h"
+#include "../coro1-ret-int-yield-int.h"
+#include 
+
+struct TestAwaiter {
+int recent_test;
+TestAwaiter(int test) : recent_test{test} {}
+bool await_ready() { return true; }
+void await_suspend(coro::coroutine_handle<>) {}
+int await_resume() { return recent_test;}
+void return_value(int x) { recent_test = x;}
+};
+
+template 
+coro1 test_temparg (std::chrono::duration dur)
+{
+   auto sum = co_await TestAwaiter(1);
+   if (!sum)
+dur.count();
+   co_return 0;
+}
-- 
2.19.1.3.ge56e4f7



Re: libgo patch committed: Update to Go1.14beta1

2020-02-04 Thread Andrew Pinski
Something like attached.
I will clean it up next week and submit it then.
It should also fix some arm64be related issues too.

Thanks,
Andrew Pinski

On Mon, Feb 3, 2020 at 6:17 PM Ian Lance Taylor  wrote:
>
> On Sun, Feb 2, 2020 at 2:27 AM Andreas Schwab  wrote:
> >
> > I'm getting these errors on aarch64 with -mabi=ilp32:
> >
> > ../../../../libgo/go/runtime/mpagealloc.go:226:38: error: shift count 
> > overflow
> >   226 |  chunks [1 << pallocChunksL1Bits]*[1 << 
> > pallocChunksL2Bits]pallocData
> >   |  ^
> > ../../../../libgo/go/runtime/mgcscavenge.go:487:15: error: shift count 
> > overflow
> >   487 |l2 := (*[1 << 
> > pallocChunksL2Bits]pallocData)(atomic.Loadp(unsafe.Pointer([i.l1()])))
> >   |   ^
> > ../../../../libgo/go/runtime/mpagealloc.go:138:22: error: shift count 
> > overflow
> >   138 |   return uint(i) & (1< >   |  ^
> > ../../../../libgo/go/runtime/mpagealloc.go:129:21: error: integer constant 
> > overflow
> >   129 |   return uint(i) >> pallocChunksL1Shift
> >   | ^
> > ../../../../libgo/go/runtime/mpagealloc_64bit.go:34:2: error: integer 
> > constant overflow
> >34 |  summaryL0Bits,
> >   |  ^
>
> I'm not sure that gccgo ever fully worked with aarch64 -mabi=ilp32.
> In Go I think that will have to be represented with a new GOARCH
> value, arm64p32.
>
> Ian
From 14de07bd862051df38160da375fd286ce956785f Mon Sep 17 00:00:00 2001
From: Andrew Pinski 
Date: Wed, 5 Feb 2020 04:36:13 +
Subject: [PATCH] Add ilp32 ARM64 support to gccgo.

Change-Id: Ide52be45dd9fd5d2a5dfc7d138fc56d963d06632
Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/go.test/go-test.exp |  9 ++-
 libgo/configure   | 27 ++-
 libgo/configure.ac| 20 +-
 libgo/go/cmd/cgo/main.go  |  6 +
 libgo/go/cmd/go/go_test.go|  1 +
 libgo/go/cmd/go/internal/imports/build.go |  2 ++
 libgo/go/cmd/internal/sys/arch.go | 10 +++
 libgo/go/cmd/internal/sys/supported.go| 10 ---
 libgo/go/crypto/aes/aes_gcm.go|  2 +-
 libgo/go/crypto/aes/cipher_asm.go |  2 +-
 libgo/go/crypto/aes/cipher_generic.go |  2 +-
 libgo/go/golang.org/x/sys/cpu/byteorder.go|  2 ++
 ...cpu_linux_arm64.go => cpu_linux_arm64x.go} |  2 ++
 .../golang.org/x/sys/cpu/cpu_linux_other.go   |  2 +-
 .../cpu/{cpu_arm64.go => cpu_arm64x.go}   |  2 ++
 libgo/go/internal/cpu/cpu_no_init.go  |  3 +++
 .../syscall/unix/getrandom_linux_generic.go   |  2 +-
 libgo/go/runtime/cputicks.go  |  3 +++
 libgo/go/runtime/hash32.go|  2 +-
 libgo/go/runtime/lfstack_32bit.go |  2 +-
 libgo/go/runtime/mpagealloc_32bit.go  |  2 +-
 .../{os_linux_arm64.go => os_linux_arm64x.go} |  2 +-
 libgo/go/runtime/os_linux_noauxv.go   |  2 +-
 libgo/go/syscall/endian_big.go|  2 +-
 libgo/go/syscall/endian_little.go |  2 +-
 libgo/goarch.sh   |  7 -
 libgo/match.sh|  4 +--
 libgo/testsuite/gotest|  4 +--
 28 files changed, 103 insertions(+), 33 deletions(-)
 rename libgo/go/golang.org/x/sys/cpu/{cpu_linux_arm64.go => 
cpu_linux_arm64x.go} (97%)
 rename libgo/go/internal/cpu/{cpu_arm64.go => cpu_arm64x.go} (98%)
 rename libgo/go/runtime/{os_linux_arm64.go => os_linux_arm64x.go} (94%)

diff --git a/gcc/testsuite/go.test/go-test.exp 
b/gcc/testsuite/go.test/go-test.exp
index 51f9b381d67..7afcba14b64 100644
--- a/gcc/testsuite/go.test/go-test.exp
+++ b/gcc/testsuite/go.test/go-test.exp
@@ -188,7 +188,14 @@ proc go-set-goarch { } {
 
 switch -glob $target_triplet {
"aarch64*-*-*" {
-   set goarch "arm64"
+   if [check_effective_target_lp64] {
+   set goarch "arm64"
+   } else {
+   set goarch "amd64p32"
+   }
+   if [check_effective_target_aarch64_big_endian] {
+   append goarch "be"
+   }
}
"alpha*-*-*" {
set goarch "alpha"
diff --git a/libgo/configure b/libgo/configure
index 2f787392abd..8eca900889f 100755
--- a/libgo/configure
+++ b/libgo/configure
@@ -14070,7 +14070,7 @@ esac
 #   - libgo/go/syscall/endian_XX.go
 #   - possibly others
 # - possibly update files in libgo/go/internal/syscall/unix
-ALLGOARCH="386 alpha amd64 amd64p32 arm armbe arm64 arm64be ia64 m68k mips 
mipsle mips64 mips64le mips64p32 mips64p32le nios2 ppc ppc64 ppc64le riscv 
riscv64 s390 s390x sh shbe sparc sparc64 wasm"
+ALLGOARCH="386 alpha amd64 amd64p32 arm armbe arm64 arm64be arm64p32 
arm64p32be ia64 m68k mips mipsle mips64 mips64le mips64p32 mips64p32le nios2 
ppc ppc64 ppc64le riscv riscv64 s390 s390x sh shbe sparc sparc64 wasm"
 
 # All known GOARCH family values.
 ALLGOARCHFAMILY="I386 

[PATCH Coroutines]Pickup more CO_AWAIT_EXPR expanding cases

2020-02-04 Thread bin.cheng
Hi,
This simple patch actually is a missing part of previous one at
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01451.html
It picks up more CO_AWAIT_EXPR expanding cases.  Test is also added.

Bootstrap and test on x86_64. Is it OK?

Thanks,
bin

gcc/cp
2020-02-05  Bin Cheng  

* coroutines.cc (co_await_expander): Handle more CO_AWAIT_EXPR
expanding cases.

gcc/testsuite
2020-02-05  Bin Cheng  

* g++.dg/coroutines/co-await-syntax-10.C: New test.

0001-Pickup-more-CO_AWAIT_EXPR-expanding-cases.patch
Description: Binary data


[PATCH] document that alias and target must have the same type

2020-02-04 Thread Martin Sebor

GCC diagnoses declarations of function aliases whose type doesn't
match that of the target (ditto for attribute weakref).  It doesn't
yet diagnose such incompatbilities for variable aliases but that's
just an oversight that I will try to remember to correct in GCC 11.
The attached patch updates the manual to make it clear that
aliases must have the same type as their targets, or the behavior
is undefined (and may be diagnosed).

Martin
gcc/ChangeLog:

	* doc/extend.texi (attribute alias, weak, weakref): Mention that
	the target of an alias must have the same type as the alias itself.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ec99c38a607..6f602140bfa 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2558,7 +2558,9 @@ __attribute__ ((access (write_only, 1, 2), access (read_write, 3))) int fgets (c
 @item alias ("@var{target}")
 @cindex @code{alias} function attribute
 The @code{alias} attribute causes the declaration to be emitted as an
-alias for another symbol, which must be specified.  For instance,
+alias for another symbol, which must have been previously declared with
+the same type.  Declaring an alias with a different type than the target
+is undefined and may be diagnosed.  For instance,
 
 @smallexample
 void __f () @{ /* @r{Do something.} */; @}
@@ -3922,9 +3924,10 @@ results in warning on line 5.
 The @code{weak} attribute causes the declaration to be emitted as a weak
 symbol rather than a global.  This is primarily useful in defining
 library functions that can be overridden in user code, though it can
-also be used with non-function declarations.  Weak symbols are supported
-for ELF targets, and also for a.out targets when using the GNU assembler
-and linker.
+also be used with non-function declarations.  The overriding symbol must
+have the same type as the declaration of the weak symbol.  Weak symbols
+are supported for ELF targets, and also for a.out targets when using
+the GNU assembler and linker.
 
 @item weakref
 @itemx weakref ("@var{target}")
@@ -3932,7 +3935,8 @@ and linker.
 The @code{weakref} attribute marks a declaration as a weak reference.
 Without arguments, it should be accompanied by an @code{alias} attribute
 naming the target symbol.  Optionally, the @var{target} may be given as
-an argument to @code{weakref} itself.  In either case, @code{weakref}
+an argument to @code{weakref} itself.  The the @var{target} must have
+the same type as the declaration.  In either case, @code{weakref}
 implicitly marks the declaration as @code{weak}.  Without a
 @var{target}, given as an argument to @code{weakref} or to @code{alias},
 @code{weakref} is equivalent to @code{weak}.
@@ -3951,7 +3955,9 @@ definition to be given for the target symbol.  If the target symbol is
 only referenced through weak references, then it becomes a @code{weak}
 undefined symbol.  If it is directly referenced, however, then such
 strong references prevail, and a definition is required for the
-symbol, not necessarily in the same translation unit.
+symbol, not necessarily in the same translation unit.  The definition
+must have the same type as the alias, otherwise the behavior is
+undefined.
 
 The effect is equivalent to moving all references to the alias to a
 separate translation unit, renaming the alias to the aliased symbol,


[RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-02-04 Thread Jeff Law

Richard & Segher, if y'all could check my analysis here, it'd be
appreciated.

pr90275 is a P2 regression that is only triggering on ARM.  David's
testcase in c#1 is the best for this problem as it doesn't require
magic flags like -fno-dce to trigger.

The block in question:

> (code_label 89 88 90 24 15 (nil) [0 uses])
> (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
> (insn 97 90 98 24 (parallel [
> (set (reg:CC 100 cc)
> (compare:CC (reg:SI 131 [ d_lsm.21 ])
> (const_int 0 [0])))
> (set (reg:SI 135 [ d_lsm.21 ])
> (reg:SI 131 [ d_lsm.21 ]))
> ]) "pr90275.c":21:45 248 {*movsi_compare0}
>  (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])
> (nil)))
> (insn 98 97 151 24 (set (reg:SI 136 [+4 ])
> (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>  (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])
> (expr_list:REG_DEAD (reg:CC 100 cc)
> (nil
> (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])
> (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>  (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])
> (nil)))
> (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])
> (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>  (expr_list:REG_DEAD (reg:SI 136 [+4 ])
> (nil)))
> 
insns 97 and 151 are the most important.

We process insn 97 which creates an equivalency between r135 and r131. 
This is expressed by putting both on on the "same_value" chain
(table_elt->{next,prev}_same_value).

When we put the REGs on the chain we'll set REG_QTY to a positive
number which indicates their values are valid.

We continue processing insns forward and run into insn 151 which is a
self-copy.

First CSE will invalidate r131 (because its set).  Invalidation is
accomplished by setting REG_QTY for r131 to a negative value.  It does
not remove r131 from the same value chains.

Then CSE will call insert_regs for r131.  The qty is not valid, so we
get into this code:

>  if (modified || ! qty_valid)
> {
>   if (classp)
> for (classp = classp->first_same_value;
>  classp != 0;
>  classp = classp->next_same_value)
>   if (REG_P (classp->exp)
>   && GET_MODE (classp->exp) == GET_MODE (x))
> {
>   unsigned c_regno = REGNO (classp->exp);
> 
>   gcc_assert (REGNO_QTY_VALID_P (c_regno));
> [ ... ]

So we walk the chain of same values for r131.  WHen walking we run into
r131 itself.  Since r131 has been invalidated  we trip the assert.


The fix is pretty simple.  We just arrange to stop processing insns
that are nop reg->reg copies much like we already do for mem->mem
copies and (set (pc) (pc)).

This has bootstrapped and regression tested on x86_64.  I've also
verified it fixes the testcase in c#1 of pr90275, the test in pr93125
and pr92388.  Interestingly enough I couldn't trigger the original
testcase in 90275, but I'm confident this is ultimately all the same
problem.

OK for the trunk?

Thanks,
Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d6b5ded32a4..90d9f9d92d3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-02-04  Jeff Law 
+
+	PR rtl-optimization/90275
+	* cse.c (cse_insn): Stop processing reg->reg nop moves early.
+
 2020-02-04  Richard Biener  
 
 	PR tree-optimization/93538
diff --git a/gcc/cse.c b/gcc/cse.c
index 79ee0ce80e3..6e18bdae85f 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn)
 	  sets[i].rtl = 0;
 	}
 
+  /* Similarly for no-op moves.  */
+  if (n_sets == 1
+	  && GET_CODE (src) == REG
+	  && src == dest)
+	{
+	  cse_cfg_altered |= delete_insn_and_edges (insn);
+	  /* No more processing for this set.  */
+	  sets[i].rtl = 0;
+	}
+
   /* If this SET is now setting PC to a label, we know it used to
 	 be a conditional or computed branch.  */
   else if (dest == pc_rtx && GET_CODE (src) == LABEL_REF
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d2dc6648bc4..7be52bd6d2a 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-02-04  Jeff Law  
+
+	PR rtl-optimization/90275
+	* gcc.c-torture/compile/pr90275.c: New test
+
 2020-02-04  David Malcolm  
 
 	* gcc.dg/analyzer/data-model-1.c (struct coord): Convert fields
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr90275.c b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
new file mode 100644
index 000..83e0df77226
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
@@ -0,0 +1,27 @@
+a, b, c;
+
+long long d;
+
+e() {
+
+  char f;
+
+  for (;;) {
+
+c = a = c ? 5 : 0;
+
+if (f) {
+
+  b = a;
+
+  f = d;
+
+}
+
+(d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
+
+  }
+
+}
+
+


[PATCH][AARCH64] Fix for PR86901

2020-02-04 Thread Modi Mo via gcc-patches
Hi,

Adding support for extv and extzv on aarch64 as described in 
PR86901. I also changed extract_bit_field_using_extv to use 
gen_lowpart_if_possible instead of gen_lowpart directly. Using gen_lowpart 
directly will fail with an ICE in building libgcc when the compiler fails to 
successfully do so whereas gen_lowpart_if_possible will bail out of matching 
this pattern gracefully.

I'm looking through past mails and https://gcc.gnu.org/contribute.html which 
details testing bootstrap. I'm building a cross-compiler (x64_aarch64) and the 
instructions don't address that scenario. The GCC cross-build is green and 
there's no regressions on the C/C++ tests (The go/fortran etc. look like they 
need additional infrastructure built on my side to work). Is there a workflow 
for cross-builds or should I aim to get an aarch64 native machine for full 
validation?


ChangeLog:
2020-02-03  Di Mo  

gcc/
* config/aarch64/aarch64.md: Add define_expand for extv and 
extzv.
* expmed.c (extract_bit_field_using_extv): Change gen_lowpart to 
gen_lowpart_if_possible to avoid compiler assert building libgcc.
testsuite/
* gcc.target/aarch64/pr86901.c: Add new test.


Best,
Modi


pr86901.patch
Description: pr86901.patch


Re: [PATCH 01/14] Initial create of rs6000-genbif.c.

2020-02-04 Thread Segher Boessenkool
On Tue, Feb 04, 2020 at 04:44:04PM -0600, Bill Schmidt wrote:
> >"ldv" certainly is shorter and nicer in principle, but it is a bit
> >cryptic.  As I said, it's probably not too hard to get used to it; and
> >maybe a better name will present itself?
> Maybe ldvec and stvec would serve without introducing specific builtin 
> confusion.

Let's go with that, if nothing better shows up.

> >That's not what I meant...  Can you say
> >   [TARGET_ALTIVEC && TARGET_64BIT]
> >here?  Or even just
> >   [!TARGET_ALTIVEC]
> >or
> >   [1]
> >for always, or
> >   [0]
> >for never ("commented out").
> Ah!  Sorry for misunderstanding.  Right now just an identifier is 
> allowed, but we could certainly grab the whole string between the [] and 
> drop it in with no concerns.  Hopefully we both remember when we get to 
> the patch that reads the stanzas...

:-)


Segher


[COMMITTED] c++: Fix error-recovery with concepts.

2020-02-04 Thread Jason Merrill
Here, push_tinst_level refused to push into the scope of Foo::Foo
because it was triggered from the ill-formed function fun.  But we didn't
check the return value and tried to pop the un-pushed level.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/93551
* constraint.cc (satisfy_declaration_constraints): Check return
value of push_tinst_level.
---
 gcc/cp/constraint.cc   |  3 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-err1.C | 33 ++
 2 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-err1.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index cda644eabe2..58044cd0f9d 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2692,7 +2692,8 @@ satisfy_declaration_constraints (tree t, subst_info info)
   tree result = boolean_true_node;
   if (norm)
 {
-  push_tinst_level (t);
+  if (!push_tinst_level (t))
+   return result;
   push_access_scope (t);
   result = satisfy_associated_constraints (norm, args, info);
   pop_access_scope (t);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-err1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-err1.C
new file mode 100644
index 000..e482ba05b46
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-err1.C
@@ -0,0 +1,33 @@
+// PR c++/93551
+// { dg-do compile { target concepts } }
+
+namespace std {
+  template
+  struct integral_constant
+  {
+static constexpr _Tp  value = __v;
+typedef _Tp   value_type;
+typedef integral_constant<_Tp, __v>   type;
+constexpr operator value_type() const noexcept { return value; }
+  };
+  template
+  struct is_base_of
+: public integral_constant
+  { };
+  template 
+  inline constexpr bool is_base_of_v = is_base_of<_Base, _Derived>::value;
+}
+class Bar { };
+struct Foo {
+  template  requires std::is_base_of_v
+  Foo(P const&);
+};
+template 
+Foo fun(P const& arg) {
+  (bool)arg;   // { dg-error "" }
+  return Foo {arg};
+}
+int main() {
+  fun(Bar{});
+  return 0;
+}

base-commit: 0712ea6313bc44f9ae8feb235c1b02c92cdd0527
-- 
2.18.1



Re: [PATCH 01/14] Initial create of rs6000-genbif.c.

2020-02-04 Thread Bill Schmidt

On 2/4/20 4:36 PM, Segher Boessenkool wrote:

On Tue, Feb 04, 2020 at 03:10:32PM -0600, Bill Schmidt wrote:

I really don't think using the new acronym "bif" helps; built-in
functions already are often called "builtins" (or "intrinsics", which is
problematic itself).

Until we manage to replace the old methods, we already have
rs6000-builtin.def, so I am a bit constrained in my choices. Given that
restriction, what name would you prefer?  I can use rs6000-builtins.def
(the plural) if you like.

As we discussed (offline), maybe rs6000-builtin-new.def is best (and at
the end of this conversion, just move it).

+1



+ ldv Needs special handling for vec_ld semantics
+ stv Needs special handling for vec_st semantics

Call those "vec_ld" and "vec_st", then?  Or should I get used to it, the
names aren't obvious, but cut-and-paste always is ;-)

Hm.  Well, vec_ld is a specific built-in, but this applies to a few more
than just that one.  But sure, if you want.

"ldv" certainly is shorter and nicer in principle, but it is a bit
cryptic.  As I said, it's probably not too hard to get used to it; and
maybe a better name will present itself?
Maybe ldvec and stvec would serve without introducing specific builtin 
confusion.



+[TARGET_ALTIVEC]

Can this be a C expression?  Most gen* programs just copy similar things
to the generated C code, which can be interesting to debug, but works
perfectly well otherwise.

I rather prefer the way it is.  I do generate C code from this in the
subsequent patches.  But I like table-driven code to use things that
look like tables for input. :-)

That's not what I meant...  Can you say
   [TARGET_ALTIVEC && TARGET_64BIT]
here?  Or even just
   [!TARGET_ALTIVEC]
or
   [1]
for always, or
   [0]
for never ("commented out").
Ah!  Sorry for misunderstanding.  Right now just an identifier is 
allowed, but we could certainly grab the whole string between the [] and 
drop it in with no concerns.  Hopefully we both remember when we get to 
the patch that reads the stanzas...



+  Blank lines may be used as desired in these files.

Between stanzas and stuff only?  There are places where newlines are
significant and not just whitespace, right?

I don't believe so, although there may be places where I forgot to allow
a line to be advanced -- that would be a bug, though, so let me know if
you see any.  Blank lines don't have any inherent meaning in the input
files.

Not blank lines, I'm asking about newlines :-)  But those are not allowed
to be inserted just anywhere, a line has to be one line, iiuc?


Yes.  Additional newlines can follow a newline, but the individual lines 
must contain everything that's expected in them.


Bill




Segher


Re: [PATCH 01/14] Initial create of rs6000-genbif.c.

2020-02-04 Thread Segher Boessenkool
On Tue, Feb 04, 2020 at 03:10:32PM -0600, Bill Schmidt wrote:
> >I really don't think using the new acronym "bif" helps; built-in
> >functions already are often called "builtins" (or "intrinsics", which is
> >problematic itself).
> 
> Until we manage to replace the old methods, we already have 
> rs6000-builtin.def, so I am a bit constrained in my choices. Given that 
> restriction, what name would you prefer?  I can use rs6000-builtins.def 
> (the plural) if you like.

As we discussed (offline), maybe rs6000-builtin-new.def is best (and at
the end of this conversion, just move it).

> >>+ ldv Needs special handling for vec_ld semantics
> >>+ stv Needs special handling for vec_st semantics
> >Call those "vec_ld" and "vec_st", then?  Or should I get used to it, the
> >names aren't obvious, but cut-and-paste always is ;-)
> 
> Hm.  Well, vec_ld is a specific built-in, but this applies to a few more 
> than just that one.  But sure, if you want.

"ldv" certainly is shorter and nicer in principle, but it is a bit
cryptic.  As I said, it's probably not too hard to get used to it; and
maybe a better name will present itself?

> >>+[TARGET_ALTIVEC]
> >Can this be a C expression?  Most gen* programs just copy similar things
> >to the generated C code, which can be interesting to debug, but works
> >perfectly well otherwise.
> 
> I rather prefer the way it is.  I do generate C code from this in the 
> subsequent patches.  But I like table-driven code to use things that 
> look like tables for input. :-)

That's not what I meant...  Can you say
  [TARGET_ALTIVEC && TARGET_64BIT]
here?  Or even just
  [!TARGET_ALTIVEC]
or
  [1]
for always, or
  [0]
for never ("commented out").

> >>+  Blank lines may be used as desired in these files.
> >Between stanzas and stuff only?  There are places where newlines are
> >significant and not just whitespace, right?
> 
> I don't believe so, although there may be places where I forgot to allow 
> a line to be advanced -- that would be a bug, though, so let me know if 
> you see any.  Blank lines don't have any inherent meaning in the input 
> files.

Not blank lines, I'm asking about newlines :-)  But those are not allowed
to be inserted just anywhere, a line has to be one line, iiuc?


Segher


[COMMITTED] c++: Fix constexpr vs. omitted aggregate init.

2020-02-04 Thread Jason Merrill
Value-initialization is importantly different from {}-initialization for
this testcase, where the former calls the deleted S constructor and the
latter initializes S happily.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/90951
* constexpr.c (cxx_eval_array_reference): {}-initialize missing
elements instead of value-initializing them.
---
 gcc/cp/constexpr.c | 12 ++--
 gcc/testsuite/g++.dg/cpp0x/constexpr-array24.C | 10 ++
 2 files changed, 20 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array24.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index c35ec5acc97..8a02c6e0713 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3324,8 +3324,16 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
 }
 
   /* If it's within the array bounds but doesn't have an explicit
- initializer, it's value-initialized.  */
-  tree val = build_value_init (elem_type, tf_warning_or_error);
+ initializer, it's initialized from {}.  But use build_value_init
+ directly for non-aggregates to avoid creating a garbage CONSTRUCTOR.  */
+  tree val;
+  if (CP_AGGREGATE_TYPE_P (elem_type))
+{
+  tree empty_ctor = build_constructor (init_list_type_node, NULL);
+  val = digest_init (elem_type, empty_ctor, tf_warning_or_error);
+}
+  else
+val = build_value_init (elem_type, tf_warning_or_error);
   return cxx_eval_constant_expression (ctx, val, lval, non_constant_p,
   overflow_p);
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-array24.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-array24.C
new file mode 100644
index 000..538969830ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-array24.C
@@ -0,0 +1,10 @@
+// PR c++/90951
+// { dg-do compile { target c++11 } }
+
+#define assert(expr) static_assert (expr, #expr)
+
+struct S { const char a[2]; };
+
+constexpr struct S a[1][1][1] = { };
+
+assert ('\0' == *a[0][0][0].a);

base-commit: a1c9c9ff06ab15e697d5bac6ea6e5da2df840cf5
-- 
2.18.1



Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-04 Thread Martin Sebor

On 2/4/20 2:31 PM, Jeff Law wrote:

On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:

On 2/4/20 12:15 PM, Richard Biener wrote:

On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  wrote:

On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:

On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:

PR 93519 reports a false positive -Wrestrict issued for an inlined

call

to strcpy that carefully guards against self-copying.  This is

caused

by the caller's arguments substituted into the call during inlining

and

before dead code elimination.

The attached patch avoids this by removing -Wrestrict from the

folder

and deferring folding perfectly overlapping (and so undefined)

calls

to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
perfectly overlapping calls to memcpy are still folded early.


Why do we bother to warn at all for this case?  Just DWIM here.

Warnings like

this can be emitted from the analyzer?

They potentially can, but the analyzer is and will almost always
certainly be considerably slower.  I would not expect it to be used
nearly as much as the core compiler.

WHether or not a particular warning makes sense in the core compiler or
analyzer would seem to me to depend on whether or not we can reasonably
issue warnings without interprocedural analysis.  double-free
realistically requires interprocedural analysis to be effective.  I'm
not sure Wrestrict really does.



That is, I suggest to simply remove the bogus warning code from

folding

(and _not_ fail the folding).

I haven't looked at the patch, but if we can get the warning out of the
folder that's certainly preferable.  And we could investigate deferring
self-copy removal.


I think the issue is as usual, warning for code we'll later remove as dead. 
Warning at folding is almost always premature.


In this instance the code is reachable (or isn't obviously unreachable).
GCC doesn't remove it, but provides benign (and reasonable) semantics
for it(*).  To me, that's one aspect of quality.  Letting the user know
that the code is buggy is another.  I view that as at least as important
as folding the ill-effects away because it makes it possible to fix
the problem so the code works correctly even with compilers that don't
provide these benign semantics.

If you look at the guts of what happens at the point where we issue the
warning from within gimple_fold_builtin_strcpy we have:


DCH_to_char (char * in, char * out, int collid)
{
   int type;
   char * D.2148;
   char * dest;
   char * num;
   long unsigned int _4;
   char * _5;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   2
;;succ:   5

;;   basic block 5, loop depth 0
;;pred:   4
;;succ:   6

;;   basic block 6, loop depth 0
;;pred:   5
   if (0 != 0)
 goto ; [53.47%]
   else
 goto ; [46.53%]
;;succ:   7
;;8

;;   basic block 7, loop depth 0
;;pred:   6
   strcpy (out_1(D), out_1(D));
;;succ:   8

;;   basic block 8, loop depth 0
;;pred:   6
;;7
   _4 = __builtin_strlen (out_1(D));
   _5 = out_1(D) + _4;
   __builtin_memcpy (_5, "foo", 4);
;;succ:   3

;;   basic block 3, loop depth 0
;;pred:   8
   return;
;;succ:   EXIT

}



Which shows the code is obviously unreachable in the case we're warning
about.  You can't see this in the dumps because it's exposed by
inlining, then cleaned up before writing the dump file.


In the specific case of the bug the code is of course eliminated
because it's guarded by the if (s != d).  I was referring to
the general (unguarded) case of:

  char *s = "", *p;

  int main (void)
  {
p = strcpy (s, s);
puts (p);
  }

where GCC folds the assignment 'p = strcpy(s, s);' to effectively
p = s;  That's perfectly reasonable but it could equally as well
leave the call alone, as it does when s is null, for instance.

I think folding it away is not only reasonable but preferable to
making the invalid call, but it's done only rarely.  Most of
the time GCC does emit the undefined access (it does that with
calls to library functions as well as with direct stores and
reads).  (I am hoping we can change that in the future so that
these kinds of problems are handled with some consistency.)



ISTM this would be a case we could handle with the __builtin_warning
stuff.

I think the question is do we want to do anything about it this cycle?
  


If so, I think Martin's approach is quite reasonable.  It disables
folding away the self-copies from gimple-fold and moves the warning
into the expander.  So if there's such a call in the IL at expansion
time we get a warning (-O0).

I'd hazard a guess that the diagnostic was added to the strlen pass to
capture the missed warning when we're optimizing and the self-copy has
survived until that point. There's a couple issues that raises though.

First, it's insufficient.  DSE (for 

Re: [PATCH 1/3] libstdc++: Apply the move_iterator changes described in P1207R4

2020-02-04 Thread Jonathan Wakely

On 04/02/20 16:23 -0500, Patrick Palka wrote:

On Tue, 4 Feb 2020, Jonathan Wakely wrote:


On 03/02/20 21:07 -0500, Patrick Palka wrote:
> These changes are needed for some of the tests in the constrained algorithm
> patch, because they use move_iterator with an uncopyable output_iterator.
> The
> other changes described in the paper are already applied, it seems.
>
> libstdc++-v3/ChangeLog:
>
>* include/bits/stl_iterator.h (move_iterator::move_iterator): Move the
>iterator when initializing _M_current.
>(move_iterator::base): Split into two overloads differing in
>ref-qualifiers as in P1207R4.
> ---
> libstdc++-v3/include/bits/stl_iterator.h | 11 +--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/stl_iterator.h
> b/libstdc++-v3/include/bits/stl_iterator.h
> index 784d200d22f..1a288a5c785 100644
> --- a/libstdc++-v3/include/bits/stl_iterator.h
> +++ b/libstdc++-v3/include/bits/stl_iterator.h
> @@ -1166,7 +1166,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>   explicit _GLIBCXX17_CONSTEXPR
>   move_iterator(iterator_type __i)
> -  : _M_current(__i) { }
> +  : _M_current(std::move(__i)) { }
>
>   template
>_GLIBCXX17_CONSTEXPR
> @@ -1174,9 +1174,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>: _M_current(__i.base()) { }
>
>   _GLIBCXX17_CONSTEXPR iterator_type
> -  base() const
> +  base() const &
> +#if __cplusplus > 201703L && __cpp_lib_concepts
> +  requires copy_constructible
> +#endif
>   { return _M_current; }
>
> +  _GLIBCXX17_CONSTEXPR iterator_type
> +  base() &&
> +  { return std::move(_M_current); }
> +

I think this change has to be restricted to C++20 and later, otherwise
a move_iterator in C++17 can change state so that its _M_current
becomes moved-from, which should not be possible before C++20.

So something like:

#if __cplusplus <= 201703L
  _GLIBCXX17_CONSTEXPR iterator_type
  base() const
  { return _M_current; }
#else
  constexpr iterator_type
  base() const &
#if __cpp_lib_concepts
requires copy_constructible
#endif
  { return _M_current; }

  constexpr iterator_type
  base() &&
  { return std::move(_M_current); }
#endif




Thanks for the review, here is the updated patch.


Thanks, this is OK for master (it's safe enough to change now, and it's
needed for some C++20-only changes that will also be safe to do now as
they don't touch exiting code).



libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (move_iterator::move_iterator): Move __i
when initializing _M_current.
(move_iterator::base): Split into two overloads differing in
ref-qualifiers as in P1207R4 for C++20.
---
libstdc++-v3/include/bits/stl_iterator.h | 15 ++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 784d200d22f..c200f7a9d14 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1166,16 +1166,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  explicit _GLIBCXX17_CONSTEXPR
  move_iterator(iterator_type __i)
-  : _M_current(__i) { }
+  : _M_current(std::move(__i)) { }

  template
_GLIBCXX17_CONSTEXPR
move_iterator(const move_iterator<_Iter>& __i)
: _M_current(__i.base()) { }

+#if __cplusplus <= 201703L
  _GLIBCXX17_CONSTEXPR iterator_type
  base() const
  { return _M_current; }
+#else
+  constexpr iterator_type
+  base() const &
+#if __cpp_lib_concepts
+   requires copy_constructible
+#endif
+  { return _M_current; }
+
+  constexpr iterator_type
+  base() &&
+  { return std::move(_M_current); }
+#endif

  _GLIBCXX17_CONSTEXPR reference
  operator*() const
--
2.25.0.114.g5b0ca878e0





Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-04 Thread Jeff Law
On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> On 2/4/20 12:15 PM, Richard Biener wrote:
> > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  wrote:
> > > On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > > > On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:
> > > > > PR 93519 reports a false positive -Wrestrict issued for an inlined
> > > call
> > > > > to strcpy that carefully guards against self-copying.  This is
> > > caused
> > > > > by the caller's arguments substituted into the call during inlining
> > > and
> > > > > before dead code elimination.
> > > > > 
> > > > > The attached patch avoids this by removing -Wrestrict from the
> > > folder
> > > > > and deferring folding perfectly overlapping (and so undefined)
> > > calls
> > > > > to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
> > > > > perfectly overlapping calls to memcpy are still folded early.
> > > > 
> > > > Why do we bother to warn at all for this case?  Just DWIM here.
> > > Warnings like
> > > > this can be emitted from the analyzer?
> > > They potentially can, but the analyzer is and will almost always
> > > certainly be considerably slower.  I would not expect it to be used
> > > nearly as much as the core compiler.
> > > 
> > > WHether or not a particular warning makes sense in the core compiler or
> > > analyzer would seem to me to depend on whether or not we can reasonably
> > > issue warnings without interprocedural analysis.  double-free
> > > realistically requires interprocedural analysis to be effective.  I'm
> > > not sure Wrestrict really does.
> > > 
> > > 
> > > > That is, I suggest to simply remove the bogus warning code from
> > > folding
> > > > (and _not_ fail the folding).
> > > I haven't looked at the patch, but if we can get the warning out of the
> > > folder that's certainly preferable.  And we could investigate deferring
> > > self-copy removal.
> > 
> > I think the issue is as usual, warning for code we'll later remove as dead. 
> > Warning at folding is almost always premature.
> 
> In this instance the code is reachable (or isn't obviously unreachable).
> GCC doesn't remove it, but provides benign (and reasonable) semantics
> for it(*).  To me, that's one aspect of quality.  Letting the user know
> that the code is buggy is another.  I view that as at least as important
> as folding the ill-effects away because it makes it possible to fix
> the problem so the code works correctly even with compilers that don't
> provide these benign semantics.
If you look at the guts of what happens at the point where we issue the
warning from within gimple_fold_builtin_strcpy we have:

> DCH_to_char (char * in, char * out, int collid)
> {
>   int type;
>   char * D.2148;
>   char * dest;
>   char * num;
>   long unsigned int _4;
>   char * _5;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
> ;;succ:   4
> 
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;succ:   5
> 
> ;;   basic block 5, loop depth 0
> ;;pred:   4
> ;;succ:   6
> 
> ;;   basic block 6, loop depth 0
> ;;pred:   5
>   if (0 != 0)
> goto ; [53.47%]
>   else
> goto ; [46.53%]
> ;;succ:   7
> ;;8
> 
> ;;   basic block 7, loop depth 0
> ;;pred:   6
>   strcpy (out_1(D), out_1(D));
> ;;succ:   8
> 
> ;;   basic block 8, loop depth 0
> ;;pred:   6
> ;;7
>   _4 = __builtin_strlen (out_1(D));
>   _5 = out_1(D) + _4;
>   __builtin_memcpy (_5, "foo", 4);
> ;;succ:   3
> 
> ;;   basic block 3, loop depth 0
> ;;pred:   8
>   return;
> ;;succ:   EXIT
> 
> }
> 

Which shows the code is obviously unreachable in the case we're warning
about.  You can't see this in the dumps because it's exposed by
inlining, then cleaned up before writing the dump file.

ISTM this would be a case we could handle with the __builtin_warning
stuff. 

I think the question is do we want to do anything about it this cycle?
 

If so, I think Martin's approach is quite reasonable.  It disables
folding away the self-copies from gimple-fold and moves the warning
into the expander.  So if there's such a call in the IL at expansion
time we get a warning (-O0).

I'd hazard a guess that the diagnostic was added to the strlen pass to
capture the missed warning when we're optimizing and the self-copy has
survived until that point. There's a couple issues that raises though.

First, it's insufficient.  DSE (for example) can do self-copy removal,
so it needs the same handling.  There may be others places too.

Second, if the code becomes unreachable after strlen, then we've got
new false positive issues.

It's the classic problems we have with all middle end based warnings.

But I could live with those if we can show that using __builtin_warning
to handle this stuff in gcc-11 works...  ISTM we emit the
__builtin_warning call before any self-copy like that, whenever we
happen to spot them.  They'll 

Re: [PATCH 1/3] libstdc++: Apply the move_iterator changes described in P1207R4

2020-02-04 Thread Patrick Palka
On Tue, 4 Feb 2020, Jonathan Wakely wrote:

> On 03/02/20 21:07 -0500, Patrick Palka wrote:
> > These changes are needed for some of the tests in the constrained algorithm
> > patch, because they use move_iterator with an uncopyable output_iterator.
> > The
> > other changes described in the paper are already applied, it seems.
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * include/bits/stl_iterator.h (move_iterator::move_iterator): Move the
> > iterator when initializing _M_current.
> > (move_iterator::base): Split into two overloads differing in
> > ref-qualifiers as in P1207R4.
> > ---
> > libstdc++-v3/include/bits/stl_iterator.h | 11 +--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libstdc++-v3/include/bits/stl_iterator.h
> > b/libstdc++-v3/include/bits/stl_iterator.h
> > index 784d200d22f..1a288a5c785 100644
> > --- a/libstdc++-v3/include/bits/stl_iterator.h
> > +++ b/libstdc++-v3/include/bits/stl_iterator.h
> > @@ -1166,7 +1166,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > 
> >   explicit _GLIBCXX17_CONSTEXPR
> >   move_iterator(iterator_type __i)
> > -  : _M_current(__i) { }
> > +  : _M_current(std::move(__i)) { }
> > 
> >   template
> > _GLIBCXX17_CONSTEXPR
> > @@ -1174,9 +1174,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > : _M_current(__i.base()) { }
> > 
> >   _GLIBCXX17_CONSTEXPR iterator_type
> > -  base() const
> > +  base() const &
> > +#if __cplusplus > 201703L && __cpp_lib_concepts
> > +   requires copy_constructible
> > +#endif
> >   { return _M_current; }
> > 
> > +  _GLIBCXX17_CONSTEXPR iterator_type
> > +  base() &&
> > +  { return std::move(_M_current); }
> > +
> 
> I think this change has to be restricted to C++20 and later, otherwise
> a move_iterator in C++17 can change state so that its _M_current
> becomes moved-from, which should not be possible before C++20.
> 
> So something like:
> 
> #if __cplusplus <= 201703L
>   _GLIBCXX17_CONSTEXPR iterator_type
>   base() const
>   { return _M_current; }
> #else
>   constexpr iterator_type
>   base() const &
> #if __cpp_lib_concepts
>   requires copy_constructible
> #endif
>   { return _M_current; }
> 
>   constexpr iterator_type
>   base() &&
>   { return std::move(_M_current); }
> #endif
> 
> 

Thanks for the review, here is the updated patch.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (move_iterator::move_iterator): Move __i
when initializing _M_current.
(move_iterator::base): Split into two overloads differing in
ref-qualifiers as in P1207R4 for C++20.
---
 libstdc++-v3/include/bits/stl_iterator.h | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 784d200d22f..c200f7a9d14 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1166,16 +1166,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   explicit _GLIBCXX17_CONSTEXPR
   move_iterator(iterator_type __i)
-  : _M_current(__i) { }
+  : _M_current(std::move(__i)) { }
 
   template
_GLIBCXX17_CONSTEXPR
move_iterator(const move_iterator<_Iter>& __i)
: _M_current(__i.base()) { }
 
+#if __cplusplus <= 201703L
   _GLIBCXX17_CONSTEXPR iterator_type
   base() const
   { return _M_current; }
+#else
+  constexpr iterator_type
+  base() const &
+#if __cpp_lib_concepts
+   requires copy_constructible
+#endif
+  { return _M_current; }
+
+  constexpr iterator_type
+  base() &&
+  { return std::move(_M_current); }
+#endif
 
   _GLIBCXX17_CONSTEXPR reference
   operator*() const
-- 
2.25.0.114.g5b0ca878e0



Re: [PATCH 01/14] Initial create of rs6000-genbif.c.

2020-02-04 Thread Bill Schmidt

On 2/4/20 12:27 PM, Segher Boessenkool wrote:

Hi!

On Mon, Feb 03, 2020 at 08:26:02PM -0600, Bill Schmidt wrote:

Includes header documentation and initial set of include directives.

Please use full sentences in commit messages.



OK.




+/* This program generates built-in function initialization and
+   recognition code for Power targets, based on text files that
+   describe the built-in functions and vector overloads:
+
+ rs6000-bif.def   Table of built-in functions
+ rs6000-overload.def  Table of overload functions

I really don't think using the new acronym "bif" helps; built-in
functions already are often called "builtins" (or "intrinsics", which is
problematic itself).



Until we manage to replace the old methods, we already have 
rs6000-builtin.def, so I am a bit constrained in my choices. Given that 
restriction, what name would you prefer?  I can use rs6000-builtins.def 
(the plural) if you like.


I didn't think I was inventing "bif" as shorthand, but maybe that was an 
LLVM thing...





+ ext Process as a vec_extract function

Please spell out "extract"?  There are too many other words starting with
"ext", some of which you could expect here ("extend", "extension", maybe
even "extra");



OK.




+ ldv Needs special handling for vec_ld semantics
+ stv Needs special handling for vec_st semantics

Call those "vec_ld" and "vec_st", then?  Or should I get used to it, the
names aren't obvious, but cut-and-paste always is ;-)



Hm.  Well, vec_ld is a specific built-in, but this applies to a few more 
than just that one.  But sure, if you want.





+[TARGET_ALTIVEC]

Can this be a C expression?  Most gen* programs just copy similar things
to the generated C code, which can be interesting to debug, but works
perfectly well otherwise.



I rather prefer the way it is.  I do generate C code from this in the 
subsequent patches.  But I like table-driven code to use things that 
look like tables for input. :-)





+  const vector signed char __builtin_altivec_abs_v16qi (vector signed char);
+ABS_V16QI absv16qi2 {abs}
+  const vector signed short __builtin_altivec_abs_v8hi (vector signed short);
+ABS_V8HI absv8hi2 {abs}
+
+   Note the use of indentation, which is recommended but not required.

It does require a single newline at the end of each such line, right?
Does that work aout almost always, or do you get very long lines?



Yes, for now I am requiring the newline at the end of each line. I found 
that it does indeed get very long (unreadably long) lines for vector 
signatures.  I forgot to update this documentation when I changed my 
format.  I am now using abbreviations for vector types that match those 
we use often in our test cases ("vuc" for "vector unsigned char", "vsll" 
for "vector signed long long", etc.).  This makes for very nicely 
readable inputs (see patch #2).


The above now becomes

  const vsc __builtin_altivec_abs_v16qi (vsc);
    ABS_V16QI absv16qi2 {abs}
  const vss __builtin_altivec_abs_v8hi (vss);
    ABS_V8HI absv8hi2 {abs}

I will fix the documentation!




+ [, , ]

Hrm, "internal" suggests "name within the GCC code", but that is not what
it means.  Maybe something like abi-name and builtin-name?



OK, that's reasonable.




+  Blank lines may be used as desired in these files.

Between stanzas and stuff only?  There are places where newlines are
significant and not just whitespace, right?



I don't believe so, although there may be places where I forgot to allow 
a line to be advanced -- that would be a bug, though, so let me know if 
you see any.  Blank lines don't have any inherent meaning in the input 
files.




Great docs, thanks!



Thanks for the review!
Bill




Segher


[PATCH] git: Fix typo in url

2020-02-04 Thread Segher Boessenkool
Committed.


Segher


---
 htdocs/git.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/git.html b/htdocs/git.html
index 66d68eb..7fd22a9 100644
--- a/htdocs/git.html
+++ b/htdocs/git.html
@@ -44,7 +44,7 @@ through, you can replace git:// with 
https://.
 If there is another local repository accessible you can avoid
 re-downloading everything by using --reference, e.g.
 
-git clone --reference original-gcc --dissociate 
ssh://gcc.gnu.org/git.gcc.git new-gcc
+git clone --reference original-gcc --dissociate 
ssh://gcc.gnu.org/git/gcc.git new-gcc
 
 But if you own this other copy, you probably want to use
 separate worktrees instead of multiple clones.
-- 
1.8.3.1



Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-04 Thread Martin Sebor

On 2/4/20 12:15 PM, Richard Biener wrote:

On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  wrote:

On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:

On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:

PR 93519 reports a false positive -Wrestrict issued for an inlined

call

to strcpy that carefully guards against self-copying.  This is

caused

by the caller's arguments substituted into the call during inlining

and

before dead code elimination.

The attached patch avoids this by removing -Wrestrict from the

folder

and deferring folding perfectly overlapping (and so undefined)

calls

to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
perfectly overlapping calls to memcpy are still folded early.


Why do we bother to warn at all for this case?  Just DWIM here.

Warnings like

this can be emitted from the analyzer?

They potentially can, but the analyzer is and will almost always
certainly be considerably slower.  I would not expect it to be used
nearly as much as the core compiler.

WHether or not a particular warning makes sense in the core compiler or
analyzer would seem to me to depend on whether or not we can reasonably
issue warnings without interprocedural analysis.  double-free
realistically requires interprocedural analysis to be effective.  I'm
not sure Wrestrict really does.




That is, I suggest to simply remove the bogus warning code from

folding

(and _not_ fail the folding).

I haven't looked at the patch, but if we can get the warning out of the
folder that's certainly preferable.  And we could investigate deferring
self-copy removal.


I think the issue is as usual, warning for code we'll later remove as dead. 
Warning at folding is almost always premature.


In this instance the code is reachable (or isn't obviously unreachable).
GCC doesn't remove it, but provides benign (and reasonable) semantics
for it(*).  To me, that's one aspect of quality.  Letting the user know
that the code is buggy is another.  I view that as at least as important
as folding the ill-effects away because it makes it possible to fix
the problem so the code works correctly even with compilers that don't
provide these benign semantics.

Martin

[*] This is in contrast to the usual (though arguably inferior) strategy
GCC employs when dealing with undefined calls to built-in functions:
call the library function even when it's certain the call will crash.


[COMMITTED] c++: Fix ({ ... }) array mem-initializer.

2020-02-04 Thread Jason Merrill
Here, we were going down the wrong path in perform_member_init because of
the incorrect parens around the mem-initializer for the array.  And then
cxx_eval_vec_init_1 didn't know what to do with a CONSTRUCTOR as the
initializer.  The latter issue was a straightforward fix, but I also wanted
to fix us silently accepting the parens, which led to factoring out handling
of TREE_LIST and flexarrays.  The latter led to adjusting the expected
behavior on flexary29.C: we should complain about the initializer, but not
complain about a missing initializer.

As I commented on PR 92812, in this process I noticed that we weren't
handling C++20 parenthesized aggregate initialization as a mem-initializer.
So my TREE_LIST handling includes a commented out section that should
probably be part of a future fix for that issue; with it uncommented we
continue to crash on the testcase in C++20 mode, but should instead complain
about the braced-init-list not being a valid initializer for an A.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/86917
* init.c (perform_member_init): Simplify.
* constexpr.c (cx_check_missing_mem_inits): Allow uninitialized
flexarray.
(cxx_eval_vec_init_1): Handle CONSTRUCTOR.
---
 gcc/cp/constexpr.c| 11 -
 gcc/cp/init.c | 48 ++-
 .../g++.dg/cpp0x/constexpr-array23.C  | 24 ++
 gcc/testsuite/g++.dg/cpp0x/desig2.C   |  4 +-
 gcc/testsuite/g++.dg/cpp0x/desig3.C   |  4 +-
 gcc/testsuite/g++.dg/cpp0x/desig4.C   |  4 +-
 gcc/testsuite/g++.dg/ext/array1.C |  2 +-
 gcc/testsuite/g++.dg/ext/flexary29.C  |  2 +-
 gcc/testsuite/g++.dg/init/array28.C   |  2 +-
 9 files changed, 57 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array23.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 3962763fb21..c35ec5acc97 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -826,7 +826,12 @@ cx_check_missing_mem_inits (tree ctype, tree body, bool 
complain)
return true;
  continue;
}
- ftype = strip_array_types (TREE_TYPE (field));
+ ftype = TREE_TYPE (field);
+ if (!ftype || !TYPE_P (ftype) || !COMPLETE_TYPE_P (ftype))
+   /* A flexible array can't be intialized here, so don't complain
+  that it isn't.  */
+   continue;
+ ftype = strip_array_types (ftype);
  if (type_has_constexpr_default_constructor (ftype))
{
  /* It's OK to skip a member with a trivial constexpr ctor.
@@ -3784,6 +3789,10 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
   unsigned HOST_WIDE_INT i;
   tsubst_flags_t complain = ctx->quiet ? tf_none : tf_warning_or_error;
 
+  if (init && TREE_CODE (init) == CONSTRUCTOR)
+return cxx_eval_bare_aggregate (ctx, init, lval,
+   non_constant_p, overflow_p);
+
   /* For the default constructor, build up a call to the default
  constructor of the element type.  We only need to handle class types
  here, as for a constructor to be constexpr, all members must be
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 543d127abcd..625062b60ad 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -801,6 +801,17 @@ perform_member_init (tree member, tree init)
member);
 }
 
+  if (maybe_reject_flexarray_init (member, init))
+return;
+
+  if (init && TREE_CODE (init) == TREE_LIST
+  && (DIRECT_LIST_INIT_P (TREE_VALUE (init))
+ /* FIXME C++20 parenthesized aggregate init (PR 92812).  */
+ || !(/* cxx_dialect >= cxx2a ? CP_AGGREGATE_TYPE_P (type) */
+  /* :  */CLASS_TYPE_P (type
+init = build_x_compound_expr_from_list (init, ELK_MEM_INIT,
+   tf_warning_or_error);
+
   if (init == void_type_node)
 {
   /* mem() means value-initialization.  */
@@ -832,12 +843,7 @@ perform_member_init (tree member, tree init)
 }
   else if (init
   && (TYPE_REF_P (type)
-  /* Pre-digested NSDMI.  */
-  || (((TREE_CODE (init) == CONSTRUCTOR
-&& TREE_TYPE (init) == type)
-   /* { } mem-initializer.  */
-   || (TREE_CODE (init) == TREE_LIST
-   && DIRECT_LIST_INIT_P (TREE_VALUE (init
+  || (TREE_CODE (init) == CONSTRUCTOR
   && (CP_AGGREGATE_TYPE_P (type)
   || is_std_init_list (type)
 {
@@ -847,10 +853,7 @@ perform_member_init (tree member, tree init)
 persists until the constructor exits."  */
   unsigned i; tree t;
   releasing_vec cleanups;
-  if (TREE_CODE (init) == TREE_LIST)
-   init = build_x_compound_expr_from_list (init, ELK_MEM_INIT,
-   

Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-04 Thread Richard Biener
On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  wrote:
>On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
>> On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:
>> > PR 93519 reports a false positive -Wrestrict issued for an inlined
>call
>> > to strcpy that carefully guards against self-copying.  This is
>caused
>> > by the caller's arguments substituted into the call during inlining
>and
>> > before dead code elimination.
>> > 
>> > The attached patch avoids this by removing -Wrestrict from the
>folder
>> > and deferring folding perfectly overlapping (and so undefined)
>calls
>> > to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
>> > perfectly overlapping calls to memcpy are still folded early.
>> 
>> Why do we bother to warn at all for this case?  Just DWIM here. 
>Warnings like
>> this can be emitted from the analyzer?
>They potentially can, but the analyzer is and will almost always
>certainly be considerably slower.  I would not expect it to be used
>nearly as much as the core compiler.
>
>WHether or not a particular warning makes sense in the core compiler or
>analyzer would seem to me to depend on whether or not we can reasonably
>issue warnings without interprocedural analysis.  double-free
>realistically requires interprocedural analysis to be effective.  I'm
>not sure Wrestrict really does.
>
>
>> 
>> That is, I suggest to simply remove the bogus warning code from
>folding
>> (and _not_ fail the folding).
>I haven't looked at the patch, but if we can get the warning out of the
>folder that's certainly preferable.  And we could investigate deferring
>self-copy removal.

I think the issue is as usual, warning for code we'll later remove as dead. 
Warning at folding is almost always premature. 

Richard. 

>Jeff



Re: [PATCH 01/14] Initial create of rs6000-genbif.c.

2020-02-04 Thread Segher Boessenkool
Hi!

On Mon, Feb 03, 2020 at 08:26:02PM -0600, Bill Schmidt wrote:
> Includes header documentation and initial set of include directives.

Please use full sentences in commit messages.

> +/* This program generates built-in function initialization and
> +   recognition code for Power targets, based on text files that
> +   describe the built-in functions and vector overloads:
> +
> + rs6000-bif.def   Table of built-in functions
> + rs6000-overload.def  Table of overload functions

I really don't think using the new acronym "bif" helps; built-in
functions already are often called "builtins" (or "intrinsics", which is
problematic itself).

> + ext Process as a vec_extract function

Please spell out "extract"?  There are too many other words starting with
"ext", some of which you could expect here ("extend", "extension", maybe
even "extra");

> + ldv Needs special handling for vec_ld semantics
> + stv Needs special handling for vec_st semantics

Call those "vec_ld" and "vec_st", then?  Or should I get used to it, the
names aren't obvious, but cut-and-paste always is ;-)

> +[TARGET_ALTIVEC]

Can this be a C expression?  Most gen* programs just copy similar things
to the generated C code, which can be interesting to debug, but works
perfectly well otherwise.

> +  const vector signed char __builtin_altivec_abs_v16qi (vector signed char);
> +ABS_V16QI absv16qi2 {abs}
> +  const vector signed short __builtin_altivec_abs_v8hi (vector signed short);
> +ABS_V8HI absv8hi2 {abs}
> +
> +   Note the use of indentation, which is recommended but not required.

It does require a single newline at the end of each such line, right?
Does that work aout almost always, or do you get very long lines?

> + [, , ]

Hrm, "internal" suggests "name within the GCC code", but that is not what
it means.  Maybe something like abi-name and builtin-name?

> +  Blank lines may be used as desired in these files.

Between stanzas and stuff only?  There are places where newlines are
significant and not just whitespace, right?

Great docs, thanks!


Segher


[PATCH] alias: Fix offset checks involving section anchors [PR92294]

2020-02-04 Thread Richard Sandiford
Richard Sandiford  writes:
> [...]
>> I'm not sure given the issues you've introduced if I could actually
>> fill out the matrix of answers without more underlying information. 
>> ie, when can we get symbols without source level decls, 
>> anchors+interposition issues, etc.
>
> OK.  In that case, I wonder whether it would be safer to have a
> fourth state on top of the three above:
>
>   - known distance apart
>   - independent
>   - known distance apart or independent
>   - don't know
>
> with "don't know" being anything that involves bare symbols?
>
> Richard

How does this look?  Tested on aarch64-linux-gnu and
x86_64-linux-gnu.

Full description from scratch:

memrefs_conflict_p has a slightly odd structure.  It first checks
whether two addresses based on SYMBOL_REFs refer to the same object,
with a tristate result:

  int cmp = compare_base_symbol_refs (x,y);

If the addresses do refer to the same object, we can use offset-based checks:

  /* If both decls are the same, decide by offsets.  */
  if (cmp == 1)
return offset_overlap_p (c, xsize, ysize);

But then, apart from the special case of forced address alignment,
we use an offset-based check even if we don't know whether the
addresses refer to the same object:

  /* Assume a potential overlap for symbolic addresses that went
 through alignment adjustments (i.e., that have negative
 sizes), because we can't know how far they are from each
 other.  */
  if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0))
return -1;
  /* If decls are different or we know by offsets that there is no overlap,
 we win.  */
  if (!cmp || !offset_overlap_p (c, xsize, ysize))
return 0;

This somewhat contradicts:

  /* In general we assume that memory locations pointed to by different labels
 may overlap in undefined ways.  */

at the end of compare_base_symbol_refs.  In other words, we're taking -1
to mean that either (a) the symbols are equal (via aliasing) or (b) the
references access non-overlapping objects.

But even assuming that's true for normal symbols, it doesn't cope
correctly with section anchors.  If a symbol X at ANCHOR+OFFSET
is preemptible, either (a) X = ANDHOR+OFFSET or (b) X and ANCHOR
reference non-overlapping objects.

And an offset-based comparison makes no sense for an anchor symbol
vs. a bare symbol with no decl.  If the bare symbol is allowed to
alias other symbols then it can surely alias any symbol in the
anchor's block, so there are multiple anchor offsets that might
induce an alias.

This patch therefore replaces the current tristate:

  - known equal
  - known independent (two accesses can't alias)
  - equal or independent

with:

  - known distance apart
  - known independent (two accesses can't alias)
  - known distance apart or independent
  - don't know

For safety, the patch puts all bare symbols in the "don't know"
category.  If that turns out to be too conservative, we at least
need that behaviour for combinations involving a bare symbol
and a section anchor.

2020-02-04  Richard Sandiford  

gcc/
PR rtl-optimization/92294
* alias.c (compare_base_symbol_refs): Take an extra parameter
and add the distance between two symbols to it.  Enshrine in
comments that -1 means "either 0 or 1, but we can't tell
which at compile time".  Return -2 for symbols whose
relationship is unknown.
(memrefs_conflict_p): Update call accordingly.
(rtx_equal_for_memref_p): Likewise.  Punt for a return value of -2,
without even checking the offset.  Take the distance between symbols
into account.
---
 gcc/alias.c | 53 ++---
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/gcc/alias.c b/gcc/alias.c
index 3794f9b6a9e..c8b53df0b48 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -159,7 +159,8 @@ static tree decl_for_component_ref (tree);
 static int write_dependence_p (const_rtx,
   const_rtx, machine_mode, rtx,
   bool, bool, bool);
-static int compare_base_symbol_refs (const_rtx, const_rtx);
+static int compare_base_symbol_refs (const_rtx, const_rtx,
+HOST_WIDE_INT * = NULL);
 
 static void memory_modified_1 (rtx, const_rtx, void *);
 
@@ -1806,7 +1807,11 @@ rtx_equal_for_memref_p (const_rtx x, const_rtx y)
   return label_ref_label (x) == label_ref_label (y);
 
 case SYMBOL_REF:
-  return compare_base_symbol_refs (x, y) == 1;
+  {
+   HOST_WIDE_INT distance = 0;
+   return (compare_base_symbol_refs (x, y, ) == 1
+   && distance == 0);
+  }
 
 case ENTRY_VALUE:
   /* This is magic, don't go through canonicalization et al.  */
@@ -2138,10 +2143,24 @@ compare_base_decls (tree base1, tree base2)
   return ret;
 }
 
-/* Same as compare_base_decls but for SYMBOL_REF.  */
+/* Compare SYMBOL_REFs X_BASE and 

Re: [PATCH 00/14] rs6000: Begin replacing built-in support

2020-02-04 Thread Segher Boessenkool
Hi!

On Mon, Feb 03, 2020 at 08:26:01PM -0600, Bill Schmidt wrote:
> The current built-in support in the rs6000 back end requires at least
> a master's degree in spelunking to comprehend.  It's full of cruft,
> redundancy, and unused bits of code, and long overdue for a
> replacement.  This is the first part of my project to do that.

Woohoo :-)

> My intent is to make adding new built-in functions as simple as adding
> a few lines to a couple of files, and automatically generating as much
> of the initialization, overload resolution, and expansion logic as
> possible.  This patch series establishes the format of the input files
> and creates a new program (rs6000-genbif) to:

Let's call it rs6000-gen-builtins or similar.  Not as cryptic.

> Note that none of the code in this patch set affects GCC's operation
> at all, with the exception of patch #14.  Patch 14 causes the program
> rs6000-genbif to be built and executed, producing the output files,
> and linking rs6000-bif.o into the executable.  However, none of the
> code in rs6000-bif.o is called, so the only effect is to make the gcc
> executable larger.

If it builds at all ;-)

> I'd like to consider at least patches 1-13 as stage 4 material for the
> current release.  I'd prefer to also include patch 14 for convenience,
> but I understand if that's not deemed acceptable.
> 
> I've attempted to break this up into logical pieces for easy
> consumption, but some of the patches may still be a bit large.  Please
> let me know if you'd like me to break any of them up.

I will.  "Large" isn't a problem if it is a lot of the same thing.  If
it is two or more things, having them in separate patches is easier to
review; if there is just one case that is different, put it in a separate
patch if that can be done; otherwise, please point it out in the patch
commit message.

>   Initial create of rs6000-genbif.c.

Subjects do not end in a dot (but do start with a capital).

>   Add stubs for input files.  These will grow much larger.

The second half of this does not belong in the title, but in the body.


Segher


Re: [PATCH 2/4] [ARC] Use TARGET_INSN_COST.

2020-02-04 Thread Claudiu Zissulescu Ianculescu
> My only worry would be asking for the length early in the RTL pipeline
> may not be as accurate, but it's supposed to work, so if you're
> comfortable with the end results, then OK.
>
Indeed, the length is not accurate, but the results seem slightly
better than using COST_RTX. Using INSN_COSTS seems to me a better
manageable way in controlling what combiner does.
Anyhow, for ARC the instruction size is accurate quite late in the
compilation process as it needs register and immediate value info :(

Thank you for you review,
Claudiu


Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-02-04 Thread Stam Markianos-Wright



On 2/4/20 12:02 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/31/20 1:45 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/30/20 10:01 AM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/29/20 12:42 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

Hi all,

This fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Genmodes.c was generating the "wider_mode" chain as follows:

HF -> BF -> SF - > DF -> TF -> VOID

This caused issues in some rare cases where conversion between modes
was needed, such as the above PR93300 where BFmode was being picked up
as a valid mode for:

optabs.c:prepare_float_lib_cmp

which then led to the ICE at expr.c:convert_mode_scalar.


Hi Richard,



Can you go into more details about why this chain was a problem?
Naively, it's the one I'd have expected: HF should certainly have
priority over BF,


Is that because functionally it looks like genmodes puts things in reverse
alphabetical order if all else is equal? (If I'm reading the comment about
MODE_RANDOM, MODE_CC correctly)


but BF coming before SF doesn't seem unusual in itself.

I'm not saying the patch is wrong.  It just wasn't clear why it was
right either.


Yes, I see what you mean. I'll go through my thought process here:

In investigating the ICE PR93300 I found that the diversion from pre-bf16
behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a
`FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate
library calls for conversions.

This was then being caught further down by the gcc_assert at expr.c:325 where
GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because
it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which
is what happened if i removed the gcc_assert at expr.c:325)

With BFmode being a target-defined mode, I didn't want to add something like `if
(mode != BFmode)` to specifically exclude BFmode from being selected for this.
(and there's nothing different between HFmode and BFmode here to allow me to
make this distinction?)

Also I couldn't find anywhere where the target back-end is not consulted for a
"is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the
libcall being created later on as __extendhfbf2.


Yeah, prepare_float_lib_cmp just checks for libfuncs rather than
calling target hooks directly.  The libfuncs themselves are under
the control of the target though.

By default we assume all float modes have associated libfuncs.
It's then up to the target to remove functions that don't exist
(or redirect them to other functions).  So I think we need to remove
BFmode libfuncs in arm_init_libfuncs in the same way as we currently
do for HFmode.

I guess we should also nullify the conversion libfuncs for BFmode,
not just the arithmetic and comparison ones.


Ahhh now this works, thank you for the suggestion!

I was aware of arm_init_libfuncs, but I had not realised that returning NULL
would have the desired effect for us, in this case. So I have essentially rolled
back the whole previous version of the patch and done this in the new diff.
It seems to have fixed the ICE and I am currently in the process of regression
testing!


LGTM behaviourally, just a couple of requests about how it's written:



Thank you!
Stam



Thanks,
Richard


Finally, because we really don't want __bf16 to be in the same "conversion rank"
as standard float modes for things like automatic promotion, this seemed like a
reasonable solution to that problem :)

Let me know of your thoughts!

Cheers,
Stam


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c47fc232f39..18055d4a75e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2643,6 +2643,30 @@ arm_init_libfuncs (void)
   default:
 break;
   }
+
+  /* For all possible libcalls in BFmode, return NULL.  */
+  /* Conversions.  */
+  set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL));
+  set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL));
+  set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL));
+  set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL));


It might be slightly safer to do:

FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)

to iterate over all float modes on the non-BF side.


Done :)



+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, BFmode, NULL);
+  set_optab_libfunc (sdiv_optab, BFmode, NULL);
+  set_optab_libfunc (smul_optab, BFmode, NULL);
+  set_optab_libfunc (neg_optab, BFmode, NULL);
+  set_optab_libfunc (sub_optab, BFmode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, BFmode, NULL);
+  set_optab_libfunc (ne_optab, BFmode, NULL);
+  set_optab_libfunc (lt_optab, BFmode, NULL);
+  set_optab_libfunc (le_optab, BFmode, NULL);
+  set_optab_libfunc (ge_optab, BFmode, NULL);
+  set_optab_libfunc (gt_optab, 

Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-04 Thread Jeff Law
On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:
> > PR 93519 reports a false positive -Wrestrict issued for an inlined call
> > to strcpy that carefully guards against self-copying.  This is caused
> > by the caller's arguments substituted into the call during inlining and
> > before dead code elimination.
> > 
> > The attached patch avoids this by removing -Wrestrict from the folder
> > and deferring folding perfectly overlapping (and so undefined) calls
> > to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
> > perfectly overlapping calls to memcpy are still folded early.
> 
> Why do we bother to warn at all for this case?  Just DWIM here.  Warnings like
> this can be emitted from the analyzer?
They potentially can, but the analyzer is and will almost always
certainly be considerably slower.  I would not expect it to be used
nearly as much as the core compiler.

WHether or not a particular warning makes sense in the core compiler or
analyzer would seem to me to depend on whether or not we can reasonably
issue warnings without interprocedural analysis.  double-free
realistically requires interprocedural analysis to be effective.  I'm
not sure Wrestrict really does.


> 
> That is, I suggest to simply remove the bogus warning code from folding
> (and _not_ fail the folding).
I haven't looked at the patch, but if we can get the warning out of the
folder that's certainly preferable.  And we could investigate deferring
self-copy removal.

Jeff



[committed] analyzer: fix testsuite assumption that sizeof(int) > 2

2020-02-04 Thread David Malcolm
Fix some failures on xstormy16-elf:
  gcc.dg/analyzer/data-model-1.c  (test for warnings, line 595)
  gcc.dg/analyzer/data-model-1.c  (test for warnings, line 642)
  gcc.dg/analyzer/data-model-1.c  (test for warnings, line 690)
  gcc.dg/analyzer/data-model-1.c  (test for warnings, line 738)

due to:

warning: overflow in conversion from ‘long int’ to ‘int’ changes
  value from ‘100024’ to ‘-31048’ [-Woverflow]
20 |   p[0].x = 100024;
   |^~


Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Manually verified on a stage1 --target=xstormy16-elf without asm
support; appears to now be correct there.

Pushed to master as c422cec54a5495f6f42b80f35a11c5508fe8eec3.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/data-model-1.c (struct coord): Convert fields
from int to long.
---
 gcc/testsuite/gcc.dg/analyzer/data-model-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/data-model-1.c 
b/gcc/testsuite/gcc.dg/analyzer/data-model-1.c
index 3f925941f87..d75b9fa1db3 100644
--- a/gcc/testsuite/gcc.dg/analyzer/data-model-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/data-model-1.c
@@ -171,8 +171,8 @@ int test_12c (void)
 
 struct coord
 {
-  int x;
-  int y;
+  long x;
+  long y;
 };
 
 int test_12d (struct coord c)
-- 
2.21.0



[Patch][Testsuite][libgomp] – Fix check_effective_target_offload_target_nvptx for remote execution

2020-02-04 Thread Tobias Burnus

DejaGNU uses in lib/target.exp's
   proc default_target_compile {source destfile type options}
uses the following.

When running locally, $source is simply used
as argument. However, when run remotely, it is tried to be uploaded
on the remote host – and otherwise skipped. That's fine if the
argument is an actual file – but not if it is a flag.

Hence, flags should be passed as $options not as $source.
Quoting now from DejaGNU's default_target_compile#:

set sources ""
if {[isremote host]} {
foreach x $source {
set file [remote_download host $x]
if { $file eq "" } {
warning "Unable to download $x to host."
return "Unable to download $x to host."
} else {
append sources " $file"
}
}
} else {
set sources $source
}

 * * *

FIRST, I think that affects the following calls, but I have not
checked. — I assume that moving it from the first to the last
argument should work and fix the "isremote" issue.

gcc/testsuite/gcc.target/arm/multilib.exp:set gcc_output [${tool}_target_compile "--print-multi-directory 
$gcc_opts" "" "none" ""]
gcc/testsuite/lib/target-supports.exp:set gcc_output [${tool}_target_compile "-v" "" 
"none" ""]
gcc/testsuite/lib/target-supports.exp:  set gcc_ld [lindex [${tool}_target_compile "-print-prog-name=ld" 
"" "none" ""] 0]
gcc/testsuite/lib/target-supports.exp:  set gcc_as [lindex [${tool}_target_compile "-print-prog-name=as" 
"" "none" ""] 0]
gcc/testsuite/lib/target-supports.exp:  set gcc_ld [lindex [${tool}_target_compile "-print-prog-name=ld" 
"" "none" ""] 0]

TODO: One should probably change this.


SECONDLY: I hit a very similar issue in libgomp, which I actually did debug.

In check_effective_target_offload_target_nvptx, one has something similar, 
namely:
  set gcc_output [libgomp_target_compile "-v $options" "" "none" ""]
This currently tries (w/o success) to upload the flags to the remote host and 
then
fails as the required "-v" flag fails (i.e. no input).

However, using "-v" as options argument does not work as seemingly only 
additional_flags=
arguments are passed on. Additionally, if one does not only want to pass on the 
first
argument, curly braces are needed.

PATCH: The attached patch does this – and have successfully tested it both with 
local
runs and with remote runs. (RUNTESTFLAGS="-v -v -v" did help a lot for studying 
the
behavior in both cases.)

OK for the trunk?

Cheers,

Tobias

	libgomp/
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_nvptx): Pass flags as 'options'
	and not as 'source' argument to libgomp_target_compile.

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 7e94527c7ca..cb7757b6a91 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -346,11 +346,11 @@ proc check_effective_target_offload_target_nvptx { } {
 # files; in particular, '-foffload', 'libgomp.oacc-*/*.exp'), which don't
 # get passed on to 'check_effective_target_*' functions.  (Not caching the
 # result due to that.)
-set options [current_compiler_flags]
+set options [concat "{additional_flags=-v" [current_compiler_flags] "}"]
 # Instead of inspecting command-line options, look what the compiler driver
 # decides.  This is somewhat modelled after
 # 'gcc/testsuite/lib/target-supports.exp:check_configured_with'.
-set gcc_output [libgomp_target_compile "-v $options" "" "none" ""]
+set gcc_output [libgomp_target_compile "" "" "none" $options]
 if [regexp "(?n)^OFFLOAD_TARGET_NAMES=(.*)" $gcc_output dummy offload_targets] {
 	verbose "compiling for offload targets: $offload_targets"
 	return [string match "*:nvptx*:*" ":$offload_targets:"]


Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-04 Thread Martin Sebor

On 2/4/20 2:34 AM, Richard Biener wrote:

On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:


PR 93519 reports a false positive -Wrestrict issued for an inlined call
to strcpy that carefully guards against self-copying.  This is caused
by the caller's arguments substituted into the call during inlining and
before dead code elimination.

The attached patch avoids this by removing -Wrestrict from the folder
and deferring folding perfectly overlapping (and so undefined) calls
to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
perfectly overlapping calls to memcpy are still folded early.


Why do we bother to warn at all for this case?  Just DWIM here.  Warnings like
this can be emitted from the analyzer?

That is, I suggest to simply remove the bogus warning code from folding
(and _not_ fail the folding).


The overlapping copy is ultimately folded into a no-op but the warning
points out that code that relies on it is undefined.  The code should
be fixed.  This is in line with one of the strategies we discussed and
(at least those of us in the room) agreed on for undefined behavior
back in Manchester: try to do the least harm but warn.

Martin


Re: [PATCH] Do not load body for alias symbols.

2020-02-04 Thread Jan Hubicka
> Hi.
> 
> The patch is about not loading of LTO bodies for symbols
> that are only aliases.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/lto/ChangeLog:
> 
> 2020-02-04  Martin Liska  
> 
>   PR lto/93489
>   * lto-dump.c (dump_list_functions): Do not
>   load body for aliases.
>   (dump_body): Likewise here.
OK,
thanks!
Honza


[PATCH] Do not load body for alias symbols.

2020-02-04 Thread Martin Liška

Hi.

The patch is about not loading of LTO bodies for symbols
that are only aliases.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/lto/ChangeLog:

2020-02-04  Martin Liska  

PR lto/93489
* lto-dump.c (dump_list_functions): Do not
load body for aliases.
(dump_body): Likewise here.
---
 gcc/lto/lto-dump.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)


diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c
index 2983c22fdd5..96e26d9e81c 100644
--- a/gcc/lto/lto-dump.c
+++ b/gcc/lto/lto-dump.c
@@ -122,7 +122,7 @@ public:
 cgraph_node *cnode = dyn_cast (node);
 gcc_assert (cnode);
 
-return (cnode->definition)
+return (cnode->definition && !cnode->alias)
  ? n_basic_blocks_for_fn (DECL_STRUCT_FUNCTION (cnode->decl))
  : 0;
   }
@@ -157,10 +157,10 @@ void dump_list_functions (void)
   cgraph_node *cnode;
   FOR_EACH_FUNCTION (cnode)
   {
-if (cnode->definition)
+if (cnode->definition && !cnode->alias)
   cnode->get_untransformed_body ();
 symbol_entry *e = new function_entry (cnode);
-if (!flag_lto_dump_defined || cnode->definition)
+if (!flag_lto_dump_defined || (cnode->definition && !cnode->alias))
   v.safe_push (e);
   }
 
@@ -260,13 +260,15 @@ void dump_body ()
   }
   cgraph_node *cnode;
   FOR_EACH_FUNCTION (cnode)
-  if (cnode->definition && !strcmp (cnode->name (), flag_dump_body))
-  {
-printf ("Gimple Body of Function: %s\n", cnode->name ());
-cnode->get_untransformed_body ();
-debug_function (cnode->decl, flags);
-flag = 1;
-  }
+if (cnode->definition
+	&& !cnode->alias
+	&& !strcmp (cnode->name (), flag_dump_body))
+  {
+	printf ("Gimple Body of Function: %s\n", cnode->name ());
+	cnode->get_untransformed_body ();
+	debug_function (cnode->decl, flags);
+	flag = 1;
+  }
   if (!flag)
 error_at (input_location, "Function not found.");
   return;



Re: [PATCH][DOCUMENTATION] Document ASLR for Precompiled Headers.

2020-02-04 Thread Martin Liška

On 2/4/20 2:59 PM, Jakub Jelinek wrote:

On Tue, Feb 04, 2020 at 02:55:31PM +0100, Richard Biener wrote:

On Tue, Feb 4, 2020 at 2:45 PM Martin Liška  wrote:


Hi.

The patch is about enhancement of the documentation, where
we do not support ASLR for Precompiled Headers.

Ready for trunk?


I think we support ASLR just fine?


We do support ASLR just fine, just the PCH from it will not be bitwise
reproduceable.  So the doc patch is misplaced (putting it between sentence
talking about options and the list of those options is weird) and should make
it clear that ASLR may result in non-reproduceable PCH files.


You are right. I fixed the placing of the hunk and relaxed the wording.

Martin




gcc/ChangeLog:

2020-02-04  Martin Liska  

 PR c++/92717
 * doc/invoke.texi: Document that one should
 not combine ASLR and -fpch.
---
   gcc/doc/invoke.texi | 2 ++
   1 file changed, 2 insertions(+)




Jakub



>From f0b049bb709f3f7a1755abff0193dc40159c1d95 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 4 Feb 2020 14:43:35 +0100
Subject: [PATCH] Document ASLR for Precompiled Headers.

gcc/ChangeLog:

2020-02-04  Martin Liska  

	PR c++/92717
	* doc/invoke.texi: Document that one should
	not combine ASLR and -fpch.
---
 gcc/doc/invoke.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4dec0c8326b..350d193ad22 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -31106,6 +31106,9 @@ precompiled header.  The following are known to be safe:
 -fsched-verbose=@var{number}  -fschedule-insns  -fvisibility= @gol
 -pedantic-errors}
 
+@item Address space layout randomization (ASLR) can lead
+to non-reproducible results.  We recommend to disable it.
+
 @end itemize
 
 For all of these except the last, the compiler automatically
-- 
2.25.0



[PATCH] x86: Add UNSPECV_PATCHABLE_AREA

2020-02-04 Thread H.J. Lu
On Mon, Feb 03, 2020 at 06:10:49PM -0800, H.J. Lu wrote:
> On Mon, Feb 3, 2020 at 4:02 PM H.J. Lu  wrote:
> >
> > On Mon, Feb 3, 2020 at 10:35 AM H.J. Lu  wrote:
> > >
> > > Define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY to make sure that the
> > > ENDBR are emitted before the patch area.  When -mfentry -pg is also used
> > > together, there should be no ENDBR before "call __fentry__".
> > >
> > > OK for master if there is no regression?
> > >
> > > Thanks.
> > >
> > > H.J.
> > > --
> > > gcc/
> > >
> > > PR target/93492
> > > * config/i386/i386.c (ix86_asm_output_function_label): Set
> > > function_label_emitted to true.
> > > (ix86_print_patchable_function_entry): New function.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/93492
> > > * gcc.target/i386/pr93492-1.c: New test.
> > > * gcc.target/i386/pr93492-2.c: Likewise.
> > > * gcc.target/i386/pr93492-3.c: Likewise.
> > >
> >
> > This version works with both .cfi_startproc and DWARF debug info.
> >
> 
> -g -fpatchable-function-entry=1  doesn't work together:
> 

Here is a differnt approach with UNSPECV_PATCHABLE_AREA.


H.J.
---
Currently patchable area is at the wrong place.  It is placed immediately
after function label, before both .cfi_startproc and ENDBR.  This patch
adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
changes ENDBR insertion pass to also insert a dummy patchable area.
TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY is defined to provide the
actual size for patchable area.  It places patchable area immediately
after .cfi_startproc and ENDBR.

gcc/

PR target/93492
* config/i386/i386-features.c (rest_of_insert_endbranch):
Renamed to ...
(rest_of_insert_endbr_and_patchable_area): Change return type
to void.  Don't call timevar_push nor timevar_pop.  Replace
endbr_queued_at_entrance with insn_queued_at_entrance.  Insert
UNSPECV_PATCHABLE_AREA for patchable area.
(pass_data_insert_endbranch): Renamed to ...
(pass_data_insert_endbr_and_patchable_area): This.  Change
pass name to endbr_and_patchable_area.
(pass_insert_endbranch): Renamed to ...
(pass_insert_endbr_and_patchable_area): This.  Add need_endbr
and need_patchable_area.
(pass_insert_endbr_and_patchable_area::gate): Set and check
need_endbr/need_patchable_area.
(pass_insert_endbr_and_patchable_area::execute): Call
timevar_push and timevar_pop.  Pass need_endbr amd
need_patchable_area to rest_of_insert_endbr_and_patchable_area.
(make_pass_insert_endbranch): Renamed to ...
(make_pass_insert_endbr_and_patchable_area): This.
* config/i386/i386-passes.def: Replace pass_insert_endbranch
with pass_insert_endbr_and_patchable_area.
* config/i386/i386-protos.h (ix86_output_patchable_area): New.
(make_pass_insert_endbranch): Renamed to ...
(make_pass_insert_endbr_and_patchable_area): This.
* config/i386/i386.c (ix86_asm_output_function_label): Set
function_label_emitted to true.
(ix86_print_patchable_function_entry): New function.
(ix86_output_patchable_area): Likewise.
(x86_function_profiler): Replace endbr_queued_at_entrance with
insn_queued_at_entrance.  Generate ENDBR only for TYPE_ENDBR.
Call ix86_output_patchable_area to generate patchable area.
(TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): New.
* i386.h (queued_insn_type): New.
(machine_function): Add patch_area_size, record_patch_area and
function_label_emitted.  Replace endbr_queued_at_entrance with
insn_queued_at_entrance.
* config/i386/i386.md (UNSPECV_PATCHABLE_AREA): New.
(patchable_area): New.

gcc/testsuite/

PR target/93492
* gcc.target/i386/pr93492-1.c: New test.
* gcc.target/i386/pr93492-2.c: Likewise.
* gcc.target/i386/pr93492-3.c: Likewise.
* gcc.target/i386/pr93492-4.c: Likewise.
* gcc.target/i386/pr93492-5.c: Likewise.
---
 gcc/config/i386/i386-features.c   | 139 ++
 gcc/config/i386/i386-passes.def   |   2 +-
 gcc/config/i386/i386-protos.h |   5 +-
 gcc/config/i386/i386.c|  90 +-
 gcc/config/i386/i386.h|  20 +++-
 gcc/config/i386/i386.md   |  14 +++
 gcc/testsuite/gcc.target/i386/pr93492-1.c |  73 
 gcc/testsuite/gcc.target/i386/pr93492-2.c |  12 ++
 gcc/testsuite/gcc.target/i386/pr93492-3.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr93492-4.c |  11 ++
 gcc/testsuite/gcc.target/i386/pr93492-5.c |  12 ++
 11 files changed, 337 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr93492-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr93492-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr93492-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr93492-4.c
 create mode 100644 

Re: [PATCH] adjust object size computation for union accesses and PHIs (PR 92765)

2020-02-04 Thread Richard Biener
On Mon, Feb 3, 2020 at 7:45 PM Jeff Law  wrote:
>
> On Fri, 2020-01-31 at 12:04 -0700, Martin Sebor wrote:
> > Attached is a reworked patch since the first one didn't go far
> > enough to solve the major problems.  The new solution relies on
> > get_range_strlen_dynamic the same way as the sprintf optimization,
> > and does away with the determine_min_objsize function and calling
> > compute_builtin_object_size.
> >
> > To minimize testsuite fallout I extended get_range_strlen to handle
> > a couple more kinds expressions(*), but I still had to xfail and
> > disable a few tests that were relying on being able to use the type
> > of the destination object as the upper bound on the string length.
> >
> > Tested on x86_64-linux.
> >
> > Martin
> >
> > [*] With all the issues around MEM_REFs and types this change needs
> > extra scrutiny.  I'm still not sure I fully understand what can and
> > what cannot be safely relied on at this level.
> >
> > On 1/15/20 6:18 AM, Martin Sebor wrote:
> > > The strcmp optimization newly introduced in GCC 10 relies on
> > > the size of the smallest referenced array object to determine
> > > whether the function can return zero.  When the size of
> > > the object is smaller than the length of the other string
> > > argument the optimization folds the equality to false.
> > >
> > > The bug report has identified a couple of problems here:
> > > 1) when the access to the array object is via a pointer to
> > > a (possibly indirect) member of a union, in GIMPLE the pointer
> > > may actually point to a different member than the one in
> > > the original source code.  Thus the size of the array may
> > > appear to be smaller than in the source code which can then
> > > result in the optimization being invalid.
> > > 2) when the pointer in the access may point to two or more
> > > arrays of different size (i.e., it's the result of a PHI),
> > > assuming it points to the smallest of them can also lead
> > > to an incorrect result when the optimization is applied.
> > >
> > > The attached patch adjusts the optimization to 1) avoid making
> > > any assumptions about the sizes of objects accessed via union
> > > types, and b) use the size of the largest object in PHI nodes.
> > >
> > > Tested on x86_64-linux.
> > >
> > > Martin
> >
> >
> > PR tree-optimization/92765 - wrong code for strcmp of a union member
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/92765
> > * gimple-fold.c (get_range_strlen_tree): Handle MEM_REF and 
> > PARM_DECL.
> > * tree-ssa-strlen.c (compute_string_length): Remove.
> > (determine_min_objsize): Remove.
> > (get_len_or_size): Add an argument.  Call get_range_strlen_dynamic.
> > Avoid using type size as the upper bound on string length.
> > (handle_builtin_string_cmp): Add an argument.  Adjust.
> > (strlen_check_and_optimize_call): Pass additional argument to
> > handle_builtin_string_cmp.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/92765
> > * g++.dg/tree-ssa/strlenopt-1.C: New test.
> > * g++.dg/tree-ssa/strlenopt-2.C: New test.
> > * gcc.dg/Warray-bounds-58.c: New test.
> > * gcc.dg/Wrestrict-20.c: Avoid a valid -Wformat-overflow.
> > * gcc.dg/Wstring-compare.c: Xfail a test.
> > * gcc.dg/strcmpopt_2.c: Disable tests.
> > * gcc.dg/strcmpopt_4.c: Adjust tests.
> > * gcc.dg/strcmpopt_10.c: New test.
> > * gcc.dg/strlenopt-69.c: Disable tests.
> > * gcc.dg/strlenopt-92.c: New test.
> > * gcc.dg/strlenopt-93.c: New test.
> > * gcc.dg/strlenopt.h: Declare calloc.
> > * gcc.dg/tree-ssa/pr92056.c: Xfail tests until pr93518 is resolved.
> > * gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: Correct test (pr93517).
> >
> > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> > index ed225922269..d70ac67e1ca 100644
> > --- a/gcc/gimple-fold.c
> > +++ b/gcc/gimple-fold.c
> > @@ -1280,7 +1280,7 @@ get_range_strlen_tree (tree arg, bitmap *visited, 
> > strlen_range_kind rkind,
> >c_strlen_data *pdata, unsigned eltsize)
> >  {
> >gcc_assert (TREE_CODE (arg) != SSA_NAME);
> > -
> > +
> >/* The length computed by this invocation of the function.  */
> >tree val = NULL_TREE;
> >
> > @@ -1422,7 +1422,42 @@ get_range_strlen_tree (tree arg, bitmap *visited, 
> > strlen_range_kind rkind,
> >  type about the length here.  */
> >   tight_bound = true;
> > }
> > -  else if (VAR_P (arg))
> > +  else if (TREE_CODE (arg) == MEM_REF
> > +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE
> > +  && TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) == INTEGER_TYPE
> > +  && TREE_CODE (TREE_OPERAND (arg, 0)) == ADDR_EXPR)
> > +   {
> > + /* Handle a MEM_REF into a DECL accessing an array of integers,
> > +being conservative about references to extern structures with
> 

Re: [RFC PATCH] __builtin_escape/__builtin_bless

2020-02-04 Thread Uecker, Martin


Am Dienstag, den 04.02.2020, 15:01 +0100 schrieb Richard Biener:
> On Sat, Feb 1, 2020 at 1:04 AM Uecker, Martin
>  wrote:
> > 
> > 
> > 
> > Hi Richard and Joseph,
> > 
> > for discussion: here is my simple patch
> > for __builtin_escape/__builtin_bless.
> > 
> > Maybe it does something stupid.
> 
> I'd use an internal function once the frontend inserts the calls,
> we shouldn't expose those to the user.

Ok. I thought it might be useful for testing...

> I expect the builtins to show as quite invasive with their
> data dependence blocking many optimizations. 

Yes, I wondering about the impact of this on
optimization and whether this could be made
more light-weight somehow.

Do you think it is worth investigating
the use of some other kind of marker?

> Any reason that __builtin_escape is not marked as const? 

Conceptually, letting something escape (or marking
it "exposed" according to the formal semantics
in the PVNI proposal) is a side-effect.

Practically, if I mark it constant, I think the
problem was that it was getting optimized away
completely.

> Do you try to prevent code motion of it so flow-sensitive 
> "escape" analysis can be used? 

If I don't miss something important, there is 
no  flow-sensitive analysis done in GCC.

> But nothing prevents hoisting of the bless call
> across the escape one.

With flow-sensitive analysis, this would be
incorrect.

Martin


> > 
> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> > index e4a8694054e..d0046135213 100644
> > --- a/gcc/builtins.c
> > +++ b/gcc/builtins.c
> > @@ -6014,6 +6014,31 @@ expand_builtin_unreachable (void)
> >    emit_barrier ();
> >  }
> > 
> > +
> > +static rtx
> > +expand_builtin_escape (tree exp, rtx target)
> > +{
> > +  if (call_expr_nargs (exp) < 1)
> > +return const0_rtx;
> > +
> > +  target = expand_expr (CALL_EXPR_ARG (exp, 0), target, Pmode,
> > +   EXPAND_NORMAL);
> > +
> > +  return target;
> > +}
> > +
> > +static rtx
> > +expand_builtin_bless (tree exp, rtx target)
> > +{
> > +  if (call_expr_nargs (exp) < 1)
> > +return const0_rtx;
> > +
> > +  target = expand_expr (CALL_EXPR_ARG (exp, 0), target, Pmode,
> > +   EXPAND_NORMAL);
> > +
> > +  return target;
> > +}
> > +
> >  /* Expand EXP, a call to fabs, fabsf or fabsl.
> > Return NULL_RTX if a normal call should be emitted rather than
> > expanding
> > the function inline.  If convenient, the result should be
> > placed
> > @@ -8304,6 +8329,12 @@ expand_builtin (tree exp, rtx target, rtx
> > subtarget, machine_mode mode,
> >    expand_builtin_unreachable ();
> >    return const0_rtx;
> > 
> > +case BUILT_IN_ESCAPE:
> > +  return expand_builtin_escape (exp, target);
> > +
> > +case BUILT_IN_BLESS:
> > +  return expand_builtin_bless (exp, target);
> > +
> >  CASE_FLT_FN (BUILT_IN_SIGNBIT):
> >  case BUILT_IN_SIGNBITD32:
> >  case BUILT_IN_SIGNBITD64:
> > diff --git a/gcc/builtins.def b/gcc/builtins.def
> > index fa8b0641ab1..9264a0fdaab 100644
> > --- a/gcc/builtins.def
> > +++ b/gcc/builtins.def
> > @@ -865,6 +865,8 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_EXECVP,
> > "execvp", BT_FN_INT_CONST_STRING_PT
> >  DEF_EXT_LIB_BUILTIN(BUILT_IN_EXECVE, "execve",
> > BT_FN_INT_CONST_STRING_PTR_CONST_STRING_PTR_CONST_STRING,
> > ATTR_NOTHROW_LIST)
> >  DEF_LIB_BUILTIN(BUILT_IN_EXIT, "exit", BT_FN_VOID_INT,
> > ATTR_NORETURN_NOTHROW_LIST)
> >  DEF_GCC_BUILTIN(BUILT_IN_EXPECT, "expect",
> > BT_FN_LONG_LONG_LONG, ATTR_CONST_NOTHROW_LEAF_LIST)
> > +DEF_GCC_BUILTIN(BUILT_IN_ESCAPE, "escape", BT_FN_PTR_PTR,
> > ATTR_NOTHROW_LEAF_LIST)
> > +DEF_GCC_BUILTIN(BUILT_IN_BLESS, "bless", BT_FN_PTR_PTR,
> > ATTR_CONST_NOTHROW_LEAF_LIST)
> >  DEF_GCC_BUILTIN(BUILT_IN_EXPECT_WITH_PROBABILITY,
> > "expect_with_probability", BT_FN_LONG_LONG_LONG_DOUBLE,
> > ATTR_CONST_NOTHROW_LEAF_LIST)
> >  DEF_GCC_BUILTIN(BUILT_IN_ASSUME_ALIGNED, "assume_aligned",
> > BT_FN_PTR_CONST_PTR_SIZE_VAR, ATTR_CONST_NOTHROW_LEAF_LIST)
> >  DEF_GCC_BUILTIN(BUILT_IN_EXTEND_POINTER, "extend_pointer",
> > BT_FN_UNWINDWORD_PTR, ATTR_CONST_NOTHROW_LEAF_LIST)
> > diff --git a/gcc/testsuite/gcc.dg/alias-17.c
> > b/gcc/testsuite/gcc.dg/alias-17.c
> > new file mode 100644
> > index 000..c375e4027ca
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/alias-17.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O1" } */
> > +
> > +
> > +static
> > +int test(int a, int b, int c)
> > +{
> > +   int x, y;
> > +   int d = ( < ) ? 1 : -1;
> > +   if (a) __builtin_escape();
> > +   y = 0; //!
> > +   int *q = 
> > +   q += d;
> > +   int *r = q;
> > +   if (b) r = __builtin_bless(q);
> > +   *(c ? r : q) = 1;
> > +   return y;
> > +}
> > +
> > +int main()
> > +{
> > +   if (0 != test(0, 0, 1)) __builtin_abort();
> > +   if (0 != test(0, 1, 0)) __builtin_abort();
> > +   if (0 != test(0, 1, 1)) 

Re: [PATCH][AArch64] Improve clz patterns

2020-02-04 Thread Wilco Dijkstra
Hi Richard,

> Could you go into more detail about what the before and after code
> looks like, and what combine is doing?  Like you say, this sounds
> like a target-independent thing on face value.

It is indeed, but it seems specific to instructions where we have range
information which allows it to remove a redundant sign-extend.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565 for the full details.

> Either way, something like this needs a testcase.

Sure I've added the testcase from pr93565, see below.

Cheers,
Wilco


Although GCC should understand the limited range of clz/ctz/cls results,
Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
sign extension.  Avoid this by adding an explicit AND with 127 in the
patterns. Deepsjeng performance improves by ~0.6%.

Bootstrap OK.

ChangeLog:
2020-02-04  Wilco Dijkstra  

PR rtl-optimization/93565
* config/aarch64/aarch64.md (clz2): Mask the clz result.
(clrsb2): Likewise.
(ctz2): Likewise.

* gcc.target/aarch64/pr93565.c: New test.
--
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4794,7 +4794,8 @@ (define_insn 
"*and_one_cmpl_3_compare0_no_reuse"
 
 (define_insn "clz2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-   (clz:GPI (match_operand:GPI 1 "register_operand" "r")))]
+   (and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r"))
+(const_int 127)))]
   ""
   "clz\\t%0, %1"
   [(set_attr "type" "clz")]
@@ -4848,7 +4849,8 @@ (define_expand "popcount2"
 
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-(clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
+   (and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r"))
+(const_int 127)))]
   ""
   "cls\\t%0, %1"
   [(set_attr "type" "clz")]
@@ -4869,7 +4871,8 @@ (define_insn "rbit2"
 
 (define_insn_and_split "ctz2"
  [(set (match_operand:GPI   0 "register_operand" "=r")
-   (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
+   (and:GPI (ctz:GPI (match_operand:GPI  1 "register_operand" "r"))
+   (const_int 127)))]
   ""
   "#"
   "reload_completed"
diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c 
b/gcc/testsuite/gcc.target/aarch64/pr93565.c
new file mode 100644
index 
..7200f80d1bb161f6a058cc6591f61b6b75cf1749
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+static const unsigned long long magic = 0x03f08c5392f756cdULL;
+
+static const char table[64] = {
+ 0,  1, 12,  2, 13, 22, 17,  3,
+14, 33, 23, 36, 18, 58, 28,  4,
+62, 15, 34, 26, 24, 48, 50, 37,
+19, 55, 59, 52, 29, 44, 39,  5,
+63, 11, 21, 16, 32, 35, 57, 27,
+61, 25, 47, 49, 54, 51, 43, 38,
+10, 20, 31, 56, 60, 46, 53, 42,
+ 9, 30, 45, 41,  8, 40,  7,  6,
+};
+
+static inline int ctz1 (unsigned long  b)
+{
+  unsigned long lsb = b & -b;
+  return table[(lsb * magic) >> 58];
+}
+
+void f (unsigned long x, int *p)
+{
+  if (x != 0)
+{
+  int a = ctz1 (x);
+  *p = a | p[a];
+}
+}
+
+/* { dg-final { scan-assembler-times "rbit\t" 1 } } */
+/* { dg-final { scan-assembler-times "clz\t" 1 } } */
+



[PATCH] tree-optimization/93538 - add missing comparison folding case

2020-02-04 Thread Richard Biener
This adds back a folding that worked in GCC 4.5 times by amending
the pattern that handles other cases of address vs. SSA name
comparisons.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-02-04  Richard Biener  

PR tree-optimization/93538
* match.pd (addr EQ/NE ptr): Amend to handle >x EQ/NE ptr.

* gcc.dg/tree-ssa/forwprop-38.c: New testcase.
---
 gcc/match.pd| 33 +++--
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-38.c | 13 
 2 files changed, 35 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-38.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 780eb6a8959..a9215f971b9 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4267,19 +4267,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
{ constant_boolean_node (above ? false : true, type); }
 
 (for cmp (eq ne)
- /* A local variable can never be pointed to by
-the default SSA name of an incoming parameter.
-SSA names are canonicalized to 2nd place.  */
  (simplify
+  /* SSA names are canonicalized to 2nd place.  */
   (cmp addr@0 SSA_NAME@1)
-  (if (SSA_NAME_IS_DEFAULT_DEF (@1)
-   && TREE_CODE (SSA_NAME_VAR (@1)) == PARM_DECL)
-   (with { tree base = get_base_address (TREE_OPERAND (@0, 0)); }
-(if (TREE_CODE (base) == VAR_DECL
- && auto_var_in_fn_p (base, current_function_decl))
- (if (cmp == NE_EXPR)
-  { constant_boolean_node (true, type); }
-  { constant_boolean_node (false, type); }))
+  (with
+   { poly_int64 off; tree base; }
+   /* A local variable can never be pointed to by
+  the default SSA name of an incoming parameter.  */
+   (if (SSA_NAME_IS_DEFAULT_DEF (@1)
+   && TREE_CODE (SSA_NAME_VAR (@1)) == PARM_DECL
+   && (base = get_base_address (TREE_OPERAND (@0, 0)))
+   && TREE_CODE (base) == VAR_DECL
+   && auto_var_in_fn_p (base, current_function_decl))
+(if (cmp == NE_EXPR)
+ { constant_boolean_node (true, type); }
+ { constant_boolean_node (false, type); })
+/* If the address is based on @1 decide using the offset.  */
+(if ((base = get_addr_base_and_unit_offset (TREE_OPERAND (@0, 0), ))
+&& TREE_CODE (base) == MEM_REF
+&& TREE_OPERAND (base, 0) == @1)
+ (with { off += mem_ref_offset (base).force_shwi (); }
+  (if (known_ne (off, 0))
+   { constant_boolean_node (cmp == NE_EXPR, type); }
+   (if (known_eq (off, 0))
+{ constant_boolean_node (cmp == EQ_EXPR, type); }
 
 /* Equality compare simplifications from fold_binary  */
 (for cmp (eq ne)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-38.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-38.c
new file mode 100644
index 000..e016c825b3c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-38.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-forwprop1" } */
+
+struct A { int a[1]; };
+
+void f (struct A *p)
+{
+  void *q = p->a;
+  if (p != q)
+__builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "abort" "forwprop1" } } */
-- 
2.16.4


Re: [PR47785] COLLECT_AS_OPTIONS

2020-02-04 Thread Richard Biener
On Mon, Feb 3, 2020 at 12:37 PM Prathamesh Kulkarni
 wrote:
>
> On Thu, 30 Jan 2020 at 19:10, Richard Biener  
> wrote:
> >
> > On Thu, Jan 30, 2020 at 5:31 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Tue, 28 Jan 2020 at 17:17, Richard Biener  
> > > wrote:
> > > >
> > > > On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Mon, 20 Jan 2020 at 15:44, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > > Thanks for the review.
> > > > > > > > >
> > > > > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the reviews.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan 
> > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard Biener 
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan 
> > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, Richard 
> > > > > > > > > > > > > > > > > > > Biener  wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM Kugan 
> > > > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > As mentioned in the PR, attached 
> > > > > > > > > > > > > > > > > > > > > patch adds COLLECT_AS_OPTIONS for
> > > > > > > > > > > > > > > > > > > > > passing assembler options specified 
> > > > > > > > > > > > > > > > > > > > > with -Wa, to the link-time driver.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > The proposed solution only works for 
> > > > > > > > > > > > > > > > > > > > > uniform -Wa options across all
> > > > > > > > > > > > > > > > > > > > > TUs. As mentioned by Richard Biener, 
> > > > > > > > > > > > > > > > > > > > > supporting non-uniform -Wa flags
> > > > > > > > > > > > > > > > > > > > > would require either adjusting 
> > > > > > > > > > > > > > > > > > > > > partitioning according to flags or
> > > > > > > > > > > > > > > > > > > > > emitting multiple object files  from 
> > > > > > > > > > > > > > > > > > > > > a single LTRANS CU. We could
> > > > > > > > > > > > > > > > > > > > > consider this as a follow up.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Bootstrapped and regression tests on  
> > > > > > > > > > > > > > > > > > > > > arm-linux-gcc. Is this OK for trunk?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > While it works for your simple cases it 
> > > > > > > > > > > > > > > > > > > > is unlikely to work in practice since
> > > > > > > > > > > > > > > > > > > > your implementation needs the assembler 
> 

[committed] libstdc++: Fix name of macro in #undef directive

2020-02-04 Thread Jonathan Wakely
The macro that is defined is _GLIBCXX_NOT_FN_CALL_OP but the macro that
was named in the #undef directive was _GLIBCXX_NOT_FN_CALL. This fixes
the #undef.

* include/std/functional (_GLIBCXX_NOT_FN_CALL_OP): Un-define after
use.

Tested powerpc64le-linux. Committed to master.

commit 9bc5bea1f3f0ce3927fd86ce728641d087f3d3b5
Author: Jonathan Wakely 
Date:   Tue Feb 4 13:30:57 2020 +

libstdc++: Fix name of macro in #undef directive

The macro that is defined is _GLIBCXX_NOT_FN_CALL_OP but the macro that
was named in the #undef directive was _GLIBCXX_NOT_FN_CALL. This fixes
the #undef.

* include/std/functional (_GLIBCXX_NOT_FN_CALL_OP): Un-define after
use.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 88cffd5a9ce..faa7e85c114 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -953,7 +953,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX_NOT_FN_CALL_OP( const & )
   _GLIBCXX_NOT_FN_CALL_OP( && )
   _GLIBCXX_NOT_FN_CALL_OP( const && )
-#undef _GLIBCXX_NOT_FN_CALL
+#undef _GLIBCXX_NOT_FN_CALL_OP
 
 private:
   _Fn _M_fn;


Re: [RFC PATCH] __builtin_escape/__builtin_bless

2020-02-04 Thread Richard Biener
On Sat, Feb 1, 2020 at 1:04 AM Uecker, Martin
 wrote:
>
>
>
> Hi Richard and Joseph,
>
> for discussion: here is my simple patch
> for __builtin_escape/__builtin_bless.
>
> Maybe it does something stupid.

I'd use an internal function once the frontend inserts the calls,
we shouldn't expose those to the user.

I expect the builtins to show as quite invasive with their
data dependence blocking many optimizations.  Any reason
that __builtin_escape is not marked as const?  Do you try
to prevent code motion of it so flow-sensitive "escape" analysis
can be used?  But nothing prevents hoisting of the bless call
across the escape one.

Richard.

>
> Best,
> Martin
>
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index e4a8694054e..d0046135213 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -6014,6 +6014,31 @@ expand_builtin_unreachable (void)
>emit_barrier ();
>  }
>
> +
> +static rtx
> +expand_builtin_escape (tree exp, rtx target)
> +{
> +  if (call_expr_nargs (exp) < 1)
> +return const0_rtx;
> +
> +  target = expand_expr (CALL_EXPR_ARG (exp, 0), target, Pmode,
> +   EXPAND_NORMAL);
> +
> +  return target;
> +}
> +
> +static rtx
> +expand_builtin_bless (tree exp, rtx target)
> +{
> +  if (call_expr_nargs (exp) < 1)
> +return const0_rtx;
> +
> +  target = expand_expr (CALL_EXPR_ARG (exp, 0), target, Pmode,
> +   EXPAND_NORMAL);
> +
> +  return target;
> +}
> +
>  /* Expand EXP, a call to fabs, fabsf or fabsl.
> Return NULL_RTX if a normal call should be emitted rather than expanding
> the function inline.  If convenient, the result should be placed
> @@ -8304,6 +8329,12 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
>expand_builtin_unreachable ();
>return const0_rtx;
>
> +case BUILT_IN_ESCAPE:
> +  return expand_builtin_escape (exp, target);
> +
> +case BUILT_IN_BLESS:
> +  return expand_builtin_bless (exp, target);
> +
>  CASE_FLT_FN (BUILT_IN_SIGNBIT):
>  case BUILT_IN_SIGNBITD32:
>  case BUILT_IN_SIGNBITD64:
> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index fa8b0641ab1..9264a0fdaab 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -865,6 +865,8 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_EXECVP, "execvp", 
> BT_FN_INT_CONST_STRING_PT
>  DEF_EXT_LIB_BUILTIN(BUILT_IN_EXECVE, "execve", 
> BT_FN_INT_CONST_STRING_PTR_CONST_STRING_PTR_CONST_STRING, ATTR_NOTHROW_LIST)
>  DEF_LIB_BUILTIN(BUILT_IN_EXIT, "exit", BT_FN_VOID_INT, 
> ATTR_NORETURN_NOTHROW_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_EXPECT, "expect", BT_FN_LONG_LONG_LONG, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_GCC_BUILTIN(BUILT_IN_ESCAPE, "escape", BT_FN_PTR_PTR, 
> ATTR_NOTHROW_LEAF_LIST)
> +DEF_GCC_BUILTIN(BUILT_IN_BLESS, "bless", BT_FN_PTR_PTR, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_EXPECT_WITH_PROBABILITY, 
> "expect_with_probability", BT_FN_LONG_LONG_LONG_DOUBLE, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_ASSUME_ALIGNED, "assume_aligned", 
> BT_FN_PTR_CONST_PTR_SIZE_VAR, ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_EXTEND_POINTER, "extend_pointer", 
> BT_FN_UNWINDWORD_PTR, ATTR_CONST_NOTHROW_LEAF_LIST)
> diff --git a/gcc/testsuite/gcc.dg/alias-17.c b/gcc/testsuite/gcc.dg/alias-17.c
> new file mode 100644
> index 000..c375e4027ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/alias-17.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O1" } */
> +
> +
> +static
> +int test(int a, int b, int c)
> +{
> +   int x, y;
> +   int d = ( < ) ? 1 : -1;
> +   if (a) __builtin_escape();
> +   y = 0; //!
> +   int *q = 
> +   q += d;
> +   int *r = q;
> +   if (b) r = __builtin_bless(q);
> +   *(c ? r : q) = 1;
> +   return y;
> +}
> +
> +int main()
> +{
> +   if (0 != test(0, 0, 1)) __builtin_abort();
> +   if (0 != test(0, 1, 0)) __builtin_abort();
> +   if (0 != test(0, 1, 1)) __builtin_abort();
> +   if (0 != test(1, 0, 1)) __builtin_abort();
> +// if (0 != test(1, 1, 0)) __builtin_abort();
> +   if (0 == test(1, 1, 1)) __builtin_abort();
> +   return 0;
> +}
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index 670676f20c3..d5befff71a6 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -2462,6 +2462,8 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
>   }
>
> /* The following builtins do not read from memory.  */
> +   case BUILT_IN_ESCAPE:
> +   case BUILT_IN_BLESS:
> case BUILT_IN_FREE:
> case BUILT_IN_MALLOC:
> case BUILT_IN_POSIX_MEMALIGN:
> diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
> index be6647db894..53a38a04ebb 100644
> --- a/gcc/tree-ssa-ccp.c
> +++ b/gcc/tree-ssa-ccp.c
> @@ -1963,6 +1963,8 @@ evaluate_stmt (gimple *stmt)
>   break;
>
> /* These builtins 

Re: [PATCH][DOCUMENTATION] Document ASLR for Precompiled Headers.

2020-02-04 Thread Jakub Jelinek
On Tue, Feb 04, 2020 at 02:55:31PM +0100, Richard Biener wrote:
> On Tue, Feb 4, 2020 at 2:45 PM Martin Liška  wrote:
> >
> > Hi.
> >
> > The patch is about enhancement of the documentation, where
> > we do not support ASLR for Precompiled Headers.
> >
> > Ready for trunk?
> 
> I think we support ASLR just fine?

We do support ASLR just fine, just the PCH from it will not be bitwise
reproduceable.  So the doc patch is misplaced (putting it between sentence
talking about options and the list of those options is weird) and should make
it clear that ASLR may result in non-reproduceable PCH files.

> > gcc/ChangeLog:
> >
> > 2020-02-04  Martin Liska  
> >
> > PR c++/92717
> > * doc/invoke.texi: Document that one should
> > not combine ASLR and -fpch.
> > ---
> >   gcc/doc/invoke.texi | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> >

Jakub



Re: [PATCH][DOCUMENTATION] Document ASLR for Precompiled Headers.

2020-02-04 Thread Martin Liška

On 2/4/20 2:55 PM, Richard Biener wrote:

On Tue, Feb 4, 2020 at 2:45 PM Martin Liška  wrote:


Hi.

The patch is about enhancement of the documentation, where
we do not support ASLR for Precompiled Headers.

Ready for trunk?


I think we support ASLR just fine?


There's issue where somebody confirms the opposite:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92717#c8
Martin




Thanks,
Martin

gcc/ChangeLog:

2020-02-04  Martin Liska  

 PR c++/92717
 * doc/invoke.texi: Document that one should
 not combine ASLR and -fpch.
---
   gcc/doc/invoke.texi | 2 ++
   1 file changed, 2 insertions(+)






Re: [PATCH][DOCUMENTATION] Document ASLR for Precompiled Headers.

2020-02-04 Thread Richard Biener
On Tue, Feb 4, 2020 at 2:45 PM Martin Liška  wrote:
>
> Hi.
>
> The patch is about enhancement of the documentation, where
> we do not support ASLR for Precompiled Headers.
>
> Ready for trunk?

I think we support ASLR just fine?

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2020-02-04  Martin Liska  
>
> PR c++/92717
> * doc/invoke.texi: Document that one should
> not combine ASLR and -fpch.
> ---
>   gcc/doc/invoke.texi | 2 ++
>   1 file changed, 2 insertions(+)
>
>


Re: [GCC][PATCH][ARM] Set profile to M for Armv8.1-M

2020-02-04 Thread Christophe Lyon
On Mon, 3 Feb 2020 at 18:20, Mihail Ionescu  wrote:
>
> Hi,
>
> We noticed that the profile for armv8.1-m.main was not set in arm-cpus.in
> , which led to TARGET_ARM_ARCH_PROFILE and _ARM_ARCH_PROFILE not being
> defined properly.
>
>
>
> gcc/ChangeLog:
>
> 2020-02-03  Mihail Ionescu  
>
> * config/arm/arm-cpus.in: Set profile M
> for armv8.1-m.main.
>
>
> Ok for trunk?
>
> Regards,
> Mihail
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 
> 1805b2b1cd8d6f65a967b4e3945257854a7e0fc1..96f584da325172bd1460251e2de0ad679589d312
>  100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -692,6 +692,7 @@ begin arch armv8.1-m.main
>   tune for cortex-m7
>   tune flags CO_PROC
>   base 8M_MAIN
> + profile M
>   isa ARMv8_1m_main
>  # fp => FPv5-sp-d16; fp.dp => FPv5-d16
>   option dsp add armv7em
>

I'm wondering whether this is obvious?
OTOH, what's the impact of missing this (or why didn't we notice the
problem via a failing testcase?)

Christophe


[PATCH][DOCUMENTATION] Document ASLR for Precompiled Headers.

2020-02-04 Thread Martin Liška

Hi.

The patch is about enhancement of the documentation, where
we do not support ASLR for Precompiled Headers.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2020-02-04  Martin Liska  

PR c++/92717
* doc/invoke.texi: Document that one should
not combine ASLR and -fpch.
---
 gcc/doc/invoke.texi | 2 ++
 1 file changed, 2 insertions(+)


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4dec0c8326b..eb4b786b405 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -31091,6 +31091,8 @@ for any cases where this rule is relaxed.
 @item Each of the following options must be the same when building and using
 the precompiled header:
 
+@item Address space layout randomization (ASLR) is disabled.
+
 @gccoptlist{-fexceptions}
 
 @item



[committed] libstdc++: Fix regressions in unique_ptr::swap (PR 93562)

2020-02-04 Thread Jonathan Wakely
The requirements for this function are only that the deleter is
swappable, but we incorrectly require that the element type is complete
and that the deleter can be swapped using std::swap (which requires it
to be move cosntructible and move assignable).

The fix is to add __uniq_ptr_impl::swap which swaps the pointer and
deleter individually, instead of using the generic std::swap on the
tuple containing them.

PR libstdc++/93562
* include/bits/unique_ptr.h (__uniq_ptr_impl::swap): Define.
(unique_ptr::swap, unique_ptr::swap): Call it.
* testsuite/20_util/unique_ptr/modifiers/93562.cc: New test.

Tested powerpc64le-linux, committed to master.

Backports to follow.


commit 9962493ca2f71d3f3dd06b0e9cd19fcf849e3e4b
Author: Jonathan Wakely 
Date:   Tue Feb 4 12:59:14 2020 +

libstdc++: Fix regressions in unique_ptr::swap (PR 93562)

The requirements for this function are only that the deleter is
swappable, but we incorrectly require that the element type is complete
and that the deleter can be swapped using std::swap (which requires it
to be move cosntructible and move assignable).

The fix is to add __uniq_ptr_impl::swap which swaps the pointer and
deleter individually, instead of using the generic std::swap on the
tuple containing them.

PR libstdc++/93562
* include/bits/unique_ptr.h (__uniq_ptr_impl::swap): Define.
(unique_ptr::swap, unique_ptr::swap): Call it.
* testsuite/20_util/unique_ptr/modifiers/93562.cc: New test.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index 131937a8e53..d03266c1878 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -185,6 +185,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return __p;
   }
 
+  void
+  swap(__uniq_ptr_impl& __rhs) noexcept
+  {
+   using std::swap;
+   swap(this->_M_ptr(), __rhs._M_ptr());
+   swap(this->_M_deleter(), __rhs._M_deleter());
+  }
+
 private:
   tuple _M_t;
 };
@@ -448,8 +456,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   swap(unique_ptr& __u) noexcept
   {
-   using std::swap;
-   swap(_M_t, __u._M_t);
+   static_assert(__is_swappable<_Dp>::value, "deleter must be swappable");
+   _M_t.swap(__u._M_t);
   }
 
   // Disable copy from lvalue.
@@ -703,8 +711,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   swap(unique_ptr& __u) noexcept
   {
-   using std::swap;
-   swap(_M_t, __u._M_t);
+   static_assert(__is_swappable<_Dp>::value, "deleter must be swappable");
+   _M_t.swap(__u._M_t);
   }
 
   // Disable copy from lvalue.
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/modifiers/93562.cc 
b/libstdc++-v3/testsuite/20_util/unique_ptr/modifiers/93562.cc
new file mode 100644
index 000..8ed236333ac
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/modifiers/93562.cc
@@ -0,0 +1,98 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+
+struct incomplete;
+
+// This function isn't called, we just need to check it compiles.
+void
+test01(std::unique_ptr& p1, std::unique_ptr& p2)
+{
+  // PR libstdc++/93562
+  p1.swap(p2);
+  swap(p1, p2);
+}
+
+// This function isn't called, we just need to check it compiles.
+void
+test02(std::unique_ptr& p1, std::unique_ptr& p2)
+{
+  // PR libstdc++/93562
+  p1.swap(p2);
+  swap(p1, p2);
+}
+
+namespace A
+{
+  struct Deleter
+  {
+Deleter& operator=(const Deleter&) = delete;
+
+void operator()(int* p) const noexcept { delete p; }
+
+// found by ADL
+friend void swap(Deleter& lhs, Deleter& rhs) noexcept
+{ std::swap(lhs.id, rhs.id); }
+
+int id;
+  };
+
+  static_assert(!std::is_move_assignable::value, "not assignable");
+#if __cplusplus >= 201703L
+  static_assert(std::is_swappable_v, "but swappable");
+#endif
+} // namespace A
+
+void
+test03()
+{
+  std::unique_ptr p1(new int(1), { -1 });
+  std::unique_ptr p2(new int(2), { -2 });
+  int* const pi1 = p1.get();
+  int* const pi2 = p2.get();
+  // This type must swappable even though the deleter is not 

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 2:13 PM Jakub Jelinek  wrote:
>
> On Tue, Feb 04, 2020 at 01:38:51PM +0100, Uros Bizjak wrote:
> > As Richard advised, let's put this safety stuff back. Usually, in
> > i386.md, these kind of splitters are implemented as two patterns, one
> > (define_insn_and_split) having "#", and the other (define_insn) with a
> > real insn. My opinion is, that this separation avoids confusion as
> > much as possible.
>
> Okay.  So like this if it passes bootstrap/regtest then?

Yes.

> 2020-02-04  Jakub Jelinek  
>
> PR target/92190
> * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
> include sets and not clobbers in the vzeroupper pattern.
> * config/i386/sse.md (*avx_vzeroupper): Require in insn condition that
> the parallel has 17 (64-bit) or 9 (32-bit) elts.
> (*avx_vzeroupper_1): New define_insn_and_split.
>
> * gcc.target/i386/pr92190.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386-features.c.jj  2020-02-04 13:33:32.713885386 +0100
> +++ gcc/config/i386/i386-features.c 2020-02-04 13:55:44.358058104 +0100
> @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
>
>   (set (reg:V2DF R) (reg:V2DF R))
>
> -   which preserves the low 128 bits but clobbers the upper bits.
> -   For a dead register we just use:
> -
> - (clobber (reg:V2DF R))
> -
> -   which invalidates any previous contents of R and stops R from becoming
> -   live across the vzeroupper in future.  */
> +   which preserves the low 128 bits but clobbers the upper bits.  */
>
>  static void
>  ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
>  {
>rtx pattern = PATTERN (insn);
>unsigned int nregs = TARGET_64BIT ? 16 : 8;
> -  rtvec vec = rtvec_alloc (nregs + 1);
> -  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> +  unsigned int npats = nregs;
>for (unsigned int i = 0; i < nregs; ++i)
>  {
>unsigned int regno = GET_SSE_REGNO (i);
> +  if (!bitmap_bit_p (live_regs, regno))
> +   npats--;
> +}
> +  if (npats == 0)
> +return;
> +  rtvec vec = rtvec_alloc (npats + 1);
> +  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> +  for (unsigned int i = 0, j = 0; i < nregs; ++i)
> +{
> +  unsigned int regno = GET_SSE_REGNO (i);
> +  if (!bitmap_bit_p (live_regs, regno))
> +   continue;
>rtx reg = gen_rtx_REG (V2DImode, regno);
> -  if (bitmap_bit_p (live_regs, regno))
> -   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
> -  else
> -   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> +  ++j;
> +  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
>  }
>XVEC (pattern, 0) = vec;
>df_insn_rescan (insn);
> --- gcc/config/i386/sse.md.jj   2020-02-04 13:33:32.733885088 +0100
> +++ gcc/config/i386/sse.md  2020-02-04 13:57:38.995349722 +0100
> @@ -19818,11 +19818,49 @@ (define_expand "avx_vzeroupper"
>  (define_insn "*avx_vzeroupper"
>[(match_parallel 0 "vzeroupper_pattern"
>   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> -  "TARGET_AVX"
> +  "TARGET_AVX && XVECLEN (operands[0], 0) == (TARGET_64BIT ? 16 : 8) + 1"
>"vzeroupper"
>[(set_attr "type" "sse")
> (set_attr "modrm" "0")
> (set_attr "memory" "none")
> +   (set_attr "prefix" "vex")
> +   (set_attr "btver2_decode" "vector")
> +   (set_attr "mode" "OI")])
> +
> +(define_insn_and_split "*avx_vzeroupper_1"
> +  [(match_parallel 0 "vzeroupper_pattern"
> + [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> +  "TARGET_AVX && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1"
> +  "#"
> +  "&& epilogue_completed"
> +  [(match_dup 0)]
> +{
> +  /* For IPA-RA purposes, make it clear the instruction clobbers
> + even XMM registers not mentioned explicitly in the pattern.  */
> +  unsigned int nregs = TARGET_64BIT ? 16 : 8;
> +  unsigned int npats = XVECLEN (operands[0], 0);
> +  rtvec vec = rtvec_alloc (nregs + 1);
> +  RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
> +  for (unsigned int i = 0, j = 1; i < nregs; ++i)
> +{
> +  unsigned int regno = GET_SSE_REGNO (i);
> +  if (j < npats
> + && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
> +   {
> + RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
> + j++;
> +   }
> +  else
> +   {
> + rtx reg = gen_rtx_REG (V2DImode, regno);
> + RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> +   }
> +}
> +  operands[0] = gen_rtx_PARALLEL (VOIDmode, vec);
> +}
> +  [(set_attr "type" "sse")
> +   (set_attr "modrm" "0")
> +   (set_attr "memory" "none")
> (set_attr "prefix" "vex")
> (set_attr "btver2_decode" "vector")
> (set_attr "mode" "OI")])
> --- gcc/testsuite/gcc.target/i386/pr92190.c.jj  2020-02-04 13:55:44.364058015 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr92190.c 2020-02-04 13:55:44.364058015 
> +0100
> @@ -0,0 +1,19 @@
> +/* PR target/92190 */

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Jakub Jelinek
On Tue, Feb 04, 2020 at 01:38:51PM +0100, Uros Bizjak wrote:
> As Richard advised, let's put this safety stuff back. Usually, in
> i386.md, these kind of splitters are implemented as two patterns, one
> (define_insn_and_split) having "#", and the other (define_insn) with a
> real insn. My opinion is, that this separation avoids confusion as
> much as possible.

Okay.  So like this if it passes bootstrap/regtest then?

2020-02-04  Jakub Jelinek  

PR target/92190
* config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
include sets and not clobbers in the vzeroupper pattern.
* config/i386/sse.md (*avx_vzeroupper): Require in insn condition that
the parallel has 17 (64-bit) or 9 (32-bit) elts.
(*avx_vzeroupper_1): New define_insn_and_split.

* gcc.target/i386/pr92190.c: New test.

--- gcc/config/i386/i386-features.c.jj  2020-02-04 13:33:32.713885386 +0100
+++ gcc/config/i386/i386-features.c 2020-02-04 13:55:44.358058104 +0100
@@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
 
  (set (reg:V2DF R) (reg:V2DF R))
 
-   which preserves the low 128 bits but clobbers the upper bits.
-   For a dead register we just use:
-
- (clobber (reg:V2DF R))
-
-   which invalidates any previous contents of R and stops R from becoming
-   live across the vzeroupper in future.  */
+   which preserves the low 128 bits but clobbers the upper bits.  */
 
 static void
 ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
 {
   rtx pattern = PATTERN (insn);
   unsigned int nregs = TARGET_64BIT ? 16 : 8;
-  rtvec vec = rtvec_alloc (nregs + 1);
-  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  unsigned int npats = nregs;
   for (unsigned int i = 0; i < nregs; ++i)
 {
   unsigned int regno = GET_SSE_REGNO (i);
+  if (!bitmap_bit_p (live_regs, regno))
+   npats--;
+}
+  if (npats == 0)
+return;
+  rtvec vec = rtvec_alloc (npats + 1);
+  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  for (unsigned int i = 0, j = 0; i < nregs; ++i)
+{
+  unsigned int regno = GET_SSE_REGNO (i);
+  if (!bitmap_bit_p (live_regs, regno))
+   continue;
   rtx reg = gen_rtx_REG (V2DImode, regno);
-  if (bitmap_bit_p (live_regs, regno))
-   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
-  else
-   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+  ++j;
+  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
 }
   XVEC (pattern, 0) = vec;
   df_insn_rescan (insn);
--- gcc/config/i386/sse.md.jj   2020-02-04 13:33:32.733885088 +0100
+++ gcc/config/i386/sse.md  2020-02-04 13:57:38.995349722 +0100
@@ -19818,11 +19818,49 @@ (define_expand "avx_vzeroupper"
 (define_insn "*avx_vzeroupper"
   [(match_parallel 0 "vzeroupper_pattern"
  [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
-  "TARGET_AVX"
+  "TARGET_AVX && XVECLEN (operands[0], 0) == (TARGET_64BIT ? 16 : 8) + 1"
   "vzeroupper"
   [(set_attr "type" "sse")
(set_attr "modrm" "0")
(set_attr "memory" "none")
+   (set_attr "prefix" "vex")
+   (set_attr "btver2_decode" "vector")
+   (set_attr "mode" "OI")])
+
+(define_insn_and_split "*avx_vzeroupper_1"
+  [(match_parallel 0 "vzeroupper_pattern"
+ [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
+  "TARGET_AVX && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1"
+  "#"
+  "&& epilogue_completed"
+  [(match_dup 0)]
+{
+  /* For IPA-RA purposes, make it clear the instruction clobbers
+ even XMM registers not mentioned explicitly in the pattern.  */
+  unsigned int nregs = TARGET_64BIT ? 16 : 8;
+  unsigned int npats = XVECLEN (operands[0], 0);
+  rtvec vec = rtvec_alloc (nregs + 1);
+  RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
+  for (unsigned int i = 0, j = 1; i < nregs; ++i)
+{
+  unsigned int regno = GET_SSE_REGNO (i);
+  if (j < npats
+ && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
+   {
+ RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
+ j++;
+   }
+  else
+   {
+ rtx reg = gen_rtx_REG (V2DImode, regno);
+ RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+   }
+}
+  operands[0] = gen_rtx_PARALLEL (VOIDmode, vec);
+}
+  [(set_attr "type" "sse")
+   (set_attr "modrm" "0")
+   (set_attr "memory" "none")
(set_attr "prefix" "vex")
(set_attr "btver2_decode" "vector")
(set_attr "mode" "OI")])
--- gcc/testsuite/gcc.target/i386/pr92190.c.jj  2020-02-04 13:55:44.364058015 
+0100
+++ gcc/testsuite/gcc.target/i386/pr92190.c 2020-02-04 13:55:44.364058015 
+0100
@@ -0,0 +1,19 @@
+/* PR target/92190 */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
+/* { dg-options "-mabi=ms -O2 -mavx512f" } */
+
+typedef char VC __attribute__((vector_size (16)));
+typedef int VI __attribute__((vector_size (16 * sizeof 0)));
+VC a;
+VI b;
+void bar (VI);
+void baz (VC);
+
+void
+foo (void)
+{
+  VC k = a;
+  VI n = b;
+  

[PATCH] Rework GCOV_PREFIX_STRIP env variable.

2020-02-04 Thread Martin Liška

Hi.

The patch addresses https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93401#c9
where we currently support mechanism for cross compilation:
GCC manual:
10.5 Data File Relocation to Support Cross-Profiling

That's very much the same what Honza needs for Firefox, where profile-gen
run happens in a different location than profile-use run.
However, I see current meaning of GCOV_PREFIX_STRIP a bit difficult
to use.

That's why I suggest to rename it to GCOV_PREFIX_STRIP_COMPONENTS and
use GCOV_PREFIX_STRIP for real strip prefix.

Thoughts?
Martin

gcc/ChangeLog:

2020-02-04  Martin Liska  

PR gcov-profile/93401
* doc/gcov.texi: Document new behaviour of GCOV_PREFIX_STRIP
and introduce GCOV_PREFIX_STRIP_COMPONENTS.

libgcc/ChangeLog:

2020-02-04  Martin Liska  

PR gcov-profile/93401
* libgcov-driver-system.c (allocate_filename_struct):
Parse also GCOV_PREFIX_STRIP_COMPONENTS.
(gcov_exit_open_gcda_file): Handle new meaning
of GCOV_PREFIX_STRIP and GCOV_PREFIX_STRIP_COMPONENTS.
* libgcov-driver.c (struct gcov_filename): New field.
---
 gcc/doc/gcov.texi  | 17 +++--
 libgcc/libgcov-driver-system.c |  7 +--
 libgcc/libgcov-driver.c|  1 +
 3 files changed, 17 insertions(+), 8 deletions(-)


diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 61250c9407e..154c7eb965a 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -951,11 +951,16 @@ in the object file. Prefix can be absolute, or relative.  The
 default is no prefix.
 
 @item
-GCOV_PREFIX_STRIP indicates the how many initial directory names to strip off
-the hardwired absolute paths. Default value is 0.
+GCOV_PREFIX_STRIP_COMPONENTS indicates the how many initial directory names
+to strip off the hardwired absolute paths.  Default value is 0.
 
-@emph{Note:} If GCOV_PREFIX_STRIP is set without GCOV_PREFIX is undefined,
- then a relative path is made out of the hardwired absolute paths.
+@item
+GCOV_PREFIX_STRIP indicates path prefix to strip off
+the hardwired absolute paths.
+
+@emph{Note:} If GCOV_PREFIX_STRIP_COMPONENTS (or GCOV_PREFIX_STRIP)
+is set without GCOV_PREFIX, then a relative path is made
+out of the hardwired absolute paths.
 @end itemize
 
 For example, if the object file @file{/user/build/foo.o} was built with
@@ -963,8 +968,8 @@ For example, if the object file @file{/user/build/foo.o} was built with
 @file{/user/build/foo.gcda} when running on the target system.  This will
 fail if the corresponding directory does not exist and it is unable to create
 it.  This can be overcome by, for example, setting the environment as
-@samp{GCOV_PREFIX=/target/run} and @samp{GCOV_PREFIX_STRIP=1}.  Such a
-setting will name the data file @file{/target/run/build/foo.gcda}.
+@samp{GCOV_PREFIX=/target/run} and @samp{GCOV_PREFIX_STRIP_COMPONENTS=1}.
+Such a setting will name the data file @file{/target/run/build/foo.gcda}.
 
 You must move the data files to the expected directory tree in order to
 use them for profile directed optimizations (@option{-fprofile-use}), or to
diff --git a/libgcc/libgcov-driver-system.c b/libgcc/libgcov-driver-system.c
index 031f057e318..c17ce72b4b6 100644
--- a/libgcc/libgcov-driver-system.c
+++ b/libgcc/libgcov-driver-system.c
@@ -218,7 +218,7 @@ allocate_filename_struct (struct gcov_filename *gf)
 
   {
 /* Check if the level of dirs to strip off specified. */
-char *tmp = getenv("GCOV_PREFIX_STRIP");
+char *tmp = getenv ("GCOV_PREFIX_STRIP_COMPONENTS");
 if (tmp)
   {
 strip = atoi (tmp);
@@ -228,6 +228,7 @@ allocate_filename_struct (struct gcov_filename *gf)
   }
   }
   gf->strip = strip;
+  gf->strip_prefix = getenv ("GCOV_PREFIX_STRIP");
 
   /* Get file name relocation prefix.  Non-absolute values are ignored. */
   gcov_prefix = getenv("GCOV_PREFIX");
@@ -239,7 +240,7 @@ allocate_filename_struct (struct gcov_filename *gf)
 
   /* If no prefix was specified and a prefix stip, then we assume
  relative.  */
-  if (!prefix_length && gf->strip)
+  if (!prefix_length && (gf->strip || gf->strip_prefix))
 {
   gcov_prefix = ".";
   prefix_length = 1;
@@ -286,6 +287,8 @@ gcov_exit_open_gcda_file (struct gcov_info *gi_ptr,
 level--;
   }
 }
+  else if (gf->strip_prefix && strstr (fname, gf->strip_prefix) == fname)
+fname += strlen (gf->strip_prefix);
 
   /* Update complete filename with stripped original. */
   if (gf->prefix)
diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index fb320738e1e..15c2009f8fc 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -76,6 +76,7 @@ struct gcov_filename
 {
   char *filename;  /* filename buffer */
   int strip; /* leading chars to strip from filename */
+  char *strip_prefix; /* leading path prefix that should be stripped */
   char *prefix; /* prefix string */
 };
 



Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 1:30 PM Jakub Jelinek  wrote:
>
> On Tue, Feb 04, 2020 at 12:24:10PM +0100, Uros Bizjak wrote:
> > > A && is missing in the split condition to inherit TARGET_AVX.
> >
> > Also, you don't need to emit "#" in output template. This is just for
> > extra safety, we can live without. Please see e.g. "*tzcnt_1".
>
> I was worried that -fipa-ra then would give wrong answers.
> -O2 -mabi=ms -mavx512f -fno-schedule-insns2 I was worried about isn't a 
> problem though,
> because while the split4 pass is disabled, the split3 pass is scheduled 
> instead before
> regstack.
> I can do
> -O2 -mabi=ms -mavx512f -fdisable-rtl-split4
> or
> -O2 -mabi=ms -mavx512f -fno-schedule-insns2 -fdisable-rtl-split3
> though and then we ICE, with the patch as is with just && epilogue_completed 
> ...
> the ICE is
> cc1: note: disable pass rtl-split4 for functions in the range of [0, 
> 4294967295]
> pr92190.c: In function ‘foo’:
> pr92190.c:19:1: error: could not split insn
>19 | }
>   | ^
> (insn:TI 23 18 10 2 (parallel [
> (unspec_volatile [
> (const_int 0 [0])
> ] UNSPECV_VZEROUPPER)
> (clobber (reg:V2DI 20 xmm0))
> (clobber (reg:V2DI 21 xmm1))
> (clobber (reg:V2DI 22 xmm2))
> (clobber (reg:V2DI 23 xmm3))
> (clobber (reg:V2DI 24 xmm4))
> (clobber (reg:V2DI 25 xmm5))
> (set (reg:V2DI 26 xmm6)
> (reg:V2DI 26 xmm6))
> (clobber (reg:V2DI 27 xmm7))
> (clobber (reg:V2DI 44 xmm8))
> (clobber (reg:V2DI 45 xmm9))
> (clobber (reg:V2DI 46 xmm10))
> (clobber (reg:V2DI 47 xmm11))
> (clobber (reg:V2DI 48 xmm12))
> (clobber (reg:V2DI 49 xmm13))
> (clobber (reg:V2DI 50 xmm14))
> (clobber (reg:V2DI 51 xmm15))
> ]) "pr92190.c":17:3 4895 {*avx_vzeroupper}
>  (nil))
> because as the splitter is written, it modifies the insn in place and so
> later in try_split
>   if (INSN_P (insn_last)
>   && rtx_equal_p (PATTERN (insn_last), pat))
> return trial;
> rtx_equal_p is true because of the modification in place.
> Now, if I slightly modify the patch, so that it does
>   operands[0] = gen_rtx_PARALLEL (VOIDmode, vec);
> instead of
>   XVEC (operands[0], 0) = vec;
> and thus doesn't modify in place, it will ICE in a different spot:
> pr92190.c: In function ‘foo’:
> pr92190.c:19:1: internal compiler error: in final_scan_insn_1, at final.c:3078
>19 | }
>   | ^
> i.e. on
> /* If we have a length attribute, this instruction should have
>been split in shorten_branches, to ensure that we would have
>valid length info for the splitees.  */
> gcc_assert (!HAVE_ATTR_length);
> Though, I guess that is not really specific to this patch, by
> -fdisable-rtl-split{3,4,5} we effectively disable all possibilities of
> splitting before final, so anything that isn't split before will ICE that
> way.
> So, if we wanted to remove that class of ICEs, we should just fatal_error if
> somebody disables the last splitter pass before expansion and
> HAVE_ATTR_length is defined.
>
> Here is the patch with && before epilogue_completed, no "#" and that avoids
> modification of the insn in place.  We don't ICE, just could generate wrong
> code with -fipa-ra, but user asked for that with -fdisable-rtl-split{3,4,5},
> so guess that is ok.
>
> 2020-02-04  Jakub Jelinek  
>
> PR target/92190
> * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
> include sets and not clobbers in the vzeroupper pattern.
> * config/i386/sse.md (*avx_vzeroupper): Change from define_insn to
> define_insn_and_split, split if epilogue_completed and not all 
> xmm0-xmm15
> registers are mentioned in the pattern and add clobbers for the 
> missing
> registers at that point.
>
> * gcc.target/i386/pr92190.c: New test.

As Richard advised, let's put this safety stuff back. Usually, in
i386.md, these kind of splitters are implemented as two patterns, one
(define_insn_and_split) having "#", and the other (define_insn) with a
real insn. My opinion is, that this separation avoids confusion as
much as possible.

Uros.

> --- gcc/config/i386/i386-features.c.jj  2020-02-04 11:40:58.755611428 +0100
> +++ gcc/config/i386/i386-features.c 2020-02-04 11:51:33.602148491 +0100
> @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
>
>   (set (reg:V2DF R) (reg:V2DF R))
>
> -   which preserves the low 128 bits but clobbers the upper bits.
> -   For a dead register we just use:
> -
> - (clobber (reg:V2DF R))
> -
> -   which invalidates any previous contents of R and stops R from becoming
> -   live across the vzeroupper in future.  */
> +   which preserves the low 128 bits but clobbers the upper bits.  */
>
>  static void
>  

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Jakub Jelinek
On Tue, Feb 04, 2020 at 12:24:10PM +0100, Uros Bizjak wrote:
> > A && is missing in the split condition to inherit TARGET_AVX.
> 
> Also, you don't need to emit "#" in output template. This is just for
> extra safety, we can live without. Please see e.g. "*tzcnt_1".

I was worried that -fipa-ra then would give wrong answers.
-O2 -mabi=ms -mavx512f -fno-schedule-insns2 I was worried about isn't a problem 
though,
because while the split4 pass is disabled, the split3 pass is scheduled instead 
before
regstack.
I can do
-O2 -mabi=ms -mavx512f -fdisable-rtl-split4
or
-O2 -mabi=ms -mavx512f -fno-schedule-insns2 -fdisable-rtl-split3
though and then we ICE, with the patch as is with just && epilogue_completed ...
the ICE is
cc1: note: disable pass rtl-split4 for functions in the range of [0, 4294967295]
pr92190.c: In function ‘foo’:
pr92190.c:19:1: error: could not split insn
   19 | }
  | ^
(insn:TI 23 18 10 2 (parallel [
(unspec_volatile [
(const_int 0 [0])
] UNSPECV_VZEROUPPER)
(clobber (reg:V2DI 20 xmm0))
(clobber (reg:V2DI 21 xmm1))
(clobber (reg:V2DI 22 xmm2))
(clobber (reg:V2DI 23 xmm3))
(clobber (reg:V2DI 24 xmm4))
(clobber (reg:V2DI 25 xmm5))
(set (reg:V2DI 26 xmm6)
(reg:V2DI 26 xmm6))
(clobber (reg:V2DI 27 xmm7))
(clobber (reg:V2DI 44 xmm8))
(clobber (reg:V2DI 45 xmm9))
(clobber (reg:V2DI 46 xmm10))
(clobber (reg:V2DI 47 xmm11))
(clobber (reg:V2DI 48 xmm12))
(clobber (reg:V2DI 49 xmm13))
(clobber (reg:V2DI 50 xmm14))
(clobber (reg:V2DI 51 xmm15))
]) "pr92190.c":17:3 4895 {*avx_vzeroupper}
 (nil))
because as the splitter is written, it modifies the insn in place and so
later in try_split
  if (INSN_P (insn_last)
  && rtx_equal_p (PATTERN (insn_last), pat))
return trial;
rtx_equal_p is true because of the modification in place.
Now, if I slightly modify the patch, so that it does
  operands[0] = gen_rtx_PARALLEL (VOIDmode, vec);
instead of
  XVEC (operands[0], 0) = vec;
and thus doesn't modify in place, it will ICE in a different spot:
pr92190.c: In function ‘foo’:
pr92190.c:19:1: internal compiler error: in final_scan_insn_1, at final.c:3078
   19 | }
  | ^
i.e. on
/* If we have a length attribute, this instruction should have
   been split in shorten_branches, to ensure that we would have
   valid length info for the splitees.  */
gcc_assert (!HAVE_ATTR_length);
Though, I guess that is not really specific to this patch, by
-fdisable-rtl-split{3,4,5} we effectively disable all possibilities of
splitting before final, so anything that isn't split before will ICE that
way.
So, if we wanted to remove that class of ICEs, we should just fatal_error if
somebody disables the last splitter pass before expansion and
HAVE_ATTR_length is defined.

Here is the patch with && before epilogue_completed, no "#" and that avoids
modification of the insn in place.  We don't ICE, just could generate wrong
code with -fipa-ra, but user asked for that with -fdisable-rtl-split{3,4,5},
so guess that is ok.

2020-02-04  Jakub Jelinek  

PR target/92190
* config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
include sets and not clobbers in the vzeroupper pattern.
* config/i386/sse.md (*avx_vzeroupper): Change from define_insn to
define_insn_and_split, split if epilogue_completed and not all 
xmm0-xmm15
registers are mentioned in the pattern and add clobbers for the missing
registers at that point.

* gcc.target/i386/pr92190.c: New test.

--- gcc/config/i386/i386-features.c.jj  2020-02-04 11:40:58.755611428 +0100
+++ gcc/config/i386/i386-features.c 2020-02-04 11:51:33.602148491 +0100
@@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
 
  (set (reg:V2DF R) (reg:V2DF R))
 
-   which preserves the low 128 bits but clobbers the upper bits.
-   For a dead register we just use:
-
- (clobber (reg:V2DF R))
-
-   which invalidates any previous contents of R and stops R from becoming
-   live across the vzeroupper in future.  */
+   which preserves the low 128 bits but clobbers the upper bits.  */
 
 static void
 ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
 {
   rtx pattern = PATTERN (insn);
   unsigned int nregs = TARGET_64BIT ? 16 : 8;
-  rtvec vec = rtvec_alloc (nregs + 1);
-  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  unsigned int npats = nregs;
   for (unsigned int i = 0; i < nregs; ++i)
 {
   unsigned int regno = GET_SSE_REGNO (i);
+  if (!bitmap_bit_p (live_regs, regno))
+   npats--;
+}
+  if (npats == 0)
+return;
+  rtvec vec = rtvec_alloc (npats + 1);
+  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  for (unsigned int i = 0, j = 0; i < 

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 1:06 PM Richard Sandiford
 wrote:
>
> Uros Bizjak  writes:
> > On Tue, Feb 4, 2020 at 12:13 PM Uros Bizjak  wrote:
> >>
> >> On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek  wrote:
> >> >
> >> > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> >> > > If it works OK, I'd rather see this functionality implemented as an
> >> > > epilogue_completed guarded splitter. In the .md files, there are
> >> > > already cases where we split at this point, and where it is assumed
> >> > > that allocated registers won't change anymore. Also, please don't make
> >> > > the functionality conditional on flag_split_ra. This way, we would
> >> > > always get new patterns in the debug dumps, so in case something goes
> >> > > wrong, one could at least clearly see the full pattern.
> >> >
> >> > The following seems to work on the testcase, will bootstrap/regtest it 
> >> > soon.
> >> >
> >> > 2020-02-04  Jakub Jelinek  
> >> >
> >> > PR target/92190
> >> > * config/i386/i386-features.c 
> >> > (ix86_add_reg_usage_to_vzeroupper): Only
> >> > include sets and not clobbers in the vzeroupper pattern.
> >> > * config/i386/sse.md (*avx_vzeroupper): Change from define_insn 
> >> > to
> >> > define_insn_and_split, split if epilogue_completed and not all 
> >> > xmm0-xmm15
> >> > registers are mentioned in the pattern and add clobbers for the 
> >> > missing
> >> > registers at that point.
> >> >
> >> > * gcc.target/i386/pr92190.c: New test.
> >>
> >> A && is missing in the split condition to inherit TARGET_AVX.
> >
> > Also, you don't need to emit "#" in output template. This is just for
> > extra safety, we can live without. Please see e.g. "*tzcnt_1".
>
> IMO it's better to keep it here, where we're relying on the split
> happening for correctness.  There's no theoretical guarantee that
> a split ever happens otherwise.

In this case, I think it would be better to have two patterns, one
(define_insn_and_split) with "#" and the other "real" (define_insn).

Uros.

> Thanks,
> Richard


> >> OK with this change.
> >>
> >> Thanks,
> >> Uros.
> >>
> >> >
> >> > --- gcc/config/i386/i386-features.c.jj  2020-02-04 11:40:58.755611428 
> >> > +0100
> >> > +++ gcc/config/i386/i386-features.c 2020-02-04 11:51:33.602148491 
> >> > +0100
> >> > @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
> >> >
> >> >   (set (reg:V2DF R) (reg:V2DF R))
> >> >
> >> > -   which preserves the low 128 bits but clobbers the upper bits.
> >> > -   For a dead register we just use:
> >> > -
> >> > - (clobber (reg:V2DF R))
> >> > -
> >> > -   which invalidates any previous contents of R and stops R from 
> >> > becoming
> >> > -   live across the vzeroupper in future.  */
> >> > +   which preserves the low 128 bits but clobbers the upper bits.  */
> >> >
> >> >  static void
> >> >  ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
> >> >  {
> >> >rtx pattern = PATTERN (insn);
> >> >unsigned int nregs = TARGET_64BIT ? 16 : 8;
> >> > -  rtvec vec = rtvec_alloc (nregs + 1);
> >> > -  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> >> > +  unsigned int npats = nregs;
> >> >for (unsigned int i = 0; i < nregs; ++i)
> >> >  {
> >> >unsigned int regno = GET_SSE_REGNO (i);
> >> > +  if (!bitmap_bit_p (live_regs, regno))
> >> > +   npats--;
> >> > +}
> >> > +  if (npats == 0)
> >> > +return;
> >> > +  rtvec vec = rtvec_alloc (npats + 1);
> >> > +  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> >> > +  for (unsigned int i = 0, j = 0; i < nregs; ++i)
> >> > +{
> >> > +  unsigned int regno = GET_SSE_REGNO (i);
> >> > +  if (!bitmap_bit_p (live_regs, regno))
> >> > +   continue;
> >> >rtx reg = gen_rtx_REG (V2DImode, regno);
> >> > -  if (bitmap_bit_p (live_regs, regno))
> >> > -   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
> >> > -  else
> >> > -   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> >> > +  ++j;
> >> > +  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
> >> >  }
> >> >XVEC (pattern, 0) = vec;
> >> >df_insn_rescan (insn);
> >> > --- gcc/config/i386/sse.md.jj   2020-02-04 11:40:58.813610563 +0100
> >> > +++ gcc/config/i386/sse.md  2020-02-04 11:58:31.544909659 +0100
> >> > @@ -19815,11 +19815,43 @@ (define_expand "avx_vzeroupper"
> >> >[(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> >> >"TARGET_AVX")
> >> >
> >> > -(define_insn "*avx_vzeroupper"
> >> > +(define_insn_and_split "*avx_vzeroupper"
> >> >[(match_parallel 0 "vzeroupper_pattern"
> >> >   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> >> >"TARGET_AVX"
> >> > -  "vzeroupper"
> >> > +{
> >> > +  if (XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
> >> > +return "#";
> >> > +  else
> >> > +return "vzeroupper";
> >> > +}
> >> > +  "epilogue_completed
> >>
> >> && epilogue_completed
> >>

[PATCH coroutines] Change lowering behavior of alias variable from copy to substitute

2020-02-04 Thread JunMa

Hi
When testing coroutines with lambda function, I find we copy each captured
variable to frame. However, according to gimplify pass, for each declaration
that is an alias for another expression(DECL_VALUE_EXPR), we can
substitute them directly.

Since lambda captured variables is one of this kind. It is better to 
replace them
rather than copy them, This can reduce frame size (all of the captured 
variables

are field of closure class) and avoid extra copy behavior as well.

This patch remove all of the code related to copy captured variable.
Instead, we first rewrite DECL_VALUE_EXPR with frame field, then we check
every variable whether it has DECL_VALUE_EXPR, and then substitute it, this
patch does not create frame field for such variables.

Bootstrap and test on X86_64, is it OK?

Regards
JunMa

gcc/cp
2020-02-04  Jun Ma 

    * coroutines.cc (morph_fn_to_coro): Remove code related to
    copy captured variable.
    (register_local_var_uses):  Ditto.
    (register_param_uses):  Collect use of parameters inside
    DECL_VALUE_EXPR.
    (transform_local_var_uses): Substitute the alias variable
    with DECL_VALUE_EXPR if it has one.


gcc/testsuite
2020-02-04  Jun Ma 

    * g++.dg/coroutines/lambda-07-multi-capture.C: New test.
---
 gcc/cp/coroutines.cc  | 117 +-
 .../torture/lambda-07-multi-capture.C |  55 
 2 files changed, 85 insertions(+), 87 deletions(-)
 create mode 100644 
gcc/testsuite/g++.dg/coroutines/torture/lambda-07-multi-capture.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 0c2ec32d7db..1bc2ed7f89c 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1790,6 +1790,11 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  cp_walk_tree (_SIZE (lvar), transform_local_var_uses, d, NULL);
  cp_walk_tree (_SIZE_UNIT (lvar), transform_local_var_uses, d,
NULL);
+ if (DECL_HAS_VALUE_EXPR_P (lvar))
+   {
+ tree t  = DECL_VALUE_EXPR (lvar);
+ cp_walk_tree (, transform_local_var_uses, d, NULL);
+   }
 
  /* TODO: implement selective generation of fields when vars are
 known not-used.  */
@@ -1815,9 +1820,9 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  gcc_checking_assert (existed);
 
  if (local_var.field_id == NULL_TREE)
-   pvar = _CHAIN (*pvar); /* Wasn't used.  */
-
- *pvar = DECL_CHAIN (*pvar); /* discard this one, we replaced it.  */
+   pvar = _CHAIN (*pvar); /* Wasn't used or was an alias.  */
+ else
+   *pvar = DECL_CHAIN (*pvar); /* discard this one, we replaced it.  */
}
 
   *do_subtree = 0; /* We've done the body already.  */
@@ -1847,8 +1852,16 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
   if (local_var_i == NULL)
 return NULL_TREE;
 
-  /* This is our revised 'local' i.e. a frame slot.  */
-  tree revised = local_var_i->field_idx;
+  /* This is our revised 'local' i.e. a frame slot or an alias.  */
+  tree revised  = NULL_TREE;
+  if (local_var_i->field_id == NULL_TREE)
+{
+  gcc_checking_assert (DECL_HAS_VALUE_EXPR_P (var_decl));
+  /* If the decl is an alias for another expression, substitute it.  */
+  revised = DECL_VALUE_EXPR (var_decl);
+}
+  else
+revised = local_var_i->field_idx;
   gcc_checking_assert (DECL_CONTEXT (var_decl) == lvd->context);
 
   if (decl_expr_p && DECL_INITIAL (var_decl))
@@ -2796,6 +2809,12 @@ register_param_uses (tree *stmt, int *do_subtree 
ATTRIBUTE_UNUSED, void *d)
 {
   param_frame_data *data = (param_frame_data *) d;
 
+  if (VAR_P (*stmt) && DECL_HAS_VALUE_EXPR_P (*stmt))
+{
+  tree x = DECL_VALUE_EXPR (*stmt);
+  cp_walk_tree (, register_param_uses, d, NULL);
+}
+
   if (TREE_CODE (*stmt) != PARM_DECL)
 return NULL_TREE;
 
@@ -2840,7 +2859,6 @@ struct local_vars_frame_data
 {
   tree *field_list;
   hash_map *local_var_uses;
-  vec *captures;
   unsigned int nest_depth, bind_indx;
   location_t loc;
   bool saw_capture;
@@ -2869,26 +2887,27 @@ register_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  local_var_info _var
= lvd->local_var_uses->get_or_insert (lvar, );
  gcc_checking_assert (!existed);
+ /* For var as an alias, just leave it.  */
+ if (DECL_HAS_VALUE_EXPR_P (lvar))
+   continue;
  tree lvtype = TREE_TYPE (lvar);
  tree lvname = DECL_NAME (lvar);
- bool captured = is_normal_capture_proxy (lvar);
  /* Make names depth+index unique, so that we can support nested
 scopes with identically named locals.  */
  char *buf;
  size_t namsize = sizeof ("__lv...") + 18;
- const char *nm = (captured ? "cp" : "lv");
  if (lvname != NULL_TREE)
{
  namsize += IDENTIFIER_LENGTH (lvname);

Re: [PATCH] libcpp: Fix ICEs on __has_include syntax errors [PR93545]

2020-02-04 Thread Nathan Sidwell

On 2/4/20 3:46 AM, Jakub Jelinek wrote:

Hi!

Some of the following testcases ICE, because one of the cpp_get_token
calls in builtin_has_include reads the CPP_EOF token but the caller isn't
aware that CPP_EOF has been reached and will do another cpp_get_token.
get_token_no_padding is something that is use by the
has_attribute/has_builtin callbacks, which will first peek and will not
consume CPP_EOF (but will consume other tokens).  The !SEEN_EOL ()
check on the other side doesn't work anymore and isn't really needed,
as we don't consume the EOF.  The change adds one further error to the
pr88974.c testcase, if we wanted just one error per __has_include,
we could add some boolean whether we've emitted errors already and
only emit the first one we encounter (not implemented).


thanks for fixing.

nathan
--
Nathan Sidwell


Re: [PATCH] libcpp: Diagnose __has_include outside of preprocessor directives [PR93545]

2020-02-04 Thread Nathan Sidwell

On 2/4/20 3:52 AM, Jakub Jelinek wrote:

Hi!

The standard says http://eel.is/c++draft/cpp.cond#7.sentence-2 that
__has_include can't appear at arbitrary places in the source.  As we have
not recognized __has_include* outside of preprocessing directives in the
past, accepting it there now would be a regression.  The patch does still
allow it in #define if it is then used in preprocessing directives, I guess
that use isn't strictly valid either, but clang seems to accept it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


yes, thanks!


--
Nathan Sidwell


Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Richard Sandiford
Uros Bizjak  writes:
> On Tue, Feb 4, 2020 at 12:13 PM Uros Bizjak  wrote:
>>
>> On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek  wrote:
>> >
>> > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
>> > > If it works OK, I'd rather see this functionality implemented as an
>> > > epilogue_completed guarded splitter. In the .md files, there are
>> > > already cases where we split at this point, and where it is assumed
>> > > that allocated registers won't change anymore. Also, please don't make
>> > > the functionality conditional on flag_split_ra. This way, we would
>> > > always get new patterns in the debug dumps, so in case something goes
>> > > wrong, one could at least clearly see the full pattern.
>> >
>> > The following seems to work on the testcase, will bootstrap/regtest it 
>> > soon.
>> >
>> > 2020-02-04  Jakub Jelinek  
>> >
>> > PR target/92190
>> > * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): 
>> > Only
>> > include sets and not clobbers in the vzeroupper pattern.
>> > * config/i386/sse.md (*avx_vzeroupper): Change from define_insn to
>> > define_insn_and_split, split if epilogue_completed and not all 
>> > xmm0-xmm15
>> > registers are mentioned in the pattern and add clobbers for the 
>> > missing
>> > registers at that point.
>> >
>> > * gcc.target/i386/pr92190.c: New test.
>>
>> A && is missing in the split condition to inherit TARGET_AVX.
>
> Also, you don't need to emit "#" in output template. This is just for
> extra safety, we can live without. Please see e.g. "*tzcnt_1".

IMO it's better to keep it here, where we're relying on the split
happening for correctness.  There's no theoretical guarantee that
a split ever happens otherwise.

Thanks,
Richard

>> OK with this change.
>>
>> Thanks,
>> Uros.
>>
>> >
>> > --- gcc/config/i386/i386-features.c.jj  2020-02-04 11:40:58.755611428 +0100
>> > +++ gcc/config/i386/i386-features.c 2020-02-04 11:51:33.602148491 +0100
>> > @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
>> >
>> >   (set (reg:V2DF R) (reg:V2DF R))
>> >
>> > -   which preserves the low 128 bits but clobbers the upper bits.
>> > -   For a dead register we just use:
>> > -
>> > - (clobber (reg:V2DF R))
>> > -
>> > -   which invalidates any previous contents of R and stops R from becoming
>> > -   live across the vzeroupper in future.  */
>> > +   which preserves the low 128 bits but clobbers the upper bits.  */
>> >
>> >  static void
>> >  ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
>> >  {
>> >rtx pattern = PATTERN (insn);
>> >unsigned int nregs = TARGET_64BIT ? 16 : 8;
>> > -  rtvec vec = rtvec_alloc (nregs + 1);
>> > -  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
>> > +  unsigned int npats = nregs;
>> >for (unsigned int i = 0; i < nregs; ++i)
>> >  {
>> >unsigned int regno = GET_SSE_REGNO (i);
>> > +  if (!bitmap_bit_p (live_regs, regno))
>> > +   npats--;
>> > +}
>> > +  if (npats == 0)
>> > +return;
>> > +  rtvec vec = rtvec_alloc (npats + 1);
>> > +  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
>> > +  for (unsigned int i = 0, j = 0; i < nregs; ++i)
>> > +{
>> > +  unsigned int regno = GET_SSE_REGNO (i);
>> > +  if (!bitmap_bit_p (live_regs, regno))
>> > +   continue;
>> >rtx reg = gen_rtx_REG (V2DImode, regno);
>> > -  if (bitmap_bit_p (live_regs, regno))
>> > -   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
>> > -  else
>> > -   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
>> > +  ++j;
>> > +  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
>> >  }
>> >XVEC (pattern, 0) = vec;
>> >df_insn_rescan (insn);
>> > --- gcc/config/i386/sse.md.jj   2020-02-04 11:40:58.813610563 +0100
>> > +++ gcc/config/i386/sse.md  2020-02-04 11:58:31.544909659 +0100
>> > @@ -19815,11 +19815,43 @@ (define_expand "avx_vzeroupper"
>> >[(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
>> >"TARGET_AVX")
>> >
>> > -(define_insn "*avx_vzeroupper"
>> > +(define_insn_and_split "*avx_vzeroupper"
>> >[(match_parallel 0 "vzeroupper_pattern"
>> >   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
>> >"TARGET_AVX"
>> > -  "vzeroupper"
>> > +{
>> > +  if (XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
>> > +return "#";
>> > +  else
>> > +return "vzeroupper";
>> > +}
>> > +  "epilogue_completed
>>
>> && epilogue_completed
>>
>> > +   && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1"
>> > +  [(match_dup 0)]
>> > +{
>> > +  /* For IPA-RA purposes, make it clear the instruction clobbers
>> > + even XMM registers not mentioned explicitly in the pattern.  */
>> > +  unsigned int nregs = TARGET_64BIT ? 16 : 8;
>> > +  unsigned int npats = XVECLEN (operands[0], 0);
>> > +  rtvec vec = rtvec_alloc (nregs + 1);
>> > +  RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
>> > +  for 

Re: [PATCH Coroutines v2] Handle type deduction of auto and decltype(auto) with indirect_ref expression

2020-02-04 Thread Nathan Sidwell

On 2/3/20 8:47 PM, JunMa wrote:

在 2020/2/3 下午8:53, Nathan Sidwell 写道:


sorry, I should have realized you still needed this bit.  This too is 
stripping a reference access, right?  I.e. TYPE_REF_P (TREE_TYPE 
(resume_call)) is true. You should that as the condition too.



you mean REFERENCE_REF_P (resume_call) ?  here is the update.


ah, missed that predicate -- well found!

yes, this is all good, thanks.

nathan

--
Nathan Sidwell


Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-02-04 Thread Richard Sandiford
Stam Markianos-Wright  writes:
> On 1/31/20 1:45 PM, Richard Sandiford wrote:
>> Stam Markianos-Wright  writes:
>>> On 1/30/20 10:01 AM, Richard Sandiford wrote:
 Stam Markianos-Wright  writes:
> On 1/29/20 12:42 PM, Richard Sandiford wrote:
>> Stam Markianos-Wright  writes:
>>> Hi all,
>>>
>>> This fixes:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300
>>>
>>> Genmodes.c was generating the "wider_mode" chain as follows:
>>>
>>> HF -> BF -> SF - > DF -> TF -> VOID
>>>
>>> This caused issues in some rare cases where conversion between modes
>>> was needed, such as the above PR93300 where BFmode was being picked up
>>> as a valid mode for:
>>>
>>> optabs.c:prepare_float_lib_cmp
>>>
>>> which then led to the ICE at expr.c:convert_mode_scalar.
>
> Hi Richard,
>
>>
>> Can you go into more details about why this chain was a problem?
>> Naively, it's the one I'd have expected: HF should certainly have
>> priority over BF,
>
> Is that because functionally it looks like genmodes puts things in reverse
> alphabetical order if all else is equal? (If I'm reading the comment about
> MODE_RANDOM, MODE_CC correctly)
>
>> but BF coming before SF doesn't seem unusual in itself.
>>
>> I'm not saying the patch is wrong.  It just wasn't clear why it was
>> right either.
>>
> Yes, I see what you mean. I'll go through my thought process here:
>
> In investigating the ICE PR93300 I found that the diversion from pre-bf16
> behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a
> `FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate
> library calls for conversions.
>
> This was then being caught further down by the gcc_assert at expr.c:325 
> where
> GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) 
> because
> it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` 
> (which
> is what happened if i removed the gcc_assert at expr.c:325)
>
> With BFmode being a target-defined mode, I didn't want to add something 
> like `if
> (mode != BFmode)` to specifically exclude BFmode from being selected for 
> this.
> (and there's nothing different between HFmode and BFmode here to allow me 
> to
> make this distinction?)
>
> Also I couldn't find anywhere where the target back-end is not consulted 
> for a
> "is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the
> libcall being created later on as __extendhfbf2.

 Yeah, prepare_float_lib_cmp just checks for libfuncs rather than
 calling target hooks directly.  The libfuncs themselves are under
 the control of the target though.

 By default we assume all float modes have associated libfuncs.
 It's then up to the target to remove functions that don't exist
 (or redirect them to other functions).  So I think we need to remove
 BFmode libfuncs in arm_init_libfuncs in the same way as we currently
 do for HFmode.

 I guess we should also nullify the conversion libfuncs for BFmode,
 not just the arithmetic and comparison ones.
>>>
>>> Ahhh now this works, thank you for the suggestion!
>>>
>>> I was aware of arm_init_libfuncs, but I had not realised that returning NULL
>>> would have the desired effect for us, in this case. So I have essentially 
>>> rolled
>>> back the whole previous version of the patch and done this in the new diff.
>>> It seems to have fixed the ICE and I am currently in the process of 
>>> regression
>>> testing!
>> 
>> LGTM behaviourally, just a couple of requests about how it's written:
>> 
>>>
>>> Thank you!
>>> Stam
>>>

 Thanks,
 Richard

> Finally, because we really don't want __bf16 to be in the same 
> "conversion rank"
> as standard float modes for things like automatic promotion, this seemed 
> like a
> reasonable solution to that problem :)
>
> Let me know of your thoughts!
>
> Cheers,
> Stam
>>>
>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> index c47fc232f39..18055d4a75e 100644
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -2643,6 +2643,30 @@ arm_init_libfuncs (void)
>>>   default:
>>> break;
>>>   }
>>> +
>>> +  /* For all possible libcalls in BFmode, return NULL.  */
>>> +  /* Conversions.  */
>>> +  set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL));
>>> +  set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL));
>>> +  set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL));
>>> +  set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL));
>>> +  set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL));
>>> +  set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL));
>> 
>> It might be slightly safer to do:
>> 
>>FOR_EACH_MODE_IN_CLASS 

[PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-04 Thread Jakub Jelinek
Hi!

On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> I guess that Comment #9 patch form the PR should be trivially correct,
> but althouhg it looks obvious, I don't want to propose the patch since
> I have no means of testing it.

I don't have means of testing it either.
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
128-bits only) are call preserved.

Jonathan, could you please test this if it is sufficient to just change
CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
too?  Thanks.

We are talking e.g. about
/* { dg-options "-O2 -mabi=ms -mavx512vl" } */

typedef double V __attribute__((vector_size (16)));
void foo (void);
V bar (void);
void baz (V);
void
qux (void)
{
  V c;
  {
register V a __asm ("xmm18");
V b = bar ();
asm ("" : "=x" (a) : "0" (b));
c = a;
  }
  foo ();
  {
register V d __asm ("xmm18");
V e;
d = c;
asm ("" : "=x" (e) : "0" (d));
baz (e);
  }
}
where according to the MSDN doc gcc incorrectly holds the c value
in xmm18 register across the foo call; if foo is compiled by some Microsoft
compiler (or LLVM), then it could clobber %xmm18.
If all xmm18 occurrences are changed to say xmm15, then it is valid to hold
the 128-bit value across the foo call (though, surprisingly, LLVM saves it
into stack anyway).

2020-02-04  Uroš Bizjak  

* config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
xmm16-xmm31 call-used even in 64-bit ms-abi.

--- gcc/config/i386/config/i386/i386.h.jj   2020-01-22 10:19:24.199221986 
+0100
+++ gcc/config/i386/config/i386/i386.h  2020-02-04 12:09:12.338341003 +0100
@@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu
 /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/  \
  6,   6,6,6,6,6,6,6,   \
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/\
- 6,6, 6,6,6,6,6,6, \
+ 1,1, 1,1,1,1,1,1, \
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/\
- 6,6, 6,6,6,6,6,6, \
+ 1,1, 1,1,1,1,1,1, \
  /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/\
  1,   1,   1,   1,   1,   1,   1,   1 }
 


Jakub



Re: [PATCH][AArch64] Improve clz patterns

2020-02-04 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Although GCC should understand the limited range of clz/ctz/cls results,
> Combine sometimes behaves oddly and duplicates ctz to remove a
> sign extension.  Avoid this by adding an explicit AND with 127 in the
> patterns. Deepsjeng performance improves by ~0.6%.

Could you go into more detail about what the before and after code
looks like, and what combine is doing?  Like you say, this sounds
like a target-independent thing on face value.

Either way, something like this needs a testcase.

Thanks,
Richard

>
> Bootstrap OK.
>
> ChangeLog:
> 2020-02-03  Wilco Dijkstra  
>
> * config/aarch64/aarch64.md (clz2): Mask the clz result.
> (clrsb2): Likewise.
> (ctz2): Likewise.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4794,7 +4794,8 @@ (define_insn 
> "*and_one_cmpl_3_compare0_no_reuse"
>
>  (define_insn "clz2"
>[(set (match_operand:GPI 0 "register_operand" "=r")
> -(clz:GPI (match_operand:GPI 1 "register_operand" "r")))]
> +(and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r"))
> + (const_int 127)))]
>""
>"clz\\t%0, %1"
>[(set_attr "type" "clz")]
> @@ -4848,7 +4849,8 @@ (define_expand "popcount2"
>
>  (define_insn "clrsb2"
>[(set (match_operand:GPI 0 "register_operand" "=r")
> -(clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> +(and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r"))
> + (const_int 127)))]
>""
>"cls\\t%0, %1"
>[(set_attr "type" "clz")]
> @@ -4869,7 +4871,8 @@ (define_insn "rbit2"
>
>  (define_insn_and_split "ctz2"
>   [(set (match_operand:GPI   0 "register_operand" "=r")
> -   (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
> +   (and:GPI (ctz:GPI (match_operand:GPI  1 "register_operand" "r"))
> +(const_int 127)))]
>""
>"#"
>"reload_completed"


Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-02-04 Thread Stam Markianos-Wright



On 1/31/20 1:45 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/30/20 10:01 AM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/29/20 12:42 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

Hi all,

This fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Genmodes.c was generating the "wider_mode" chain as follows:

HF -> BF -> SF - > DF -> TF -> VOID

This caused issues in some rare cases where conversion between modes
was needed, such as the above PR93300 where BFmode was being picked up
as a valid mode for:

optabs.c:prepare_float_lib_cmp

which then led to the ICE at expr.c:convert_mode_scalar.


Hi Richard,



Can you go into more details about why this chain was a problem?
Naively, it's the one I'd have expected: HF should certainly have
priority over BF,


Is that because functionally it looks like genmodes puts things in reverse
alphabetical order if all else is equal? (If I'm reading the comment about
MODE_RANDOM, MODE_CC correctly)


but BF coming before SF doesn't seem unusual in itself.

I'm not saying the patch is wrong.  It just wasn't clear why it was
right either.


Yes, I see what you mean. I'll go through my thought process here:

In investigating the ICE PR93300 I found that the diversion from pre-bf16
behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a
`FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate
library calls for conversions.

This was then being caught further down by the gcc_assert at expr.c:325 where
GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because
it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which
is what happened if i removed the gcc_assert at expr.c:325)

With BFmode being a target-defined mode, I didn't want to add something like `if
(mode != BFmode)` to specifically exclude BFmode from being selected for this.
(and there's nothing different between HFmode and BFmode here to allow me to
make this distinction?)

Also I couldn't find anywhere where the target back-end is not consulted for a
"is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the
libcall being created later on as __extendhfbf2.


Yeah, prepare_float_lib_cmp just checks for libfuncs rather than
calling target hooks directly.  The libfuncs themselves are under
the control of the target though.

By default we assume all float modes have associated libfuncs.
It's then up to the target to remove functions that don't exist
(or redirect them to other functions).  So I think we need to remove
BFmode libfuncs in arm_init_libfuncs in the same way as we currently
do for HFmode.

I guess we should also nullify the conversion libfuncs for BFmode,
not just the arithmetic and comparison ones.


Ahhh now this works, thank you for the suggestion!

I was aware of arm_init_libfuncs, but I had not realised that returning NULL
would have the desired effect for us, in this case. So I have essentially rolled
back the whole previous version of the patch and done this in the new diff.
It seems to have fixed the ICE and I am currently in the process of regression
testing!


LGTM behaviourally, just a couple of requests about how it's written:



Thank you!
Stam



Thanks,
Richard


Finally, because we really don't want __bf16 to be in the same "conversion rank"
as standard float modes for things like automatic promotion, this seemed like a
reasonable solution to that problem :)

Let me know of your thoughts!

Cheers,
Stam


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c47fc232f39..18055d4a75e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2643,6 +2643,30 @@ arm_init_libfuncs (void)
  default:
break;
  }
+
+  /* For all possible libcalls in BFmode, return NULL.  */
+  /* Conversions.  */
+  set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL));
+  set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL));
+  set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL));
+  set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL));


It might be slightly safer to do:

   FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)

to iterate over all float modes on the non-BF side.


Done :)



+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, BFmode, NULL);
+  set_optab_libfunc (sdiv_optab, BFmode, NULL);
+  set_optab_libfunc (smul_optab, BFmode, NULL);
+  set_optab_libfunc (neg_optab, BFmode, NULL);
+  set_optab_libfunc (sub_optab, BFmode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, BFmode, NULL);
+  set_optab_libfunc (ne_optab, BFmode, NULL);
+  set_optab_libfunc (lt_optab, BFmode, NULL);
+  set_optab_libfunc (le_optab, BFmode, NULL);
+  set_optab_libfunc (ge_optab, BFmode, NULL);
+  set_optab_libfunc (gt_optab, BFmode, NULL);
+  set_optab_libfunc (unord_optab, BFmode, NULL);


Could you split 

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 12:13 PM Uros Bizjak  wrote:
>
> On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek  wrote:
> >
> > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> > > If it works OK, I'd rather see this functionality implemented as an
> > > epilogue_completed guarded splitter. In the .md files, there are
> > > already cases where we split at this point, and where it is assumed
> > > that allocated registers won't change anymore. Also, please don't make
> > > the functionality conditional on flag_split_ra. This way, we would
> > > always get new patterns in the debug dumps, so in case something goes
> > > wrong, one could at least clearly see the full pattern.
> >
> > The following seems to work on the testcase, will bootstrap/regtest it soon.
> >
> > 2020-02-04  Jakub Jelinek  
> >
> > PR target/92190
> > * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): 
> > Only
> > include sets and not clobbers in the vzeroupper pattern.
> > * config/i386/sse.md (*avx_vzeroupper): Change from define_insn to
> > define_insn_and_split, split if epilogue_completed and not all 
> > xmm0-xmm15
> > registers are mentioned in the pattern and add clobbers for the 
> > missing
> > registers at that point.
> >
> > * gcc.target/i386/pr92190.c: New test.
>
> A && is missing in the split condition to inherit TARGET_AVX.

Also, you don't need to emit "#" in output template. This is just for
extra safety, we can live without. Please see e.g. "*tzcnt_1".

> OK with this change.
>
> Thanks,
> Uros.
>
> >
> > --- gcc/config/i386/i386-features.c.jj  2020-02-04 11:40:58.755611428 +0100
> > +++ gcc/config/i386/i386-features.c 2020-02-04 11:51:33.602148491 +0100
> > @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
> >
> >   (set (reg:V2DF R) (reg:V2DF R))
> >
> > -   which preserves the low 128 bits but clobbers the upper bits.
> > -   For a dead register we just use:
> > -
> > - (clobber (reg:V2DF R))
> > -
> > -   which invalidates any previous contents of R and stops R from becoming
> > -   live across the vzeroupper in future.  */
> > +   which preserves the low 128 bits but clobbers the upper bits.  */
> >
> >  static void
> >  ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
> >  {
> >rtx pattern = PATTERN (insn);
> >unsigned int nregs = TARGET_64BIT ? 16 : 8;
> > -  rtvec vec = rtvec_alloc (nregs + 1);
> > -  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> > +  unsigned int npats = nregs;
> >for (unsigned int i = 0; i < nregs; ++i)
> >  {
> >unsigned int regno = GET_SSE_REGNO (i);
> > +  if (!bitmap_bit_p (live_regs, regno))
> > +   npats--;
> > +}
> > +  if (npats == 0)
> > +return;
> > +  rtvec vec = rtvec_alloc (npats + 1);
> > +  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> > +  for (unsigned int i = 0, j = 0; i < nregs; ++i)
> > +{
> > +  unsigned int regno = GET_SSE_REGNO (i);
> > +  if (!bitmap_bit_p (live_regs, regno))
> > +   continue;
> >rtx reg = gen_rtx_REG (V2DImode, regno);
> > -  if (bitmap_bit_p (live_regs, regno))
> > -   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
> > -  else
> > -   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> > +  ++j;
> > +  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
> >  }
> >XVEC (pattern, 0) = vec;
> >df_insn_rescan (insn);
> > --- gcc/config/i386/sse.md.jj   2020-02-04 11:40:58.813610563 +0100
> > +++ gcc/config/i386/sse.md  2020-02-04 11:58:31.544909659 +0100
> > @@ -19815,11 +19815,43 @@ (define_expand "avx_vzeroupper"
> >[(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> >"TARGET_AVX")
> >
> > -(define_insn "*avx_vzeroupper"
> > +(define_insn_and_split "*avx_vzeroupper"
> >[(match_parallel 0 "vzeroupper_pattern"
> >   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> >"TARGET_AVX"
> > -  "vzeroupper"
> > +{
> > +  if (XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
> > +return "#";
> > +  else
> > +return "vzeroupper";
> > +}
> > +  "epilogue_completed
>
> && epilogue_completed
>
> > +   && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1"
> > +  [(match_dup 0)]
> > +{
> > +  /* For IPA-RA purposes, make it clear the instruction clobbers
> > + even XMM registers not mentioned explicitly in the pattern.  */
> > +  unsigned int nregs = TARGET_64BIT ? 16 : 8;
> > +  unsigned int npats = XVECLEN (operands[0], 0);
> > +  rtvec vec = rtvec_alloc (nregs + 1);
> > +  RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
> > +  for (unsigned int i = 0, j = 1; i < nregs; ++i)
> > +{
> > +  unsigned int regno = GET_SSE_REGNO (i);
> > +  if (j < npats
> > + && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
> > +   {
> > + RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
> > + j++;
> > +   }
> > +  

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek  wrote:
>
> On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> > If it works OK, I'd rather see this functionality implemented as an
> > epilogue_completed guarded splitter. In the .md files, there are
> > already cases where we split at this point, and where it is assumed
> > that allocated registers won't change anymore. Also, please don't make
> > the functionality conditional on flag_split_ra. This way, we would
> > always get new patterns in the debug dumps, so in case something goes
> > wrong, one could at least clearly see the full pattern.
>
> The following seems to work on the testcase, will bootstrap/regtest it soon.
>
> 2020-02-04  Jakub Jelinek  
>
> PR target/92190
> * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
> include sets and not clobbers in the vzeroupper pattern.
> * config/i386/sse.md (*avx_vzeroupper): Change from define_insn to
> define_insn_and_split, split if epilogue_completed and not all 
> xmm0-xmm15
> registers are mentioned in the pattern and add clobbers for the 
> missing
> registers at that point.
>
> * gcc.target/i386/pr92190.c: New test.

A && is missing in the split condition to inherit TARGET_AVX.

OK with this change.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-features.c.jj  2020-02-04 11:40:58.755611428 +0100
> +++ gcc/config/i386/i386-features.c 2020-02-04 11:51:33.602148491 +0100
> @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
>
>   (set (reg:V2DF R) (reg:V2DF R))
>
> -   which preserves the low 128 bits but clobbers the upper bits.
> -   For a dead register we just use:
> -
> - (clobber (reg:V2DF R))
> -
> -   which invalidates any previous contents of R and stops R from becoming
> -   live across the vzeroupper in future.  */
> +   which preserves the low 128 bits but clobbers the upper bits.  */
>
>  static void
>  ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
>  {
>rtx pattern = PATTERN (insn);
>unsigned int nregs = TARGET_64BIT ? 16 : 8;
> -  rtvec vec = rtvec_alloc (nregs + 1);
> -  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> +  unsigned int npats = nregs;
>for (unsigned int i = 0; i < nregs; ++i)
>  {
>unsigned int regno = GET_SSE_REGNO (i);
> +  if (!bitmap_bit_p (live_regs, regno))
> +   npats--;
> +}
> +  if (npats == 0)
> +return;
> +  rtvec vec = rtvec_alloc (npats + 1);
> +  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> +  for (unsigned int i = 0, j = 0; i < nregs; ++i)
> +{
> +  unsigned int regno = GET_SSE_REGNO (i);
> +  if (!bitmap_bit_p (live_regs, regno))
> +   continue;
>rtx reg = gen_rtx_REG (V2DImode, regno);
> -  if (bitmap_bit_p (live_regs, regno))
> -   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
> -  else
> -   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> +  ++j;
> +  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
>  }
>XVEC (pattern, 0) = vec;
>df_insn_rescan (insn);
> --- gcc/config/i386/sse.md.jj   2020-02-04 11:40:58.813610563 +0100
> +++ gcc/config/i386/sse.md  2020-02-04 11:58:31.544909659 +0100
> @@ -19815,11 +19815,43 @@ (define_expand "avx_vzeroupper"
>[(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
>"TARGET_AVX")
>
> -(define_insn "*avx_vzeroupper"
> +(define_insn_and_split "*avx_vzeroupper"
>[(match_parallel 0 "vzeroupper_pattern"
>   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
>"TARGET_AVX"
> -  "vzeroupper"
> +{
> +  if (XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
> +return "#";
> +  else
> +return "vzeroupper";
> +}
> +  "epilogue_completed

&& epilogue_completed

> +   && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1"
> +  [(match_dup 0)]
> +{
> +  /* For IPA-RA purposes, make it clear the instruction clobbers
> + even XMM registers not mentioned explicitly in the pattern.  */
> +  unsigned int nregs = TARGET_64BIT ? 16 : 8;
> +  unsigned int npats = XVECLEN (operands[0], 0);
> +  rtvec vec = rtvec_alloc (nregs + 1);
> +  RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
> +  for (unsigned int i = 0, j = 1; i < nregs; ++i)
> +{
> +  unsigned int regno = GET_SSE_REGNO (i);
> +  if (j < npats
> + && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
> +   {
> + RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
> + j++;
> +   }
> +  else
> +   {
> + rtx reg = gen_rtx_REG (V2DImode, regno);
> + RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> +   }
> +}
> +  XVEC (operands[0], 0) = vec;
> +}
>[(set_attr "type" "sse")
> (set_attr "modrm" "0")
> (set_attr "memory" "none")
> --- gcc/testsuite/gcc.target/i386/pr92190.c.jj  2020-02-04 11:51:33.608148402 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr92190.c 

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Jakub Jelinek
On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> If it works OK, I'd rather see this functionality implemented as an
> epilogue_completed guarded splitter. In the .md files, there are
> already cases where we split at this point, and where it is assumed
> that allocated registers won't change anymore. Also, please don't make
> the functionality conditional on flag_split_ra. This way, we would
> always get new patterns in the debug dumps, so in case something goes
> wrong, one could at least clearly see the full pattern.

The following seems to work on the testcase, will bootstrap/regtest it soon.

2020-02-04  Jakub Jelinek  

PR target/92190
* config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
include sets and not clobbers in the vzeroupper pattern.
* config/i386/sse.md (*avx_vzeroupper): Change from define_insn to
define_insn_and_split, split if epilogue_completed and not all 
xmm0-xmm15
registers are mentioned in the pattern and add clobbers for the missing
registers at that point.

* gcc.target/i386/pr92190.c: New test.

--- gcc/config/i386/i386-features.c.jj  2020-02-04 11:40:58.755611428 +0100
+++ gcc/config/i386/i386-features.c 2020-02-04 11:51:33.602148491 +0100
@@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
 
  (set (reg:V2DF R) (reg:V2DF R))
 
-   which preserves the low 128 bits but clobbers the upper bits.
-   For a dead register we just use:
-
- (clobber (reg:V2DF R))
-
-   which invalidates any previous contents of R and stops R from becoming
-   live across the vzeroupper in future.  */
+   which preserves the low 128 bits but clobbers the upper bits.  */
 
 static void
 ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
 {
   rtx pattern = PATTERN (insn);
   unsigned int nregs = TARGET_64BIT ? 16 : 8;
-  rtvec vec = rtvec_alloc (nregs + 1);
-  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  unsigned int npats = nregs;
   for (unsigned int i = 0; i < nregs; ++i)
 {
   unsigned int regno = GET_SSE_REGNO (i);
+  if (!bitmap_bit_p (live_regs, regno))
+   npats--;
+}
+  if (npats == 0)
+return;
+  rtvec vec = rtvec_alloc (npats + 1);
+  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  for (unsigned int i = 0, j = 0; i < nregs; ++i)
+{
+  unsigned int regno = GET_SSE_REGNO (i);
+  if (!bitmap_bit_p (live_regs, regno))
+   continue;
   rtx reg = gen_rtx_REG (V2DImode, regno);
-  if (bitmap_bit_p (live_regs, regno))
-   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
-  else
-   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+  ++j;
+  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
 }
   XVEC (pattern, 0) = vec;
   df_insn_rescan (insn);
--- gcc/config/i386/sse.md.jj   2020-02-04 11:40:58.813610563 +0100
+++ gcc/config/i386/sse.md  2020-02-04 11:58:31.544909659 +0100
@@ -19815,11 +19815,43 @@ (define_expand "avx_vzeroupper"
   [(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
   "TARGET_AVX")
 
-(define_insn "*avx_vzeroupper"
+(define_insn_and_split "*avx_vzeroupper"
   [(match_parallel 0 "vzeroupper_pattern"
  [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
   "TARGET_AVX"
-  "vzeroupper"
+{
+  if (XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
+return "#";
+  else
+return "vzeroupper";
+}
+  "epilogue_completed
+   && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1"
+  [(match_dup 0)]
+{
+  /* For IPA-RA purposes, make it clear the instruction clobbers
+ even XMM registers not mentioned explicitly in the pattern.  */
+  unsigned int nregs = TARGET_64BIT ? 16 : 8;
+  unsigned int npats = XVECLEN (operands[0], 0);
+  rtvec vec = rtvec_alloc (nregs + 1);
+  RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
+  for (unsigned int i = 0, j = 1; i < nregs; ++i)
+{
+  unsigned int regno = GET_SSE_REGNO (i);
+  if (j < npats
+ && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
+   {
+ RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
+ j++;
+   }
+  else
+   {
+ rtx reg = gen_rtx_REG (V2DImode, regno);
+ RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+   }
+}
+  XVEC (operands[0], 0) = vec;
+}
   [(set_attr "type" "sse")
(set_attr "modrm" "0")
(set_attr "memory" "none")
--- gcc/testsuite/gcc.target/i386/pr92190.c.jj  2020-02-04 11:51:33.608148402 
+0100
+++ gcc/testsuite/gcc.target/i386/pr92190.c 2020-02-04 11:51:33.608148402 
+0100
@@ -0,0 +1,19 @@
+/* PR target/92190 */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
+/* { dg-options "-mabi=ms -O2 -mavx512f" } */
+
+typedef char VC __attribute__((vector_size (16)));
+typedef int VI __attribute__((vector_size (16 * sizeof 0)));
+VC a;
+VI b;
+void bar (VI);
+void baz (VC);
+
+void
+foo (void)
+{
+  VC k = a;
+  VI n = b;
+  bar (n);
+  baz (k);
+}



Re: [PATCH][ARM] Remove support for MULS

2020-02-04 Thread Wilco Dijkstra
Any further comments? Note GCC doesn't support S/UMULLS either since it is 
equally
useless. It's no surprise that Thumb-2 removed support for flag-setting 64-bit 
multiplies,
while AArch64 didn't add flag-setting multiplies. So there is no argument that 
these
instructions are in any way useful to compilers.

Wilco

Hi Richard, Kyrill,

>> I disagree. If they still trigger and generate better code than without 
>> we should keep them.
> 
>> What kind of code is *common* varies greatly from user to user.

Not really - doing a multiply and checking whether the result is zero is
exceedingly rare. I found only 3 cases out of 7300 mul/mla in all of
SPEC2006... Overall codesize effect with -Os: 28 bytes or 0.00045%.

So we really should not even consider wasting any more time on
maintaining such useless patterns.

> Also, the main reason for restricting their use was that in the 'olden 
> days', when we had multi-cycle implementations of the multiply 
> instructions with short-circuit fast termination when the result was 
> completed, the flag setting variants would never short-circuit.

That only applied to conditional multiplies IIRC, some implementations
would not early-terminate if the condition failed. Today there are serious
penalties for conditional multiplies - but that's something to address in a
different patch.

> These days we have fixed cycle counts for multiply instructions, so this 
> is no-longer a penalty.  

No, there is a large overhead on modern cores when you set the flags,
and there are other penalties due to the extra micro-ops.

> In the thumb2 case in particular we can often 
> reduce mul-cmp (6 bytes) to muls (2 bytes), that's a 66% saving on this 
> sequence and definitely worth exploiting when we can, even if it's not 
> all that common.

Using muls+cbz is equally small. With my patch we generate this with -Os:

void g(void);
int f(int x)
{
  if (x * x != 0)
g();
}

f:
mulsr0, r0, r0
push{r3, lr}
cbz r0, .L9
bl  g
.L9:
pop {r3, pc}

Wilco

Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2020-02-04 Thread Wilco Dijkstra
Hi Kyrill,

> Hmm, I'm not too confident on that. I'd support such a change for the 
> generic arm_cortex_tune, definitely, and the Armv8-a based ones, but I 
> don't think the argument is as strong for Cortex-A7, Cortex-A8, Cortex-A9.
>
> So let's make the change for the Armv8-A-based cores now. If you get 
> benchmarking data for the older ones (such systems may or may not be 
> easy to get a hold of) we can update those separately.

I ran some experiments on Cortex-A53 and this shows the difference between
2, 3 and 4 is less than for out-of-order cores (which clearly prefer 2).
So it seems alright to set it to 4 for the older in-order cores - see updated 
patch
below.

>>   Set max_cond_insns
>> to 4 on Thumb-2 architectures given it's already limited to that by
>> MAX_INSN_PER_IT_BLOCK.  Also use the CPU tuning setting when a CPU/tune
>> is selected if -mrestrict-it is not explicitly set.
>
> This can go in as a separate patch from the rest, thanks.

Sure, I'll split that part off into a separate patch.

Cheers,
Wilco

[PATCH v2][ARM] Improve max_cond_insns setting for Cortex cores

Various CPUs have max_cond_insns set to 5 due to historical reasons.
Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A
cores, so change it to 2. Set it to 4 on older in-order cores as that is
the MAX_INSN_PER_IT_BLOCK limit for Thumb-2.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-12-03  Wilco Dijkstra  

* config/arm/arm.c (arm_v6t2_tune): Set max_cond_insns to 4.
(arm_cortex_tune): Set max_cond_insns to 2.
(arm_cortex_a8_tune): Set max_cond_insns to 4.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a35_tune): Set max_cond_insns to 2.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a5_tune): Set max_cond_insns to 4.
(arm_cortex_a9_tune): Likewise.
(arm_v6m_tune): Likewise.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
a6b401b7f2e3738ff68316bd83d6e5a2bcf0e7d7..daebe76352d62ad94556762b4e3bc3d0532ad411
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1947,7 +1947,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1971,7 +1971,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1993,7 +1993,7 @@ const struct tune_params arm_cortex_a8_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2015,7 +2015,7 @@ const struct tune_params arm_cortex_a7_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2059,7 +2059,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2081,7 +2081,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* 

Re: [PATCH][Arm] Only enable fsched-pressure with Ofast

2020-02-04 Thread Wilco Dijkstra
ping

The current pressure scheduler doesn't appear to correctly track register
pressure and avoid creating unnecessary spills when register pressure is high.
As a result disabling the early scheduler improves integer performance
considerably and reduces codesize as a bonus. Since scheduling floating point
code is generally beneficial (more registers and higher latencies), only enable
the pressure scheduler with -Ofast.

On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well
as a 0.2% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-11-06  Wilco Dijkstra  

* gcc/common/config/arm-common.c (arm_option_optimization_table):
Enable fsched_pressure with Ofast only.

--
diff --git a/gcc/common/config/arm/arm-common.c 
b/gcc/common/config/arm/arm-common.c
index 
41a920f6dc96833e778faa8dbcc19beac483734c..b761d3abd670a144a593c4b410b1e7fbdcb52f56
 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -38,7 +38,7 @@ static const struct default_options 
arm_option_optimization_table[] =
   {
 /* Enable section anchors by default at -O1 or higher.  */
 { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
-{ OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
+{ OPT_LEVELS_FAST, OPT_fsched_pressure, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };


Re: [PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2020-02-04 Thread Wilco Dijkstra
ping (updated comment to use the same wording as the AArch64 version on trunk)

Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
bitfields by their declared type, which results in better codegeneration
on practically any target.  So set it correctly to 1 on Arm.

As a result we generate much better code for bitfields:

typedef struct
{
  int x : 2, y : 8, z : 2;
} X;

int bitfield (X *p)
{
  return p->x + p->y + p->z;
}


Before:
ldrbr3, [r0]@ zero_extendqisi2
ldrhr2, [r0]
ldrbr0, [r0, #1]@ zero_extendqisi2
sbfxr3, r3, #0, #2
sbfxr2, r2, #2, #8
sbfxr0, r0, #2, #2
sxtab   r3, r2, r3
sxtab   r0, r3, r0
bx  lr

After:
ldr r0, [r0]
sbfxr3, r0, #0, #2
sbfxr2, r0, #2, #8
sbfxr0, r0, #10, #2
sxtab   r3, r2, r3
add r0, r0, r3
bx  lr

Bootstrap OK, OK for commit?

ChangeLog:
2019-09-11  Wilco Dijkstra  

* config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1.

--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
e07cf03538c5bb23e3285859b9e44a627b6e9ced..998139ce759d5829b7f868367d4263df9d0e12d9
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1956,8 +1956,8 @@ enum arm_auto_incmodes
((arm_arch4 || (MODE) == QImode) ? ZERO_EXTEND  \
 : ((BYTES_BIG_ENDIAN && (MODE) == HImode) ? SIGN_EXTEND : UNKNOWN)))
 
-/* Nonzero if access to memory by bytes is slow and undesirable.  */
-#define SLOW_BYTE_ACCESS 0
+/* Enable wide bitfield accesses for more efficient bitfield code.  */
+#define SLOW_BYTE_ACCESS 1
 
 /* Immediate shift counts are truncated by the output routines (or was it
the assembler?).  Shift counts in a register are truncated by ARM.  Note


Re: [PATCH 1/3] libstdc++: Apply the move_iterator changes described in P1207R4

2020-02-04 Thread Jonathan Wakely

On 03/02/20 21:07 -0500, Patrick Palka wrote:

These changes are needed for some of the tests in the constrained algorithm
patch, because they use move_iterator with an uncopyable output_iterator.  The
other changes described in the paper are already applied, it seems.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (move_iterator::move_iterator): Move the
iterator when initializing _M_current.
(move_iterator::base): Split into two overloads differing in
ref-qualifiers as in P1207R4.
---
libstdc++-v3/include/bits/stl_iterator.h | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 784d200d22f..1a288a5c785 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1166,7 +1166,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  explicit _GLIBCXX17_CONSTEXPR
  move_iterator(iterator_type __i)
-  : _M_current(__i) { }
+  : _M_current(std::move(__i)) { }

  template
_GLIBCXX17_CONSTEXPR
@@ -1174,9 +1174,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _M_current(__i.base()) { }

  _GLIBCXX17_CONSTEXPR iterator_type
-  base() const
+  base() const &
+#if __cplusplus > 201703L && __cpp_lib_concepts
+   requires copy_constructible
+#endif
  { return _M_current; }

+  _GLIBCXX17_CONSTEXPR iterator_type
+  base() &&
+  { return std::move(_M_current); }
+


I think this change has to be restricted to C++20 and later, otherwise
a move_iterator in C++17 can change state so that its _M_current
becomes moved-from, which should not be possible before C++20.

So something like:

#if __cplusplus <= 201703L
  _GLIBCXX17_CONSTEXPR iterator_type
  base() const
  { return _M_current; }
#else
  constexpr iterator_type
  base() const &
#if __cpp_lib_concepts
requires copy_constructible
#endif
  { return _M_current; }

  constexpr iterator_type
  base() &&
  { return std::move(_M_current); }
#endif



Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-04 Thread Andrew Stubbs

On 03/02/2020 18:09, Michael Matz wrote:

But suggesting that using the subject line for tagging is recommended can
lead to subjects like

  [PATCH][GCC][Foo][component] Fix foo component bootstrap failure

in an e-mail directed to gcc-patches@gcc.gnu.org (from somewhen last year,
where Foo/foo was an architecture; I'm really not trying to single out the
author).  That is, _none_ of the four tags carried any informational
content.


I partially disagree with this. Certainly there's pointless redundancy 
it this example, but I'd rather have the tags with a meaningful subject 
than a generic subject with no tags.


gcc-patches is a high-volume list in which most of the content is 
outside my range of interest and/or understanding. If I stay on top of 
it then I can read all the subject lines, at least, and probably select 
a few threads to learn about something new, but if I let the list get 
away from me for even a short while then it's too much to handle.


I do have filters set up to highlight subjects for which I should pay 
attention and if people are in the habit of tagging subjects then that 
becomes much more reliable.


Conversely, the tags help me quickly decide what not to read.

I see that some people are using a "[email tag] git tag: msg" format, 
and I quite like that.


Andrew


Re: [PATCH][AArch64] Improve popcount expansion

2020-02-04 Thread Wilco Dijkstra
Hi Andrew,

> You might want to add a testcase that the autovectorizers too.
>
> Currently we get also:
>
>    ldr q0, [x0]
>    addv    b0, v0.16b
>    umov    w0, v0.b[0]
>    ret

My patch doesn't change this case on purpose - there are also many intrinsics 
which generate redundant umovs. That's for a separate patch.

Wilco


[PATCH][AArch64] Improve clz patterns

2020-02-04 Thread Wilco Dijkstra
Although GCC should understand the limited range of clz/ctz/cls results,
Combine sometimes behaves oddly and duplicates ctz to remove a
sign extension.  Avoid this by adding an explicit AND with 127 in the
patterns. Deepsjeng performance improves by ~0.6%.

Bootstrap OK.

ChangeLog:
2020-02-03  Wilco Dijkstra  

* config/aarch64/aarch64.md (clz2): Mask the clz result.
(clrsb2): Likewise.
(ctz2): Likewise.
--

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4794,7 +4794,8 @@ (define_insn 
"*and_one_cmpl_3_compare0_no_reuse"
 
 (define_insn "clz2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-   (clz:GPI (match_operand:GPI 1 "register_operand" "r")))]
+   (and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r"))
+(const_int 127)))]
   ""
   "clz\\t%0, %1"
   [(set_attr "type" "clz")]
@@ -4848,7 +4849,8 @@ (define_expand "popcount2"
 
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-(clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
+   (and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r"))
+(const_int 127)))]
   ""
   "cls\\t%0, %1"
   [(set_attr "type" "clz")]
@@ -4869,7 +4871,8 @@ (define_insn "rbit2"
 
 (define_insn_and_split "ctz2"
  [(set (match_operand:GPI   0 "register_operand" "=r")
-   (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
+   (and:GPI (ctz:GPI (match_operand:GPI  1 "register_operand" "r"))
+   (const_int 127)))]
   ""
   "#"
   "reload_completed"




Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 10:39 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in the PR, the CLOBBERs in vzeroupper are added there even for
> registers that aren't ever live in the function before and break the
> prologue/epilogue expansion with ms ABI (normal ABIs are fine, as they
> consider all [xyz]mm registers call clobbered, but the ms ABI considers
> xmm0-15 call used but the bits above low 128 ones call clobbered).
> The following patch fixes it by not adding the clobbers during vzeroupper
> pass (before pro_and_epilogue), but adding them for -fipa-ra purposes only
> during the final output.  Perhaps we could add some CLOBBERs early (say for
> df_regs_ever_live_p regs that aren't live in the live_regs bitmap, or
> depending on the ABI either add all of them immediately, or for ms ABI add
> CLOBBERs for xmm0-xmm5 if they don't have a SET) and add the rest later.
> And the addition could be perhaps done at other spots, e.g. in an
> epilogue_completed guarded splitter.

If it works OK, I'd rather see this functionality implemented as an
epilogue_completed guarded splitter. In the .md files, there are
already cases where we split at this point, and where it is assumed
that allocated registers won't change anymore. Also, please don't make
the functionality conditional on flag_split_ra. This way, we would
always get new patterns in the debug dumps, so in case something goes
wrong, one could at least clearly see the full pattern.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Or any of the above mentioned variants?
>
> A separate issue is CALL_USED_REGISTERS for msabi on xmm16-xmm31.

I guess that Comment #9 patch form the PR should be trivially correct,
but althouhg it looks obvious, I don't want to propose the patch since
I have no means of testing it.

Uros.

> And yet another issue is that the presence of a vzeroupper instruction does
> (and the patch doesn't change) prevent callers from trying to preserve
> something in the xmm0-xmm15 registers if the callee doesn't use them.
> Would be nice to be able to say that the vzeroupper will preserve the low
> 128 bits, so it is ok to hold a 128-bit value across the call, but not
> 256-bit or larger.  Not sure if the RA has a way to express that (and IPA-RA
> certainly ATM doesn't).
>
> 2020-02-04  Jakub Jelinek  
>
> PR target/92190
> * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
> include sets and not clobbers in the vzeroupper pattern.
> * config/i386/sse.md (*avx_vzeroupper): Add the clobbers during
> pattern output.
>
> * gcc.target/i386/pr92190.c: New test.
>
> --- gcc/config/i386/i386-features.c.jj  2020-02-03 13:17:15.919723150 +0100
> +++ gcc/config/i386/i386-features.c 2020-02-03 18:30:08.274865983 +0100
> @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
>
>   (set (reg:V2DF R) (reg:V2DF R))
>
> -   which preserves the low 128 bits but clobbers the upper bits.
> -   For a dead register we just use:
> -
> - (clobber (reg:V2DF R))
> -
> -   which invalidates any previous contents of R and stops R from becoming
> -   live across the vzeroupper in future.  */
> +   which preserves the low 128 bits but clobbers the upper bits.  */
>
>  static void
>  ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
>  {
>rtx pattern = PATTERN (insn);
>unsigned int nregs = TARGET_64BIT ? 16 : 8;
> -  rtvec vec = rtvec_alloc (nregs + 1);
> -  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> +  unsigned int npats = nregs;
>for (unsigned int i = 0; i < nregs; ++i)
>  {
>unsigned int regno = GET_SSE_REGNO (i);
> +  if (!bitmap_bit_p (live_regs, regno))
> +   npats--;
> +}
> +  if (npats == 0)
> +return;
> +  rtvec vec = rtvec_alloc (npats + 1);
> +  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> +  for (unsigned int i = 0, j = 0; i < nregs; ++i)
> +{
> +  unsigned int regno = GET_SSE_REGNO (i);
> +  if (!bitmap_bit_p (live_regs, regno))
> +   continue;
>rtx reg = gen_rtx_REG (V2DImode, regno);
> -  if (bitmap_bit_p (live_regs, regno))
> -   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
> -  else
> -   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> +  ++j;
> +  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
>  }
>XVEC (pattern, 0) = vec;
>df_insn_rescan (insn);
> --- gcc/config/i386/sse.md.jj   2020-02-03 13:17:15.957722589 +0100
> +++ gcc/config/i386/sse.md  2020-02-03 18:30:08.280865894 +0100
> @@ -19819,7 +19819,34 @@ (define_insn "*avx_vzeroupper"
>[(match_parallel 0 "vzeroupper_pattern"
>   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
>"TARGET_AVX"
> -  "vzeroupper"
> +{
> +  if (flag_ipa_ra && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
> +{
> +  /* For IPA-RA purposes, make it clear the instruction clobbers
> +even XMM registers not mentioned explicitly 

[PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the CLOBBERs in vzeroupper are added there even for
registers that aren't ever live in the function before and break the
prologue/epilogue expansion with ms ABI (normal ABIs are fine, as they
consider all [xyz]mm registers call clobbered, but the ms ABI considers
xmm0-15 call used but the bits above low 128 ones call clobbered).
The following patch fixes it by not adding the clobbers during vzeroupper
pass (before pro_and_epilogue), but adding them for -fipa-ra purposes only
during the final output.  Perhaps we could add some CLOBBERs early (say for
df_regs_ever_live_p regs that aren't live in the live_regs bitmap, or
depending on the ABI either add all of them immediately, or for ms ABI add
CLOBBERs for xmm0-xmm5 if they don't have a SET) and add the rest later.
And the addition could be perhaps done at other spots, e.g. in an
epilogue_completed guarded splitter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Or any of the above mentioned variants?

A separate issue is CALL_USED_REGISTERS for msabi on xmm16-xmm31.

And yet another issue is that the presence of a vzeroupper instruction does
(and the patch doesn't change) prevent callers from trying to preserve
something in the xmm0-xmm15 registers if the callee doesn't use them.
Would be nice to be able to say that the vzeroupper will preserve the low
128 bits, so it is ok to hold a 128-bit value across the call, but not
256-bit or larger.  Not sure if the RA has a way to express that (and IPA-RA
certainly ATM doesn't).

2020-02-04  Jakub Jelinek  

PR target/92190
* config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
include sets and not clobbers in the vzeroupper pattern.
* config/i386/sse.md (*avx_vzeroupper): Add the clobbers during
pattern output.

* gcc.target/i386/pr92190.c: New test.

--- gcc/config/i386/i386-features.c.jj  2020-02-03 13:17:15.919723150 +0100
+++ gcc/config/i386/i386-features.c 2020-02-03 18:30:08.274865983 +0100
@@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
 
  (set (reg:V2DF R) (reg:V2DF R))
 
-   which preserves the low 128 bits but clobbers the upper bits.
-   For a dead register we just use:
-
- (clobber (reg:V2DF R))
-
-   which invalidates any previous contents of R and stops R from becoming
-   live across the vzeroupper in future.  */
+   which preserves the low 128 bits but clobbers the upper bits.  */
 
 static void
 ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
 {
   rtx pattern = PATTERN (insn);
   unsigned int nregs = TARGET_64BIT ? 16 : 8;
-  rtvec vec = rtvec_alloc (nregs + 1);
-  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  unsigned int npats = nregs;
   for (unsigned int i = 0; i < nregs; ++i)
 {
   unsigned int regno = GET_SSE_REGNO (i);
+  if (!bitmap_bit_p (live_regs, regno))
+   npats--;
+}
+  if (npats == 0)
+return;
+  rtvec vec = rtvec_alloc (npats + 1);
+  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  for (unsigned int i = 0, j = 0; i < nregs; ++i)
+{
+  unsigned int regno = GET_SSE_REGNO (i);
+  if (!bitmap_bit_p (live_regs, regno))
+   continue;
   rtx reg = gen_rtx_REG (V2DImode, regno);
-  if (bitmap_bit_p (live_regs, regno))
-   RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
-  else
-   RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+  ++j;
+  RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
 }
   XVEC (pattern, 0) = vec;
   df_insn_rescan (insn);
--- gcc/config/i386/sse.md.jj   2020-02-03 13:17:15.957722589 +0100
+++ gcc/config/i386/sse.md  2020-02-03 18:30:08.280865894 +0100
@@ -19819,7 +19819,34 @@ (define_insn "*avx_vzeroupper"
   [(match_parallel 0 "vzeroupper_pattern"
  [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
   "TARGET_AVX"
-  "vzeroupper"
+{
+  if (flag_ipa_ra && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
+{
+  /* For IPA-RA purposes, make it clear the instruction clobbers
+even XMM registers not mentioned explicitly in the pattern.  */
+  unsigned int nregs = TARGET_64BIT ? 16 : 8;
+  unsigned int npats = XVECLEN (operands[0], 0);
+  rtvec vec = rtvec_alloc (nregs + 1);
+  RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
+  for (unsigned int i = 0, j = 1; i < nregs; ++i)
+   {
+ unsigned int regno = GET_SSE_REGNO (i);
+ if (j < npats
+ && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
+   {
+ RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
+ j++;
+   }
+ else
+   {
+ rtx reg = gen_rtx_REG (V2DImode, regno);
+ RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+   }
+   }
+  XVEC (operands[0], 0) = vec;
+}
+  return "vzeroupper";
+}
   [(set_attr "type" "sse")
(set_attr "modrm" "0")
(set_attr "memory" 

Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-04 Thread Richard Biener
On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:
>
> PR 93519 reports a false positive -Wrestrict issued for an inlined call
> to strcpy that carefully guards against self-copying.  This is caused
> by the caller's arguments substituted into the call during inlining and
> before dead code elimination.
>
> The attached patch avoids this by removing -Wrestrict from the folder
> and deferring folding perfectly overlapping (and so undefined) calls
> to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
> perfectly overlapping calls to memcpy are still folded early.

Why do we bother to warn at all for this case?  Just DWIM here.  Warnings like
this can be emitted from the analyzer?

That is, I suggest to simply remove the bogus warning code from folding
(and _not_ fail the folding).

Richard.

> Tested on x86_64-linux.
>
> Martin


Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-02-04 Thread Prathamesh Kulkarni
On Mon, 3 Feb 2020 at 14:56, Prathamesh Kulkarni
 wrote:
>
> On Mon, 3 Feb 2020 at 14:41, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 30 Jan 2020 at 19:17, Richard Biener  
> > wrote:
> > >
> > > On Thu, Jan 30, 2020 at 11:49 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Wed, 29 Jan 2020 at 14:38, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Tue, Jan 28, 2020 at 1:02 PM Jakub Jelinek  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Jan 28, 2020 at 05:09:36PM +0530, Prathamesh Kulkarni wrote:
> > > > > > > On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh Kulkarni 
> > > > > > > > wrote:
> > > > > > > > > Thanks for the suggestions. In the attached patch I bumped up 
> > > > > > > > > value of
> > > > > > > > > ERF_RETURNS_ARG_MASK
> > > > > > > > > to UINT_MAX >> 2, and use highest two bits for ERF_NOALIAS 
> > > > > > > > > and ERF_RETURNS_ARG.
> > > > > > > > > And use fn spec "Z" to store the argument number in 
> > > > > > > > > fn-spec format.
> > > > > > > > > Does that look OK ?
> > > > > > > >
> > > > > > > > No.
> > > > > > > >
> > > > > > > > +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2)
> > > > > > > >
> > > > > > > >  /* Nonzero if the return value is equal to the argument number
> > > > > > > > flags & ERF_RETURN_ARG_MASK.  */
> > > > > > > > -#define ERF_RETURNS_ARG(1 << 2)
> > > > > > > > +#define ERF_RETURNS_ARG(1 << (BITS_PER_WORD - 
> > > > > > > > 2))
> > > > > > > >
> > > > > > > > How is size of host int related to BITS_PER_WORD?  Not to 
> > > > > > > > mention that
> > > > > > > > if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - 2) is 
> > > > > > > > UB.
> > > > > > > Oops sorry. I should have used HOST_BITS_PER_INT.
> > > > > > > I assume that'd be correct ?
> > > > > >
> > > > > > It still wouldn't, 1 << (HOST_BITS_PER_INT - 1) is negative number, 
> > > > > > you'd
> > > > > > need either 1U and verify all ERF_* flags uses, or avoid using the 
> > > > > > sign bit.
> > > > > > The patch has other issues, you don't verify that the argnum fits 
> > > > > > into
> > > > > > the bits (doesn't overflow into the other ERF_* bits), in
> > > > > > +  char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD);
> > > > > > +  s[0] = 'Z';
> > > > > > +  sprintf (s + 1, "%lu", argnum);
> > > > > > 1) sizeof (char) is 1 by definition
> > > > > > 2) it is pointless to allocate it and then deallocate (just use 
> > > > > > automatic
> > > > > > array)
> > > > > > 3) it is unclear how is BITS_PER_WORD related to the length of 
> > > > > > decimal
> > > > > > encoded string + Z char + terminating '\0'.  The usual way is for 
> > > > > > unsigned
> > > > > > numbers to estimate number of digits by counting 3 digits per each 
> > > > > > 8 bits,
> > > > > > in your case of course + 2.
> > > > > >
> > > > > > More importantly, the "fn spec" attribute isn't used just in
> > > > > > gimple_call_return_flags, but also in e.g. gimple_call_arg_flags 
> > > > > > which
> > > > > > assumes that the return stuff in there is a single char and the 
> > > > > > reaming
> > > > > > chars are for argument descriptions, or in decl_return_flags which 
> > > > > > you
> > > > > > haven't modified.
> > > > > >
> > > > > > I must say I fail to see the point in trying to glue this together 
> > > > > > into the
> > > > > > "fn spec" argument so incompatibly, why can't we handle the fn spec 
> > > > > > with its
> > > > > > 1-4 returns_arg and returns_arg attribute with arbitrary position
> > > > > > side-by-side?
> > > > >
> > > > > Yeah, I wouldn't have added "fn spec" for "returns_arg" but kept the
> > > > > query interface thus access it via gimple_call_return_flags and use
> > > > > ERF_*.  For the flags adjustment just up the maximum argument
> > > > > to 1<<15 then the argument number is also nicely aligned, no need
> > > > > to do fancy limiting that depends on the host.  For too large
> > > > > argument numbers just warn the attribute is ignored.  I'd say even
> > > > > a max of 255 is sane just the existing limit is a bit too low.
> > > > Hi,
> > > > Thanks for the suggestions. In the attached patch, I use TREE_VALUE
> > > > (attr) to store/retrieve
> > > > arbitrary argument position, and have bumped ERF_RETURNS_ARG_MASK to 
> > > > 0x3fff.
> > > > Does it look OK ?
> > >
> > > +  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
> > > + "%qE attribute ignored on a function returning %qT.",
> > > + name, rettype);
> > >
> > > no punctuation in diagnostics (trailing '.'), also elsewhere in the patch.
> > >
> > > +  tree fndecl = gimple_call_fndecl (stmt);
> > > +  attr = lookup_attribute ("returns_arg", DECL_ATTRIBUTES (fndecl));
> > > +  if (attr)
> > > +{
> > > +  unsigned argnum = tree_to_uhwi (TREE_VALUE (TREE_VALUE (attr)));
> > > +  return ERF_RETURNS_ARG | argnum;
> > >
> > > 

[PATCH] c++: Handle CONSTRUCTORs without indexes in find_array_ctor_elt [PR93549]

2020-02-04 Thread Jakub Jelinek
Hi!

My change
* typeck2.c (store_init_value): Don't call cp_fully_fold_init on
initializers of automatic non-constexpr variables in constexpr
functions.
-  value = cp_fully_fold_init (value);
+  /* Don't fold initializers of automatic variables in constexpr functions,
+ that might fold away something that needs to be diagnosed at constexpr
+ evaluation time.  */
+  if (!current_function_decl
+  || !DECL_DECLARED_CONSTEXPR_P (current_function_decl)
+  || TREE_STATIC (decl))
+value = cp_fully_fold_init (value);
from the constexpr new change apparently broke the following testcase.
When handling COND_EXPR, we build_vector_from_val, however as the argument we
pass to it is not an INTEGER_CST/REAL_CST, but that wrapped in a
NON_LVALUE_EXPR location wrapper, we end up with a CONSTRUCTOR and as it is
middle-end that builds it, it doesn't bother with indexes.  The
cp_fully_fold_init call used to fold it into VECTOR_CST in the past, but as
we intentionally don't invoke it anymore as it might fold away something
that needs to be diagnosed during constexpr evaluation, we end up evaluating
ARRAY_REF into the index-less CONSTRUCTOR.  The following patch fixes the
ICE by teaching find_array_ctor_elt to handle CONSTRUCTORs without indexes
(that itself could be still very efficient) and CONSTRUCTORs with some
indexes present and others missing (the rules are that if the index on the
first element is missing, then it is the array's lowest index (in C/C++ 0)
and if other indexes are missing, they are the index of the previous element
+ 1).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Slightly OT, for large arrays it might be efficient to leave indexes out of
the CONSTRUCTORs too, but as the patch shows, we can't then access the
elements using binary search.  Could we (for GCC11+) use either some flag on
the CONSTRUCTOR to denote CONSTRUCTORs with no indexes at all, or some new
tree as index on the first element that would be similar to RANGE_EXPR, but
would say this element has index xyz and the following n elements have
linearly increasing indexes omitted?  If we have such an assurance, we could
do direct access to access the array elts in such CONSTRUCTORs and be even
more efficient than binary search.

2020-02-04  Jakub Jelinek  

PR c++/93549
* constexpr.c (find_array_ctor_elt): Deal with some or all indexes in
CONSTRUCTOR missing.

* g++.dg/ext/constexpr-pr93549.C: New test.

--- gcc/cp/constexpr.c.jj   2020-01-26 00:20:26.532367552 +0100
+++ gcc/cp/constexpr.c  2020-02-03 17:14:21.520780109 +0100
@@ -3023,7 +3023,8 @@ find_array_ctor_elt (tree ary, tree dind
   if (end > 0)
 {
   tree cindex = (*elts)[end - 1].index;
-  if (TREE_CODE (cindex) == INTEGER_CST
+  if (cindex
+ && TREE_CODE (cindex) == INTEGER_CST
  && compare_tree_int (cindex, end - 1) == 0)
{
  if (i < end)
@@ -3037,8 +3038,32 @@ find_array_ctor_elt (tree ary, tree dind
   while (begin != end)
 {
   unsigned HOST_WIDE_INT middle = (begin + end) / 2;
-  constructor_elt  = (*elts)[middle];
-  tree idx = elt.index;
+  constructor_elt *elt = &(*elts)[middle];
+  tree idx = elt->index;
+
+  if (idx == NULL_TREE)
+   {
+ /* If some or all indexes are missing, we can't use binary search.  */
+ unsigned HOST_WIDE_INT j
+   = (begin > 0 ? tree_to_uhwi ((*elts)[begin - 1].index) + 1 : 0);
+ for (middle = begin; middle < end; middle++)
+   if ((*elts)[middle].index)
+ {
+   elt = &(*elts)[middle];
+   idx = elt->index;
+   break;
+ }
+   else if (i == j + (middle - begin))
+ {
+   (*elts)[middle].index = dindex;
+   return middle;
+ }
+ if (middle == end)
+   {
+ begin = end;
+ continue;
+   }
+   }
 
   int cmp = array_index_cmp (dindex, idx);
   if (cmp < 0)
@@ -3053,7 +3078,7 @@ find_array_ctor_elt (tree ary, tree dind
  constructor_elt e;
  tree lo = TREE_OPERAND (idx, 0);
  tree hi = TREE_OPERAND (idx, 1);
- tree value = elt.value;
+ tree value = elt->value;
  dindex = fold_convert (sizetype, dindex);
  if (tree_int_cst_lt (lo, dindex))
{
@@ -3062,7 +3087,7 @@ find_array_ctor_elt (tree ary, tree dind
 size_one_node);
  if (tree_int_cst_equal (lo, new_hi))
/* Only one element left, no longer a range.  */
-   elt.index = lo;
+   elt->index = lo;
  else
TREE_OPERAND (idx, 1) = new_hi;
  /* Append the element we want to insert.  */
@@ -3073,7 +3098,7 @@ find_array_ctor_elt (tree ary, tree dind
}
  

[PATCH] libcpp: Diagnose __has_include outside of preprocessor directives [PR93545]

2020-02-04 Thread Jakub Jelinek
Hi!

The standard says http://eel.is/c++draft/cpp.cond#7.sentence-2 that
__has_include can't appear at arbitrary places in the source.  As we have
not recognized __has_include* outside of preprocessing directives in the
past, accepting it there now would be a regression.  The patch does still
allow it in #define if it is then used in preprocessing directives, I guess
that use isn't strictly valid either, but clang seems to accept it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-04  Jakub Jelinek  

* macro.c (builtin_has_include): Diagnose __has_include* use outside
of preprocessing directives.

* c-c++-common/cpp/has-include-1.c: New test.
* c-c++-common/cpp/has-include-next-1.c: New test.
* c-c++-common/gomp/has-include-1.c: New test.

--- libcpp/macro.c.jj   2020-02-03 14:25:52.511004543 +0100
+++ libcpp/macro.c  2020-02-03 14:29:13.505038048 +0100
@@ -359,6 +359,11 @@ builtin_has_include (cpp_reader *pfile,
 {
   int result = 0;
 
+  if (!pfile->state.in_directive)
+cpp_error (pfile, CPP_DL_ERROR,
+  "\"%s\" used outside of preprocessing directive",
+  NODE_NAME (op));
+
   pfile->state.angled_headers = true;
   const cpp_token *token = cpp_get_token_no_padding (pfile);
   bool paren = token->type == CPP_OPEN_PAREN;
--- gcc/testsuite/c-c++-common/cpp/has-include-1.c.jj   2020-02-03 
15:03:17.970839759 +0100
+++ gcc/testsuite/c-c++-common/cpp/has-include-1.c  2020-02-03 
15:14:11.598180968 +0100
@@ -0,0 +1,104 @@
+/* { dg-do preprocess } */
+
+#if __has_include ("stdlib.h")
+#else
+#error error 1
+#endif
+#if __has_include ()
+#else
+#error error 2
+#endif
+#if !__has_include ("stdlib.h")
+#error error 3
+#elif !__has_include ()
+#error error 4
+#endif
+#if __has_include ("stdlib.h") && __has_include ()
+#else
+#error error 5
+#endif
+#if !defined(__has_include)
+#error error 6
+#endif
+#ifndef __has_include
+#error error 7
+#endif
+#ifdef __has_include
+#else
+#error error 8
+#endif
+#define m1 __has_include("stdlib.h")
+#define m2 ("stdlib.h")
+#define m3 ("has-include-1-nonexistent.h")
+#define m4 has-include-1-nonexistent-2.h>)
+#define m5 
+#if !m1
+#error error 9
+#endif
+#if !__has_include m2
+#error error 10
+#endif
+#if __has_include m3
+#error error 11
+#endif
+#if __has_include () /* { dg-error "used outside of 
preprocessing directive" } */
+m1 /* { dg-error "used outside of 
preprocessing directive" } */
+#if 1
+m1 /* { dg-error "used outside of 
preprocessing directive" } */
+#endif
+#if 0
+#elif 1
+m1 /* { dg-error "used outside of 
preprocessing directive" } */
+#endif
+#if 0
+m1
+#endif
+#if 0
+#elif 0
+m1
+#endif
+#if __has_include "stdlib.h")  /* { dg-error "missing" } */
+#endif
+#if __has_include (stdlib.h)   /* { dg-error "operator|missing" } */
+#endif
+#if __has_include ()   /* { dg-error "operator|missing" } */
+#endif
+#if __has_include )/* { dg-error "operator|missing" } */
+#endif
+#if __has_include ("stdlib.h)
+#endif
+/* { dg-error "operator|missing\[^\n\r]*after" "" { target *-*-* } .-2 } */
+/* { dg-warning "missing terminating" "" { target *-*-* } .-3 } */
+#if __has_include (stdlib.h>)  /* { dg-error "operator|missing" } */
+#endif
+#if __has_include ("stdlib.h"  /* { dg-error "missing" } */
+#endif
+#if __has_include (/* { dg-error "operator|missing" } */
+#endif
+#if __has_include  /* { dg-error "operator|missing" } */
+#endif
+#if __has_include"stdlib.h"/* { dg-error "missing" } */
+#endif
+#if __has_include'h'   /* { dg-error "operator|missing" } */
+#endif
+#if __has_include('h'  /* { dg-error "operator|missing" } */
+#endif
+#if __has_include('h') /* { dg-error "operator" } */
+#endif
+#define H(h) __has_include(h)
+#if H()
+#else
+#error error 14
+#endif
+void
+foo ()
+{
+#pragma omp parallel if (__has_include (""))
+  ;
+}
--- gcc/testsuite/c-c++-common/cpp/has-include-next-1.c.jj  2020-02-03 
15:19:05.068844085 +0100
+++ gcc/testsuite/c-c++-common/cpp/has-include-next-1.c 2020-02-03 
15:19:39.993327980 +0100
@@ -0,0 +1,104 @@
+/* { dg-do preprocess } */
+
+#if __has_include_next ("stdlib.h")
+#else
+#error error 1
+#endif
+#if __has_include_next ()
+#else
+#error error 2
+#endif
+#if !__has_include_next ("stdlib.h")
+#error error 3
+#elif !__has_include_next ()
+#error error 4
+#endif
+#if __has_include_next ("stdlib.h") && __has_include_next ()
+#else
+#error error 5
+#endif
+#if !defined(__has_include_next)
+#error error 6
+#endif
+#ifndef __has_include_next
+#error error 7
+#endif
+#ifdef __has_include_next
+#else
+#error error 8
+#endif
+#define m1 __has_include_next("stdlib.h")
+#define m2 ("stdlib.h")
+#define m3 ("has-include-1-nonexistent.h")
+#define m4 

[PATCH] libcpp: Fix ICEs on __has_include syntax errors [PR93545]

2020-02-04 Thread Jakub Jelinek
Hi!

Some of the following testcases ICE, because one of the cpp_get_token
calls in builtin_has_include reads the CPP_EOF token but the caller isn't
aware that CPP_EOF has been reached and will do another cpp_get_token.
get_token_no_padding is something that is use by the
has_attribute/has_builtin callbacks, which will first peek and will not
consume CPP_EOF (but will consume other tokens).  The !SEEN_EOL ()
check on the other side doesn't work anymore and isn't really needed,
as we don't consume the EOF.  The change adds one further error to the
pr88974.c testcase, if we wanted just one error per __has_include,
we could add some boolean whether we've emitted errors already and
only emit the first one we encounter (not implemented).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-04  Jakub Jelinek  

PR preprocessor/93545
* macro.c (cpp_get_token_no_padding): New function.
(builtin_has_include): Use it instead of cpp_get_token.  Don't check
SEEN_EOL.

* c-c++-common/cpp/pr88974.c: Expect another diagnostics during error
recovery.
* c-c++-common/cpp/pr93545-1.c: New test.
* c-c++-common/cpp/pr93545-2.c: New test.
* c-c++-common/cpp/pr93545-3.c: New test.
* c-c++-common/cpp/pr93545-4.c: New test.

--- libcpp/macro.c.jj   2020-01-31 19:17:47.377126121 +0100
+++ libcpp/macro.c  2020-02-03 10:14:42.447621094 +0100
@@ -336,6 +336,22 @@ unsigned num_expanded_macros_counter = 0
from macro expansion.  */
 unsigned num_macro_tokens_counter = 0;
 
+/* Wrapper around cpp_get_token to skip CPP_PADDING tokens
+   and not consume CPP_EOF.  */
+static const cpp_token *
+cpp_get_token_no_padding (cpp_reader *pfile)
+{
+  for (;;)
+{
+  const cpp_token *ret = cpp_peek_token (pfile, 0);
+  if (ret->type == CPP_EOF)
+   return ret;
+  ret = cpp_get_token (pfile);
+  if (ret->type != CPP_PADDING)
+   return ret;
+}
+}
+
 /* Handle meeting "__has_include" builtin macro.  */
 
 static int
@@ -344,10 +360,10 @@ builtin_has_include (cpp_reader *pfile,
   int result = 0;
 
   pfile->state.angled_headers = true;
-  const cpp_token *token = cpp_get_token (pfile);
+  const cpp_token *token = cpp_get_token_no_padding (pfile);
   bool paren = token->type == CPP_OPEN_PAREN;
   if (paren)
-token = cpp_get_token (pfile);
+token = cpp_get_token_no_padding (pfile);
   else
 cpp_error (pfile, CPP_DL_ERROR,
   "missing '(' before \"%s\" operand", NODE_NAME (op));
@@ -379,7 +395,8 @@ builtin_has_include (cpp_reader *pfile,
   XDELETEVEC (fname);
 }
 
-  if (paren && !SEEN_EOL () && cpp_get_token (pfile)->type != CPP_CLOSE_PAREN)
+  if (paren
+  && cpp_get_token_no_padding (pfile)->type != CPP_CLOSE_PAREN)
 cpp_error (pfile, CPP_DL_ERROR,
   "missing ')' after \"%s\" operand", NODE_NAME (op));
 
--- gcc/testsuite/c-c++-common/cpp/pr88974.c.jj 2020-01-12 11:54:37.002404522 
+0100
+++ gcc/testsuite/c-c++-common/cpp/pr88974.c2020-02-03 10:54:09.700688768 
+0100
@@ -3,4 +3,5 @@
 
 #if __has_include ( character" "" { target *-*-* } .-1 } */
+/* { dg-error "missing '\\\)' after .__has_include. operand" "" { target *-*-* 
} .-2 } */
 #endif
--- gcc/testsuite/c-c++-common/cpp/pr93545-1.c.jj   2020-02-03 
10:41:59.164432679 +0100
+++ gcc/testsuite/c-c++-common/cpp/pr93545-1.c  2020-02-03 10:41:53.540515723 
+0100
@@ -0,0 +1,4 @@
+/* PR preprocessor/93545 */
+/* { dg-do preprocess } */
+
+__has_include  /* { dg-error "" } */
--- gcc/testsuite/c-c++-common/cpp/pr93545-2.c.jj   2020-02-03 
10:42:06.315327128 +0100
+++ gcc/testsuite/c-c++-common/cpp/pr93545-2.c  2020-02-03 10:42:14.138211640 
+0100
@@ -0,0 +1,4 @@
+/* PR preprocessor/93545 */
+/* { dg-do preprocess } */
+
+__has_include (/* { dg-error "" } */
--- gcc/testsuite/c-c++-common/cpp/pr93545-3.c.jj   2020-02-03 
10:42:26.758025328 +0100
+++ gcc/testsuite/c-c++-common/cpp/pr93545-3.c  2020-02-03 10:42:35.528895850 
+0100
@@ -0,0 +1,4 @@
+/* PR preprocessor/93545 */
+/* { dg-do preprocess } */
+
+__has_include ("foobarbaz" /* { dg-error "" } */
--- gcc/testsuite/c-c++-common/cpp/pr93545-4.c.jj   2020-02-03 
10:42:44.563762465 +0100
+++ gcc/testsuite/c-c++-common/cpp/pr93545-4.c  2020-02-03 10:42:59.947535354 
+0100
@@ -0,0 +1,4 @@
+/* PR preprocessor/93545 */
+/* { dg-do preprocess } */
+
+__has_include ()   /* { dg-error "" } */

Jakub



[PATCH][OBVIOUS] Fix release checking build of ARM.

2020-02-04 Thread Martin Liška

Hi.

It's a move of code outside of seltests that are not enabled with
--enable-checking=release.

@Stam: Can you please take a look at observer warnings related to your
code:

/home/marxin/Programming/gcc2/gcc/config/arm/arm.c: In function ‘const char* 
arm_gen_far_branch(rtx_def**, int, const char*, const char*)’:
/home/marxin/Programming/gcc2/gcc/config/arm/arm.c:33089:38: warning: ‘%s’ 
directive output may be truncated writing up to 255 bytes into a region of size 
128 [-Wformat-truncation=]
33089 |   snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
  |  ^~
/home/marxin/Programming/gcc2/gcc/config/arm/arm.c:33089:12: note: ‘snprintf’ 
output 1 or more bytes (assuming 256) into a destination of size 128
33089 |   snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
  |   ~^~~~
/home/marxin/Programming/gcc2/gcc/config/arm/arm.c:33092:38: warning: ‘%s’ 
directive output may be truncated writing up to 255 bytes into a region of size 
between 111 and 121 [-Wformat-truncation=]
33092 |   snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
label_ptr);
  |  ^~~~
/home/marxin/Programming/gcc2/gcc/config/arm/arm.c:33092:12: note: ‘snprintf’ 
output between 9 and 274 bytes into a destination of size 128
33092 |   snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
label_ptr);
  |   
~^

Martin

gcc/ChangeLog:

2020-02-04  Martin Liska  

* config/arm/arm.c (arm_gen_far_branch): Move the function
outside of selftests.
---
 gcc/config/arm/arm.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b5ae7e3e9ce..fe3bc675b42 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -33041,6 +33041,26 @@ arm_run_selftests (void)
 }
 } /* Namespace selftest.  */
 
+#undef TARGET_RUN_TARGET_SELFTESTS
+#define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
+#endif /* CHECKING_P */
+
+/* Worker function for TARGET_MD_ASM_ADJUST, while in thumb1 mode.
+   Unlike the arm version, we do NOT implement asm flag outputs.  */
+
+rtx_insn *
+thumb1_md_asm_adjust (vec , vec &/*inputs*/,
+		  vec ,
+		  vec &/*clobbers*/, HARD_REG_SET &/*clobbered_regs*/)
+{
+  for (unsigned i = 0, n = outputs.length (); i < n; ++i)
+if (strncmp (constraints[i], "=@cc", 4) == 0)
+  {
+	sorry ("asm flags not supported in thumb1 mode");
+	break;
+  }
+  return NULL;
+}
 
 /* Generate code to enable conditional branches in functions over 1 MiB.
Parameters are:
@@ -33075,27 +33095,6 @@ arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
   return "";
 }
 
-#undef TARGET_RUN_TARGET_SELFTESTS
-#define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
-#endif /* CHECKING_P */
-
-/* Worker function for TARGET_MD_ASM_ADJUST, while in thumb1 mode.
-   Unlike the arm version, we do NOT implement asm flag outputs.  */
-
-rtx_insn *
-thumb1_md_asm_adjust (vec , vec &/*inputs*/,
-		  vec ,
-		  vec &/*clobbers*/, HARD_REG_SET &/*clobbered_regs*/)
-{
-  for (unsigned i = 0, n = outputs.length (); i < n; ++i)
-if (strncmp (constraints[i], "=@cc", 4) == 0)
-  {
-	sorry ("asm flags not supported in thumb1 mode");
-	break;
-  }
-  return NULL;
-}
-
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-arm.h"