[Bug middle-end/90549] missing -Wreturn-local-addr maybe returning an address of a local array plus offset

2019-07-08 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90549

Martin Sebor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |10.0

--- Comment #6 from Martin Sebor  ---
Fixed via r273261.  Both functions in the test case are now diagnosed:

pr90549.c: In function ‘f’:
pr90549.c:7:10: warning: function may return address of local variable
[-Wreturn-local-addr]
7 |   return p;// -Wreturn-local-addr (good)
  |  ^
pr90549.c:5:7: note: declared here
5 |   int b[2];
  |   ^
pr90549.c: In function ‘g’:
pr90549.c:15:12: warning: function may return address of local variable
[-Wreturn-local-addr]
   15 |   return p + 1;// missing -Wreturn-local-addr
  |  ~~^~~
pr90549.c:12:7: note: declared here
   12 |   int b[2];
  |   ^

[Bug other/90556] [meta-bug] bogus/missing -Wreturn-local-addr

2019-07-08 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90556
Bug 90556 depends on bug 90549, which changed state.

Bug 90549 Summary: missing -Wreturn-local-addr maybe returning an address of a 
local array plus offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90549

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug other/90556] [meta-bug] bogus/missing -Wreturn-local-addr

2019-07-08 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90556
Bug 90556 depends on bug 71924, which changed state.

Bug 71924 Summary: missing -Wreturn-local-addr returning alloca result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71924

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug c/71924] missing -Wreturn-local-addr returning alloca result

2019-07-08 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71924

Martin Sebor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |10.0

--- Comment #7 from Martin Sebor  ---
Patch committed in r273261.

[Bug c++/64867] split warning for passing non-POD to varargs function from -Wconditionally-supported into new warning flag, -Wnon-pod-varargs

2019-07-08 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64867

Eric Gallager  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org

--- Comment #26 from Eric Gallager  ---
Martin Sebor has been doing stuff related to warnings about POD-ness lately;
cc-ing him

[Bug middle-end/90549] missing -Wreturn-local-addr maybe returning an address of a local array plus offset

2019-07-08 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90549

--- Comment #5 from Martin Sebor  ---
Author: msebor
Date: Tue Jul  9 04:15:42 2019
New Revision: 273261

URL: https://gcc.gnu.org/viewcvs?rev=273261=gcc=rev
Log:
PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of
a local array plus offset

gcc/ChangeLog:

PR middle-end/71924
PR middle-end/90549
* gimple-ssa-isolate-paths.c (isolate_path): Add attribute.  Update
comment.
(args_loc_t): New type.
(args_loc_t, locmap_t): same.
(diag_returned_locals): New function.
(is_addr_local): Same.
(handle_return_addr_local_phi_arg, warn_return_addr_local): Same.
(find_implicit_erroneous_behavior): Call
warn_return_addr_local_phi_arg.
(find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

PR middle-end/71924
PR middle-end/90549
* gcc.c-torture/execute/return-addr.c: New test.
* gcc.dg/Wreturn-local-addr-2.c: New test.
* gcc.dg/Wreturn-local-addr-4.c: New test.
* gcc.dg/Wreturn-local-addr-5.c: New test.
* gcc.dg/Wreturn-local-addr-6.c: New test.
* gcc.dg/Wreturn-local-addr-7.c: New test.
* gcc.dg/Wreturn-local-addr-8.c: New test.
* gcc.dg/Wreturn-local-addr-9.c: New test.
* gcc.dg/Wreturn-local-addr-10.c: New test.
* gcc.dg/Walloca-4.c: Handle expected warnings.
* gcc.dg/pr41551.c: Same.
* gcc.dg/pr59523.c: Same.
* gcc.dg/tree-ssa/pr88775-2.c: Same.
* gcc.dg/tree-ssa/alias-37.c: Same.
* gcc.dg/winline-7.c: Same.


Added:
trunk/gcc/testsuite/gcc.c-torture/execute/return-addr.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-10.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-2.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-3.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-4.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-5.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-6.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-7.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-8.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-9.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimple-ssa-isolate-paths.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/Walloca-4.c
trunk/gcc/testsuite/gcc.dg/pr41551.c
trunk/gcc/testsuite/gcc.dg/pr59523.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/alias-37.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr88775-2.c
trunk/gcc/testsuite/gcc.dg/winline-7.c
trunk/libgcc/generic-morestack.c

[Bug c/71924] missing -Wreturn-local-addr returning alloca result

2019-07-08 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71924

--- Comment #6 from Martin Sebor  ---
Author: msebor
Date: Tue Jul  9 04:15:42 2019
New Revision: 273261

URL: https://gcc.gnu.org/viewcvs?rev=273261=gcc=rev
Log:
PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of
a local array plus offset

gcc/ChangeLog:

PR middle-end/71924
PR middle-end/90549
* gimple-ssa-isolate-paths.c (isolate_path): Add attribute.  Update
comment.
(args_loc_t): New type.
(args_loc_t, locmap_t): same.
(diag_returned_locals): New function.
(is_addr_local): Same.
(handle_return_addr_local_phi_arg, warn_return_addr_local): Same.
(find_implicit_erroneous_behavior): Call
warn_return_addr_local_phi_arg.
(find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

PR middle-end/71924
PR middle-end/90549
* gcc.c-torture/execute/return-addr.c: New test.
* gcc.dg/Wreturn-local-addr-2.c: New test.
* gcc.dg/Wreturn-local-addr-4.c: New test.
* gcc.dg/Wreturn-local-addr-5.c: New test.
* gcc.dg/Wreturn-local-addr-6.c: New test.
* gcc.dg/Wreturn-local-addr-7.c: New test.
* gcc.dg/Wreturn-local-addr-8.c: New test.
* gcc.dg/Wreturn-local-addr-9.c: New test.
* gcc.dg/Wreturn-local-addr-10.c: New test.
* gcc.dg/Walloca-4.c: Handle expected warnings.
* gcc.dg/pr41551.c: Same.
* gcc.dg/pr59523.c: Same.
* gcc.dg/tree-ssa/pr88775-2.c: Same.
* gcc.dg/tree-ssa/alias-37.c: Same.
* gcc.dg/winline-7.c: Same.


Added:
trunk/gcc/testsuite/gcc.c-torture/execute/return-addr.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-10.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-2.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-3.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-4.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-5.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-6.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-7.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-8.c
trunk/gcc/testsuite/gcc.dg/Wreturn-local-addr-9.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimple-ssa-isolate-paths.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/Walloca-4.c
trunk/gcc/testsuite/gcc.dg/pr41551.c
trunk/gcc/testsuite/gcc.dg/pr59523.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/alias-37.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr88775-2.c
trunk/gcc/testsuite/gcc.dg/winline-7.c
trunk/libgcc/generic-morestack.c

[Bug target/91103] AVX512 vector element extract uses more than 1 shuffle instruction; VALIGND can grab any element

2019-07-08 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103

--- Comment #4 from Peter Cordes  ---
We should not put any stock in what ICC does for GNU C native vector indexing. 
I think it doesn't know how to optimize that because it *always* spills/reloads
even for `vec[0]` which could be a no-op.  And it's always a full-width spill
(ZMM), not just the low XMM/YMM part that contains the desired element.  I
mainly mentioned ICC in my initial post to suggest the store/reload strategy in
general as an *option*.

ICC also doesn't optimize intriniscs: it pretty much always faithfully
transliterates them to asm.  e.g. v = _mm_add_epi32(v, _mm_set1_epi32(1)); 
twice compiles to two separate paddd instructions, instead of one with a
constant of set1(2).

If we want to see ICC's strided-store strategy, we'd need to write some pure C
that auto-vectorizes.



That said, store/reload is certainly a valid option when we want all the
elements, and gets *more* attractive with wider vectors, where the one extra
store amortizes over more elements.

Strided stores will typically bottleneck on cache/memory bandwidth unless the
destination lines are already hot in L1d.  But if there's other work in the
loop, we care about OoO exec of that work with the stores, so uop throughput
could be a factor.


If we're tuning for Intel Haswell/Skylake with 1 per clock shuffles but 2 loads
+ 1 store per clock throughput (if we avoid indexed addressing modes for
stores), then it's very attractive and unlikely to be a bottleneck.

There's typically spare load execution-unit cycles in a loop that's also doing
stores + other work.  You need every other uop to be (or include) a load to
bottleneck on that at 4 uops per clock, unless you have indexed stores (which
can't run on the simple store-AGU on port 7 and need to run on port 2/3, taking
a cycle from a load).   Cache-split loads do get replayed to grab the 2nd half,
so it costs extra execution-unit pressure as well as extra cache-read cycles.

Intel says Ice will have 2 load + 2 store pipes, and a 2nd shuffle unit.  A
mixed strategy there might be interesting: extract the high 256 bits to memory
with vextractf32x8 and reload it, but shuffle the low 128/256 bits.  That
strategy might be good on earlier CPUs, too.  At least with movss + extractps
stores from the low XMM where we can do that directly.

AMD before Ryzen 2 has only 2 AGUs, so only 2 memory ops per clock, up to one
of which can be a store.  It's definitely worth considering extracting the high
128-bit half of a YMM and using movss then shuffles like vextractps: 2 uops on
Ryzen or AMD.


-

If the stride is small enough (so more than 1 element fits in a vector), we
should consider  shuffle + vmaskmovps  masked stores, or with AVX512 then
AVX512 masked stores.

But for larger strides, AVX512 scatter may get better in the future.  It's
currently (SKX) 43 uops for VSCATTERDPS or ...DD ZMM, so not very friendly to
surrounding code.  It sustains one per 17 clock throughput, slightly worse than
1 element stored per clock cycle.  Same throughput on KNL, but only 4 uops so
it can overlap much better with surrounding code.




For qword elements, we have efficient stores of the high or low half of an XMM.
 A MOVHPS store doesn't need a shuffle uop on most Intel CPUs.  So we only need
1 (YMM) or 3 (ZMM) shuffles to get each of the high 128-bit lanes down to an
XMM register.

Unfortunately on Ryzen, MOVHPS [mem], xmm costs a shuffle+store.  But Ryzen has
shuffle EUs on multiple ports.

[Bug c++/91118] New: ubsan does not work with openmp default (none) directive

2019-07-08 Thread alan.avbs at rocketmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91118

Bug ID: 91118
   Summary: ubsan does not work with openmp default (none)
directive
   Product: gcc
   Version: 9.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alan.avbs at rocketmail dot com
  Target Milestone: ---

Program fails to compile when using -fsanitize=undefined and using
the directive default(none) in a parallel region.

For instance:

#include 

int main()
{
#pragma omp parallel default(none) shared(std::cerr)
{
std::cerr<<"hello"

[Bug target/91117] New: _mm_movpi64_epi64/_mm_movepi64_pi64 generating store+load instead of using MOVQ2DQ/MOVDQ2Q

2019-07-08 Thread wolfwings+gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91117

Bug ID: 91117
   Summary: _mm_movpi64_epi64/_mm_movepi64_pi64 generating
store+load instead of using MOVQ2DQ/MOVDQ2Q
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wolfwings+gcc at gmail dot com
  Target Milestone: ---

_mm_movpi64_epi64 is never using MOVQ2DQ (and _mm_movepi64_pi64 never using
MOVDQ2Q) despite documentation it should when used in mixed MMX -> SSE
situations, and that these are in fact the intrinsics to use when desiring the
Q2DQ/DQ2Q opcodes.

This appears to be due to the header defining them causing fallback memory
write then read except in (technically invalid) SSE -> SSE cases where a MOVD
is used.

Tested on GCC 7.4 + 9.1 locally, with additional testing on Godbolt all showing
identical code being generated all the way back to 4.x series.

Compiled with -O1:

#include 

__m128i test( __m128i input ) {
__m64 x = _mm_movepi64_pi64( input );
return _mm_movpi64_epi64( _mm_mullo_pi16( x, x ) );
}

Generated assembly on GCC 9.1:

movq%xmm0, -16(%rsp)
movq-16(%rsp), %mm0
movq%mm0, %mm1
pmullw  %mm0, %mm1
movq%mm1, -16(%rsp)
movq-16(%rsp), %xmm0
ret

A version that makes explicit calls to movq2dq/movdq2q works and outputs the
expected assembly sequence:

#include 

static inline __m64 _my_movepi64_pi64( __m128i input ) {
__m64 result;
asm( "movdq2q %1, %0" : "=y" (result) : "x" (input) : );
return result;
}

static inline __m128i _my_movpi64_epi64( __m64 input ) {
__m128i result;
asm( "movq2dq %1, %0" : "=x" (result) : "y" (input) : );
return result;
}

__m128i test( __m128i input ) {
__m64 x = _my_movepi64_pi64( input );
return _my_movpi64_epi64( _mm_mullo_pi16( x, x ) );
}

Generated assembly on GCC 7.4, 9.1, and others via Godbolt, again with -O1 (-O2
and -O3 make no difference):

movdq2q %xmm0, %mm0
pmullw  %mm0, %mm0
movq2dq %mm0, %xmm0
ret

For completeness, ICC generates the 'short' code form on all available versions
without needing the inline assembly workaround.

[Bug c++/91110] [10 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in cp_omp_mappable_type_1, at cp/decl2.c:1421

2019-07-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91110

--- Comment #3 from Jakub Jelinek  ---
Author: jakub
Date: Mon Jul  8 22:08:27 2019
New Revision: 273248

URL: https://gcc.gnu.org/viewcvs?rev=273248=gcc=rev
Log:
PR c++/91110
* decl2.c (cp_omp_mappable_type_1): Don't emit any note for
error_mark_node type.

* g++.dg/gomp/pr91110.C: New test.

Added:
trunk/gcc/testsuite/g++.dg/gomp/pr91110.C
Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/decl2.c
trunk/gcc/testsuite/ChangeLog

[Bug c++/61339] add mismatch between struct and class [-Wmismatched-tags] to non-bugs

2019-07-08 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61339

Martin Sebor  changed:

   What|Removed |Added

   Keywords||patch

--- Comment #11 from Martin Sebor  ---
Patch: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00621.html

[Bug target/91116] New: bad register choices for rs6000 -m32

2019-07-08 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91116

Bug ID: 91116
   Summary: bad register choices for rs6000 -m32
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

In the new testcase pr88233.c, which is

typedef struct { double a[2]; } A;
A
foo (const A *a)
{
  return *a;
}

we currently get as generated code for -m32

addi 10,4,4
lfiwzx 10,0,4
addi 9,3,12
lfiwzx 11,0,10
addi 10,4,8
lfiwzx 12,0,10
addi 10,4,12
stfiwx 10,0,3
lfiwzx 0,0,10
addi 10,3,4
stfiwx 11,0,10
addi 10,3,8
stfiwx 12,0,10
stfiwx 0,0,9
blr


Expand decides to do this as four SImode copies, which isn't such a great
idea, of course; but RA thinks it is cost 0 to put a SImode in an FP or
altivec register.  That won't fly.

[Bug rtl-optimization/88233] combine fails to merge insns leaving unneeded reg copies

2019-07-08 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88233

--- Comment #4 from Segher Boessenkool  ---
Author: segher
Date: Mon Jul  8 20:38:46 2019
New Revision: 273245

URL: https://gcc.gnu.org/viewcvs?rev=273245=gcc=rev
Log:
rs6000: Add testcase for PR88233

This testcase tests that with -mcpu=power8 we do not generate any
mtvsr* instructions, and we do the copy with {l,st}xvd2x.


gcc/testsuite/
PR rtl-optimization/88233
* gcc.target/powerpc/pr88233.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.target/powerpc/pr88233.c
Modified:
trunk/gcc/testsuite/ChangeLog

[Bug c++/91073] [9/10 Regression] if constexpr no longer works directly with Concepts

2019-07-08 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91073

--- Comment #2 from Paolo Carlini  ---
In principle the issue is rather simple. The
cp_parser_maybe_commit_to_declaration at the beginning of cp_parser_condition
since r260482 thinks erroneously that the just parsed HasInit must be a declaration. In practice, I'm still not sure which is the best
way to solve this... well, I'm not even sure we are supposed to actively work
now on relatively minor concept-related issues.

[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135

2019-07-08 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109

--- Comment #2 from Christophe Lyon  ---
Removing the test*() calls from the end, the first failing one is testX().
However, if I remove all the preceding ones, the test passes.

Using -fwhole-program instead of -flto has no effect: the test still fails.

Adding a printf() call in check() also makes the test pass.

[Bug target/61577] [4.9.0] can't compile on hp-ux v3 ia64

2019-07-08 Thread dave.anglin at bell dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61577

--- Comment #64 from dave.anglin at bell dot net ---
On 2019-07-08 2:51 p.m., elowe at elowe dot com wrote:
> I made a very simple change:
>
> --- ia64.c.orig 2019-07-08 14:43:33 +
> +++ ia64.c  2019-07-05 16:46:24 +
> @@ -1137,7 +1137,7 @@
>  emit_insn (gen_load_fptr (dest, src));
>else if (sdata_symbolic_operand (src, VOIDmode))
>  emit_insn (gen_load_gprel (dest, src));
> -  else if (local_symbolic_operand64 (src, VOIDmode))
> +  else if (local_symbolic_operand64 (src, VOIDmode) && !TARGET_HPUX)
>  {
>/* We want to use @gprel rather than @ltoff relocations for local
>  symbols:
>
> Which I think has the same effect as disabling it in predicate. I'm happy with
> either approach.
Okay, I assume we are now at the problem in comment #58.  Would you upload the
final RTL
dump for "IsLower.c" ("-da" opttion will generate)?  It would also be useful to
find the change
which introduced the regression for "IsLower.c".

You could post the above patch with a ChangeLog to gcc-patches.  It's small
enough that a FSF
assignment shouldn't be needed.

[Bug sanitizer/91115] stack-buffer-overflow on memset local variable when creating thread on ARM Linux

2019-07-08 Thread fhsueh at roku dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91115

--- Comment #1 from Fred Hsueh  ---
Created attachment 46580
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46580=edit
Fixup memory location of shadow

This shadow location works better than the 32-bit default.

[Bug sanitizer/91115] New: stack-buffer-overflow on memset local variable when creating thread on ARM Linux

2019-07-08 Thread fhsueh at roku dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91115

Bug ID: 91115
   Summary: stack-buffer-overflow on memset local variable when
creating thread on ARM Linux
   Product: gcc
   Version: 8.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fhsueh at roku dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

I'm getting a ASAN stack-buffer-overflow when thread is starting on ARM Linux.
gcc-8.3 and glibc-2.22. Here's the output, cleaned up a bit:

>
==1541==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x9bffebf8 at
pc 0xa3585e98 bp 0x9bffebc4 sp 0x9bffe790
WRITE of size 36 at 0x9bffebf8 thread T10
#0 0xa3585e97 in __interceptor_memset
gcc-8.3.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:709
#1 0x9f6d378b in __pthread_attr_init_2_1
glibc-2.22/nptl/pthread_attr_init.c:41
#2 0xa3619053 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned
long*, unsigned long*)
gcc-8.3.0/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc:105
#3 0xa361940b in __sanitizer::GetThreadStackAndTls(bool, unsigned long*,
unsigned long*, unsigned long*, unsigned long*)
gcc-8.3.0/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc:415
#4 0xa360f147 in
__asan::AsanThread::SetThreadStackAndTls(__asan::AsanThread::InitOptions
const*) gcc-8.3.0/libsanitizer/asan/asan_thread.cc:287
#5 0xa360f237 in __asan::AsanThread::Init(__asan::AsanThread::InitOptions
const*) gcc-8.3.0/libsanitizer/asan/asan_thread.cc:224
#6 0xa360f367 in __asan::AsanThread::ThreadStart(unsigned long,
__sanitizer::atomic_uintptr_t*) gcc-8.3.0/libsanitizer/asan/asan_thread.cc:241
#7 0x9f6d1d63 in start_thread glibc-2.22/nptl/pthread_create.c:336

Address 0x9bffebf8 is located in stack of thread T9 at offset 664 in frame
#0 0x25b6e3f in _M_run arm-roku-linux-gnueabi/include/c++/8.3.0/thread:196

  This frame has 13 object(s):
[32, 36) 'bt'
[96, 100) 'bt'
[160, 168) ''
[224, 232) ''
[288, 296) ''
[352, 360) ''
[416, 424) ''
[480, 488) ''
[544, 552) 'lock'
[608, 620) 'cd'
[672, 684) 'cd' <== Memory access at offset 664 partially underflows this
variable
[736, 748) ''
[800, 812) ''
HINT: this may be a false positive if your program uses some custom stack
unwind mechanism or swapcontext
  (longjmp and C++ exceptions *are* supported)
Thread T9 created by T0 here:
#0 0xa35cdc1f in __interceptor_pthread_create
gcc-8.3.0/libsanitizer/asan/asan_interceptors.cc:202
#1 0x9f83d543 in
std::thread::_M_start_thread(std::unique_ptr >, void (*)())
(/usr/lib/libstdc++.so.6+0x9c543)

SUMMARY: AddressSanitizer: stack-buffer-overflow
gcc-8.3.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:709
in __interceptor_memset
Shadow bytes around the buggy address:
  0x437ffd20: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x437ffd30: 04 f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 f2 f2 f2 f2
  0x437ffd40: 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 f2 f2
  0x437ffd50: 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 f2 f2
  0x437ffd60: 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 f2 f2
=>0x437ffd70: 00 f2 f2 f2 f2 f2 f2 f2 00 04 f2 f2 f2 f2 f2[f2]
  0x437ffd80: 00 04 f2 f2 f2 f2 f2 f2 00 04 f2 f2 f2 f2 f2 f2
  0x437ffd90: 00 04 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
  0x437ffda0: 00 00 00[  363.983356] grsec: bruteforce prevention initiated for
the next 30 minutes or until service restarted, stalling each fork 30 seconds. 
Please investigate the crash report for /bin/Application[Application:1541]
uid/euid:0/0 gid/egid:0/0, parent /bin/Application[Application:1480]
uid/euid:501/501 gid/egid:501/501
 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x437ffdb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x437ffdc0: 00 00 00 00  364.068455] ltcore_dump: starting dump
m 00 00 00 00 00 00 grsec: From 10.14.24.38: denied resource overstep by
requesting 52 for RLIMIT_CORE against limit 0 for
/bin/Application[Application:1542] uid/euid:501/501 gid/egid:501/501, parent
/bin/busybox[sh:1394] uid/euid:0/0 gid/egid:0/0
[1m 00  grsec: From 10.14.24.38: denied resource overstep by requesting 84 for
RLIMIT_CORE against limit 0 for /bin/Application[Application:1542]
uid/euid:501/501 gid/egid:501/501, parent /bin/busybox[sh:1394] uid/euid:0/0
gid/egid:0/0
[0m00 00[  364.128767] grsec: From 10.14.24.38: denied resource overstep by
requesting 116 for RLIMIT_CORE against limit 0 for
/bin/Application[Application:1542] uid/euid:501/501 gid/egid:501/501, parent
/bin/busybox[sh:1394] uid/euid:0/0 gid/egid:0/0
 00 00  364.152847] grsec: From 10.14.24.38: denied resource overstep by
requesting 148 for RLIMIT_CORE against limit 0 for

[Bug target/61577] [4.9.0] can't compile on hp-ux v3 ia64

2019-07-08 Thread elowe at elowe dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61577

--- Comment #63 from EML  ---
Sorry, I didn't undo the patch completely.

I made a very simple change:

--- ia64.c.orig 2019-07-08 14:43:33 +
+++ ia64.c  2019-07-05 16:46:24 +
@@ -1137,7 +1137,7 @@
 emit_insn (gen_load_fptr (dest, src));
   else if (sdata_symbolic_operand (src, VOIDmode))
 emit_insn (gen_load_gprel (dest, src));
-  else if (local_symbolic_operand64 (src, VOIDmode))
+  else if (local_symbolic_operand64 (src, VOIDmode) && !TARGET_HPUX)
 {
   /* We want to use @gprel rather than @ltoff relocations for local
 symbols:

Which I think has the same effect as disabling it in predicate. I'm happy with
either approach.

[Bug tree-optimization/91114] New: [10 Regression] ICE in vect_analyze_loop, at tree-vect-loop.c:2415

2019-07-08 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91114

Bug ID: 91114
   Summary: [10 Regression] ICE in vect_analyze_loop, at
tree-vect-loop.c:2415
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: ice-checking, ice-on-valid-code, openmp
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---
Target: x86_64-unknown-linux-gnu

gcc-10.0.0-alpha20190707 snapshot (r273184) ICEs when compiling the following
testcase w/ -O1 -fopenmp-simd:

void
ne (double *zu)
{
  int h3;

#pragma omp simd simdlen (4)
  for (h3 = 0; h3 < 4; ++h3)
zu[h3] = 0;
}

% x86_64-unknown-linux-gnu-gcc-10.0.0-alpha20190707 -O1 -fopenmp-simd -c
hnkztevu.c
during GIMPLE pass: vect
hnkztevu.c: In function 'ne':
hnkztevu.c:2:1: internal compiler error: in vect_analyze_loop, at
tree-vect-loop.c:2415
2 | ne (double *zu)
  | ^~
0x6fe6b2 vect_analyze_loop(loop*, _loop_vec_info*, vec_info_shared*)
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/tree-vect-loop.c:2415
0xfc5495 try_vectorize_loop_1
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/tree-vectorizer.c:886
0xfc613f vectorize_loops()
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/tree-vectorizer.c:1114

[Bug tree-optimization/91010] ICE: Segmentation fault (in location_wrapper_p)

2019-07-08 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91010

--- Comment #4 from Arseny Solokha  ---
Can this PR be closed now?

[Bug rtl-optimization/88233] combine fails to merge insns leaving unneeded reg copies

2019-07-08 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88233

--- Comment #3 from Segher Boessenkool  ---
Author: segher
Date: Mon Jul  8 17:35:12 2019
New Revision: 273240

URL: https://gcc.gnu.org/viewcvs?rev=273240=gcc=rev
Log:
subreg: Add -fsplit-wide-types-early (PR88233)

Currently the second lower-subreg pass is run right before RA.  This
is much too late to be very useful.  At least for targets that do not
have RTL patterns for operations on multi-register modes it is a lot
better to split patterns earlier, before combine and all related
passes.

This adds an option -fsplit-wide-types-early that does that, and
enables it by default for rs6000.


PR rtl-optimization/88233
* common.opt (fsplit-wide-types-early): New option.
* common/config/rs6000/rs6000-common.c
(rs6000_option_optimization_table): Add OPT_fsplit_wide_types_early for
OPT_LEVELS_ALL.
* doc/invoke.texi (Optimization Options): Add -fsplit-wide-types-early.
* lower-subreg.c (pass_lower_subreg2::gate): Add test for
flag_split_wide_types_early.
(pass_data_lower_subreg3): New.
(pass_lower_subreg3): New.
(make_pass_lower_subreg3): New.
* passes.def (pass_lower_subreg2): Move after the loop passes.
(pass_lower_subreg3): New, inserted where pass_lower_subreg2 was.
* tree-pass.h (make_pass_lower_subreg2): Move up, to its new place in
the pass pipeline; its previous place is taken by ...
(make_pass_lower_subreg3): ... this.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/common.opt
trunk/gcc/common/config/rs6000/rs6000-common.c
trunk/gcc/doc/invoke.texi
trunk/gcc/lower-subreg.c
trunk/gcc/passes.def
trunk/gcc/tree-pass.h

[Bug testsuite/78529] gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2

2019-07-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529

Wilco  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||wilco at gcc dot gnu.org
 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |wilco at gcc dot gnu.org

--- Comment #40 from Wilco  ---
Fixed

[Bug testsuite/78529] gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2

2019-07-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529

--- Comment #39 from Wilco  ---
Author: wilco
Date: Mon Jul  8 17:02:35 2019
New Revision: 273238

URL: https://gcc.gnu.org/viewcvs?rev=273238=gcc=rev
Log:
Turn of ipa-ra in builtins test (PR91059)

The gcc.c-torture/execute/builtins/lib directory contains a reimplementation
of many C library string functions, which causes non-trivial register
allocation
bugs with LTO and static linked libraries.  To fix this long-standing test
issue, turn off ipa-ra which avoids the register corruption across calls.  All
builtin torture tests now pass on aarch64-none-elf.  Committed as obvious.

testsuite/
PR testsuite/91059
PR testsuite/78529
* gcc.c-torture/execute/builtins/builtins.exp: Add -fno-ipa-ra.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp

[Bug c/91092] Error on implicit function declarations by default

2019-07-08 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91092

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-07-08
 Ever confirmed|0   |1

--- Comment #12 from Segher Boessenkool  ---
Given the above, I don't think it can ever be ready in time for GCC 10.
But, confirmed.

[Bug testsuite/91059] [10 regression] gcc.c-torture/execute/builtins/snprintf-chk.c fails on aarch64-elf since r272843

2019-07-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91059

--- Comment #5 from Wilco  ---
Author: wilco
Date: Mon Jul  8 17:02:35 2019
New Revision: 273238

URL: https://gcc.gnu.org/viewcvs?rev=273238=gcc=rev
Log:
Turn of ipa-ra in builtins test (PR91059)

The gcc.c-torture/execute/builtins/lib directory contains a reimplementation
of many C library string functions, which causes non-trivial register
allocation
bugs with LTO and static linked libraries.  To fix this long-standing test
issue, turn off ipa-ra which avoids the register corruption across calls.  All
builtin torture tests now pass on aarch64-none-elf.  Committed as obvious.

testsuite/
PR testsuite/91059
PR testsuite/78529
* gcc.c-torture/execute/builtins/builtins.exp: Add -fno-ipa-ra.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp

[Bug testsuite/91059] [10 regression] gcc.c-torture/execute/builtins/snprintf-chk.c fails on aarch64-elf since r272843

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91059

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Richard Biener  ---
duplicate then

*** This bug has been marked as a duplicate of bug 78529 ***

[Bug testsuite/78529] gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529

Richard Biener  changed:

   What|Removed |Added

 CC||clyon at gcc dot gnu.org

--- Comment #38 from Richard Biener  ---
*** Bug 91059 has been marked as a duplicate of this bug. ***

[Bug target/61577] [4.9.0] can't compile on hp-ux v3 ia64

2019-07-08 Thread dave.anglin at bell dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61577

--- Comment #62 from dave.anglin at bell dot net ---
On 2019-07-08 12:22 p.m., elowe at elowe dot com wrote:
> When I remove that gprel patch - the 64bit stage 1 compiler is able to compile
> hello world, islower, as well as all the other "conftest" programs
> successfully. It can compile libstdc++ as well (some duplicate symbols
> however).
I doubt removing the gprel patch is an acceptable solution as it fixed a bug on
Linux.  A better
solution is to disable the local_symbolic_operand64 predicate on hpux.

That should fix hello world.  Then, we can move to other issues.

[Bug testsuite/91059] [10 regression] gcc.c-torture/execute/builtins/snprintf-chk.c fails on aarch64-elf since r272843

2019-07-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91059

--- Comment #3 from Wilco  ---
Confirmed it's the same memset register corruption issue. The fix is trivial:
add -fno-ipa-ra.

[Bug c/91092] Error on implicit function declarations by default

2019-07-08 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91092

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #11 from Rich Felker  ---
I'm strongly in favor of fixing this, but the configure situation is a mess.
Doing this needs both an active project to fix configure scripts (starting with
upstream autoconf/gnulib ones, and at least in the past, the maintainers'
misguided opinions that testing for symbol presence with missing or invalid
declarations was a valid configure test! see: https://ewontfix.com/13/) and
probably one of the workarounds described above (detecting configure use and
not erroring out in that case?).

[Bug target/61577] [4.9.0] can't compile on hp-ux v3 ia64

2019-07-08 Thread elowe at elowe dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61577

--- Comment #61 from EML  ---
Sorry, perhaps I have confused the situation.

I have already patched my compiler to remove the gprel in both 32 and 64.

That gprel patch breaks things in both 32 and 64. I'm reasonably convinced the
patch is wrong for HP-UX, so I'm moving forward with that assumption.

When I remove that gprel patch - the 64bit stage 1 compiler is able to compile
hello world, islower, as well as all the other "conftest" programs
successfully. It can compile libstdc++ as well (some duplicate symbols
however).

However, the 32-bit compiler does not work which I believe to be a pointer
swizzle issue.

I've confirmed the binary is 32bit as follows:

-bash-5.0$ file islower
islower:ELF-32 executable object file - IA64

-bash-5.0$ elfdump -f islower

islower:

*** ELF Header ***

Class:   ELF-32
Data:Big-endian
OS:  HP-UX
ABI Version: 1
Type:EXEC
Machine: IPF
Version: 1
Entry Addr:  0x40008b0
Program Hdr Offset:  0x34
Section Hdr Offset:  0x1104c
Flags:   trapnil
Flags:   big-endian PSR
Flags:   IA-64
Elf Hdr Size:0x34
Program Hdr Size:0x20  
Program Hdr Number:  12
Section Hdr Size:0x28  
Section Hdr Number:  43
Section Hdr String Idx:  42

[Bug c/89072] -Wall -Werror should be defaults

2019-07-08 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89072

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #2 from Rich Felker  ---
Just here to second that -Werror should never be the default and that it's
pretty much entirely wrong. -Werror is useful for imposing development policy
in a development environment you control. It's not at all okay for shipping
source that the user will compile in an environment you don't control.

[Bug target/61577] [4.9.0] can't compile on hp-ux v3 ia64

2019-07-08 Thread dave.anglin at bell dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61577

--- Comment #60 from dave.anglin at bell dot net ---
On 2019-07-08 12:07 a.m., elowe at elowe dot com wrote:
> If you insert the addp4 r14 = 0,r14 before that command (like gcc 4.9.3 does),
> the program compiles and runs correctly
It would be useful to do a regression search to determine the revision that
changed the above behavior.
>
> I'll upload the .s for "IsLower.c" - it's definitely a 32 bit executable, so
> the correct options are being passed around.
I'm not sure why you say the .s for "IsLower.c" is a 32-bit executable and that
the correct options are
being passed around.  You haven't shown the assembler or linker commands used
to create the executable.
For applications like the hello world program, there is very little difference
between the 32 and 64-bit
assembler output generated by gcc (cc1).

I'm still trying to understand the problem with the gprel relocation.  It seems
to work in 64-bit but not
in 32-bit.  While there might be issues with assembler or linker, you are
probably correct that we need
to swizzle pointer with ILP32.

You could try adding something like the following to this hunk after the
emit_insn() line:

  else if (local_symbolic_operand64 (src, VOIDmode))
    {
  /* We want to use @gprel rather than @ltoff relocations for local
 symbols:
  - @gprel does not require dynamic linker
  - and does not use .sdata section
 https://gcc.gnu.org/bugzilla/60465 */
  emit_insn (gen_load_gprel64 (dest, src));
    }

if (TARGET_ILP32)
  {
    rtx tmp;
    tmp = gen_rtx_REG_offset (dest, ptr_mode, REGNO (dest),
   byte_lowpart_offset (ptr_mode, GET_MODE
(dest)));
    REG_POINTER (tmp) = 1;
    emit_insn (gen_ptr_extend (dest, tmp));
  }

Alternatively, you could try disabling the local_symbolic_operand64 predicate
in predicates.md:
(define_predicate "local_symbolic_operand64"
  (match_code "symbol_ref,const")
{
  switch (GET_CODE (op))
    {

Just add

if (TARGET_ILP32)
  return false;

before switch statement.

[Bug c/91113] New: add declare_simd_variant attribute support

2019-07-08 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91113

Bug ID: 91113
   Summary: add declare_simd_variant attribute support
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

to declare vector functions on aarch64 for one simd architecture only,
support for the openmp 5.0 declare variant syntax is required, but
full support for the omp declare variant pragma is excessive.
(for the aarch64 use-case, see user defined vector functions in
https://developer.arm.com/docs/101129/latest )

I suggest introducing an attribute in gcc that can handle a subset
of omp declare variant pragma and works in c and fortran declarations
for declare simd functions.

I think the syntax and semantics for the attribute should follow
the proposal for clang (without the clang_ prefix):
http://lists.llvm.org/pipermail/llvm-dev/2019-June/132987.html

```
declare_simd_variant
  (, {, })

:= The name of a function variant that is a base
language identifier, or, for C++, a template-id.

 := , {, }

 := simdlen() | simdlen("scalable")

:= inbranch | notinbranch

 :=  
 | 
 |   | {,}

  := linear_ref(,)
  | linear_var(, )
  | linear_uval(, )
  | linear(, )

 :=  | 

 := uniform()

   := align(, )

 := Name of a parameter in the scalar function declaration/definition

 := ... | -2 | -1 | 1 | 2 | ...

 := 1 | 2 | 3 | ...

 := {}{,} {}

 := isa(target-specific-value)

 := arch(target-specific-value)
```

example usage:
```
__attribute__(declare_simd_variant("vfoo", simdlen(2), notinbranch,
isa("simd"))
double foo(double x);

float64x2_t vfoo(float64x2_t vx);
```

should be equivalent to the openmp 5.0 code
```
#pragma omp declare variant(vfoo) \
  match(construct={simd(simdlen(2), notinbranch)}, device={isa("simd")})
double foo(double x);

float64x2_t vfoo(float64x2_t vx);
```

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #16 from Jakub Jelinek  ---
(In reply to Jakub Jelinek from comment #15)
> Seems systemd abuses compound literals even in cases where they make no
> sense, perhaps one of those in a short function like that is no longer
> optimized away completely and that is why it triggers all the
> __asan_malloc_0 calls in there where formerly it got away without that.
> E.g.
> #define assert_cc(expr) \
> struct CONCATENATE(_assert_struct_, __COUNTER__) {  \
> char x[(expr) ? 0 : -1];\
> };
> doesn't make any sense to me, why not say
> do { extern char CONCATENATE(_assert_var_, __COUNTER__) [(expr) ? 0 : -1]; }
> while (0)
> instead?
> The IN_SET macro has another compound literal:
> assert_cc((sizeof((long double[]){__VA_ARGS__})/sizeof(long double)) <= 20);
> It would surprise me if you can't do such counting without resorting to
> compound literals.

As IN_SET is turning the __VA_ARGS__ arguments into case N:, those have to be
constant expressions, so you could say replace IN_SET's
assert_cc((sizeof((long double[]){__VA_ARGS__})/sizeof(long double)) <= 20);
with static long double __assert_in_set __attribute__((__unused__)) [] = {
__VA_ARGS__ };
assert_cc(sizeof (__assert_in_set)/sizeof(long double)) <= 20);
or similar, this is in its own scope, so doesn't need to use any __COUNTER__
etc.
With -O1 and above it would be surely optimized away, and with -O0 it would be
much less costly for asan.

[Bug target/91059] [10 regression] gcc.c-torture/execute/builtins/snprintf-chk.c fails on aarch64-elf since r272843

2019-07-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91059

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #2 from Wilco  ---
(In reply to Richard Biener from comment #1)
> Likely target issue - please aarch64 folks investigate first.

I'll have a look, but I bet it's PR78529 again since failures only happen with
LTO and static linking with newlib.

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #15 from Jakub Jelinek  ---
Seems systemd abuses compound literals even in cases where they make no sense,
perhaps one of those in a short function like that is no longer optimized away
completely and that is why it triggers all the __asan_malloc_0 calls in there
where formerly it got away without that.
E.g.
#define assert_cc(expr) \
struct CONCATENATE(_assert_struct_, __COUNTER__) {  \
char x[(expr) ? 0 : -1];\
};
doesn't make any sense to me, why not say
do { extern char CONCATENATE(_assert_var_, __COUNTER__) [(expr) ? 0 : -1]; }
while (0)
instead?
The IN_SET macro has another compound literal:
assert_cc((sizeof((long double[]){__VA_ARGS__})/sizeof(long double)) <= 20);
It would surprise me if you can't do such counting without resorting to
compound literals.

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

Martin Liška  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |INVALID

--- Comment #14 from Martin Liška  ---
Ahh, I've got it. The systemd is built in the configuration without any
optimization level! Please use -O2, that should speed up it significantly.

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #13 from Martin Liška  ---
And the stack difference is:

Before:

;; Function categorize_eol (categorize_eol, funcdef_no=127, decl_uid=8513,
cgraph_uid=127, symbol_order=127)

categorize_eol (char c, ReadLineFlags flags)
{
  _Bool _found;
  EndOfLineMarker D.9001;
  _Bool D.8520;
  _Bool _1;
  EndOfLineMarker _3;
  _Bool _7;
  EndOfLineMarker _9;
  EndOfLineMarker _10;
  EndOfLineMarker _11;
  EndOfLineMarker _12;

   :
  _found_4 = 0;
  if (flags_5(D) == 1)
goto ; [INV]
  else
goto ; [INV]
...

After:

;; Function categorize_eol (categorize_eol, funcdef_no=127, decl_uid=8513,
cgraph_uid=127, symbol_order=127)

categorize_eol (char c, ReadLineFlags flags)
{
  long double D.8516[1] = {1.0e+0}; <--- This stack variable.
  _Bool _found;
  EndOfLineMarker D.9001;
  _Bool D.8520;
  _Bool _1;
  EndOfLineMarker _3;
  _Bool _7;
  EndOfLineMarker _9;
  EndOfLineMarker _10;
  EndOfLineMarker _11;
  EndOfLineMarker _12;
...

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #12 from Martin Liška  ---
So the suspected allocation that happens is:

#0  0x7723abb2 in __asan::FakeStack::Allocate
(real_stack=140737488344072, class_id=0, stack_size_log=20,
this=0x725f7000) at ../../../../libsanitizer/asan/asan_fake_stack.cc:103
#1  __asan::OnMalloc (size=, class_id=0) at
../../../../libsanitizer/asan/asan_fake_stack.cc:208
#2  __asan_stack_malloc_0 (size=) at
../../../../libsanitizer/asan/asan_fake_stack.cc:234
#3  0x76b5e6c5 in categorize_eol (c=120 'x', flags=(unknown: 0)) at
../src/basic/fileio.c:759
#4  0x76b5eb01 in read_line_full (f=0x61603f80, limit=1048576,
flags=(unknown: 0), ret=0x7290b760) at ../src/basic/fileio.c:833
#5  0x76a25be6 in read_line (f=0x61603f80, limit=1048576,
ret=0x7290b760) at ../src/basic/fileio.h:90
#6  0x76a2818f in config_parse (unit=0x0, filename=0x7290b320
"/tmp/test-conf-parser.gVYMCp", f=0x61603f80, sections=0x60b300 "Section",
lookup=0x4020a0 , table=0x7290b2a0,
flags=CONFIG_PARSE_WARN, userdata=0x0)
at ../src/shared/conf-parser.c:309
#7  0x00404967 in test_config_parse (i=15, s=0x409d20
"[Section]\nsetting1=", 'x' ...) at
../src/test/test-conf-parser.c:334
#8  0x00404ef0 in main (argc=1, argv=0x7fffdc58) at
../src/test/test-conf-parser.c:392

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #11 from Martin Liška  ---
If I apply the following patch:

diff --git a/libsanitizer/asan/asan_fake_stack.cc
b/libsanitizer/asan/asan_fake_stack.cc
index 3140f9a2aeb..2034769161e 100644
--- a/libsanitizer/asan/asan_fake_stack.cc
+++ b/libsanitizer/asan/asan_fake_stack.cc
@@ -198,6 +198,9 @@ static FakeStack *GetFakeStackFast() {
 }

 ALWAYS_INLINE uptr OnMalloc(uptr class_id, uptr size) {
+  VReport(1, "T%d: OnMalloc called for size: %d\n",
+  GetCurrentTidOrInvalid(), size);
+
   FakeStack *fs = GetFakeStackFast();
   if (!fs) return 0;
   uptr local_stack;

I see a rapid change of calls of the function from 15381->2127789, where the
change is an allocation 64B:

OnMalloc called for size: 64

[Bug c++/91112] [8 Regression] Bad error message for virtual function of a template class. Wrong "required from here" line number

2019-07-08 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91112

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-07-08
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
Please provide the code, not URLs (as https://gcc.gnu.org/bugs/ requests).

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #10 from Martin Liška  ---
The issue is that __asan_stack_malloc_0 function is very high in perf profile:

# Overhead  Command  Shared Object Symbol   
#   ...   

#
91.97%  test-conf-parse  libasan.so.5.0.0  [.]
__asan_stack_malloc_0
 7.00%  test-conf-parse  libsystemd-shared-242.so  [.] config_parse
 0.12%  test-conf-parse  libsystemd-shared-242.so  [.] read_line_full
 0.11%  test-conf-parse  libc-2.29.so  [.] _IO_getc
 0.10%  test-conf-parse  libc-2.29.so  [.] __strlen_avx2
 0.08%  test-conf-parse  libsystemd-shared-242.so  [.] safe_fgetc
 0.07%  test-conf-parse  libsystemd-shared-242.so  [.] categorize_eol

perf annotate says:
 :   Disassembly of section .text:
 :
 :   00034a00 <__asan_stack_malloc_0>:
 :   __asan_stack_malloc_0():
 : extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
__asan_stack_free_##class_id(  \
 : uptr ptr, uptr size) {  
\
 :   OnFree(ptr, class_id, size);  
\
 : }
...

 :   _ZN6__asan9FakeStack8AllocateEmmm():
 :   GC(real_stack);
0.00 :   34b30:   mov%rbp,%rsi
0.00 :   34b33:   mov%rax,%rdi
0.00 :   34b36:   callq  34800 <__asan::FakeStack::GC(unsigned long)>
0.00 :   34b3b:   jmpq   34a2c <__asan_stack_malloc_0+0x2c>
 : for (int i = 0; i < num_iter; i++) {
0.00 :   34b40:   xor%ecx,%ecx
0.12 :   34b42:   add$0x1,%ecx
   17.35 :   34b45:   cmp%ecx,%r8d
   17.95 :   34b48:   je 34b70 <__asan_stack_malloc_0+0x170>
 :   uptr pos = ModuloNumberOfFrames(stack_size_log,
class_id, hint_position++);
0.00 :   34b4a:   mov%rdx,%rax
0.01 :   34b4d:   add$0x1,%rdx
 :   _ZN6__asan9FakeStack20ModuloNumberOfFramesEmmm():
 :   return n & (NumberOfFrames(stack_size_log, class_id) -
1);
0.01 :   34b51:   and%rsi,%rax
 :   _ZN6__asan9FakeStack8AllocateEmmm():
0.03 :   34b54:   mov%rdx,(%rbx)
 :   if (flags[pos]) continue;
0.10 :   34b57:   lea0x1000(%rbx,%rax,1),%rdi
   31.71 :   34b5f:   cmpb   $0x0,(%rdi)
   32.65 :   34b62:   je 34a72 <__asan_stack_malloc_0+0x72>
0.00 :   34b68:   jmp34b42 <__asan_stack_malloc_0+0x142>
0.00 :   34b6a:   nopw   0x0(%rax,%rax,1)

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #9 from Martin Liška  ---
Started with r259641.

[Bug target/90712] [10 regression] gcc.dg/rtl/aarch64/subs_adds_sp.c fails with ICE

2019-07-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90712

Wilco  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||wilco at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #2 from Wilco  ---
Fixed

[Bug libfortran/91030] Poor performance of I/O -fconvert=big-endian

2019-07-08 Thread jb at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91030

--- Comment #39 from Janne Blomqvist  ---
Now, with the fixed benchmark in the previous comment, on Lustre (version 2.5)
system I get:

Test using 25000 bytes
Block size of file system: 4096
bs =   1024, 53.27 MiB/s
bs =   2048, 73.99 MiB/s
bs =   4096, 222.41 MiB/s
bs =   8192, 351.38 MiB/s
bs =  16384, 483.86 MiB/s
bs =  32768, 583.76 MiB/s
bs =  65536, 677.11 MiB/s
bs = 131072, 748.60 MiB/s
bs = 262144, 700.69 MiB/s
bs = 524288, 811.76 MiB/s
bs =1048576, 1032.99 MiB/s
bs =2097152, 1034.03 MiB/s
bs =4194304, 1063.74 MiB/s
bs =8388608, 1030.15 MiB/s
bs =   16777216, 1084.82 MiB/s
bs =   33554432, 1067.05 MiB/s
bs =   67108864, 1063.79 MiB/s


On the same system, on a NFS filesystem connected with Infiniband I get:

Test using 25000 bytes
Block size of file system: 1048576
bs =   1024, 301.41 MiB/s
bs =   2048, 351.51 MiB/s
bs =   4096, 471.39 MiB/s
bs =   8192, 444.61 MiB/s
bs =  16384, 510.88 MiB/s
bs =  32768, 527.99 MiB/s
bs =  65536, 516.57 MiB/s
bs = 131072, 481.38 MiB/s
bs = 262144, 514.29 MiB/s
bs = 524288, 462.06 MiB/s
bs =1048576, 528.30 MiB/s
bs =2097152, 526.76 MiB/s
bs =4194304, 501.09 MiB/s
bs =8388608, 493.61 MiB/s
bs =   16777216, 550.24 MiB/s
bs =   33554432, 532.20 MiB/s
bs =   67108864, 532.82 MiB/s


So for Lustre, a buffer size bigger than the current 8 kB at least seems
justified.  While Lustre sees improvements all the way to 1 MB buffer size,
such large buffers by default seems a bit excessive.

[Bug libfortran/91030] Poor performance of I/O -fconvert=big-endian

2019-07-08 Thread jb at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91030

--- Comment #38 from Janne Blomqvist  ---
First, I think there's a bug in the benchmark in comment #c20. It writes
blocksize * sizeof(double), but then advances only blocksize for each iteration
of the loop. Fixed version writing just bytes below:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

double walltime (void)
{
  struct timeval TV;
  double elapsed;
  gettimeofday(, NULL);
  elapsed = (double) TV.tv_sec + 1.0e-6*((double) TV.tv_usec);
  return elapsed;
}

#define NAME "out.dat"
#define N 25000

int main()
{
  int fd;
  unsigned char *p, *w;
  long i, size, blocksize, left, to_write;
  int bits;
  double t1, t2;
  struct statvfs buf;

  printf ("Test using %ld bytes\n", (long) N);
  statvfs (".", );
  printf ("Block size of file system: %ld\n", buf.f_bsize);

  p = malloc(N * sizeof (*p));
  for (i=0; i 0)
{
  if (left >= blocksize)
to_write = blocksize;
  else
to_write = left;

  write (fd, w, blocksize);
  w += to_write;
  left -= to_write;
}
  close (fd);
  t2 = walltime ();
  printf ("%.2f MiB/s\n", N / (t2-t1) / 1048576);
}
  free (p);
  unlink (NAME);

  return 0;
}

[Bug c++/91112] New: Bad error message for virtual function of a template class. Wrong "required from here" line number

2019-07-08 Thread ivan.kharpalev at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91112

Bug ID: 91112
   Summary: Bad error message for virtual function of a template
class. Wrong "required from here" line number
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ivan.kharpalev at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/orxMIj
I expect to see number of the line that triggers instantiation.
gcc-7 and clang show it.


P.S.
It even does not show bad method invocation line if it was called via base
class pointer.
https://godbolt.org/z/-SN5n5

It only shows call line for the class itself
https://godbolt.org/z/kmaUQt

[Bug lto/90990] [10 Regression] ICE: error: ‘component_ref’ LHS in clobber statement

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90990

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Richard Biener  ---
Honza installed a patch.

2019-07-02  Jan Hubicka  

* tree-inline.c (remap_gimple_stmt): Do not subtitute handled
components
to clobber of return value.

[Bug target/91059] [10 regression] gcc.c-torture/execute/builtins/snprintf-chk.c fails on aarch64-elf since r272843

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91059

Richard Biener  changed:

   What|Removed |Added

  Component|tree-optimization   |target

--- Comment #1 from Richard Biener  ---
Likely target issue - please aarch64 folks investigate first.

[Bug tree-optimization/83518] [8/9 Regression] Missing optimization: useless instructions should be dropped

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||10.0
 Resolution|--- |FIXED
Summary|[8/9/10 Regression] Missing |[8/9 Regression] Missing
   |optimization: useless   |optimization: useless
   |instructions should be  |instructions should be
   |dropped |dropped
  Known to fail||8.3.0, 9.1.0

--- Comment #11 from Richard Biener  ---
Fixed on trunk, not something for backporting.

[Bug tree-optimization/91108] [8/9/10 Regression] Fails to pun through unions

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91108

--- Comment #3 from Richard Biener  ---
Author: rguenth
Date: Mon Jul  8 11:48:48 2019
New Revision: 273233

URL: https://gcc.gnu.org/viewcvs?rev=273233=gcc=rev
Log:
2019-07-08  Richard Biener  

PR tree-optimization/91108
* tree-ssa-sccvn.c: Include builtins.h.
(vn_reference_lookup_3): Use only alignment constraints to
verify same-valued store disambiguation.

* gcc.dg/tree-ssa/pr91091-1.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-78.c: Likewise.

Added:
branches/gcc-9-branch/gcc/testsuite/gcc.dg/tree-ssa/pr91091-1.c
branches/gcc-9-branch/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-78.c
Modified:
branches/gcc-9-branch/gcc/ChangeLog
branches/gcc-9-branch/gcc/testsuite/ChangeLog
branches/gcc-9-branch/gcc/tree-ssa-sccvn.c

[Bug tree-optimization/91108] [8/9/10 Regression] Fails to pun through unions

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91108

--- Comment #2 from Richard Biener  ---
Author: rguenth
Date: Mon Jul  8 11:46:26 2019
New Revision: 273232

URL: https://gcc.gnu.org/viewcvs?rev=273232=gcc=rev
Log:
2019-07-08  Richard Biener  

PR tree-optimization/91108
* tree-ssa-sccvn.c: Include builtins.h.
(vn_reference_lookup_3): Use only alignment constraints to
verify same-valued store disambiguation.

* gcc.dg/tree-ssa/ssa-fre-61.c: Adjust back.
* gcc.dg/tree-ssa/ssa-fre-78.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-78.c
Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-61.c
trunk/gcc/tree-ssa-sccvn.c

[Bug c/91107] __attribute__((pure)) to function with non-const pointers

2019-07-08 Thread colomar.6.4.3 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91107

--- Comment #2 from Alejandro Colomar  ---
Technically it can modify globals as long as that doesn't affect the state of
the program, but in this case it is affecting the state of the program, so it
isn't a pure function.

Fair enough, then the bug claim is that GCC shouldn't allow functions accepting
non-const pointers.

--- Comment #3 from Alejandro Colomar  ---
Technically it can modify globals as long as that doesn't affect the state of
the program, but in this case it is affecting the state of the program, so it
isn't a pure function.

Fair enough, then the bug claim is that GCC shouldn't allow functions accepting
non-const pointers.

[Bug c/91107] __attribute__((pure)) to function with non-const pointers

2019-07-08 Thread colomar.6.4.3 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91107

--- Comment #2 from Alejandro Colomar  ---
Technically it can modify globals as long as that doesn't affect the state of
the program, but in this case it is affecting the state of the program, so it
isn't a pure function.

Fair enough, then the bug claim is that GCC shouldn't allow functions accepting
non-const pointers.

--- Comment #3 from Alejandro Colomar  ---
Technically it can modify globals as long as that doesn't affect the state of
the program, but in this case it is affecting the state of the program, so it
isn't a pure function.

Fair enough, then the bug claim is that GCC shouldn't allow functions accepting
non-const pointers.

[Bug middle-end/84877] Local stack copy of BLKmode parameter on the stack is not aligned when the requested alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT

2019-07-08 Thread dave.anglin at bell dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877

--- Comment #19 from dave.anglin at bell dot net ---
On 2019-07-07 8:39 p.m., amylaar at gcc dot gnu.org wrote:
> It seems suspicious that PREFERRED_STACK_BOUNDARY is smaller for TARGET_64BIT 
> ?
That's the way HP defined things.  The preferred stack boundary for 32-bit code
was
larger than it needed to be.  Possibly, someone thought that making it cache
aligned
would be good.
>
> Be this as it may, the problem for the 84877 testcase is not that the stack 
> has
> insufficient alignment, but that the stack slot doesn't have an aligned 
> offset.
>
> The alignment gets pruned in function.c:get_stack_local_alignment :
>
>   if (mode == BLKmode)
> alignment = BIGGEST_ALIGNMENT;
>
> I have attached a patch to preserve the alignment of the passed type for the
> case that the stack is already sufficiently aligned.
>
> To test the case where the stack is insufficiently aligned, for hppa we should
> use a different testcase with > 512 bit alignment of the type.

[Bug c++/91110] [10 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in cp_omp_mappable_type_1, at cp/decl2.c:1421

2019-07-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91110

--- Comment #2 from Jakub Jelinek  ---
Created attachment 46579
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46579=edit
gcc10-pr91110.patch

error_mark_node type doesn't have TYPE_MAIN_DECL, but more importantly,
error_mark_node on a type doesn't mean the type is incomplete, it means the
type is invalid, and some diagnostics should have been emitted already why it
is invalid.  So, IMNSHO we shouldn't emit any clarification messages in that
case.

[Bug inline-asm/91111] arm64 Linux kernel panics at boot due to unexpected register assignment in inline asm

2019-07-08 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Target||aarch64
 Status|UNCONFIRMED |NEW
  Known to work||10.0, 9.1.0
   Keywords||wrong-code
   Last reconfirmed||2019-07-08
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
  Known to fail||6.5.0, 7.4.1, 8.3.1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Hmm, I see this using x0 properly on GCC 9.1 and trunk but GCC 8 and earlier
use x1

[Bug c++/91110] [10 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in cp_omp_mappable_type_1, at cp/decl2.c:1421

2019-07-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91110

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-07-08
 CC||jakub at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
   Target Milestone|--- |10.0
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
Most likely caused by r273078.

[Bug target/91102] [9/10 Regression] aarch64 ICE on Linux kernel with -Os starting with r270266

2019-07-08 Thread stefan.kneifel at bluewin dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91102

--- Comment #6 from Stefan Kneifel  ---
It seems to fix the bug - at least the original problem (ICE during compiling
Linux kernel for aarch64 with -Os) is solved by this patch.

[Bug inline-asm/91111] New: arm64 Linux kernel panics at boot due to unexpected register assignment in inline asm

2019-07-08 Thread will.deacon at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

Bug ID: 9
   Summary: arm64 Linux kernel panics at boot due to unexpected
register assignment in inline asm
   Product: gcc
   Version: 8.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: inline-asm
  Assignee: unassigned at gcc dot gnu.org
  Reporter: will.deacon at arm dot com
  Target Milestone: ---

Created attachment 46578
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46578=edit
Output of -save-temps

When compiling the Linux kernel for arm64 with CONFIG_OPTIMIZE_INLINING=y
(which effectively removes the use of __attribute__((__always_inline__)) for
functions marked as inline), the atomic64 selftest fails due to a local
register variable being assigned to a different register from the one specified
when used in an inline asm block.

While I appreciate that we're treading on thin ice here, my reading of the
docs at:

 
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables

suggests that this should work.

To be more precise, this kernel code:


static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
{
register long x0 asm ("x0") = (long)v;

asm volatile(ARM64_LSE_ATOMIC_INSN(
/* LL/SC */
__LL_SC_ATOMIC64(dec_if_positive)
__nops(6),
/* LSE atomics */
"1: ldr x30, %[v]\n"
"   subs%[ret], x30, #1\n"
"   b.lt2f\n"
"   casal   x30, %[ret], %[v]\n"
"   sub x30, x30, #1\n"
"   sub x30, x30, %[ret]\n"
"   cbnzx30, 1b\n"
"2:")
: [ret] "+" (x0), [v] "+Q" (v->counter)
:
: __LL_SC_CLOBBERS, "cc", "memory");

return x0;
}


requires that %[ret] expands to register x0, whereas it is instead expanding to 
register x1. You can see this in the assembly code for the function:


.align  2
.type   arch_atomic64_dec_if_positive, %function
arch_atomic64_dec_if_positive:
.LVL0:
.LFB244:
.file 1 "./arch/arm64/include/asm/atomic_lse.h"
.loc 1 411 1 view -0
.cfi_startproc
.loc 1 412 2 view .LVU1
.loc 1 414 2 view .LVU2
.loc 1 411 1 is_stmt 0 view .LVU3
stp x29, x30, [sp, -16]!
.cfi_def_cfa_offset 16
.cfi_offset 29, -16
.cfi_offset 30, -8
.LVL1:
.loc 1 414 2 view .LVU4
mov x1, x0
.loc 1 411 1 view .LVU5
mov x29, sp
.loc 1 414 2 view .LVU6
#APP
// 414 "./arch/arm64/include/asm/atomic_lse.h" 1
.if 1 == 1
661:
bl  __ll_sc_arch_atomic64_dec_if_positive
.rept   6
nop
.endr

662:
.pushsection .altinstructions,"a"
 .word 661b - .
 .if 0 == 0
 .word 663f - .
 .else
 .word 0- .
 .endif
 .hword 5
 .byte 662b-661b
 .byte 664f-663f
.popsection
 .if 0 == 0
.pushsection .altinstr_replacement, "a"
663:
1:  ldr x30, [x0]
subsx1, x30, #1
b.lt2f
casal   x30, x1, [x0]
sub x30, x30, #1
sub x30, x30, x1
cbnzx30, 1b
2:
664:
.popsection
.org. - (664b-663b) + (662b-661b)
.org. - (662b-661b) + (664b-663b)
.else
663:
664:
.endif
.endif

// 0 "" 2
.LVL2:
.loc 1 414 2 view .LVU7
#NO_APP
mov x0, x1
.LVL3:
.loc 1 431 2 is_stmt 1 view .LVU8
.loc 1 432 1 is_stmt 0 view .LVU9
ldp x29, x30, [sp], 16
.cfi_restore 30
.cfi_restore 29
.cfi_def_cfa_offset 0
ret
.cfi_endproc
.LFE244:
.size   arch_atomic64_dec_if_positive, .-arch_atomic64_dec_if_positive


I've attached the .i/.s files output by:

aarch64-linux-gnu-gcc -save-temps -Wp,-MD,lib/.atomic64_test.o.d  -nostdinc
-isystem
/home/will/system/aarch64/gcc-arm-8.3-2019.03-x86_64-aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/8.3.0/include
-I./arch/arm64/include -I./arch/arm64/include/generated  -I./include
-I./arch/arm64/include/uapi -I./arch/arm64/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-include ./include/linux/compiler_types.h -D__KERNEL__ -mlittle-endian
-DKASAN_SHADOW_SCALE_SHIFT=3 -Wall -Wundef -Werror=strict-prototypes
-Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE
-Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security
-std=gnu89 -mgeneral-regs-only -DCONFIG_AS_LSE=1
-fno-asynchronous-unwind-tables -Wno-psabi -mabi=lp64
-DKASAN_SHADOW_SCALE_SHIFT=3 -fno-delete-null-pointer-checks -Wno-frame-address
-Wno-format-truncation -Wno-format-overflow -O2
--param=allow-store-data-races=0 -Wframe-larger-than=2048
-fstack-protector-strong -Wno-unused-but-set-variable
-Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls
-fno-var-tracking-assignments -g 

[Bug target/91103] AVX512 vector element extract uses more than 1 shuffle instruction; VALIGND can grab any element

2019-07-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103

Jakub Jelinek  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com,
   ||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
For the constant vector element extraction, it can be done say with:
--- gcc/config/i386/sse.md.jj   2019-07-06 23:55:51.617641994 +0200
+++ gcc/config/i386/sse.md  2019-07-08 12:23:13.315509840 +0200
@@ -9351,7 +9351,7 @@ (define_insn "avx512f_sgetexp")])

-(define_insn "_align"
+(define_insn "_align"
   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
 (unspec:VI48_AVX512VL [(match_operand:VI48_AVX512VL 1
"register_operand" "v")
   (match_operand:VI48_AVX512VL 2
"nonimmediate_operand" "vm")
--- gcc/config/i386/i386-expand.c.jj2019-07-04 00:18:37.067010375 +0200
+++ gcc/config/i386/i386-expand.c   2019-07-08 12:37:24.687562956 +0200
@@ -14827,6 +14827,14 @@ ix86_expand_vector_extract (bool mmx_ok,
   break;

 case E_V16SFmode:
+  if (elt > 12)
+   {
+ tmp = gen_reg_rtx (V16SImode);
+ vec = gen_lowpart (V16SImode, vec);
+ emit_insn (gen_avx512f_alignv16si (tmp, vec, vec, GEN_INT (elt)));
+ vec = gen_lowpart (V16SFmode, tmp);
+ elt = 0;
+   }
   tmp = gen_reg_rtx (V8SFmode);
   if (elt < 8)
emit_insn (gen_vec_extract_lo_v16sf (tmp, vec));
@@ -14836,6 +14844,14 @@ ix86_expand_vector_extract (bool mmx_ok,
   return;

 case E_V8DFmode:
+  if (elt >= 6)
+   {
+ tmp = gen_reg_rtx (V8DImode);
+ vec = gen_lowpart (V8DImode, vec);
+ emit_insn (gen_avx512f_alignv8di (tmp, vec, vec, GEN_INT (elt)));
+ vec = gen_lowpart (V8DFmode, tmp);
+ elt = 0;
+   }
   tmp = gen_reg_rtx (V4DFmode);
   if (elt < 4)
emit_insn (gen_vec_extract_lo_v8df (tmp, vec));
@@ -14845,6 +14861,13 @@ ix86_expand_vector_extract (bool mmx_ok,
   return;

 case E_V16SImode:
+  if (elt > 12)
+   {
+ tmp = gen_reg_rtx (V16SImode);
+ emit_insn (gen_avx512f_alignv16si (tmp, vec, vec, GEN_INT (elt)));
+ vec = tmp;
+ elt = 0;
+   }
   tmp = gen_reg_rtx (V8SImode);
   if (elt < 8)
emit_insn (gen_vec_extract_lo_v16si (tmp, vec));
@@ -14854,6 +14877,13 @@ ix86_expand_vector_extract (bool mmx_ok,
   return;

 case E_V8DImode:
+  if (elt >= 6)
+   {
+ tmp = gen_reg_rtx (V8DImode);
+ emit_insn (gen_avx512f_alignv8di (tmp, vec, vec, GEN_INT (elt)));
+ vec = tmp;
+ elt = 0;
+   }
   tmp = gen_reg_rtx (V4DImode);
   if (elt < 4)
emit_insn (gen_vec_extract_lo_v8di (tmp, vec));

The question is in which cases it is beneficial, from pure -Os POV the
valignd/valignq is one instruction and for integer extractions needs a vmovd
afterwards,
so for 64-bit extraction might be also useful for double [3] and [5] (for long
long it is two insns in both cases), for 32-bit extraction
likely also shorter for float [5], [6], [7], [9], [10], [11], [12], but not for
int.
But I admit I have no idea on how fast what is.

[Bug c++/91110] New: [10 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in cp_omp_mappable_type_1, at cp/decl2.c:1421

2019-07-08 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91110

Bug ID: 91110
   Summary: [10 Regression] ICE: tree check: expected class
'type', have 'exceptional' (error_mark) in
cp_omp_mappable_type_1, at cp/decl2.c:1421
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: error-recovery, ice-on-invalid-code, openmp
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---

g++-10.0.0-alpha20190707 snapshot (r273184) ICEs when compiling the following
testcase derived from gcc/testsuite/gcc.dg/gomp/_Atomic-5.c w/ -fopenmp:

void
f1 (void)
{
  X int b[2];
  b[0] = 1;
  #pragma omp target map(to: b)
  ;
}

% g++-10.0.0-alpha20190707 -fopenmp -c e8gxtyxe.c
e8gxtyxe.c: In function 'void f1()':
e8gxtyxe.c:4:3: error: 'X' was not declared in this scope
4 |   X int b[2];
  |   ^
e8gxtyxe.c:5:3: error: 'b' was not declared in this scope
5 |   b[0] = 1;
  |   ^
e8gxtyxe.c:6:30: error: 'b' does not have a mappable type in 'map' clause
6 |   #pragma omp target map(to: b)
  |  ^
e8gxtyxe.c:6:32: internal compiler error: tree check: expected class 'type',
have 'exceptional' (error_mark) in cp_omp_mappable_type_1, at cp/decl2.c:1421
6 |   #pragma omp target map(to: b)
  |^
0x7d125e tree_class_check_failed(tree_node const*, tree_code_class, char
const*, int, char const*)
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/tree.c:9950
0x5f39fa tree_class_check(tree_node*, tree_code_class, char const*, int, char
const*)
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/tree.h:3340
0x5f39fa cp_omp_mappable_type_1
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/decl2.c:1421
0xa43c69 finish_omp_clauses(tree_node*, c_omp_region_type)
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/semantics.c:7241
0x9b3267 cp_parser_omp_all_clauses
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:35735
0x9c4146 cp_parser_omp_target
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:38918
0x99f583 cp_parser_pragma
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:41352
0x9a76fd cp_parser_statement
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:11279
0x9a8665 cp_parser_statement_seq_opt
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:11667
0x9a8735 cp_parser_compound_statement
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:11621
0x9c0cbc cp_parser_function_body
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:22651
0x9c0cbc cp_parser_ctor_initializer_opt_and_function_body
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:22702
0x9c15ad cp_parser_function_definition_after_declarator
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:28016
0x9c23a3 cp_parser_function_definition_from_specifiers_and_declarator
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:27932
0x9c23a3 cp_parser_init_declarator
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:20288
0x9a4e7d cp_parser_simple_declaration
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:13546
0x9c8822 cp_parser_declaration
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:13243
0x9c8eb8 cp_parser_translation_unit
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:4699
0x9c8eb8 c_parse_file()
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/cp/parser.c:41495
0xad25ec c_common_parse_file()
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190707/work/gcc-10-20190707/gcc/c-family/c-opts.c:1160

[Bug target/91106] internal compiler error: output_operand: invalid use of register 'frame'

2019-07-08 Thread gsocshubham at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91106

--- Comment #2 from Shubham Narlawar  ---
(In reply to Richard Biener from comment #1)
> Did you paste the correct reduced testcase?

Here is the original reduced test case obtained from Creduce - 

#pragma pack(1)
struct a {
  int b;
  char c
};
union {
  struct a b
} __attribute__((aligned(32), transparent_union)) d;
e() { f(d); }

I tried to fix warnings by putting semicolon, data type and function
declaration where ever required.

[Bug middle-end/91105] internal compiler error: maximum number of generated reload insns per insn achieved (90)

2019-07-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91105

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |middle-end
 Depends on||91001

--- Comment #2 from Uroš Bizjak  ---
Not a target problem.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91001
[Bug 91001] internal compiler error: in extract_insn, at recog.c:2310

[Bug middle-end/84877] Local stack copy of BLKmode parameter on the stack is not aligned when the requested alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT

2019-07-08 Thread amylaar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877

Jorn Wolfgang Rennecke  changed:

   What|Removed |Added

  Attachment #46574|0   |1
is obsolete||

--- Comment #18 from Jorn Wolfgang Rennecke  ---
Created attachment 46577
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46577=edit
patch for aligned stack - but clamping max alignment at
MAX_SUPPORTED_STACK_ALIGNMENT

(In reply to r...@cebitec.uni-bielefeld.de from comment #17)
> > --- Comment #15 from Jorn Wolfgang Rennecke  ---
> > Created attachment 46574 [details]
> >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46574=edit
> > patch for the case that the stack is sufficiently aligned
> [...]
> > I have attached a patch to preserve the alignment of the passed type for the
> > case that the stack is already sufficiently aligned.
> 
> This patch breaks sparc-sun-solaris2.11 bootstrap with an ICE while
> compiling stage2 function.c:
> 
> during RTL pass: expand
> /vol/gcc/src/hg/trunk/local/gcc/function.c: In function 'void
> assign_parm_find_data_types(assign_parm_data_all*, tree,
> assign_parm_data_one*)':
> /vol/gcc/src/hg/trunk/local/gcc/function.c:2426:49: internal compiler error:

This location doesn't make much sense to me.  Maybe some artefact from
optimized compilation and register windows?

> in assign_stack_temp_for_type, at function.c:880
>  2426 |   else if (targetm.calls.strict_argument_naming (all->args_so_far))
>   |~^~
> 0x11bc22f assign_stack_temp_for_type(machine_mode, poly_int<1u, long long>,
> tree_node*)
>   /vol/gcc/src/hg/trunk/local/gcc/function.c:878
> 0x11bc963 assign_temp(tree_node*, int, int)

This looks like the modified assert there has triggered.  It'd be interesting
to know why - i.e. what variable does want more alignment than
MAX_SUPPORTED_STACK_ALIGNMENT - during bootstrap?  Or is this a BLKmode
variable with less alignment than BIGGEST_ALIGNMENT?
User code could specify silly alignments which we couldn't provide with
ordinary
allocation (using a fixed offset from sp/fp) and which could also blow up the
frame size too much if we tried, so it makes sense to clamp the alignment to
MAX_SUPPORTED_STACK_ALIGNMENT in get_stack_local_alignment.
The other side is that the code in assign_stack_temp_for_type seems to require
BIGGEST_ALIGNMENT for BLKmode; I'm not sure about assign_stack_local_1
slots.  It seems a bit wasteful, but trying to reduce waste of space in the
stack frame is really a different issue, so I also modified the patch to use
at least BIGGEST_ALIGNMENT for BLKmode so that it's (bug-?)compatible in that
aspect with the previous code - see attached modified patch.

[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Target Milestone|--- |10.0

--- Comment #1 from Richard Biener  ---
Can you help and check which test* () call fails?  Also check whether
-fwhole-program instead of -flto makes it fail.  Does it still fail when you
comment
all but the failing test* () call?

[Bug c/91107] __attribute__((pure)) to function with non-const pointers

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91107

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Richard Biener  ---
This function isn't pure.  GCC would optimize

 dest[0] = 0.;
 array_division (n, dest, src1, src2);
 return dest[0];

to return 0.0 since pure functions are assumed to not write to (global) memory.

[Bug target/91106] internal compiler error: output_operand: invalid use of register 'frame'

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91106

--- Comment #1 from Richard Biener  ---
Did you paste the correct reduced testcase?

[Bug c++/66999] Missing comma in lambda capture causes internal compiler error

2019-07-08 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66999

Paolo Carlini  changed:

   What|Removed |Added

 CC||paolo.carlini at oracle dot com

--- Comment #6 from Paolo Carlini  ---
Unfortunately we still issue two errors for the original testcase.

[Bug target/91105] internal compiler error: maximum number of generated reload insns per insn achieved (90)

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91105

Richard Biener  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code, ra
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-07-08
  Component|middle-end  |target
 Ever confirmed|0   |1
  Known to fail||4.8.5, 7.4.0

--- Comment #1 from Richard Biener  ---
Never worked it seems.

[Bug c++/65143] [C++11] missing devirtualization for virtual base in "final" classes

2019-07-08 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65143

Paolo Carlini  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |10.0

--- Comment #11 from Paolo Carlini  ---
Should be completely fixed.

[Bug c++/65143] [C++11] missing devirtualization for virtual base in "final" classes

2019-07-08 Thread paolo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65143

--- Comment #10 from paolo at gcc dot gnu.org  ---
Author: paolo
Date: Mon Jul  8 09:51:07 2019
New Revision: 273228

URL: https://gcc.gnu.org/viewcvs?rev=273228=gcc=rev
Log:
2019-07-08  Paolo Carlini  

PR c++/65143
* g++.dg/tree-ssa/final2.C: New.
* g++.dg/tree-ssa/final3.C: Likewise.

Added:
trunk/gcc/testsuite/g++.dg/tree-ssa/final2.C
trunk/gcc/testsuite/g++.dg/tree-ssa/final3.C
Modified:
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/91109] New: [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135

2019-07-08 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109

Bug ID: 91109
   Summary: [10 regression][arm]
gcc.c-torture/execute/20040709-1.c fails since r273135
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

Hi,

I've noticed that since r273135 (fix for PR91091), there is a regression on
arm-none-linux-gnueabi
--with-mode arm
--with-cpu cortex-a9

FAIL: gcc.c-torture/execute/20040709-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test


There's no such regression on arm-none-linux-gnueabihf or if using --with-mode
thumb

[Bug target/91103] AVX512 vector element extract uses more than 1 shuffle instruction; VALIGND can grab any element

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103

--- Comment #2 from Richard Biener  ---
(In reply to Richard Biener from comment #1)
> So when the vectorizer has the need to use strided stores it would be
> cheapest
> to spill the vector and do N element loads and stores?  I guess we can easily
> get bottle-necked by the load/store op bandwith here?  That is, the
> vectorizer needs
> 
>   for (lane)
> dest[stride * lane] = vector[lane];
> 
> thus store a specific (constant) lane of a vector to memory, for each
> vector lane.  (we could use a scatter store here but only AVX512 has that
> and builing the index vector could be tricky and not supported for all
> element types)

Indeed ICC seems to spill for AVX and AVX512 for

typedef int vsi __attribute__((vector_size(SIZE)));
void foo (vsi v, int *p, int *o)
{
  for (int i = 0; i < sizeof(vsi)/4; ++i)
p[o[i]] = v[i];
}

[Bug target/91103] AVX512 vector element extract uses more than 1 shuffle instruction; VALIGND can grab any element

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
So when the vectorizer has the need to use strided stores it would be cheapest
to spill the vector and do N element loads and stores?  I guess we can easily
get bottle-necked by the load/store op bandwith here?  That is, the
vectorizer needs

  for (lane)
dest[stride * lane] = vector[lane];

thus store a specific (constant) lane of a vector to memory, for each
vector lane.  (we could use a scatter store here but only AVX512 has that
and builing the index vector could be tricky and not supported for all
element types)

[Bug c++/80518] -Wsuggest-override does not warn about missing override on destructor

2019-07-08 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80518

--- Comment #7 from Jonathan Wakely  ---
The guideline might be changing:
https://github.com/isocpp/CppCoreGuidelines/pull/1448
If that pull request is merged we might want to change -Wsuggest-override too,
without needing a separate option.

[Bug target/91060] [10 regression] gcc.c-torture/execute/scal-to-vec1.c fails on armeb-none-linux-gnueabihf since r272843

2019-07-08 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060

--- Comment #14 from rsandifo at gcc dot gnu.org  
---
Created attachment 46576
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46576=edit
Candidate patch

I'll test the attached overnight

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |ASSIGNED
  Known to work||8.3.1
  Known to fail||9.1.0

--- Comment #8 from Martin Liška  ---
(In reply to Frantisek Sumsal from comment #7)
> (In reply to Martin Liška from comment #6)
> 
> > Do you know how to tell meson to use CC=gcc-8?
> > 
> 
> $ export CC=gcc-8 CXX=g++-8
> $ meson build ...
> 
> should suffice

Great, now I can confirm that!

[Bug target/91060] [10 regression] gcc.c-torture/execute/scal-to-vec1.c fails on armeb-none-linux-gnueabihf since r272843

2019-07-08 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060

--- Comment #13 from Christophe Lyon  ---
Indeed, this seems to work:

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 820502a..4f69122 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12471,7 +12471,7 @@ neon_expand_vector_init (rtx target, rtx vals)
   if (n_var == 1)
 {
   rtx copy = copy_rtx (vals);
-  rtx index = GEN_INT (one_var);
+  rtx index = GEN_INT (1 << one_var);

   /* Load constant part of vector, substitute neighboring value for
 varying element.  */
@@ -12483,31 +12483,40 @@ neon_expand_vector_init (rtx target, rtx vals)
   switch (mode)
{
case E_V8QImode:
- emit_insn (gen_neon_vset_lanev8qi (target, x, target, index));
+ emit_insn (gen_vec_setv8qi_internal (target, x, index, target));
  break;
case E_V16QImode:
- emit_insn (gen_neon_vset_lanev16qi (target, x, target, index));
+ emit_insn (gen_vec_setv16qi_internal (target, x, index, target));
  break;
case E_V4HImode:
- emit_insn (gen_neon_vset_lanev4hi (target, x, target, index));
+ emit_insn (gen_vec_setv4hi_internal (target, x, index, target));
  break;
case E_V8HImode:
- emit_insn (gen_neon_vset_lanev8hi (target, x, target, index));
+ emit_insn (gen_vec_setv8hi_internal (target, x, index, target));
  break;
case E_V2SImode:
- emit_insn (gen_neon_vset_lanev2si (target, x, target, index));
+ emit_insn (gen_vec_setv2si_internal (target, x, index, target));
  break;
case E_V4SImode:
- emit_insn (gen_neon_vset_lanev4si (target, x, target, index));
+ emit_insn (gen_vec_setv4si_internal (target, x, index, target));
  break;
case E_V2SFmode:
- emit_insn (gen_neon_vset_lanev2sf (target, x, target, index));
+ emit_insn (gen_vec_setv2sf_internal (target, x, index, target));
  break;
case E_V4SFmode:
- emit_insn (gen_neon_vset_lanev4sf (target, x, target, index));
+ emit_insn (gen_vec_setv4sf_internal (target, x, index, target));
  break;
case E_V2DImode:
- emit_insn (gen_neon_vset_lanev2di (target, x, target, index));
+ emit_insn (gen_vec_setv2di_internal (target, x, index, target));
  break;

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread frantisek at sumsal dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #7 from Frantisek Sumsal  ---
(In reply to Martin Liška from comment #6)

> Do you know how to tell meson to use CC=gcc-8?
> 

$ export CC=gcc-8 CXX=g++-8
$ meson build ...

should suffice

[Bug tree-optimization/91108] [8/9/10 Regression] Fails to pun through unions

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91108

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
 Status|UNCONFIRMED |ASSIGNED
  Known to work||7.4.0
   Keywords||alias, wrong-code
   Last reconfirmed||2019-07-08
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Target Milestone|--- |8.4

--- Comment #1 from Richard Biener  ---
Mine.

[Bug tree-optimization/91108] New: [8/9/10 Regression] Fails to pun through unions

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91108

Bug ID: 91108
   Summary: [8/9/10 Regression] Fails to pun through unions
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

The following testcase fails to support our promise for punning through
union members if the access happens through the union.

/* { dg-do run } */
/* { dg-options "-O3 -fstrict-aliasing" } */

union U {
  struct A { int : 2; int x : 8; } a;
  struct B { int : 6; int x : 8; } b;
};

int __attribute__((noipa))
foo (union U *p, union U *q)
{
  p->a.x = 1;
  q->b.x = 1;
  return p->a.x;
}

int
main()
{
  union U x;
  if (foo (, ) != x.a.x)
__builtin_abort ();
  return 0;
}

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #6 from Martin Liška  ---
(In reply to Frantisek Sumsal from comment #5)
> (In reply to Martin Liška from comment #4)
> > Ok, I was able to make the build:
> > 
> > $ meson build -Db_sanitize=address,undefined -Dxkbcommon=false
> > 
> > with GCC 9.1.1:
> > 
> > real0m2.176s
> > user0m2.013s
> > sys 0m0.160s
> > 
> > which is probably fast enough. And I can't run the second test-case:
> > 
> 
> Yes, without any ASAN_OPTIONS the built binary behaves as "expected:
> 
> ---
> 
> $ unset ASAN_OPTIONS
> $ time build-gcc-9.1.0-sanitizers/test-conf-parser
> <...snip...>
> = test_config_parse[16] ==
> /tmp/test-conf-parser.cvqFVQ:1: Continuation line too long
> 
> real  0m2.972s
> user  0m2.680s
> sys   0m0.280s
> 
> ---
> 
> The real issue arises with ASAN_OPTIONS=detect_stack_use_after_return=1

Ahh, got it. Now it's really much slower.

Do you know how to tell meson to use CC=gcc-8?

> 
> ---
> 
> $ export ASAN_OPTIONS=detect_stack_use_after_return=1
> $ time build-gcc-9.1.0-sanitizers/test-conf-parser
> <...snip...>
> == test_config_parse[16] ==
> /tmp/test-conf-parser.WhLgS1:1: Continuation line too long
> 
> real  0m29.637s
> user  0m29.321s
> sys   0m0.298s
> 
> ---
> 
> 
> > $ ./test/hwdb-test.sh
> > ./systemd-hwdb does not exist, please build first
> 
> For this particular case you have to cd into the build directory first (cd
> build && ../test/hwdb-test.sh)

Good.

[Bug target/91060] [10 regression] gcc.c-torture/execute/scal-to-vec1.c fails on armeb-none-linux-gnueabihf since r272843

2019-07-08 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060

--- Comment #12 from rguenther at suse dot de  ---
On Mon, 8 Jul 2019, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060
> 
> rsandifo at gcc dot gnu.org  changed:
> 
>What|Removed |Added
> 
>   Component|middle-end  |target
>Assignee|rguenth at gcc dot gnu.org |rsandifo at gcc dot 
> gnu.org
> 
> --- Comment #11 from rsandifo at gcc dot gnu.org  gnu.org> ---
> (In reply to rguent...@suse.de from comment #10)
> > On Mon, 8 Jul 2019, clyon at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060
> > > 
> > > --- Comment #8 from Christophe Lyon  ---
> > > (In reply to Richard Biener from comment #5)
> > > > Hmm, using a cross configured as
> > > > 
> > > > trunk/configure --target=armeb-none-linux-gnueabihf --with-cpu=cortex-a9
> > > > --with-fpu=neon-fp16 --enable-languages=c
> > > > 
> > > > and trimming the testcase to the first line I cannot reproduce the 
> > > > reported
> > > > assembly.  I get at -O3
> > > > 
> > > > .arm
> > > > .fpu softvfp
> > > 
> > > For some reason, you are not targeting the right FPU, I have:
> > > .arm
> > > .fpu neon-fp16
> > 
> > I noticed that - but it doesn't change even when supplying
> >  -mpfu=neon-fp16 -mcpu=cortex-a9
> > 
> > I suppose some configure-time checking disables this feature somehow
> > without notifying me :/  (don't have a armeb assembler installed,
> > trying a pure cc1 cross)
> > 
> > Anyway, I can't reproduce even after spending 1+ hours on this.
> 
> Yeah, I see the same thing building it that way.  I needed to restate
> the abi using -mfloat-abi=hard.

Even when adding -mfloat-abi=hard I see .fpu softvfp ...

> I'm pretty sure it's a target bug though.  If a vector constructor
> has a single nonconstant element, neon_expand_vector_init uses the
> neon_vset_lane* patterns to set that index.  But neon_vset_lane*
> use the architecture lane numbering while neon_expand_vector_init
> uses GCC lane numbering.  Using the vec_set(_internal) patterns
> should fix that.

I'm out of here then ;)

[Bug middle-end/84877] Local stack copy of BLKmode parameter on the stack is not aligned when the requested alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT

2019-07-08 Thread ro at CeBiTec dot Uni-Bielefeld.DE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877

--- Comment #17 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #15 from Jorn Wolfgang Rennecke  ---
> Created attachment 46574
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46574=edit
> patch for the case that the stack is sufficiently aligned
[...]
> I have attached a patch to preserve the alignment of the passed type for the
> case that the stack is already sufficiently aligned.

This patch breaks sparc-sun-solaris2.11 bootstrap with an ICE while
compiling stage2 function.c:

during RTL pass: expand
/vol/gcc/src/hg/trunk/local/gcc/function.c: In function 'void
assign_parm_find_data_types(assign_parm_data_all*, tree,
assign_parm_data_one*)':
/vol/gcc/src/hg/trunk/local/gcc/function.c:2426:49: internal compiler error: in
assign_stack_temp_for_type, at function.c:880
 2426 |   else if (targetm.calls.strict_argument_naming (all->args_so_far))
  |~^~
0x11bc22f assign_stack_temp_for_type(machine_mode, poly_int<1u, long long>,
tree_node*)
/vol/gcc/src/hg/trunk/local/gcc/function.c:878
0x11bc963 assign_temp(tree_node*, int, int)
/vol/gcc/src/hg/trunk/local/gcc/function.c:1016
0xeab99b initialize_argument_information
/vol/gcc/src/hg/trunk/local/gcc/calls.c:2087
0xeb1957 expand_call(tree_node*, rtx_def*, int)
/vol/gcc/src/hg/trunk/local/gcc/calls.c:3605
0x112a247 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/vol/gcc/src/hg/trunk/local/gcc/expr.c:11044
0x111919b expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
/vol/gcc/src/hg/trunk/local/gcc/expr.c:8286
0x110a06b store_expr(tree_node*, rtx_def*, int, bool, bool)
/vol/gcc/src/hg/trunk/local/gcc/expr.c:5685
0x11085bf expand_assignment(tree_node*, tree_node*, bool)
/vol/gcc/src/hg/trunk/local/gcc/expr.c:5447
0xed8cb3 expand_call_stmt
/vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:2727
0xedd453 expand_gimple_stmt_1
/vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:3708
0xedde3b expand_gimple_stmt
/vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:3867
0xee937f expand_gimple_basic_block
/vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:5907
0xeeb7c3 execute
/vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:6530

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread frantisek at sumsal dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

--- Comment #5 from Frantisek Sumsal  ---
(In reply to Martin Liška from comment #4)
> Ok, I was able to make the build:
> 
> $ meson build -Db_sanitize=address,undefined -Dxkbcommon=false
> 
> with GCC 9.1.1:
> 
> real  0m2.176s
> user  0m2.013s
> sys   0m0.160s
> 
> which is probably fast enough. And I can't run the second test-case:
> 

Yes, without any ASAN_OPTIONS the built binary behaves as "expected:

---

$ unset ASAN_OPTIONS
$ time build-gcc-9.1.0-sanitizers/test-conf-parser
<...snip...>
= test_config_parse[16] ==
/tmp/test-conf-parser.cvqFVQ:1: Continuation line too long

real0m2.972s
user0m2.680s
sys 0m0.280s

---

The real issue arises with ASAN_OPTIONS=detect_stack_use_after_return=1

---

$ export ASAN_OPTIONS=detect_stack_use_after_return=1
$ time build-gcc-9.1.0-sanitizers/test-conf-parser
<...snip...>
== test_config_parse[16] ==
/tmp/test-conf-parser.WhLgS1:1: Continuation line too long

real0m29.637s
user0m29.321s
sys 0m0.298s

---


> $ ./test/hwdb-test.sh
> ./systemd-hwdb does not exist, please build first

For this particular case you have to cd into the build directory first (cd
build && ../test/hwdb-test.sh)

[Bug sanitizer/91101] Possible performance regression in libasan with detect_stack_use_after_return=1

2019-07-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91101

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-07-08
 Ever confirmed|0   |1

--- Comment #4 from Martin Liška  ---
Ok, I was able to make the build:

$ meson build -Db_sanitize=address,undefined -Dxkbcommon=false

with GCC 9.1.1:

$ time ./build/test-conf-parser
filename:1: lvalue= path is not absolute, ignoring: not_absolute/path
filename:1: String is not UTF-8 clean, ignoring assignment: /path/�
filename:1: Failed to parse log level, ignoring: garbage
filename:1: Failed to parse log facility, ignoring: garbage
filename:1: Failed to parse size value '-982', ignoring: Numerical result out
of range
filename:1: Failed to parse size value '498719873987300G', ignoring:
Numerical result out of range
filename:1: Failed to parse size value 'garbage', ignoring: Invalid argument
filename:1: Failed to parse size value '-982', ignoring: Numerical result out
of range
filename:1: Failed to parse size value '498719873987300G', ignoring:
Numerical result out of range
filename:1: Failed to parse size value 'garbage', ignoring: Invalid argument
filename:1: Failed to parse int value, ignoring:

filename:1: Failed to parse int value, ignoring:
-
filename:1: Failed to parse int value, ignoring: 1G
filename:1: Failed to parse int value, ignoring: garbage
filename:1: Failed to parse unsigned value, ignoring:

filename:1: Failed to parse unsigned value, ignoring: 1G
filename:1: Failed to parse unsigned value, ignoring: garbage
filename:1: Failed to parse unsigned value, ignoring: 1000garbage
filename:1: Failed to parse mode value, ignoring: -777
filename:1: Failed to parse mode value, ignoring: 999
filename:1: Failed to parse mode value, ignoring: garbage
filename:1: Failed to parse mode value, ignoring: 777garbage
filename:1: Failed to parse mode value, ignoring: 777 garbage
filename:1: Failed to parse sec value, ignoring: -1
filename:1: Failed to parse sec value, ignoring: 10foo
filename:1: Failed to parse sec value, ignoring: garbage
filename:1: Failed to parse nsec value, ignoring: -1
filename:1: Failed to parse nsec value, ignoring: 10foo
filename:1: Failed to parse nsec value, ignoring: garbage
== test_config_parse[0] ==
== test_config_parse[1] ==
== test_config_parse[2] ==
== test_config_parse[3] ==
== test_config_parse[4] ==
== test_config_parse[5] ==
== test_config_parse[6] ==
== test_config_parse[7] ==
== test_config_parse[8] ==
== test_config_parse[9] ==
== test_config_parse[10] ==
== test_config_parse[11] ==
== test_config_parse[12] ==
== test_config_parse[13] ==
== test_config_parse[14] ==
== test_config_parse[15] ==
/tmp/test-conf-parser.l7EgI7:1: Line too long
== test_config_parse[16] ==
/tmp/test-conf-parser.2Fj9TE:1: Continuation line too long

real0m2.176s
user0m2.013s
sys 0m0.160s

which is probably fast enough. And I can't run the second test-case:

$ ./test/hwdb-test.sh
./systemd-hwdb does not exist, please build first

[Bug target/91060] [10 regression] gcc.c-torture/execute/scal-to-vec1.c fails on armeb-none-linux-gnueabihf since r272843

2019-07-08 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

  Component|middle-end  |target
   Assignee|rguenth at gcc dot gnu.org |rsandifo at gcc dot 
gnu.org

--- Comment #11 from rsandifo at gcc dot gnu.org  
---
(In reply to rguent...@suse.de from comment #10)
> On Mon, 8 Jul 2019, clyon at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060
> > 
> > --- Comment #8 from Christophe Lyon  ---
> > (In reply to Richard Biener from comment #5)
> > > Hmm, using a cross configured as
> > > 
> > > trunk/configure --target=armeb-none-linux-gnueabihf --with-cpu=cortex-a9
> > > --with-fpu=neon-fp16 --enable-languages=c
> > > 
> > > and trimming the testcase to the first line I cannot reproduce the 
> > > reported
> > > assembly.  I get at -O3
> > > 
> > > .arm
> > > .fpu softvfp
> > 
> > For some reason, you are not targeting the right FPU, I have:
> > .arm
> > .fpu neon-fp16
> 
> I noticed that - but it doesn't change even when supplying
>  -mpfu=neon-fp16 -mcpu=cortex-a9
> 
> I suppose some configure-time checking disables this feature somehow
> without notifying me :/  (don't have a armeb assembler installed,
> trying a pure cc1 cross)
> 
> Anyway, I can't reproduce even after spending 1+ hours on this.

Yeah, I see the same thing building it that way.  I needed to restate
the abi using -mfloat-abi=hard.

I'm pretty sure it's a target bug though.  If a vector constructor
has a single nonconstant element, neon_expand_vector_init uses the
neon_vset_lane* patterns to set that index.  But neon_vset_lane*
use the architecture lane numbering while neon_expand_vector_init
uses GCC lane numbering.  Using the vec_set(_internal) patterns
should fix that.

[Bug tree-optimization/83518] [8/9/10 Regression] Missing optimization: useless instructions should be dropped

2019-07-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

--- Comment #10 from Richard Biener  ---
Author: rguenth
Date: Mon Jul  8 07:09:24 2019
New Revision: 273194

URL: https://gcc.gnu.org/viewcvs?rev=273194=gcc=rev
Log:
2019-07-08  Richard Biener  

PR tree-optimization/83518
* tree-ssa-sccvn.c: Include splay-tree.h.
(struct pd_range, struct pd_data): New.
(struct vn_walk_cb_data): Add data to track partial definitions.
(vn_walk_cb_data::~vn_walk_cb_data): New.
(vn_walk_cb_data::push_partial_def): New.
(pd_tree_alloc, pd_tree_dealloc, pd_range_compare): New.
(vn_reference_lookup_2): When partial defs are registered give up.
(vn_reference_lookup_3): Track partial defs for memset and
constructor zeroing and for defs from constants.

* gcc.dg/tree-ssa/ssa-fre-73.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-74.c: Likewise.
* gcc.dg/tree-ssa/ssa-fre-75.c: Likewise.
* gcc.dg/tree-ssa/ssa-fre-76.c: Likewise.
* g++.dg/tree-ssa/pr83518.C: Likewise.

Added:
trunk/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-73.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-74.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-75.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-76.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-sccvn.c

[Bug middle-end/91060] [10 regression] gcc.c-torture/execute/scal-to-vec1.c fails on armeb-none-linux-gnueabihf since r272843

2019-07-08 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060

--- Comment #10 from rguenther at suse dot de  ---
On Mon, 8 Jul 2019, clyon at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060
> 
> --- Comment #8 from Christophe Lyon  ---
> (In reply to Richard Biener from comment #5)
> > Hmm, using a cross configured as
> > 
> > trunk/configure --target=armeb-none-linux-gnueabihf --with-cpu=cortex-a9
> > --with-fpu=neon-fp16 --enable-languages=c
> > 
> > and trimming the testcase to the first line I cannot reproduce the reported
> > assembly.  I get at -O3
> > 
> > .arm
> > .fpu softvfp
> 
> For some reason, you are not targeting the right FPU, I have:
> .arm
> .fpu neon-fp16

I noticed that - but it doesn't change even when supplying
 -mpfu=neon-fp16 -mcpu=cortex-a9

I suppose some configure-time checking disables this feature somehow
without notifying me :/  (don't have a armeb assembler installed,
trying a pure cc1 cross)

Anyway, I can't reproduce even after spending 1+ hours on this.

[Bug middle-end/91060] [10 regression] gcc.c-torture/execute/scal-to-vec1.c fails on armeb-none-linux-gnueabihf since r272843

2019-07-08 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060

--- Comment #9 from rsandifo at gcc dot gnu.org  
---
(In reply to rsand...@gcc.gnu.org from comment #7)
> (In reply to Christophe Lyon from comment #4)
> > Unfortunately, it's still failing as of r273133.
> > 
> > It fails at the very first check:
> > v1 = 2 + v0;   check (short, 8, v0, v1, 2, +, l);
> > 
> > The generated code for main is:
> > main:
> ...
> > vmov.16 d16[0], r0
> > sxthr1, r0
> > vadd.i16q0, q8, q9
> > add ip, r1, #2
> > vmov.s16r2, d0[3]
> 
> Yeah, this looks wrong.  We should be adding 2 to a single element
> here, but we're extracting from one index and inserting into another.
> The first quoted instruction should be using [3] as well.
> 
> I'd be unsurprised if this was a target bug.

Er, pretend that message never happened, first thing Monday morning :-)

[Bug middle-end/91060] [10 regression] gcc.c-torture/execute/scal-to-vec1.c fails on armeb-none-linux-gnueabihf since r272843

2019-07-08 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060

--- Comment #8 from Christophe Lyon  ---
(In reply to Richard Biener from comment #5)
> Hmm, using a cross configured as
> 
> trunk/configure --target=armeb-none-linux-gnueabihf --with-cpu=cortex-a9
> --with-fpu=neon-fp16 --enable-languages=c
> 
> and trimming the testcase to the first line I cannot reproduce the reported
> assembly.  I get at -O3
> 
> .arm
> .fpu softvfp

For some reason, you are not targeting the right FPU, I have:
.arm
.fpu neon-fp16

[Bug middle-end/91060] [10 regression] gcc.c-torture/execute/scal-to-vec1.c fails on armeb-none-linux-gnueabihf since r272843

2019-07-08 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91060

--- Comment #7 from rsandifo at gcc dot gnu.org  
---
(In reply to Christophe Lyon from comment #4)
> Unfortunately, it's still failing as of r273133.
> 
> It fails at the very first check:
> v1 = 2 + v0;   check (short, 8, v0, v1, 2, +, l);
> 
> The generated code for main is:
> main:
...
> vmov.16 d16[0], r0
> sxthr1, r0
> vadd.i16q0, q8, q9
> add ip, r1, #2
> vmov.s16r2, d0[3]

Yeah, this looks wrong.  We should be adding 2 to a single element
here, but we're extracting from one index and inserting into another.
The first quoted instruction should be using [3] as well.

I'd be unsurprised if this was a target bug.

[Bug c++/85746] Premature evaluation of __builtin_constant_p?

2019-07-08 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85746

--- Comment #8 from rsandifo at gcc dot gnu.org  
---
(In reply to Marc Glisse from comment #7)
> (In reply to Marc Glisse from comment #6)
> >  && xi.val[0] <= (HOST_WIDE_INT) ((unsigned HOST_WIDE_INT)
> >   HOST_WIDE_INT_MAX >> shift))
> 
> The issue occurs with xi.val[0] == -9223372036854775808 (lshift_large
> returns a result of length 2 for that). I don't know if the code mishandles
> this case, or if such a number is not supposed to exist in the first place,
> but that does seem like a bug.

Yeah, looks like this should have been an unsigned HOST_WIDE_INT comparison
instead, i.e. casting xi.val[0] rather than the shift result.