[Bug c++/104084] [12 regression] Internal compiler error: tree check: expected target_expr, have compound_expr in build_new_1

2022-01-18 Thread linux at carewolf dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104084

--- Comment #4 from Allan Jensen  ---
Created attachment 52217
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52217=edit
-E output

[Bug c++/104084] [12 regression] Internal compiler error: tree check: expected target_expr, have compound_expr in build_new_1

2022-01-18 Thread linux at carewolf dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104084

--- Comment #3 from Allan Jensen  ---
-v output:

Using built-in specs.
COLLECT_GCC=/opt/gcc/bin/g++-12
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --prefix=/opt/gcc
--program-suffix=-12
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.1 20220117 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-MMD' '-MF'
'obj/third_party/libgav1/libgav1/loop_restoration_info.o.d' '-D' 'USE_UDEV'
'-D' 'USE_AURA=1' '-D' 'USE_NSS_CERTS=1' '-D' 'USE_OZONE=1' '-D'
'OFFICIAL_BUILD' '-D' 'TOOLKIT_QT' '-D' '_FILE_OFFSET_BITS=64' '-D'
'_LARGEFILE_SOURCE' '-D' '_LARGEFILE64_SOURCE' '-D' 'NO_UNWIND_TABLES' '-D'
'NDEBUG' '-D' 'NVALGRIND' '-D' 'DYNAMIC_ANNOTATIONS_ENABLED=0' '-D'
'LIBGAV1_MAX_BITDEPTH=10' '-D' 'LIBGAV1_THREADPOOL_USE_STD_MUTEX' '-D'
'LIBGAV1_ENABLE_LOGGING=0' '-D' 'LIBGAV1_PUBLIC=' '-I' 'gen' '-I'
'../../../../../../qtwebengine/src/3rdparty/chromium' '-I'
'../../../../../../qtwebengine/src/3rdparty/chromium/third_party/libgav1/src'
'-I'
'../../../../../../qtwebengine/src/3rdparty/chromium/third_party/libgav1/src/src'
'-fno-ident' '-fno-strict-aliasing' '--param=ssp-buffer-size=4'
'-fstack-protector' '-Wno-unknown-pragmas' '-Wno-parentheses'
'-Wno-sign-compare' '-Wstringop-overflow=0' '-Wno-stringop-overread'
'-Wno-psabi' '-Wno-multichar' '-Wno-format-zero-length' '-fno-unwind-tables'
'-fno-asynchronous-unwind-tables' '-fPIC' '-pipe' '-pthread' '-m64' '-O2'
'-fno-omit-frame-pointer' '-g1' '-fvisibility=hidden'
'-Wno-unused-local-typedefs' '-Wno-maybe-uninitialized'
'-Wno-deprecated-declarations' '-fno-delete-null-pointer-checks' '-Wno-comment'
'-Wno-packed-not-aligned' '-Wno-dangling-else'
'-Wno-missing-field-initializers' '-Wno-unused-parameter' '-O2'
'-fdata-sections' '-ffunction-sections' '-std=gnu++14'
'-fvisibility-inlines-hidden' '-Wno-narrowing' '-Wno-attributes'
'-Wno-class-memaccess' '-Wno-subobject-linkage' '-Wno-invalid-offsetof'
'-Wno-return-type' '-Wno-deprecated-copy' '-c' '-o'
'obj/third_party/libgav1/libgav1/loop_restoration_info.o' '-v' '-shared-libgcc'
'-mtune=generic' '-march=x86-64' '-dumpdir' 'obj/third_party/libgav1/libgav1/'
 /opt/gcc/libexec/gcc/x86_64-pc-linux-gnu/12.0.1/cc1plus -quiet -v -I gen -I
../../../../../../qtwebengine/src/3rdparty/chromium -I
../../../../../../qtwebengine/src/3rdparty/chromium/third_party/libgav1/src -I
../../../../../../qtwebengine/src/3rdparty/chromium/third_party/libgav1/src/src
-imultiarch x86_64-linux-gnu -MMD
obj/third_party/libgav1/libgav1/loop_restoration_info.d -MF
obj/third_party/libgav1/libgav1/loop_restoration_info.o.d -MQ
obj/third_party/libgav1/libgav1/loop_restoration_info.o -D_GNU_SOURCE
-D_REENTRANT -D USE_UDEV -D USE_AURA=1 -D USE_NSS_CERTS=1 -D USE_OZONE=1 -D
OFFICIAL_BUILD -D TOOLKIT_QT -D _FILE_OFFSET_BITS=64 -D _LARGEFILE_SOURCE -D
_LARGEFILE64_SOURCE -D NO_UNWIND_TABLES -D NDEBUG -D NVALGRIND -D
DYNAMIC_ANNOTATIONS_ENABLED=0 -D LIBGAV1_MAX_BITDEPTH=10 -D
LIBGAV1_THREADPOOL_USE_STD_MUTEX -D LIBGAV1_ENABLE_LOGGING=0 -D LIBGAV1_PUBLIC=
../../../../../../qtwebengine/src/3rdparty/chromium/third_party/libgav1/src/src/loop_restoration_info.cc
-quiet -dumpdir obj/third_party/libgav1/libgav1/ -dumpbase
loop_restoration_info.cc -dumpbase-ext .cc -m64 -mtune=generic -march=x86-64
-g1 -O2 -O2 -Wno-unknown-pragmas -Wno-parentheses -Wno-sign-compare
-Wstringop-overflow=0 -Wno-stringop-overread -Wno-psabi -Wno-multichar
-Wno-format-zero-length -Wno-unused-local-typedefs -Wno-maybe-uninitialized
-Wno-deprecated-declarations -Wno-comment -Wno-packed-not-aligned
-Wno-dangling-else -Wno-missing-field-initializers -Wno-unused-parameter
-Wno-narrowing -Wno-attributes -Wno-class-memaccess -Wno-subobject-linkage
-Wno-invalid-offsetof -Wno-return-type -Wno-deprecated-copy -std=gnu++14
-version -fno-ident -fno-strict-aliasing -fstack-protector -fno-unwind-tables
-fno-asynchronous-unwind-tables -fPIC -fno-omit-frame-pointer
-fvisibility=hidden -fno-delete-null-pointer-checks -fdata-sections
-ffunction-sections -fvisibility-inlines-hidden --param=ssp-buffer-size=4 -o -
|
 as -v -I gen -I ../../../../../../qtwebengine/src/3rdparty/chromium -I
../../../../../../qtwebengine/src/3rdparty/chromium/third_party/libgav1/src -I
../../../../../../qtwebengine/src/3rdparty/chromium/third_party/libgav1/src/src
--gdwarf-5 --64 -o obj/third_party/libgav1/libgav1/loop_restoration_info.o
GNU assembler version 2.36.1 (x86_64-linux-gnu) using BFD version (GNU Binutils
for Ubuntu) 2.36.1
GNU C++14 (GCC) version 12.0.1 20220117 (experimental) (x86_64-pc-linux-gnu)
compiled by GNU C version 12.0.1 20220117 (experimental), GMP version
6.2.1, MPFR version 4.1.0, MPC version 1.2.0, isl version isl-0.23-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory
"/opt/gcc/lib/gcc/x86_64-pc-linux-gnu/12.0.1/../../../../x86_64-pc-linux-gnu/include"

[Bug c++/104084] [12 regression] Internal compiler error: tree check: expected target_expr, have compound_expr in build_new_1

2022-01-18 Thread linux at carewolf dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104084

--- Comment #2 from Allan Jensen  ---
Removing the (std::nothrow), and declaring the untagged new operator (instead
of declaring them deleted), seems to work around the issue.

[Bug c++/104084] New: [12 regression] Internal compiler error: tree check: expected target_expr, have compound_expr in build_new_1

2022-01-18 Thread linux at carewolf dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104084

Bug ID: 104084
   Summary: [12 regression] Internal compiler error: tree check:
expected target_expr, have compound_expr in
build_new_1
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Another error encounted while compiling Qt with gcc 12. This time in libgav1
(used by Chromium).

../../../../../../qtwebengine/src/3rdparty/chromium/third_party/libgav1/src/src/utils/dynamic_buffer.h:40:19:
internal compiler error: tree check: expected target_expr, have compound_expr
in build_new_1, at cp/init.c:3792
   40 | buffer_.reset(new (std::nothrow) T[size]);
  |   ^~
0x8a12eb tree_check_failed(tree_node const*, char const*, int, char const*,
...)
../../gcc/tree.c:8702
0x6f06b9 tree_operand_check_code(tree_node*, tree_code, int, char const*, int,
char const*)
../../gcc/tree.h:3950
0x6f06b9 build_new_1
../../gcc/cp/init.c:3792
0xa5c8f1 build_new(unsigned int, vec**,
tree_node*, tree_node*, vec**, int, int)
../../gcc/cp/init.c:4002
0xb4a6ad tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../gcc/cp/pt.c:20387
0xb750ca tsubst_copy_and_build_call_args
../../gcc/cp/pt.c:19761
0xb48c88 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../gcc/cp/pt.c:20508
0xb5d33f tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../gcc/cp/pt.c:19316
0xb5ed6b tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../gcc/cp/pt.c:18329
0xb5e484 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../gcc/cp/pt.c:18301
0xb5e4ec tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../gcc/cp/pt.c:18658
0xb5c048 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../gcc/cp/pt.c:18287
0xb5c048 instantiate_body
../../gcc/cp/pt.c:26239
0xb5d080 instantiate_decl(tree_node*, bool, bool)
../../gcc/cp/pt.c:26532
0xb813d3 instantiate_pending_templates(int)
../../gcc/cp/pt.c:26611
0xa3ba28 c_parse_final_cleanups()
../../gcc/cp/decl2.c:5097


Disabling optimizations or using different C++ standards, or fuzzing other
compiler flags didn't seem to help.

Let me know if you need the intermediate code.

[Bug c++/104078] New: Some type determination weirdness

2022-01-17 Thread linux at carewolf dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104078

Bug ID: 104078
   Summary: Some type determination weirdness
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

In an attempt to compile Qt and specifically Qt WebEngine with latest gcc 12
from git today, I git the follow weird error, from Skia inside Chromium inside
QtWebengine:

./../../../../../qtwebengine/src/3rdparty/chromium/third_party/skia/src/gpu/GrRefCnt.h:173:73:
error: ‘‘dependent_operator_type’ not supported by dump_type’ is
not a valid type for a template non-type parameter
  173 | gr_sp;
  |
^
../../../../../../qtwebengine/src/3rdparty/chromium/third_party/skia/src/gpu/GrRefCnt.h:173:73:
error: ‘‘dependent_operator_type’ not supported by dump_type’ is
not a valid type for a template non-type parameter

The error triggers in C++17 mode only, and the file compiles fine in C++20 mode
(and in c++17 mode on older gccs, and clang and msvc, etc).

[Bug target/31667] Integer extensions vectorization could be improved

2021-08-21 Thread linux at carewolf dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667

--- Comment #6 from Allan Jensen  ---
(In reply to Andrew Pinski from comment #5)
> We produce this now:
> 
> movdqa  x(%rip), %xmm1
> pxor%xmm0, %xmm0
> movdqa  %xmm1, %xmm2
> punpckhbw   %xmm0, %xmm1
> movaps  %xmm1, y+16(%rip)
> movdqa  x+16(%rip), %xmm1
> punpcklbw   %xmm0, %xmm2
> movaps  %xmm2, y(%rip)
> movdqa  %xmm1, %xmm2
> punpckhbw   %xmm0, %xmm1
> movaps  %xmm1, y+48(%rip)
> movdqa  x+32(%rip), %xmm1
> punpcklbw   %xmm0, %xmm2
> movaps  %xmm2, y+32(%rip)
> movdqa  %xmm1, %xmm2
> punpckhbw   %xmm0, %xmm1
> movaps  %xmm1, y+80(%rip)
> movdqa  x+48(%rip), %xmm1
> punpcklbw   %xmm0, %xmm2
> movaps  %xmm2, y+64(%rip)
> movdqa  %xmm1, %xmm2
> punpckhbw   %xmm0, %xmm1
> punpcklbw   %xmm0, %xmm2
> movaps  %xmm1, y+112(%rip)
> movaps  %xmm2, y+96(%rip)
> 
> And even ICC produce a similar thing except scheduled differently.

I hope that is because you forgot -msse4.1?

[Bug tree-optimization/78394] False positives of maybe-uninitialized with -Og

2021-04-02 Thread linux at carewolf dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78394

--- Comment #17 from Allan Jensen  ---
Yes, if you can figure out exactly what optimization passes it needs, then we
could disable the warning when those passes are disabled.

[Bug c/97083] New: __builtin_lround and _builtin_llround not replaced with fcvtas on aarch64

2020-09-17 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97083

Bug ID: 97083
   Summary: __builtin_lround and _builtin_llround not replaced
with fcvtas on aarch64
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

On aarch64 calling __builtin_round and casting the result to int or long long
uses a single fcvtas instruction, but using __builtin_lround or
__builtin_llround instead will do function call.

Seems like they are missing the same optimization.

[Bug c/66970] Add __has_builtin() macro

2019-07-14 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66970

--- Comment #19 from Allan Jensen  ---
(In reply to felix from comment #18)
> So even if this feature is adopted as-is, it will necessitate some changes
> in the documentation. And while I can sympathise with claims that this
> behaviour is surprising, what are the alternatives? If keywords should count
> as built-ins, should __has_builtin(sizeof) expand to 1? Should
> __has_builtin(volatile)?

No just keywords that begin with __builtin_..

[Bug rtl-optimization/43147] SSE shuffle merge

2019-05-19 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #9 from Allan Jensen  ---
(In reply to Marc Glisse from comment #6)
> Created attachment 45303 [details]
> example patch (untested)
> 
> Making the meaning of shuffles visible in GIMPLE could help a bit (although
> it wouldn't solve the problem completely because IIRC we don't dare combine
> shuffles, since it is hard to find an optimal expansion for a shuffle and we
> might pessimize some cases).

With some other cases there are checks to see if a combined new tree can be
generated as a single instruction and only combined in that case. And as soon
as the compiler have SSSE3 available, we can shuffle anything as single
instruction, so combining them is always safe and fast.

[Bug c++/88475] -E -fdirectives-only clashes with raw strings

2019-03-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88475

--- Comment #5 from Allan Jensen  ---
Note, you can fix the conflict with icecc by setting ICEC_REMOTE_CPP=0

Icecc will only do this to enable the remote cpp feature.

[Bug debug/68836] GCC can't properly emit debug info for function arguments in a back-trace when using -Og

2019-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68836

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #8 from Allan Jensen  ---
Duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78685

[Bug debug/86582] [debug] vla size reported as 0 at Og

2019-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86582

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #3 from Allan Jensen  ---
Wouldn't this be solved by disable -ftree-dse for -Og where as bug 78685 is
more complicated?

[Bug target/89057] [8/9 Regression] AArch64 ld3 st4 less optimized

2019-01-30 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

--- Comment #4 from Allan Jensen  ---
While that change might have made things worse. The real problem is probably
that the registers for those instructions are loaded and stored using
intrinsics, so proper register allocation and combining cant be performed.

For ARMv7 for instance the same code can be optimized to having no moves but
just a single vswp instruction between ld3 and st4. And MSVC and clang can do
that but GCC can not.

[Bug target/89058] GCC 7->8 regression: ARM(64) ld3 st4 less optimized

2019-01-25 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89058

--- Comment #2 from Allan Jensen  ---
Oops, sorry.

[Bug target/89058] New: GCC 7->8 regression: ARM(64) ld3 st4 less optimized

2019-01-25 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89058

Bug ID: 89058
   Summary: GCC 7->8 regression: ARM(64) ld3 st4 less optimized
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

When using the vld3_u8 and vst4_u8 instrinsics, the code generated with gcc8 is
less efficient than the code generated with gcc7. One has 3 moves, and the
other 9 moves.

The code in question is:

#include 
#include 

void qt_convert_rgb888_to_rgb32_neon(unsigned *dst, const unsigned char *src,
int len)
{
if (!len)
return;

const unsigned *const end = dst + len;

// align dst on 64 bits
const int offsetToAlignOn8Bytes = (reinterpret_cast(dst) >> 2) &
0x1;
for (int i = 0; i < offsetToAlignOn8Bytes; ++i) {
*dst++ = 0xff00 | (src[0] << 16) | (src[1] << 8) | src[2];
src += 3;
}

if ((len - offsetToAlignOn8Bytes) >= 8) {
const unsigned *const simdEnd = end - 7;
// non-inline asm version (uses more moves)
uint8x8x4_t dstVector;
dstVector.val[3] = vdup_n_u8(0xff);
do {
uint8x8x3_t srcVector = vld3_u8(src);
src += 3 * 8;
dstVector.val[0] = srcVector.val[2];
dstVector.val[1] = srcVector.val[1];
dstVector.val[2] = srcVector.val[0];
vst4_u8((uint8_t*)dst, dstVector);
dst += 8;
} while (dst < simdEnd);
}

while (dst != end) {
*dst++ = 0xff00 | (src[0] << 16) | (src[1] << 8) | src[2];
 src += 3;
 }
}


With gcc 7.3 the inner loop is:
.L5:
ld3 {v4.8b - v6.8b}, [x1]
add x1, x1, 24
orr v3.16b, v7.16b, v7.16b
mov v0.8b, v6.8b
mov v1.8b, v5.8b
mov v2.8b, v4.8b
st4 {v0.8b - v3.8b}, [x0]
add x0, x0, 32
cmp x3, x0
bhi .L5

With gcc 8.2 the inner loop is:
.L5:
ld3 {v4.8b - v6.8b}, [x1]
adrpx3, .LC1
add x1, x1, 24
ldr q3, [x3, #:lo12:.LC1]
mov v16.8b, v6.8b
mov v7.8b, v5.8b
mov v4.8b, v4.8b
ins v16.d[1], v17.d[0]
ins v7.d[1], v17.d[0]
ins v4.d[1], v17.d[0]
mov v0.16b, v16.16b
mov v1.16b, v7.16b
mov v2.16b, v4.16b
st4 {v0.8b - v3.8b}, [x0]
add x0, x0, 32
cmp x2, x0
bhi .L5

[Bug target/89057] New: GCC 7->8 regression: ARM(64) ld3 st4 less optimized

2019-01-25 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

Bug ID: 89057
   Summary: GCC 7->8 regression: ARM(64) ld3 st4 less optimized
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

When using the vld3_u8 and vst4_u8 instrinsics, the code generated with gcc8 is
less efficient than the code generated with gcc7. One has 3 moves, and the
other 9 moves.

The code in question is:

#include 
#include 

void qt_convert_rgb888_to_rgb32_neon(unsigned *dst, const unsigned char *src,
int len)
{
if (!len)
return;

const unsigned *const end = dst + len;

// align dst on 64 bits
const int offsetToAlignOn8Bytes = (reinterpret_cast(dst) >> 2) &
0x1;
for (int i = 0; i < offsetToAlignOn8Bytes; ++i) {
*dst++ = 0xff00 | (src[0] << 16) | (src[1] << 8) | src[2];
src += 3;
}

if ((len - offsetToAlignOn8Bytes) >= 8) {
const unsigned *const simdEnd = end - 7;
// non-inline asm version (uses more moves)
uint8x8x4_t dstVector;
dstVector.val[3] = vdup_n_u8(0xff);
do {
uint8x8x3_t srcVector = vld3_u8(src);
src += 3 * 8;
dstVector.val[0] = srcVector.val[2];
dstVector.val[1] = srcVector.val[1];
dstVector.val[2] = srcVector.val[0];
vst4_u8((uint8_t*)dst, dstVector);
dst += 8;
} while (dst < simdEnd);
}

while (dst != end) {
*dst++ = 0xff00 | (src[0] << 16) | (src[1] << 8) | src[2];
 src += 3;
 }
}


With gcc 7.3 the inner loop is:
.L5:
ld3 {v4.8b - v6.8b}, [x1]
add x1, x1, 24
orr v3.16b, v7.16b, v7.16b
mov v0.8b, v6.8b
mov v1.8b, v5.8b
mov v2.8b, v4.8b
st4 {v0.8b - v3.8b}, [x0]
add x0, x0, 32
cmp x3, x0
bhi .L5

With gcc 8.2 the inner loop is:
.L5:
ld3 {v4.8b - v6.8b}, [x1]
adrpx3, .LC1
add x1, x1, 24
ldr q3, [x3, #:lo12:.LC1]
mov v16.8b, v6.8b
mov v7.8b, v5.8b
mov v4.8b, v4.8b
ins v16.d[1], v17.d[0]
ins v7.d[1], v17.d[0]
ins v4.d[1], v17.d[0]
mov v0.16b, v16.16b
mov v1.16b, v7.16b
mov v2.16b, v4.16b
st4 {v0.8b - v3.8b}, [x0]
add x0, x0, 32
cmp x2, x0
bhi .L5

[Bug c++/88475] -E -fdirectives-only clashes with raw strings

2019-01-21 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88475

--- Comment #3 from Allan Jensen  ---
No, it has to be a raw-string to be valid.
https://wandbox.org/permlink/I0yF3U3OXoH6LbIM

[Bug c++/88475] -E -fdirectives-only clashes with raw strings

2019-01-17 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88475

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #1 from Allan Jensen  ---
I also see this with Debian's gcc 8.2.0 (gcc version 8.2.0 (Debian 8.2.0-14))

[Bug tree-optimization/78394] False positives of maybe-uninitialized with -Og

2018-12-17 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78394

--- Comment #9 from Allan Jensen  ---
I see two other level effort ways to possibly fix the issue. Disable the
warning like for -O0 as it is buggy, or if we believe it still has some value
in -Og even with the false positivies, just removing it from -Wall or -Wextra,
so it at least doesn't get enabled unless explicitly asked for.

[Bug c++/58407] [C++11] Should warn about deprecated implicit generation of copy constructor/assignment

2018-10-02 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58407

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #24 from Allan Jensen  ---
So with this the rule-of-three is now the rule-of-four or six?

[Bug target/85950] Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-29 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950

--- Comment #6 from Allan Jensen  ---
Btw, I have tested and the patch works for my cases.

[Bug target/85950] Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-29 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950

--- Comment #4 from Allan Jensen  ---
Btw, I found this while trying to figure out why std::round() wasn't also
optimized to a single roundss instruction, is that just a missing optimization
or is there a quirk about that that makes them not fit?

I noticed the definition of the ROUND enum in i386.md is even missing the entry
to for normal rounding (0 AFAIK)

[Bug rtl-optimization/85950] Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950

--- Comment #2 from Allan Jensen  ---
Created attachment 44196
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44196=edit
Example

To trigger need both a rounding conversion and a conversion to integer.

[Bug rtl-optimization/85950] Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950

--- Comment #1 from Allan Jensen  ---
Sorry forget the example above. I will attached the real code that triggers it.

Note it does not trigger with -fno-signed-zeros, -fno-trapping-math,
-fassociative-math and -freciprocal-math, so it is something specific to
unsafe-math-optimizations itself.

[Bug rtl-optimization/85950] New: Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85950

Bug ID: 85950
   Summary: Unsafe-math-optimizations regresses optimization using
SSE4.1 roundss
   Product: gcc
   Version: 8.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

When SSE4.1 is available, std::floor, std::ceil and their C counterparts are
inlined to being a single roundss instruction.

However if compiled with -Ofast, -ffast-math or -funsafe-math-optimization
specifically, then you instead get a slightly improved version of the much
slower SSE2 implementation of the same functions.

For instance compiling this with -msse4.1:

#include 
double stdfloor(double a)
{
return std::floor(a);
}

double stdceil(double a)
{
return std::ceil(a);
}

[Bug tree-optimization/85692] Two source permute not used for vector initialization

2018-05-08 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85692

--- Comment #5 from Allan Jensen  ---
Created attachment 44088
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44088=edit
suggested patch

[Bug tree-optimization/85692] Two source permute not used for vector initialization

2018-05-08 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85692

--- Comment #4 from Allan Jensen  ---
Note I already posted a patch on gcc-patches myself. It is very similar to
yours

[Bug tree-optimization/85692] Two source permute not used for vector initialization

2018-05-08 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85692

--- Comment #1 from Allan Jensen  ---
Created attachment 44084
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44084=edit
construct.cc

Motivating examples. Compile with -msse4.1 for the second case.

[Bug tree-optimization/85692] New: Two source permute not used for vector initialization

2018-05-08 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85692

Bug ID: 85692
   Summary: Two source permute not used for vector initialization
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

If a vector initialization is using elements from only a single vector source,
it will be optimized as a shuffle, but if it is using elements from two, it
will not be attempted.

This appears to be a missing case in
tree-ssa-forwprop.c:simplify_vector_constructor

[Bug rtl-optimization/85551] No strength reduction of modulo and integer vision

2018-04-27 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85551

--- Comment #2 from Allan Jensen  ---
Hmm.. I appear to have made unsafe assumptions in the mod_opt cases.

The first safe optimization version would then be:
void mod_opt(int *a, int count, int stride, unsigned width)
{
int pos_opt = 0;
for (int i = 0; i < count; ++i) {
if (pos_opt < 0 || pos_opt >= width)
pos_opt = pos_opt % width;
a[i] = pos_opt;
pos_opt += stride;
}
}

To be able to completely get rid of modulo, you need to know or check for the
size of stride compared to width.

[Bug rtl-optimization/85551] No strength reduction of modulo and integer vision

2018-04-27 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85551

--- Comment #1 from Allan Jensen  ---
I also stumbled on this old motivating article when I tried googling the
concept: http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TM-600.pdf

[Bug rtl-optimization/85551] New: No strength reduction of modulo and integer vision

2018-04-27 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85551

Bug ID: 85551
   Summary: No strength reduction of modulo and integer vision
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 44030
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44030=edit
strmod.cpp

Many simple loops using modulo naively can be optimized too not perform the
expensive module/division every iterations, but GCC does not perform this
strength reduction.

I have attached a motivating example including two iterations of optimizations.
An easy safe one (though it might interfere with vectorization if the arch has
vectorized integer divisions), and a more agressive one that is much faster in
some cases but not always.

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

--- Comment #6 from Allan Jensen  ---
Yeah, the a==255 was actually not a case I would expect the compiler to solve,
which is why I changed the example to the a==0 case, which should be solveable
using existing constant propagation.

Note you can put both short-cuts in, though as it standards only gcc 7 and 8
can vectorize it with two conditions, so we cant use that in general code as we
need it to be fast elsewhere too.

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

--- Comment #4 from Allan Jensen  ---
Created attachment 43995
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43995=edit
gccbug85406.cpp

This version compiles with a pcmpeqd and pandn instead of a blend, but the
principle is the same.

Though the last of a ptest in the beginning is worse, as that risks a
performance regression compared to non-vectorized.

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

--- Comment #3 from Allan Jensen  ---
You need to add the loop around it

void test(unsigned *buffer, int count)
{
for (int i = 0; i < count; ++i)
buffer[i] = qPremultiply(buffer[i]);
}

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-15 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

--- Comment #1 from Allan Jensen  ---
Note it might be hard to figure out for the compiler that the result for a==255
will leave the input unchanged, but you can observe the same if you instead
test for a == 0 (and return 0). In that case the compiler should have enough
math deduction to be able to tell that the result of a==0 is always 0.

[Bug tree-optimization/85406] New: Unnecessary blend when vectorizing short-cutted calculations

2018-04-15 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

Bug ID: 85406
   Summary: Unnecessary blend when vectorizing short-cutted
calculations
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

If you have something like this:

inline unsigned qPremultiply(unsigned x)
{
const unsigned a = x >> 24;
if (a == 255)
  return x;

unsigned t = (x & 0xff00ff) * a;
t = (t + ((t >> 8) & 0xff00ff) + 0x800080) >> 8;
t &= 0xff00ff;

x = ((x >> 8) & 0xff) * a;
x = (x + ((x >> 8) & 0xff) + 0x80);
x &= 0xff00;
return x | t | (a << 24);

}

Gcc will vectorize it so that the longer calculation is always performed and
with an added blend in the end to merge the two different return values. This
is however unnecessary as the calculation will give the same result, and thus
the blend can be saved.

Also in any case it is actually a bit unsafe to vectorize as the performance
difference between the two branches is substantial, and it happens that in this
case the short-cut is likely to be valid most of the time, so a nonvectorized
loop might be faster than a vectorized one by doing a lot less.

The latter can be fixed, if the short-cut was also vectorized, for instance
making the test for 4 values at a time and skip the long route if none of them
need it.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #8 from Allan Jensen  ---
Yes, those I say are missing are compared to -O2. I was investigating this in
relation to Qt. We either build these files with -O3, or with -Os for customer
that are binary size sensitive. Since some of the image handling routines are
quite heavy and have been written for auto-vectorization I was just checking if
I could get it to work and the results with your patch are quite good:

Normal sizes of qdrawhelper.o with -O3/-O2/-Os: 
277704 / 198984 / 168440

With -O2 -ftree-vectorize: 242224
With -O2 -fopenmp: 219536
With -Os -ftree-loop-vectorize: 168440 (no change)
With -Os -fopenmp: 177144 (with your patch)

So most of the -Os benefit and still many of the central draw loops
auto-vectorized.

Haven't benchmarked it yet though.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #6 from Allan Jensen  ---
Great. Your patch worked with 90% of the marked loops!

The remaining report things like this with -fopt-info-vec-missed:

note: not vectorized: relevant stmt not supported: idisty.872_437 = (unsigned
int) idisty_386;
note: bad operation or unsupported loop bound.

But the result is already pretty good for -fopenmp with manually marked loops.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #4 from Allan Jensen  ---
I will try the patch. I just tried -fopt-info-vec-missed and the message
reported for every loop was:

note: not vectorized: latch block not empty.
note: bad loop form.

[Bug tree-optimization/84777] New: -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

Bug ID: 84777
   Summary: -Os inhibits all vectorization
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Neither the command-line flag -ftree-loop-vectorize nor -fopenmp combined with
"#pragma omp simd" works when -Os is active.

It seems that it when specified manually vectorization should be work even in
-Os mode. I can almost see why -ftree-loop-vectorize wouldn't work, which is
why I tried the manual marking of loops to vectorize, but the latter didn't
work either.

I would suggest documenting this behavior and fix at least vectorizing manually
marked loops.

[Bug tree-optimization/84670] [8 Regression] ICE: in compute_antic_aux, at tree-ssa-pre.c:2148 with -O2 -fno-tree-dominator-opts

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84670

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #13 from Allan Jensen  ---
*** Bug 84718 has been marked as a duplicate of this bug. ***

[Bug middle-end/84718] [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718

Allan Jensen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #5 from Allan Jensen  ---
Yes an updated build which includes the fix from 84670 works.

*** This bug has been marked as a duplicate of bug 84670 ***

[Bug middle-end/84718] [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718

--- Comment #4 from Allan Jensen  ---
I will update my gcc build and check

[Bug middle-end/84718] [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718

--- Comment #2 from Allan Jensen  ---
Created attachment 43568
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43568=edit
spdy_alt_svc_wire_format.ii.gz

[Bug middle-end/84718] [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718

--- Comment #1 from Allan Jensen  ---
Created attachment 43567
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43567=edit
spdy_alt_svc_wire_format.s

[Bug middle-end/84718] New: [8 regression] ICE when compiling chromium

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84718

Bug ID: 84718
   Summary: [8 regression] ICE when compiling chromium
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 43566
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43566=edit
gcc log

Using latest gcc 8 updated today I hit an internal compiler error in the
Chromium part of qtwebengine in the file
net/spdy/core/spdy_alt_svc_wire_format.cc

[Bug middle-end/84019] [7/8 regression] ICE in fold-const of std::complex division

2018-03-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019

Allan Jensen  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #9 from Allan Jensen  ---
I now have trouble reproducing it. Let's assume for now my configuration was
wrong at the time this was still reproducable for me.

[Bug middle-end/84019] [7/8 regression] ICE in fold-const of std::complex division

2018-02-23 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019

--- Comment #8 from Allan Jensen  ---
Yes, I will take a look again and produce the intermediate results

[Bug lto/63688] all_symbols_read_handler: Assertion `lto_wrapper_argv' failed.

2018-02-05 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63688

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #2 from Allan Jensen  ---
Yeah, that assert is kind of useless and the -plugin argument is basically
pointless without the undocumented required -plugin-opt commands necessary.
Though maybe that is a binutils bug?

[Bug middle-end/84019] [7/8 regression] ICE under fold-const

2018-01-24 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019

--- Comment #2 from Allan Jensen  ---
I can provide the intermediate code, but I haven't created a reduced test-case,
so it would be big.

[Bug middle-end/84019] [7/8 regression] ICE under fold-const

2018-01-24 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019

--- Comment #1 from Allan Jensen  ---
First line of the ICE (the only line reported by system gcc)

../../src/init2.c:52: MPFR assertion failed: p >= 2 && p <=
((mpfr_prec_t)((mpfr_uprec_t)(~(mpfr_uprec_t)0)>>1))

[Bug middle-end/84019] New: [7/8 regression] ICE under fold-const

2018-01-24 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84019

Bug ID: 84019
   Summary: [7/8 regression] ICE under fold-const
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

ICE when compiling Chromium in QtWebEngine under certain conditions.

With gcc 8:
during GIMPLE pass: fre
../../../../../qtwebengine/src/3rdparty/chromium/third_party/WebKit/Source/platform/audio/IIRFilter.cpp:
In member function ‘void blink::IIRFilter::GetFrequencyResponse(int, const
float*, float*, float*)’:
../../../../../qtwebengine/src/3rdparty/chromium/third_party/WebKit/Source/platform/audio/IIRFilter.cpp:221:1:
internal compiler error: Aborted
 }  // namespace blink
 ^
0xe6e39f crash_signal
../../gcc/toplev.c:325
0xa49596 do_mpc_arg2(tree_node*, tree_node*, tree_node*, int, int
(*)(__mpc_struct*, __mpc_struct const*, __mpc_struct const*, int))
../../gcc/builtins.c:10478
0xbb51f4 const_binop
../../gcc/fold-const.c:1405
0xbb5ee7 const_binop(tree_code, tree_node*, tree_node*, tree_node*)
../../gcc/fold-const.c:1705
0x11daa14 gimple_resimplify2(gimple**, code_helper*, tree_node*, tree_node**,
tree_node* (*)(tree_node*))
../../gcc/gimple-match-head.c:133
0x12ad258 gimple_simplify(gimple*, code_helper*, tree_node**, gimple**,
tree_node* (*)(tree_node*), tree_node* (*)(tree_node*))
../../gcc/gimple-match-head.c:643
0xbf904a gimple_fold_stmt_to_constant_1(gimple*, tree_node* (*)(tree_node*),
tree_node* (*)(tree_node*))
../../gcc/gimple-fold.c:6117
0x101a604 try_to_simplify
../../gcc/tree-ssa-sccvn.c:3982
0x101a604 visit_use
../../gcc/tree-ssa-sccvn.c:4033
0x101c736 process_scc
../../gcc/tree-ssa-sccvn.c:4363
0x101c736 extract_and_process_scc_for_name
../../gcc/tree-ssa-sccvn.c:4434
0x101c736 DFS
../../gcc/tree-ssa-sccvn.c:4484
0x101cbd3 sccvn_dom_walker::before_dom_children(basic_block_def*)
../../gcc/tree-ssa-sccvn.c:4917
0x15ad907 dom_walker::walk(basic_block_def*)
../../gcc/domwalk.c:308
0x101d6ca run_scc_vn(vn_lookup_kind)
../../gcc/tree-ssa-sccvn.c:5033
0x101deea execute
../../gcc/tree-ssa-sccvn.c:6015


The same happens with system gcc (7.2 from Debian), but not with system gcc-6

[Bug tree-optimization/83847] [8 Regression] ICE in vectorizable_load, at tree-vect-stmts.c:7365

2018-01-16 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83847

--- Comment #4 from Allan Jensen  ---
Full output from the ICE:

during GIMPLE pass: vect
/src/qt5/qtbase/src/corelib/kernel/qmetaobjectbuilder.cpp: In function ‘int
buildMetaObject(QMetaObjectBuilderPrivate*, char*, int, bool)’:
/src/qt5/qtbase/src/corelib/kernel/qmetaobjectbuilder.cpp:1174:12: internal
compiler error: in vectorizable_load, at tree-vect-stmts.c:7365
 static int buildMetaObject(QMetaObjectBuilderPrivate *d, char *buf,
^~~
0x74c949 vectorizable_load
../../gcc/tree-vect-stmts.c:7365
0x10a40b4 vect_analyze_stmt(gimple*, bool*, _slp_tree*, _slp_instance*)
../../gcc/tree-vect-stmts.c:9355
0x10bddee vect_analyze_loop_operations
../../gcc/tree-vect-loop.c:1875
0x10bddee vect_analyze_loop_2
../../gcc/tree-vect-loop.c:2254
0x10bddee vect_analyze_loop(loop*, _loop_vec_info*)
../../gcc/tree-vect-loop.c:2546
0x10d6b2d vectorize_loops()
../../gcc/tree-vectorizer.c:664
Please submit a full bug report,

[Bug tree-optimization/83847] [8 Regression] ICE in vectorizable_load, at tree-vect-stmts.c:7365

2018-01-16 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83847

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #3 from Allan Jensen  ---
Affects building Qt 5.10 QtCore, but only if optimizing for certain
architectures.

I triggered it with 
 /opt/gcc/bin/g++-8 -c -pipe -march=skylake -g -O3 -std=c++1z
-fvisibility=hidden -fvisibility-inlines-hidden -Wall -W -Wvla -Wdate-time
-Wshift-overflow=2 -Wduplicated-cond -Wno-stringop-overflow -D_REENTRANT -fPIC
-DQT_NO_USING_NAMESPACE -DQT_NO_FOREACH
-DELF_INTERPRETER=\"/lib64/ld-linux-x86-64.so.2\"
-DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT -DQT_BUILD_CORE_LIB -DQT_BUILDING_QT
-DQT_NO_CAST_TO_ASCII -DQT_ASCII_CAST_WARNINGS -DQT_MOC_COMPAT
-DQT_USE_QSTRINGBUILDER -DQT_DEPRECATED_WARNINGS
-DQT_DISABLE_DEPRECATED_BEFORE=0x05 -D_LARGEFILE64_SOURCE
-D_LARGEFILE_SOURCE -DQT_NO_DEBUG -DPCRE2_CODE_UNIT_WIDTH=16
-I/src/qt5/qtbase/src/corelib -I. -Iglobal
-I/src/qt5/qtbase/src/3rdparty/harfbuzz/src -I/src/qt5/qtbase/src/3rdparty/md5
-I/src/qt5/qtbase/src/3rdparty/md4 -I/src/qt5/qtbase/src/3rdparty/sha3
-I/src/qt5/qtbase/src/3rdparty/forkfd -I../../include -I../../include/QtCore
-I../../include/QtCore/5.10.1 -I../../include/QtCore/5.10.1/QtCore -I.moc
-I/src/qt5/qtbase/src/3rdparty/pcre2/src -isystem /usr/include/glib-2.0
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include
-I/src/qt5/qtbase/mkspecs/linux-g++ -o .obj/qmetaobjectbuilder.o
/src/qt5/qtbase/src/corelib/kernel/qmetaobjectbuilder.cpp

Removing -march=skylake worked.

[Bug tree-optimization/82426] Missed tree-slp-vectorization on -O2 and -O3

2017-10-04 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426

--- Comment #3 from Allan Jensen  ---
Note it appears the fact it can do it at all in -Os is new in gcc 7

[Bug tree-optimization/82426] Missed tree-slp-vectorization on -O2 and -O3

2017-10-04 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426

--- Comment #2 from Allan Jensen  ---
Created attachment 42301
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42301=edit
Assembler output with -Os -ftree-slp-vectorize

[Bug tree-optimization/82426] Missed tree-slp-vectorization on -O2 and -O3

2017-10-04 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426

--- Comment #1 from Allan Jensen  ---
Created attachment 42300
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42300=edit
Assembler output with -O3

[Bug tree-optimization/82426] New: Missed tree-slp-vectorization on -O2 and -O3

2017-10-04 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426

Bug ID: 82426
   Summary: Missed tree-slp-vectorization on -O2 and -O3
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 42299
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42299=edit
vectslp.cpp

The attached example is a simple matrix multiplication. With -O3 or -O2
-ftree-slp-vectorize the basic-block is not vectorized.

Oddly, with -Os -ftree-slp-vectorize it is.

[Bug rtl-optimization/81174] bswap not recognized in |= statement

2017-06-22 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81174

Allan Jensen  changed:

   What|Removed |Added

Version|6.1.1   |7.1.0

--- Comment #1 from Allan Jensen  ---
Also reproduced with gcc 4.8, 4.9, 5 and 7. Works in clang. With gcc 6+ it
would sometimes work if bswap was called as part of a constructor.

[Bug rtl-optimization/81174] New: bswap not recognized in |= statement

2017-06-22 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81174

Bug ID: 81174
   Summary: bswap not recognized in |= statement
   Product: gcc
   Version: 6.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 41610
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41610=edit
bswap-issue.cc

In writting a big-endian bitfield accessor I noticed that bswap was not always
recognized.

It appears the problem triggers together with |= statements, at least replacing
the |= statement with += solves the issue.

I have attached a test case. The faulty one is the first, the two second ones
work.

[Bug ipa/80277] New: ipa-icf missing overlooking functions

2017-04-01 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80277

Bug ID: 80277
   Summary: ipa-icf missing overlooking functions
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 41100
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41100=edit
icf.cc

Several functions that produce identical assembler are not merged by ipa-icf. I
have attached an example, and only the two functions foo0 and foo1 that are
identical in every detail are meged, though all the foo* functions produce
identical assembler.

I theorice it is because the function signature is compared before the content,
and the templates and different types might cause that early comparison to fail
when it shouldn't.

I added a second test that just changed the return value but kept everything
else identical and it also wasn't merged.


A little unrelated: I noted the ipa-icf optimization is undone by -O3 as it
re-inlines, though that is kind of pointless unless it is needed for second
level inlining.

[Bug target/80040] New: SSE4.1 ptest not always merged

2017-03-14 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80040

Bug ID: 80040
   Summary: SSE4.1 ptest not always merged
   Product: gcc
   Version: 6.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 40971
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40971=edit
Example

The intrinsics _mm_testz_si128 and _mm_testc_si128 both map to the exact same
instruction and parameters. They are sometimes merged to just one instruction
call, but not always. 

I have attached and example where in the first function the two intrinsics are
merge but in the second are not.

[Bug target/80040] SSE4.1 ptest not always merged

2017-03-14 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80040

--- Comment #2 from Allan Jensen  ---
Created attachment 40973
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40973=edit
Assembler output from gcc 6

Easier to compare

[Bug target/80040] SSE4.1 ptest not always merged

2017-03-14 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80040

--- Comment #1 from Allan Jensen  ---
Created attachment 40972
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40972=edit
Assembler output

[Bug target/78921] New: SSE/AVX shuffle intrinsics uses builtins instead of __builtin_shuffle

2016-12-24 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78921

Bug ID: 78921
   Summary: SSE/AVX shuffle intrinsics uses builtins instead of
__builtin_shuffle
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

The intrinsics for x86 SIMD shuffle instructions could be redeclared using
__builtin_shuffle. This would help folding and better instruction selection. 

This has already been suggested on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756 and is also a necessary
component of solving one part of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78563 .

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-21 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762

--- Comment #13 from Allan Jensen  ---
The question is if the unaligned store is still slow on Excavator and Ryzen
which support AVX2. As far as I understand the bulldozer architectures just
prefer split AVX because it was basically emulating them with 128-bit micro-ops
anyway.

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-21 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762

--- Comment #11 from Allan Jensen  ---
Btw, did you benchmark store splitting on AMD? It is also enabled for BDVER and
ZNVER1.

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-21 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762

--- Comment #10 from Allan Jensen  ---
That would solve the problem, but also leave the behavior as Sandybridge only
(nehalem didn't have AVX).

[Bug target/59874] Missing builtin (__builtin_clzs) when compiling with g++

2016-12-15 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874

--- Comment #15 from Allan Jensen  ---
Yes, the patch works and it also evaluates at compile time.

[Bug target/59874] Missing builtin (__builtin_clzs) when compiling with g++

2016-12-13 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874

--- Comment #8 from Allan Jensen  ---
Thanks that looks good. I will test it when I have a chance. I am changing the
Qt sources to not assume the presence of __builtin_clzs when __BMI__ is
defined. It can use __builtin_clz() and __builtin_ctz()-16U instead, but for
general compatibility it is nice that GCC also keeps it around. 

Note, it would be even better though if GCC could support the short forms as
generic builtins. That changes the semantics slightly, but they are named so
similarly to the clz, clzl and clzll it would be easy to assume they also are
generics, with similar semantics, and can work across all targets.

Btw. I assume __builtin_clzs being a target specific builtin, that GCC never
had the capability of resolving it at compile-time? If that is the case, it
might actually be a bug that GCC allowed it at all in a constexpr function.

[Bug c/66970] Add __has_builtin() macro

2016-12-12 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66970

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #5 from Allan Jensen  ---
This just hit us again, when a patch release removed __builtin_clzs or renamed
it to __builtin_lzcnt_u16. We need to be able to detect which ones exist at
compile-time otherwise we can't ship in headers that won't break when gcc
updates.

[Bug target/59874] Missing builtin (__builtin_clzs) when compiling with g++

2016-12-12 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #5 from Allan Jensen  ---
This is more problematic to fix in Qt itself. How can we determine if we
should/can use __builtin_clzs or __lzcnt16?

Note the former is practically standard being supported by both older gcc and
clang. There is also the problem that we need to call a builtin, because the
C-intrinsics don't work as constexpr.

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-12-12 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118

Allan Jensen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Allan Jensen  ---
Fixed in trunk

[Bug target/47754] [missed optimization] AVX allows unaligned memory operands but GCC uses unaligned load and register operand

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754

--- Comment #11 from Allan Jensen  ---
The think the issue I noted is completely separate from this one, so I opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 to deal with it.

I think this one could probably be closed though.

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762

--- Comment #3 from Allan Jensen  ---
Created attachment 40298
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40298=edit
Test compiled with gcc 6

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762

--- Comment #2 from Allan Jensen  ---
Created attachment 40297
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40297=edit
Test compiled with -march=haswell

[Bug target/78762] Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762

--- Comment #1 from Allan Jensen  ---
Created attachment 40296
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40296=edit
Test compiled with -mavx2

[Bug target/78762] New: Regression: Splitting unaligned AVX loads also when AVX2 is enabled

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762

Bug ID: 78762
   Summary: Regression: Splitting unaligned AVX loads also when
AVX2 is enabled
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 40295
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40295=edit
Test

In gcc 7 when not optimizing for speed or newer Intel architectures unaligned
AVX loads are now split.

It appears this is on purpose, and the code related to it quite old, but I
haven't been able to trigger it with older versions gcc (tried 4.9, 5 and 6).

However this is a special tuning intended for Sandybridge and possibly AMD
cpus. It does not trigger on any AVX2 processor. Therefore it now causes a
universal performance degradation in code optimized for generic AVX2.

I suggest this tuning is disabled when avx2 is enabled.

[Bug target/47754] [missed optimization] AVX allows unaligned memory operands but GCC uses unaligned load and register operand

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754

--- Comment #10 from Allan Jensen  ---
No I mean it triggers when you compile with -mavx2, it is solved with
-march=haswell. It appears the issue is the tune flag
X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL is set for all processors that support
avx2, but if you use generic+avx2, it still pessimistically optimizes for
pre-avx2 processors setting MASK_AVX256_SPLIT_UNALIGNED_LOAD.

Though since there are two controlling flags and the second
X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL is still set for some avx2 processors
(btver and znver) besides generic, it is harder to argue what generic+avx2
should do there.

[Bug target/47754] [missed optimization] AVX allows unaligned memory operands but GCC uses unaligned load and register operand

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754

--- Comment #8 from Allan Jensen  ---
Note this happens with -mavx2, but not with -march=haswell. It appears the
tuning is a bit too pessimistic when avx2 is enabled on generic x64.

[Bug target/47754] [missed optimization] AVX allows unaligned memory operands but GCC uses unaligned load and register operand

2016-12-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #7 from Allan Jensen  ---
This is significantly worse with integer operands.

_mm256_storeu_si256((__m256i *)[3],
_mm256_add_epi32(_mm256_loadu_si256((const __m256i *)[0]),
 _mm256_loadu_si256((const __m256i *)[1]))
);

compiles to:

vmovdqu 0x20(%rax),%xmm0
vinserti128 $0x1,0x30(%rax),%ymm0,%ymm0
vmovdqu (%rax),%xmm1
vinserti128 $0x1,0x10(%rax),%ymm1,%ymm1
vpaddd %ymm1,%ymm0,%ymm0
vmovups %xmm0,0x60(%rax)
vextracti128 $0x1,%ymm0,0x70(%rax)

[Bug target/78563] SSE4.1 pmovzx shuffle pattern not recognized

2016-11-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78563

--- Comment #1 from Allan Jensen  ---
Created attachment 40177
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40177=edit
Test

[Bug target/78563] New: SSE4.1 pmovzx shuffle pattern not recognized

2016-11-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78563

Bug ID: 78563
   Summary: SSE4.1 pmovzx shuffle pattern not recognized
   Product: gcc
   Version: 6.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

An unpack pattern with 0 constant are neither folded nor recognized as a pmovzx
instruction.

SSE2 code:
_mm_unpacklo_epi32(X, _mm_setzero_si128())

GCC code:
__builtin_shuffle((__v4si)X, (__v4si)_mm_setzero_si128(), (__v4si){0, 4, 1,
5});

Will both produce the same result of an xor setting 0 and an unpack
instruction, while it could with SSE4.1 emit a pmozx instruction.

Note epi32 is just an example here used because it is most compact, this also
affects the 8 and 16 bit equivelents.

Looking in config/i386/i386.c it seems like there is no code in the
expand_vec_perm_* methods for detecting pmovzx patterns.

[Bug target/31667] Integer extensions vectorization could be improved

2016-11-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667

--- Comment #4 from Allan Jensen  ---
(In reply to Allan Jensen from comment #3)
> Gcc 5 and 6 produces code with pmovzx when compiling the example with -O3
> -msse4.1
> 
> I assume this can be closed.

Note like comment 1 saids, it will not use a memory load, though instead it
does half as many memory reads.

movdqa 0x0(%rip),%xmm0# 8 
pmovzxbw %xmm0,%xmm1
psrldq $0x8,%xmm0
pmovzxbw %xmm0,%xmm0
movaps %xmm1,0x0(%rip)# 1e 
movaps %xmm0,0x0(%rip)# 25 

[Bug target/31667] Integer extensions vectorization could be improved

2016-11-28 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #3 from Allan Jensen  ---
Gcc 5 and 6 produces code with pmovzx when compiling the example with -O3
-msse4.1

I assume this can be closed.

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-11-24 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118

Allan Jensen  changed:

   What|Removed |Added

  Attachment #40130|0   |1
is obsolete||

--- Comment #5 from Allan Jensen  ---
Created attachment 40140
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40140=edit
Patch

Updated patch confirmed to work

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-11-23 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118

--- Comment #4 from Allan Jensen  ---
Created attachment 40130
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40130=edit
Proposed patch

On closer inspection, we are only almost there, two minor changes are still
needed. (testing patch).

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-11-23 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118

--- Comment #3 from Allan Jensen  ---
Or r217608

[Bug target/70118] UBSan claims misaligned access in SSE instrinsics

2016-11-23 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118

--- Comment #2 from Allan Jensen  ---
I believe this to be fixed by r239889

[Bug tree-optimization/78394] False positives of maybe-uninitialized with -Og

2016-11-17 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78394

Allan Jensen  changed:

   What|Removed |Added

  Attachment #40064|0   |1
is obsolete||

--- Comment #1 from Allan Jensen  ---
Created attachment 40065
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40065=edit
maybe_uninitialized.cpp

Added another example

[Bug tree-optimization/78394] New: False positives of maybe-uninitialized with -Og

2016-11-17 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78394

Bug ID: 78394
   Summary: False positives of maybe-uninitialized with -Og
   Product: gcc
   Version: 6.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 40064
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40064=edit
maybe_uninitialized.cpp

Compiling with -Og produces a number of unique false positives for the
maybe-unintialized warnings. The warnings are only emited for -Og and not for
-O0, -O1, -O2 or -O3.

[Bug pch/63319] [5 Regression] ICE: Segmentation fault building qt5 with pch

2016-11-03 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63319

Allan Jensen  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #12 from Allan Jensen  ---
There is a chance this has already been fixed. We recently ran into the issue
again, see https://bugreports.qt.io/browse/QTBUG-56817 but it only affects GCC
5.3.1. On Debian's gcc 5.4.1 version it works.

[Bug tree-optimization/77902] Auto-vectorizes epilogue loops of manually vectorized functions

2016-10-18 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902

Allan Jensen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Allan Jensen  ---
Since it appears to be optimized better in gcc 7, let's say this is resolved.

[Bug tree-optimization/77902] Auto-vectorizes epilogue loops of manually vectorized functions

2016-10-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902

--- Comment #2 from Allan Jensen  ---
While this have been the case in both GCC 5 and GCC 6, it appears to both
failing cases previously meantioned already produced the best case result in
using a half recent GCC 7.
gcc version 7.0.0 20160923 (experimental) (GCC)

[Bug tree-optimization/77902] Auto-vectorizes epilogue loops of manually vectorized functions

2016-10-10 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902

--- Comment #1 from Allan Jensen  ---
Further experimentation shows that GCC can sometimes reason about the remaining
range but does so inconsistenly.

For instance this examplse also fails:
int result = 0;
for (; count >= 4; count -= 4) {
// Manually vectorized or batched code
foobar_4x(result, vector);
vector += 4;
}
for (; count >= 0; --count) {   // Still autovectorized
result += *vector++;
}

But replacing the epilogue with a loop that counts up, and GCC
appears to figure out it is pointless to vectorize:

for (int i = 0; i < count; ++count) { // correctly not vectorized

[Bug tree-optimization/77902] New: Auto-vectorizes epilogue loops or manually vectorized functions

2016-10-08 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902

Bug ID: 77902
   Summary: Auto-vectorizes epilogue loops or manually vectorized
functions
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

Created attachment 39774
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39774=edit
Example that trigger the pointless auto-vectorization

A common pattern when manually vectorizing an inner function is to have a small
epilogue that handles the remainder of the input vector that cannot be handled
by the vectorized stepping.

For instance:
int i = 0;
for (; i < (count - 3); i +=4)
   // do 4 at a time
for (; i < count; ++i)
   // do 1 at a time


When compiled with -O3 or -ftree-loop-vectorize that last epilogue may be
auto-vectorized by GCC even though it can at most be run 3 times, and the
auto-vectorized code-path will never be called.

Rewriting it as 
int i = 0;
for (; i < (count - 3); i +=4)
   // do 4 at a time
for (int _i;  _i < 3 && i < count; ++_i, ++i)
   // do 1 at a time

Fixes the issue.

I am guessing GCC would do well to learn a range from the main-loop so that it
can figure out on its own that the epilogue can not be run more than 3 times.

[Bug c++/77796] New: tautological compare warning emitted for inherited static method comparisons

2016-09-29 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77796

Bug ID: 77796
   Summary: tautological compare warning emitted for inherited
static method comparisons
   Product: gcc
   Version: 6.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linux at carewolf dot com
  Target Milestone: ---

We have been running into several issues with the tautological compare warning
in qtdeclarative, first there was https://bugreports.qt.io/browse/QTBUG-53373
(warning about comparing a typedef with its definition), and recently
https://bugreports.qt.io/browse/QTBUG-56266 (warning about a method that is
resolved to what it is compared to).

Both cases the comparison are not tautological, but merely compile time, and
specifically used in places where they need to be resolvable at compile time.
It makes no sense to warn about a comparison being resolvable at compile time a
place that demands a constexpr.

The latest example can be reproduced with this simple code:
class A {
public:
static void destroy() { }
};

class B : public A
{
};

const int tbl[1] = {
B::destroy == A::destroy ? 0 : 1
};

It specifically looks for whether the method has been overwritten in a derived
class, but since the names are looked up using two different scopes, it
shouldn't trigger the taulogical warning. Only comparing (A::destroy ==
A::destroy) should do that.

[Bug lto/65274] Internal compiler error: should die in combat

2016-08-29 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65274

--- Comment #4 from Allan Jensen  ---
It works now.

  1   2   >