[Bug target/58790] [missed optimization] reduction of masks of builtin vectors not transformed to ptest or movemask instructions

2021-06-08 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58790 --- Comment #4 from Matthias Kretz (Vir) --- I'm still not familiar with this part of GCC, but isn't `_2 == { -1, -1, -1, -1 }` equivalent to _1, i.e. it reverses VEC_COND_EXPR? However, if the `==` is supposed to return a scalar boolean instead

[Bug c++/100716] member function template parameter should never be printed in candidate list and "T = T" should never be shown in substitutions

2021-05-26 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716 --- Comment #3 from Matthias Kretz (Vir) --- Created attachment 50877 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50877&action=edit proposed patch Ensure dump_template_decl for function templates never prints template parameters after

[Bug c++/100763] Diagnostics of type alias is missing scope

2021-05-26 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100763 --- Comment #1 from Matthias Kretz (Vir) --- Created attachment 50876 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50876&action=edit proposed patch dump_type on 'const std::string' should not print 'const string' unless TFF_UNQUALIFIED_

[Bug c++/100763] New: Diagnostics of type alias is missing scope

2021-05-26 Thread kretz at kde dot org via Gcc-bugs
Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- namespace A { struct B {}; using C = B; } void f(A::B&); void f(A::C&); void g(const A::B& b, const A::C& c) { f(b);

[Bug c++/100716] member function template parameter not printed in candidate list

2021-05-25 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716 --- Comment #2 from Matthias Kretz (Vir) --- I'd like to revise my opinion above. dump_template_decl should never print the template parameter list of functions. I.e. it should be 'template f()' not 'template f()'. Because it's also declared wit

[Bug c++/100716] member function template parameter not printed in candidate list

2021-05-21 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716 --- Comment #1 from Matthias Kretz (Vir) --- With -fno-pretty-templates both test cases do print the template_parms. That's because in dump_function_decl, without flag_pretty_templates, t isn't generalized and thus is not considered a primary t

[Bug c++/100716] New: member function template parameter not printed in candidate list

2021-05-21 Thread kretz at kde dot org via Gcc-bugs
Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org CC: paolo.carlini at oracle dot com Target Milestone: --- template struct A { template void f

[Bug tree-optimization/99728] code pessimization when using wrapper classes around SIMD types

2021-03-24 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 --- Comment #10 from Matthias Kretz (Vir) --- Is this the same issue: struct A { double v; }; struct B { double v; B& operator=(const B& rhs) { v = rhs.v; return *this; } }; // 10 loads & stores void f(A& a, const A& b) {

[Bug c++/99728] code pessimization when using wrapper classes around SIMD types

2021-03-23 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 --- Comment #6 from Matthias Kretz (Vir) --- > I guess I need it for unaligned loads/stores, correct? Otherwise __v4df > should work everywhere. 1. You can freely reinterpret_cast by value between all the different [[gnu::vector_size(N)]] types

[Bug c++/99728] code pessimization when using wrapper classes around SIMD types

2021-03-23 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 --- Comment #4 from Matthias Kretz (Vir) --- FWIW, using std::experimental::native_simd also does not hoist the stores out of the loop. However, if you pass d by value and return d, the issue goes away. So I guess this is an aliasing pessimizatio

[Bug c++/99201] [8/9/10/11 Regression] ICE in tsubst_copy, at cp/pt.c:16581

2021-02-23 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201 --- Comment #5 from Matthias Kretz (Vir) --- I reduced it some more: template auto make_tester(const RefF& reffun) { return [=](auto in) { auto&& expected = [&](const auto&... vs) { if constexpr (sizeof(in) > 0)

[Bug c++/99201] [8/9/10/11 Regression] ICE in tsubst_copy, at cp/pt.c:16581

2021-02-22 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201 --- Comment #4 from Matthias Kretz (Vir) --- Manual reduction which fails with 8-11 and compiles ok with 7: template void test_values_2arg(F&&... fun_pack) { (fun_pack(V(), V()), ...); } template auto make_tester(const TestF&

[Bug c++/99201] [8/9/10/11 Regression] ICE in tsubst_copy, at cp/pt.c:16581

2021-02-22 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201 --- Comment #3 from Matthias Kretz (Vir) --- I'll try to find a better reduction.

[Bug c++/99201] New: ICE in tsubst_copy, at cp/pt.c:16581

2021-02-22 Thread kretz at kde dot org via Gcc-bugs
++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Testcase (reduced with C-Vise from valid code): template void test_values_2arg(int, int, F... fun_pack) { [] {}(fun_pack()...); } template auto make_tester(TestF, RefF) { return [](auto

[Bug target/98894] New test case experimental/simd/standard_abi_usable.cc in r11-6935 fails on power 7

2021-01-29 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98894 --- Comment #1 from Matthias Kretz (Vir) --- I already posted a fix on the gcc-patches and libstdc++ lists: libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h: Remove unnecessary static assertion. Allow sizeof(8) integer

[Bug ipa/98834] [10/11 Regression] Code path incorrectly determined to be unreachable

2021-01-26 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98834 --- Comment #3 from Matthias Kretz (Vir) --- Created attachment 50055 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50055&action=edit unreduced test case This is the test case I gave to C-Vise. It's already reduced from a more confusing t

[Bug ipa/98834] [10/11 Regression] Code path incorrectly determined to be unreachable

2021-01-26 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98834 --- Comment #2 from Matthias Kretz (Vir) --- This is reduced from a larger (4MB) testcase which doesn't have any unused arguments.

[Bug tree-optimization/98834] New: Code path incorrectly determined to be unreachable

2021-01-26 Thread kretz at kde dot org via Gcc-bugs
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-pc-linux-gnu Created attachment 50054 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50054&action=edit test case The a

[Bug libstdc++/84949] -ffast-math bugged with respect to NaNs

2020-09-18 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 --- Comment #8 from Matthias Kretz (Vir) --- I've been doing a lot of research into the numeric_limits intent/meaning recently. I also implemented and used alternative interpretations of "has NaN" and "is IEC559". My conclusion: std::numeric_limi

[Bug target/96600] __builtin_modfl returns incorrect value

2020-08-13 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96600 --- Comment #3 from Matthias Kretz (Vir) --- I should be more precise. Take this test case: int e = 69; int main() { __ibm128 a = -__builtin_ldexpl( 1.9446689187403240306919491832695730985733566864714824565497322973045558e+00l, e); __i

[Bug target/96600] __builtin_modfl returns incorrect value

2020-08-13 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96600 --- Comment #2 from Matthias Kretz (Vir) --- The runtime modf actually returns a large number. This is not about precision but about completely bogus values. You can adjust the testcase to: int e = 69;

[Bug target/96600] New: __builtin_modfl returns incorrect value

2020-08-13 Thread kretz at kde dot org
Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: powerpc64le-*-* Test case: int e = 69; int main

[Bug rtl-optimization/95493] [10 Regression] test for vector members apparently reordered with assignment to vector members since r10-7523-gb90061c6ec090c6b

2020-06-19 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95493 --- Comment #10 from Matthias Kretz (Vir) --- (In reply to Richard Biener from comment #7) > Fixed on trunk sofar. Is there anything I can help to get this backported to 10? I applied your patch on my GCC 10 checkout since you committed it to ma

[Bug target/95713] [10/11 Regression] ICE in emit_move_insn when converting int2 vector to short2 vector for -march=skylake-avx512 since r10-5031-g78307657cf9675bc

2020-06-18 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95713 --- Comment #6 from Matthias Kretz (Vir) --- Thank you! I applied the patch (with the necessary context) to the GCC 10 branch and was able to verify that it also fixes my unreduced test cases.

[Bug rtl-optimization/95713] New: [10/11 Regression] ICE in emit_move_insn when converting int2 vector to short2 vector for -march=skylake-avx512 -m32

2020-06-16 Thread kretz at kde dot org
: 10.1.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target

[Bug tree-optimization/95493] New: [10 Regression] test for vector members apparently reordered with assignment to vector members

2020-06-03 Thread kretz at kde dot org
Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (https://godbolt.org/z/egnkd7), compile with `-O2 -std=c

[Bug c/38470] value range propagation (VRP) would improve -Wsign-compare

2020-05-06 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38470 --- Comment #22 from Matthias Kretz (Vir) --- (In reply to Matthias Kretz (Vir) from comment #21) > However, -O2 would still show the warning. I meant -O0 of course.

[Bug c/38470] value range propagation (VRP) would improve -Wsign-compare

2020-05-06 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38470 Matthias Kretz (Vir) changed: What|Removed |Added CC||kretz at kde dot org --- Comment

[Bug target/94413] New: auto-vectorization of isfinite raises FP exception

2020-03-30 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (`-O3`, cf. https://godbolt.org/z/jdfv3r): #include #include #include using f4 [[gnu

[Bug target/94343] [10 Regression] invalid AVX512VL vpternlogd instruction emitted for -march=knl

2020-03-26 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94343 --- Comment #9 from Matthias Kretz (Vir) --- (In reply to Jakub Jelinek from comment #8) > Created attachment 48128 [details] > gcc10-pr94343.patch The avx512vl-pr94343.c test should ideally fail because `_mm_andnot_si128 ((__m128i) (~v ^ a), (_

[Bug target/94343] New: [10 Regression] invalid AVX512VL vpternlogd instruction emitted for -march=knl

2020-03-26 Thread kretz at kde dot org
: missed-optimization, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: i386,x86-64 Test case (`-O1 -march=knl`, cf. https

[Bug tree-optimization/94300] New: [10 Regression] memcpy vector load miscompiled during const-prop

2020-03-24 Thread kretz at kde dot org
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case `-O1 -march=skylake-avx512`: int main

[Bug target/90993] simd integer division not optimized

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90993 --- Comment #3 from Matthias Kretz (Vir) --- IIUC, AVX512 only allows overriding the rounding-mode from div instructions. So that wouldn't help. What standard requires that "integer division is not permitted to raise the "inexact" exception flag

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919 Matthias Kretz (Vir) changed: What|Removed |Added Resolution|FIXED |DUPLICATE --- Comment #6 from Mat

[Bug tree-optimization/93843] [10 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843 --- Comment #11 from Matthias Kretz (Vir) --- *** Bug 93919 has been marked as a duplicate of this bug. ***

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919 Matthias Kretz (Vir) changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug tree-optimization/93843] [10 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-02-25 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843 --- Comment #7 from Matthias Kretz (Vir) --- This one exhibits the issue without -ftree-vectorize (`-O1` suffices) (cf. https://godbolt.org/z/Swx-jW): using M [[gnu::vector_size(2)]] = char; using MM [[gnu::vector_size(4)]] = short; MM cvt(M x)

[Bug tree-optimization/93843] [10 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-02-25 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843 Matthias Kretz (Vir) changed: What|Removed |Added CC||kretz at kde dot org --- Comment

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-25 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919 --- Comment #4 from Matthias Kretz (Vir) --- Yes, this is the same issue. FWIW, a vectorization with SSE4.1 could do: pxor xmm0, xmm0 pinsrw xmm0, WORD PTR in[rip], 0 pmovsxbw xmm0, xmm0 movd DWORD PTR out[rip], xmm0 Whether that's fast

[Bug tree-optimization/93919] New: [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-25 Thread kretz at kde dot org
: wrong-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://godbolt.org/z/8QYarZ

[Bug target/93828] New: [10 Regression] incorrect shufps instruction emitted for -march=k8

2020-02-19 Thread kretz at kde dot org
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-* Test case (https://godbolt.org/z/ramAe3): using float2 [[gnu::vector_size(8

[Bug tree-optimization/93780] New: [10 Regression] ICE in SET_TYPE_VECTOR_SUBPARTS

2020-02-16 Thread kretz at kde dot org
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://godbolt.org/z/ic8eXp): #include using V [[gnu::vector_size(32

[Bug target/45414] _mm_prefetch parameter is "char const *" in ICC

2020-02-14 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45414 Matthias Kretz (Vir) changed: What|Removed |Added Keywords||rejects-valid --- Comment #2 from

[Bug c++/93729] New: [concepts] binding bit-field to lvalue reference in requires expression should be SFINAE

2020-02-13 Thread kretz at kde dot org
Keywords: rejects-valid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (https://godbolt.org/z/VaSCCA): template concept foo = requires(T&am

[Bug c++/93698] New: ICE on concept using generic lambda

2020-02-12 Thread kretz at kde dot org
: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (-std=c++2a): template concept foo = [](auto) constexpr -> bool { return true; }(N); bool a = foo<2>; Extended test case (use

[Bug c++/93549] New: [10 Regression] ICE / Segfault in constexpr expansion involving vector_size(16) short COND_EXPR

2020-02-03 Thread kretz at kde dot org
Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (https://godbolt.org/z/_ErsXE): struct simd { using _Short8

[Bug c++/93530] New: [10 Regression] ICE on invalid alignas

2020-01-31 Thread kretz at kde dot org
Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (no flags required): template struct a { using b = a; void c() alignas(b::d); }; This test case fell out of creduce while trying to

[Bug c++/89357] [8 regression][C++11] alignas for automatic variables with alignment greater than 16 fails

2020-01-30 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357 --- Comment #10 from Matthias Kretz (Vir) --- (In reply to Jason Merrill from comment #9) > Fixed for GCC 9.3/10. The patch doesn't apply cleanly to the GCC 8 branch, > is it important to fix there? Not important for me. Thank you for resolvin

[Bug target/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2020-01-23 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 --- Comment #6 from Matthias Kretz (Vir) --- FWIW, I'd prefer gnu::vector_size(N) to not introduce any additional UB over the scalar arithmetic types. I.e. behave like if promotion would happen, just with final assignment back to T (truncation).

[Bug target/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2020-01-23 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 --- Comment #4 from Matthias Kretz (Vir) --- Good point. Since gnu::vector_size(N) types are defined by you, you might be able to say that for char and short this is also UB. After all the left operand isn't actually promoted to int. Consequently

[Bug target/93172] New: with AVX512 masked mov assigning zero can use {z}

2020-01-06 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/DMQf9-): #include // missed optimization: __m512 f

[Bug target/93133] New: __builtin_isgreater emits trapping compare instruction

2020-01-02 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: aarch64-*-* Compile the following test case for aarch64 with -O2: #include int f(float x, float y) { std

[Bug target/91861] New: invalid vectorization of isless, islessequal, etc.

2019-09-23 Thread kretz at kde dot org
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (cf. https://godbolt.org/z/z3TH9F): #include using V [[gnu

[Bug target/91841] vector_size(8) passes MMX register without emms cleanup

2019-09-20 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91841 --- Comment #4 from Matthias Kretz --- (In reply to Uroš Bizjak from comment #3) > [f]emms should be emitted by an intrinsic (_mm_empty), inserted by the > programmer. The programmer can mix FP and MMX instructions in the same > function, so ther

[Bug target/91841] vector_size(8) passes MMX register without emms cleanup

2019-09-20 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91841 --- Comment #2 from Matthias Kretz --- Ah, because of: typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__)); ? Too be pedantic only `int [[gnu::vector_size(8)]]` equals __m64. But I see your point. I guess clang interprets th

[Bug target/91841] New: vector_size(8) passes MMX register without emms cleanup

2019-09-20 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: i?86-*-* Test case `g++ -O2 -m32` (cf. https://godbolt.org/z/RDUZo9): #include using T = unsigned short; using V [[gnu

[Bug target/91838] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2019-09-20 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 --- Comment #1 from Matthias Kretz --- https://godbolt.org/z/zxmCTz

[Bug target/91838] New: incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2019-09-20 Thread kretz at kde dot org
Keywords: missed-optimization, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-* Test case: using T = unsigned char

[Bug target/85482] unnecessary vmovaps/vmovapd/vmovdqa emitted

2019-09-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85482 --- Comment #3 from Matthias Kretz --- Seems like trunk (10.0.0 20190910) resolves the issue.

[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0

2019-09-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538 Matthias Kretz changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2019-09-04 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 --- Comment #5 from Matthias Kretz --- > So for #c3 you are essentially asking for a .rodata size optimization. Comment #1 also does so, no? But yes, this is a .rodata optimization and thus potentially a visible reduction on cache pressure. Cons

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2019-09-03 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 Matthias Kretz changed: What|Removed |Added CC||kretz at kde dot org --- Comment #3

[Bug target/91533] New: abs pattern generates MMX instructions but fails to call EMMS

2019-08-23 Thread kretz at kde dot org
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (cf. https://godbolt.org/z/IfL1mF): using V [[gnu

[Bug target/91142] New: Incorrect aligned vector load instruction emitted because of vinserti32x4 elision

2019-07-11 Thread kretz at kde dot org
Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/xBEtqT

[Bug target/90993] New: simd integer division not optimized

2019-06-25 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://godbolt.org/z/CYipz7): template using V [[gnu::vector_size(16)]] = T; V f(V a, V b

[Bug target/88918] [meta-bug] x86 intrinsic issues

2019-05-22 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88918 Bug 88918 depends on bug 56253, which changed state. Bug 56253 Summary: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253 What|Removed |A

[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)

2019-05-22 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253 Matthias Kretz changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug c++/88752] [8 Regression] ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-05-17 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752 Matthias Kretz changed: What|Removed |Added Known to work||7.4.0, 9.1.0 Known to fail|

[Bug target/58790] [missed optimization] reduction of masks of builtin vectors not transformed to ptest or movemask instructions

2019-05-16 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58790 Matthias Kretz changed: What|Removed |Added Version|4.9.0 |10.0 --- Comment #2 from Matthias Kretz

[Bug target/90483] input to ptest not optimized

2019-05-15 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90483 --- Comment #1 from Matthias Kretz --- https://godbolt.org/z/7BFMdG (for quick verification)

[Bug target/90487] New: optimize SSE & AVX char compares with subsequent movmskb [negation]

2019-05-15 Thread kretz at kde dot org
ssed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/7NiU7O): #inc

[Bug target/90483] New: input to ptest not optimized

2019-05-15 Thread kretz at kde dot org
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* The (V)PTEST instruction of SSE4.1/AVX produces ZF = `(a & b) == 0` and CF = `(~a & b) == 0`. Generic usage

[Bug tree-optimization/90460] Inefficient vector construction from pieces

2019-05-14 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90460 --- Comment #1 from Matthias Kretz --- PR85048 and PR77399 are related

[Bug target/90424] memcpy into vector builtin not optimized

2019-05-13 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424 --- Comment #2 from Matthias Kretz --- FWIW, I agree that "bit-inserting into a default-def" isn't a good idea. My code, in the meantime, looks more like this (https://godbolt.org/z/D-yfZJ): template using V [[gnu::vector_size(16)]] = T; templ

[Bug target/90424] New: memcpy into vector builtin not optimized

2019-05-10 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/LsKcii): template using V [[gnu::vector_size(16)]] = T; template

[Bug target/88152] optimize SSE & AVX char compares with subsequent movmskb

2019-05-09 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88152 Matthias Kretz changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug c++/90243] New: diagnostic notes that belong to a suppressed error about an uninitialized variable in a constexpr function are still shown

2019-04-25 Thread kretz at kde dot org
Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (https://godbolt.org/z/34KB20): struct Z { int y

[Bug libstdc++/88066] [7 Regression] Relative includes in bits/locale_conv.h should be prefixed

2019-03-28 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88066 Matthias Kretz changed: What|Removed |Added CC||kretz at kde dot org --- Comment #9

[Bug c++/89357] alignas for automatic variables with alignment greater than 16 fails

2019-02-19 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357 --- Comment #2 from Matthias Kretz --- I agree. The corresponding C test case produces equivalent f0 and f1: void g(int*); void f0() { __attribute__((aligned(128))) int x; g(&x); } void f1() { _Alignas(128) int x; g(&x); } And I agree

[Bug target/89357] New: alignas for automatic variables with alignment greater than 16 fails

2019-02-14 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: aarch64-*-*, arm-*-* Test case (cf. https://godbolt.org/z/ubJge4): void g(int &); auto f0() { __attribu

[Bug target/89224] New: subscript of NEON intrinsic discards const

2019-02-06 Thread kretz at kde dot org
: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (cf. https://godbolt.org/z/RFrftn): #include template void g(T &&x) { x = 1; } auto f(const __Int8x8_t &x) { g(x[0]); //x[0] = 1; // ill-formed }

[Bug target/89189] New: missed optimization for 16/8-bit vector shuffle

2019-02-04 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase `-O2 -msse2`, further missed optimization with SSSE3 / SSE4.1 (cf. https://godbolt.org/z

[Bug target/24073] (vector float){a, b, 0, 0} code gen is not good

2019-01-17 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24073 Matthias Kretz changed: What|Removed |Added CC||kretz at kde dot org --- Comment #8

[Bug tree-optimization/88854] redundant store after load that would makes aliasing UB

2019-01-15 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854 --- Comment #7 from Matthias Kretz --- (In reply to rguent...@suse.de from comment #5) > Yeah, we do not perform this kind of "flow-sensitive" TBAA. So > when trying to DSE *a = x; we only look at > > int x = *a; > *b = 1; > *a =

[Bug tree-optimization/88854] redundant store after load that would makes aliasing UB

2019-01-15 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854 --- Comment #6 from Matthias Kretz --- Regarding gcc.dg/tree-ssa/ssa-pre-30.c I'd argue that for `bar`, GCC may assume b == 0, because otherwise f would be read both via int and float pointer, which is UB. So bar can be optimized to `foo` shows

[Bug tree-optimization/88854] redundant store after load that would makes aliasing UB

2019-01-15 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854 --- Comment #4 from Matthias Kretz --- Another test case, which the patch doesn't optimize: short f(int *a, short *b) { short y = *b; // 1 int x = *a; // 2 *b = 1; *a = x; return y; } The loads in 1+2 are either UB or a an

[Bug tree-optimization/88854] New: redundant store after load that would makes aliasing UB

2019-01-15 Thread kretz at kde dot org
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org CC: rguenth at gcc dot gnu.org Target Milestone: --- Test cases: This is optimized at -O1 and with GCC 5 at -O2. -fdisable-tree-fre1 and -fno

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2019-01-14 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #10 from Matthias Kretz --- Experience from testing my simd implementation: I had failures (2 ULP deviation from long double result) when using auto __xx = abs(__x); auto __yy = abs(__y); auto __zz = abs(__z

[Bug target/88808] New: bitwise operators on AVX512 masks fail to use the new mask instructions

2019-01-11 Thread kretz at kde dot org
-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://godbolt.org/z/gyCN12): #include

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 --- Comment #4 from Matthias Kretz --- A similar test case showing that something is still missing (https://gcc.godbolt.org/z/t1DT7E): #include inline __m128i cmp(__m128i x, __m128i y) { return _mm_cmpeq_epi16(x, y); } inline unsigned to_b

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 Matthias Kretz changed: What|Removed |Added Version|8.0 |9.0 --- Comment #3 from Matthias Kretz

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2019-01-10 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #9 from Matthias Kretz --- (In reply to emsr from comment #7) > What does this do? > > auto __hi_exp = > __hi & simd<_T, _Abi>(std::numeric_limits<_T>::infinity()); // no error component-wise bitwise and of __hi and +inf. Or i

[Bug target/88794] New: fixupimm intrinsics are unusable [9.0 regression]

2019-01-10 Thread kretz at kde dot org
: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case: ``` #include __m128 f(__m128 x, __m128 &y) { y = _mm_fixupimm_ps(x, _mm_set1_epi32(0x), 0x00); retu

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2019-01-10 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #6 from Matthias Kretz --- (In reply to Marc Glisse from comment #4) > Your "reference" number seems strange. Why not do the computation with > double (or long double or mpfr) or use __builtin_hypotf? Note that it > changes the value.

[Bug rtl-optimization/88785] New: ICE in as_a, at machmode.h:353

2019-01-09 Thread kretz at kde dot org
Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Created attachment 45398 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45398&action=edit reduced test case Compile the attached test case with `-g -O2 -std=gnu++17 -march=

[Bug c++/88752] ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-01-09 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752 Matthias Kretz changed: What|Removed |Added Attachment #45376|0 |1 is obsolete|

[Bug c++/88752] ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-01-08 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752 Matthias Kretz changed: What|Removed |Added Attachment #45375|0 |1 is obsolete|

[Bug c++/88752] New: ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-01-08 Thread kretz at kde dot org
: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Created attachment 45375 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45375&action=edit not-reduced test case Compile attached test case with `-std=gnu++17

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-07 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #12 from Matthias Kretz --- (In reply to Jakub Jelinek from comment #11) > [...] though for 8x conversions we > are e.g. on x86 already outside of the realm of natively supported vectors > (we don't really want MMX and for 1024 bit an

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-05 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #9 from Matthias Kretz --- (In reply to Devin Hussey from comment #7) > Wait, silly me, this isn't about optimizations, this is about patterns. Regarding optimizations, PR85048 is a first step (it lists all x86 single-instruction SIM

  1   2   3   >