[Bug target/58790] [missed optimization] reduction of masks of builtin vectors not transformed to ptest or movemask instructions

2021-06-08 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58790 --- Comment #4 from Matthias Kretz (Vir) --- I'm still not familiar with this part of GCC, but isn't `_2 == { -1, -1, -1, -1 }` equivalent to _1, i.e. it reverses VEC_COND_EXPR? However, if the `==` is supposed to return a scalar boolean instead

[Bug c++/100716] member function template parameter should never be printed in candidate list and "T = T" should never be shown in substitutions

2021-05-27 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716 --- Comment #3 from Matthias Kretz (Vir) --- Created attachment 50877 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50877=edit proposed patch Ensure dump_template_decl for function templates never prints template parameters after the

[Bug c++/100763] Diagnostics of type alias is missing scope

2021-05-27 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100763 --- Comment #1 from Matthias Kretz (Vir) --- Created attachment 50876 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50876=edit proposed patch dump_type on 'const std::string' should not print 'const string' unless TFF_UNQUALIFIED_NAME

[Bug c++/100763] New: Diagnostics of type alias is missing scope

2021-05-26 Thread kretz at kde dot org via Gcc-bugs
Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- namespace A { struct B {}; using C = B; } void f(A::B&); void f(A::C&); void g(const A::B& b, const A::C& c) { f(b);

[Bug c++/100716] member function template parameter not printed in candidate list

2021-05-25 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716 --- Comment #2 from Matthias Kretz (Vir) --- I'd like to revise my opinion above. dump_template_decl should never print the template parameter list of functions. I.e. it should be 'template f()' not 'template f()'. Because it's also declared

[Bug c++/100716] member function template parameter not printed in candidate list

2021-05-21 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716 --- Comment #1 from Matthias Kretz (Vir) --- With -fno-pretty-templates both test cases do print the template_parms. That's because in dump_function_decl, without flag_pretty_templates, t isn't generalized and thus is not considered a primary

[Bug c++/100716] New: member function template parameter not printed in candidate list

2021-05-21 Thread kretz at kde dot org via Gcc-bugs
Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org CC: paolo.carlini at oracle dot com Target Milestone: --- template struct A { template void f

[Bug tree-optimization/99728] code pessimization when using wrapper classes around SIMD types

2021-03-24 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 --- Comment #10 from Matthias Kretz (Vir) --- Is this the same issue: struct A { double v; }; struct B { double v; B& operator=(const B& rhs) { v = rhs.v; return *this; } }; // 10 loads & stores void f(A& a, const A& b) {

[Bug c++/99728] code pessimization when using wrapper classes around SIMD types

2021-03-23 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 --- Comment #6 from Matthias Kretz (Vir) --- > I guess I need it for unaligned loads/stores, correct? Otherwise __v4df > should work everywhere. 1. You can freely reinterpret_cast by value between all the different [[gnu::vector_size(N)]]

[Bug c++/99728] code pessimization when using wrapper classes around SIMD types

2021-03-23 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 --- Comment #4 from Matthias Kretz (Vir) --- FWIW, using std::experimental::native_simd also does not hoist the stores out of the loop. However, if you pass d by value and return d, the issue goes away. So I guess this is an aliasing

[Bug c++/99201] [8/9/10/11 Regression] ICE in tsubst_copy, at cp/pt.c:16581

2021-02-23 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201 --- Comment #5 from Matthias Kretz (Vir) --- I reduced it some more: template auto make_tester(const RefF& reffun) { return [=](auto in) { auto&& expected = [&](const auto&... vs) { if constexpr (sizeof(in) > 0)

[Bug c++/99201] [8/9/10/11 Regression] ICE in tsubst_copy, at cp/pt.c:16581

2021-02-22 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201 --- Comment #4 from Matthias Kretz (Vir) --- Manual reduction which fails with 8-11 and compiles ok with 7: template void test_values_2arg(F&&... fun_pack) { (fun_pack(V(), V()), ...); } template auto make_tester(const TestF&

[Bug c++/99201] [8/9/10/11 Regression] ICE in tsubst_copy, at cp/pt.c:16581

2021-02-22 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201 --- Comment #3 from Matthias Kretz (Vir) --- I'll try to find a better reduction.

[Bug c++/99201] New: ICE in tsubst_copy, at cp/pt.c:16581

2021-02-22 Thread kretz at kde dot org via Gcc-bugs
++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Testcase (reduced with C-Vise from valid code): template void test_values_2arg(int, int, F... fun_pack) { [] {}(fun_pack()...); } template auto make_tester(TestF, RefF) { return [](auto

[Bug target/98894] New test case experimental/simd/standard_abi_usable.cc in r11-6935 fails on power 7

2021-01-29 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98894 --- Comment #1 from Matthias Kretz (Vir) --- I already posted a fix on the gcc-patches and libstdc++ lists: libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h: Remove unnecessary static assertion. Allow sizeof(8) integer

[Bug ipa/98834] [10/11 Regression] Code path incorrectly determined to be unreachable

2021-01-26 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98834 --- Comment #3 from Matthias Kretz (Vir) --- Created attachment 50055 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50055=edit unreduced test case This is the test case I gave to C-Vise. It's already reduced from a more confusing test,

[Bug ipa/98834] [10/11 Regression] Code path incorrectly determined to be unreachable

2021-01-26 Thread kretz at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98834 --- Comment #2 from Matthias Kretz (Vir) --- This is reduced from a larger (4MB) testcase which doesn't have any unused arguments.

[Bug tree-optimization/98834] New: Code path incorrectly determined to be unreachable

2021-01-26 Thread kretz at kde dot org via Gcc-bugs
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-pc-linux-gnu Created attachment 50054 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50054=edit test case The attached testc

[Bug libstdc++/84949] -ffast-math bugged with respect to NaNs

2020-09-18 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 --- Comment #8 from Matthias Kretz (Vir) --- I've been doing a lot of research into the numeric_limits intent/meaning recently. I also implemented and used alternative interpretations of "has NaN" and "is IEC559". My conclusion:

[Bug target/96600] __builtin_modfl returns incorrect value

2020-08-13 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96600 --- Comment #3 from Matthias Kretz (Vir) --- I should be more precise. Take this test case: int e = 69; int main() { __ibm128 a = -__builtin_ldexpl( 1.9446689187403240306919491832695730985733566864714824565497322973045558e+00l, e);

[Bug target/96600] __builtin_modfl returns incorrect value

2020-08-13 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96600 --- Comment #2 from Matthias Kretz (Vir) --- The runtime modf actually returns a large number. This is not about precision but about completely bogus values. You can adjust the testcase to: int e = 69;

[Bug target/96600] New: __builtin_modfl returns incorrect value

2020-08-13 Thread kretz at kde dot org
Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: powerpc64le-*-* Test case: int e = 69; int main

[Bug rtl-optimization/95493] [10 Regression] test for vector members apparently reordered with assignment to vector members since r10-7523-gb90061c6ec090c6b

2020-06-19 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95493 --- Comment #10 from Matthias Kretz (Vir) --- (In reply to Richard Biener from comment #7) > Fixed on trunk sofar. Is there anything I can help to get this backported to 10? I applied your patch on my GCC 10 checkout since you committed it to

[Bug target/95713] [10/11 Regression] ICE in emit_move_insn when converting int2 vector to short2 vector for -march=skylake-avx512 since r10-5031-g78307657cf9675bc

2020-06-18 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95713 --- Comment #6 from Matthias Kretz (Vir) --- Thank you! I applied the patch (with the necessary context) to the GCC 10 branch and was able to verify that it also fixes my unreduced test cases.

[Bug rtl-optimization/95713] New: [10/11 Regression] ICE in emit_move_insn when converting int2 vector to short2 vector for -march=skylake-avx512 -m32

2020-06-16 Thread kretz at kde dot org
: 10.1.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target

[Bug tree-optimization/95493] New: [10 Regression] test for vector members apparently reordered with assignment to vector members

2020-06-03 Thread kretz at kde dot org
Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (https://godbolt.org/z/egnkd7), compile with `-O2 -std=c

[Bug c/38470] value range propagation (VRP) would improve -Wsign-compare

2020-05-06 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38470 --- Comment #22 from Matthias Kretz (Vir) --- (In reply to Matthias Kretz (Vir) from comment #21) > However, -O2 would still show the warning. I meant -O0 of course.

[Bug c/38470] value range propagation (VRP) would improve -Wsign-compare

2020-05-06 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38470 Matthias Kretz (Vir) changed: What|Removed |Added CC||kretz at kde dot org --- Comment

[Bug target/94413] New: auto-vectorization of isfinite raises FP exception

2020-03-30 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (`-O3`, cf. https://godbolt.org/z/jdfv3r): #include #include #include using f4 [[gnu

[Bug target/94343] [10 Regression] invalid AVX512VL vpternlogd instruction emitted for -march=knl

2020-03-26 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94343 --- Comment #9 from Matthias Kretz (Vir) --- (In reply to Jakub Jelinek from comment #8) > Created attachment 48128 [details] > gcc10-pr94343.patch The avx512vl-pr94343.c test should ideally fail because `_mm_andnot_si128 ((__m128i) (~v ^ a),

[Bug target/94343] New: [10 Regression] invalid AVX512VL vpternlogd instruction emitted for -march=knl

2020-03-26 Thread kretz at kde dot org
: missed-optimization, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: i386,x86-64 Test case (`-O1 -march=knl`, cf. https

[Bug tree-optimization/94300] New: [10 Regression] memcpy vector load miscompiled during const-prop

2020-03-24 Thread kretz at kde dot org
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case `-O1 -march=skylake-avx512`: int main

[Bug target/90993] simd integer division not optimized

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90993 --- Comment #3 from Matthias Kretz (Vir) --- IIUC, AVX512 only allows overriding the rounding-mode from div instructions. So that wouldn't help. What standard requires that "integer division is not permitted to raise the "inexact" exception

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919 Matthias Kretz (Vir) changed: What|Removed |Added Resolution|FIXED |DUPLICATE --- Comment #6 from

[Bug tree-optimization/93843] [10 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843 --- Comment #11 from Matthias Kretz (Vir) --- *** Bug 93919 has been marked as a duplicate of this bug. ***

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919 Matthias Kretz (Vir) changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug tree-optimization/93843] [10 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-02-25 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843 --- Comment #7 from Matthias Kretz (Vir) --- This one exhibits the issue without -ftree-vectorize (`-O1` suffices) (cf. https://godbolt.org/z/Swx-jW): using M [[gnu::vector_size(2)]] = char; using MM [[gnu::vector_size(4)]] = short; MM cvt(M

[Bug tree-optimization/93843] [10 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-02-25 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843 Matthias Kretz (Vir) changed: What|Removed |Added CC||kretz at kde dot org --- Comment

[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-25 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919 --- Comment #4 from Matthias Kretz (Vir) --- Yes, this is the same issue. FWIW, a vectorization with SSE4.1 could do: pxor xmm0, xmm0 pinsrw xmm0, WORD PTR in[rip], 0 pmovsxbw xmm0, xmm0 movd DWORD PTR out[rip], xmm0 Whether that's

[Bug tree-optimization/93919] New: [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

2020-02-25 Thread kretz at kde dot org
: wrong-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://godbolt.org/z/8QYarZ

[Bug target/93828] New: [10 Regression] incorrect shufps instruction emitted for -march=k8

2020-02-19 Thread kretz at kde dot org
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-* Test case (https://godbolt.org/z/ramAe3): using float2 [[gnu::vector_size(8

[Bug tree-optimization/93780] New: [10 Regression] ICE in SET_TYPE_VECTOR_SUBPARTS

2020-02-16 Thread kretz at kde dot org
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://godbolt.org/z/ic8eXp): #include using V [[gnu::vector_size(32

[Bug target/45414] _mm_prefetch parameter is "char const *" in ICC

2020-02-14 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45414 Matthias Kretz (Vir) changed: What|Removed |Added Keywords||rejects-valid --- Comment #2

[Bug c++/93729] New: [concepts] binding bit-field to lvalue reference in requires expression should be SFINAE

2020-02-13 Thread kretz at kde dot org
Keywords: rejects-valid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (https://godbolt.org/z/VaSCCA): template concept foo = requires(T&am

[Bug c++/93698] New: ICE on concept using generic lambda

2020-02-12 Thread kretz at kde dot org
: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (-std=c++2a): template concept foo = [](auto) constexpr -> bool { return true; }(N); bool a = foo<2>; Extended test case (u

[Bug c++/93549] New: [10 Regression] ICE / Segfault in constexpr expansion involving vector_size(16) short COND_EXPR

2020-02-03 Thread kretz at kde dot org
Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (https://godbolt.org/z/_ErsXE): struct simd { using _Short8

[Bug c++/93530] New: [10 Regression] ICE on invalid alignas

2020-01-31 Thread kretz at kde dot org
Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (no flags required): template struct a { using b = a; void c() alignas(b::d); }; This test case fell out of creduce while trying

[Bug c++/89357] [8 regression][C++11] alignas for automatic variables with alignment greater than 16 fails

2020-01-30 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357 --- Comment #10 from Matthias Kretz (Vir) --- (In reply to Jason Merrill from comment #9) > Fixed for GCC 9.3/10. The patch doesn't apply cleanly to the GCC 8 branch, > is it important to fix there? Not important for me. Thank you for

[Bug target/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2020-01-23 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 --- Comment #6 from Matthias Kretz (Vir) --- FWIW, I'd prefer gnu::vector_size(N) to not introduce any additional UB over the scalar arithmetic types. I.e. behave like if promotion would happen, just with final assignment back to T (truncation).

[Bug target/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2020-01-23 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 --- Comment #4 from Matthias Kretz (Vir) --- Good point. Since gnu::vector_size(N) types are defined by you, you might be able to say that for char and short this is also UB. After all the left operand isn't actually promoted to int.

[Bug target/93172] New: with AVX512 masked mov assigning zero can use {z}

2020-01-06 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/DMQf9-): #include // missed optimization: __m512 f

[Bug target/93133] New: __builtin_isgreater emits trapping compare instruction

2020-01-02 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: aarch64-*-* Compile the following test case for aarch64 with -O2: #include int f(float x, float y) { std

[Bug target/91861] New: invalid vectorization of isless, islessequal, etc.

2019-09-23 Thread kretz at kde dot org
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (cf. https://godbolt.org/z/z3TH9F): #include using V [[gnu

[Bug target/91841] vector_size(8) passes MMX register without emms cleanup

2019-09-20 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91841 --- Comment #4 from Matthias Kretz --- (In reply to Uroš Bizjak from comment #3) > [f]emms should be emitted by an intrinsic (_mm_empty), inserted by the > programmer. The programmer can mix FP and MMX instructions in the same > function, so

[Bug target/91841] vector_size(8) passes MMX register without emms cleanup

2019-09-20 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91841 --- Comment #2 from Matthias Kretz --- Ah, because of: typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__)); ? Too be pedantic only `int [[gnu::vector_size(8)]]` equals __m64. But I see your point. I guess clang interprets

[Bug target/91841] New: vector_size(8) passes MMX register without emms cleanup

2019-09-20 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: i?86-*-* Test case `g++ -O2 -m32` (cf. https://godbolt.org/z/RDUZo9): #include using T = unsigned short; using V [[gnu

[Bug target/91838] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2019-09-20 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 --- Comment #1 from Matthias Kretz --- https://godbolt.org/z/zxmCTz

[Bug target/91838] New: incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2019-09-20 Thread kretz at kde dot org
Keywords: missed-optimization, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-* Test case: using T = unsigned char

[Bug target/85482] unnecessary vmovaps/vmovapd/vmovdqa emitted

2019-09-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85482 --- Comment #3 from Matthias Kretz --- Seems like trunk (10.0.0 20190910) resolves the issue.

[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0

2019-09-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538 Matthias Kretz changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2019-09-04 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 --- Comment #5 from Matthias Kretz --- > So for #c3 you are essentially asking for a .rodata size optimization. Comment #1 also does so, no? But yes, this is a .rodata optimization and thus potentially a visible reduction on cache pressure.

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2019-09-03 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 Matthias Kretz changed: What|Removed |Added CC||kretz at kde dot org --- Comment #3

[Bug target/91533] New: abs pattern generates MMX instructions but fails to call EMMS

2019-08-23 Thread kretz at kde dot org
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (cf. https://godbolt.org/z/IfL1mF): using V [[gnu

[Bug target/91142] New: Incorrect aligned vector load instruction emitted because of vinserti32x4 elision

2019-07-11 Thread kretz at kde dot org
Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/xBEtqT

[Bug target/90993] New: simd integer division not optimized

2019-06-25 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://godbolt.org/z/CYipz7): template using V [[gnu::vector_size(16)]] = T; V f(V a, V b

[Bug target/88918] [meta-bug] x86 intrinsic issues

2019-05-22 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88918 Bug 88918 depends on bug 56253, which changed state. Bug 56253 Summary: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253 What|Removed

[Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)

2019-05-22 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253 Matthias Kretz changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug c++/88752] [8 Regression] ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-05-17 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752 Matthias Kretz changed: What|Removed |Added Known to work||7.4.0, 9.1.0 Known to fail|

[Bug target/58790] [missed optimization] reduction of masks of builtin vectors not transformed to ptest or movemask instructions

2019-05-16 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58790 Matthias Kretz changed: What|Removed |Added Version|4.9.0 |10.0 --- Comment #2 from Matthias

[Bug target/90483] input to ptest not optimized

2019-05-15 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90483 --- Comment #1 from Matthias Kretz --- https://godbolt.org/z/7BFMdG (for quick verification)

[Bug target/90487] New: optimize SSE & AVX char compares with subsequent movmskb [negation]

2019-05-15 Thread kretz at kde dot org
ssed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/7NiU7O): #inc

[Bug target/90483] New: input to ptest not optimized

2019-05-15 Thread kretz at kde dot org
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* The (V)PTEST instruction of SSE4.1/AVX produces ZF = `(a & b) == 0` and CF = `(~a & b) == 0`. Generic usage

[Bug tree-optimization/90460] Inefficient vector construction from pieces

2019-05-14 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90460 --- Comment #1 from Matthias Kretz --- PR85048 and PR77399 are related

[Bug target/90424] memcpy into vector builtin not optimized

2019-05-13 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424 --- Comment #2 from Matthias Kretz --- FWIW, I agree that "bit-inserting into a default-def" isn't a good idea. My code, in the meantime, looks more like this (https://godbolt.org/z/D-yfZJ): template using V [[gnu::vector_size(16)]] = T;

[Bug target/90424] New: memcpy into vector builtin not optimized

2019-05-10 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/LsKcii): template using V [[gnu::vector_size(16)]] = T; template

[Bug target/88152] optimize SSE & AVX char compares with subsequent movmskb

2019-05-09 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88152 Matthias Kretz changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug c++/90243] New: diagnostic notes that belong to a suppressed error about an uninitialized variable in a constexpr function are still shown

2019-04-25 Thread kretz at kde dot org
Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (https://godbolt.org/z/34KB20): struct Z { int y

[Bug libstdc++/88066] [7 Regression] Relative includes in bits/locale_conv.h should be prefixed

2019-03-28 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88066 Matthias Kretz changed: What|Removed |Added CC||kretz at kde dot org --- Comment #9

[Bug c++/89357] alignas for automatic variables with alignment greater than 16 fails

2019-02-19 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357 --- Comment #2 from Matthias Kretz --- I agree. The corresponding C test case produces equivalent f0 and f1: void g(int*); void f0() { __attribute__((aligned(128))) int x; g(); } void f1() { _Alignas(128) int x; g(); } And I agree

[Bug target/89357] New: alignas for automatic variables with alignment greater than 16 fails

2019-02-14 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: aarch64-*-*, arm-*-* Test case (cf. https://godbolt.org/z/ubJge4): void g(int &); auto f0() { __attribu

[Bug target/89224] New: subscript of NEON intrinsic discards const

2019-02-06 Thread kretz at kde dot org
: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (cf. https://godbolt.org/z/RFrftn): #include template void g(T &) { x = 1; } auto f(const __Int8x8_t ) { g(x[0]); //x[0] = 1; // ill-formed } decltype

[Bug target/89189] New: missed optimization for 16/8-bit vector shuffle

2019-02-04 Thread kretz at kde dot org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase `-O2 -msse2`, further missed optimization with SSSE3 / SSE4.1 (cf. https://godbolt.org/z

[Bug target/24073] (vector float){a, b, 0, 0} code gen is not good

2019-01-17 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24073 Matthias Kretz changed: What|Removed |Added CC||kretz at kde dot org --- Comment #8

[Bug tree-optimization/88854] redundant store after load that would makes aliasing UB

2019-01-15 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854 --- Comment #7 from Matthias Kretz --- (In reply to rguent...@suse.de from comment #5) > Yeah, we do not perform this kind of "flow-sensitive" TBAA. So > when trying to DSE *a = x; we only look at > > int x = *a; > *b = 1; > *a

[Bug tree-optimization/88854] redundant store after load that would makes aliasing UB

2019-01-15 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854 --- Comment #6 from Matthias Kretz --- Regarding gcc.dg/tree-ssa/ssa-pre-30.c I'd argue that for `bar`, GCC may assume b == 0, because otherwise f would be read both via int and float pointer, which is UB. So bar can be optimized to `foo`

[Bug tree-optimization/88854] redundant store after load that would makes aliasing UB

2019-01-15 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854 --- Comment #4 from Matthias Kretz --- Another test case, which the patch doesn't optimize: short f(int *a, short *b) { short y = *b; // 1 int x = *a; // 2 *b = 1; *a = x; return y; } The loads in 1+2 are either UB or a

[Bug tree-optimization/88854] New: redundant store after load that would makes aliasing UB

2019-01-15 Thread kretz at kde dot org
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org CC: rguenth at gcc dot gnu.org Target Milestone: --- Test cases: This is optimized at -O1 and with GCC 5 at -O2. -fdisable-tree-fre1 and -fno

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2019-01-14 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #10 from Matthias Kretz --- Experience from testing my simd implementation: I had failures (2 ULP deviation from long double result) when using auto __xx = abs(__x); auto __yy = abs(__y); auto __zz =

[Bug target/88808] New: bitwise operators on AVX512 masks fail to use the new mask instructions

2019-01-11 Thread kretz at kde dot org
-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://godbolt.org/z/gyCN12): #include

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 --- Comment #4 from Matthias Kretz --- A similar test case showing that something is still missing (https://gcc.godbolt.org/z/t1DT7E): #include inline __m128i cmp(__m128i x, __m128i y) { return _mm_cmpeq_epi16(x, y); } inline unsigned

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 Matthias Kretz changed: What|Removed |Added Version|8.0 |9.0 --- Comment #3 from Matthias Kretz

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2019-01-10 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #9 from Matthias Kretz --- (In reply to emsr from comment #7) > What does this do? > > auto __hi_exp = > __hi & simd<_T, _Abi>(std::numeric_limits<_T>::infinity()); // no error component-wise bitwise and of __hi and +inf. Or

[Bug target/88794] New: fixupimm intrinsics are unusable [9.0 regression]

2019-01-10 Thread kretz at kde dot org
: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case: ``` #include __m128 f(__m128 x, __m128 ) { y = _mm_fixupimm_ps(x, _mm_set1_epi32(0x), 0x00); return x

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2019-01-10 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #6 from Matthias Kretz --- (In reply to Marc Glisse from comment #4) > Your "reference" number seems strange. Why not do the computation with > double (or long double or mpfr) or use __builtin_hypotf? Note that it > changes the

[Bug rtl-optimization/88785] New: ICE in as_a, at machmode.h:353

2019-01-09 Thread kretz at kde dot org
Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Created attachment 45398 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45398=edit reduced test case Compile the attached test case with `-g -O2 -std=gnu++17 -march=skylake-avx

[Bug c++/88752] ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-01-09 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752 Matthias Kretz changed: What|Removed |Added Attachment #45376|0 |1 is obsolete|

[Bug c++/88752] ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-01-08 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752 Matthias Kretz changed: What|Removed |Added Attachment #45375|0 |1 is obsolete|

[Bug c++/88752] New: ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-01-08 Thread kretz at kde dot org
: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Created attachment 45375 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45375=edit not-reduced test case Compile attached test case with `-std=gnu++17 -march=skylake -m

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-07 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #12 from Matthias Kretz --- (In reply to Jakub Jelinek from comment #11) > [...] though for 8x conversions we > are e.g. on x86 already outside of the realm of natively supported vectors > (we don't really want MMX and for 1024 bit

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-05 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #9 from Matthias Kretz --- (In reply to Devin Hussey from comment #7) > Wait, silly me, this isn't about optimizations, this is about patterns. Regarding optimizations, PR85048 is a first step (it lists all x86 single-instruction

  1   2   3   >