[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=24928 --- Comment #8 from Andrew Pinski --- status of this bug: comment #0 first testcase: Fixed since GCC 9 comment #0 second testcase: still needs improvement comment #3 is now basically PR 24928
[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 --- Comment #7 from Uroš Bizjak --- See also PR55894.
[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 --- Comment #6 from Marc Glisse --- (In reply to Matthias Kretz from comment #4) > A similar test case showing that something is still missing You don't seem to be passing constants here, so this is unrelated to this PR. If you file a new one, please annotate your example explaining where you expect what to simplify to what and why. > (https://gcc.godbolt.org/z/t1DT7E): Adding -fdump-tree-optimized=- -g0 and showing the compiler output makes this more understandable for me...
[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 --- Comment #5 from Marc Glisse --- (In reply to Matthias Kretz from comment #3) > GCC 9 almost resolves this. However, for some reason this extended test case > is not fully optimized: https://gcc.godbolt.org/z/jRrHth > i.e. the call to dont_call_me() should be eliminated as dead code We are left with: _GLOBAL__sub_I__Z1fv () { [local count: 1073741824]: d = 125; return; } f () { unsigned int d.1_1; [local count: 1073741824]: d.1_1 = d; if (d.1_1 == 125) [...] This is a classic, if the initialization of global variables is only noticed to be constant after optimizations (as opposed to in the front-end), gcc doesn't manage to turn the dynamic initialization into a static one. Making the intrinsics constexpr may help, but really this is something that would be nice to fix eventually, there are several PRs blocked by this.
[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 --- Comment #4 from Matthias Kretz --- A similar test case showing that something is still missing (https://gcc.godbolt.org/z/t1DT7E): #include inline __m128i cmp(__m128i x, __m128i y) { return _mm_cmpeq_epi16(x, y); } inline unsigned to_bits(__m128i mask0) { return _pext_u32(_mm_movemask_epi8(mask0), 0x); } inline __m128i to_vmask(unsigned bits) { __m128i mask = _mm_set1_epi16(bits); mask = _mm_and_si128(mask, _mm_setr_epi16(1, 2, 4, 8, 16, 32, 64, 128)); mask = _mm_cmpeq_epi16(mask, _mm_setzero_si128()); mask = _mm_xor_si128(mask, _mm_cmpeq_epi16(mask, mask)); return mask; } auto f(__m128i x, __m128i y) { // should be: // vpcmpeqw %xmm1, %xmm0, %xmm0 // ret return to_vmask(to_bits(cmp(x, y))); } auto f(unsigned bits) { // should be equivalent to `return 0xff & bits;` return to_bits(to_vmask(bits)); }
[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 Matthias Kretz changed: What|Removed |Added Version|8.0 |9.0 --- Comment #3 from Matthias Kretz --- GCC 9 almost resolves this. However, for some reason this extended test case is not fully optimized: https://gcc.godbolt.org/z/jRrHth i.e. the call to dont_call_me() should be eliminated as dead code #include inline __m128i cmp(__m128i x, __m128i y) { return _mm_cmpeq_epi16(x, y); } inline unsigned to_bits(__m128i mask0) { return _pext_u32(_mm_movemask_epi8(mask0), 0x); } inline __m128i to_vmask(unsigned bits) { __m128i mask = _mm_set1_epi16(bits); mask = _mm_and_si128(mask, _mm_setr_epi16(1, 2, 4, 8, 16, 32, 64, 128)); mask = _mm_cmpeq_epi16(mask, _mm_setzero_si128()); mask = _mm_xor_si128(mask, _mm_cmpeq_epi16(mask, mask)); return mask; } inline bool is_eq(unsigned bits, __m128i vmask) { return to_bits(vmask) == bits; } extern const auto a = __m128i{0x0001'0002'0004'0003, 0x0009'0008'0007'0006}; extern const auto b = __m128i{0x0001'0002'0005'0003, 0x'0008'0007'0006}; extern const auto c = cmp(a, b); extern const auto d = to_bits(c); void call_me(); void dont_call_me(); void f() { if (is_eq(d, cmp(b, a))) { call_me(); } else { dont_call_me(); } }
[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target||x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2017-04-26 Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- Confirmed.
[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517 --- Comment #1 from Marc Glisse --- Several of those intrinsics are implemented using vector extensions and constant propagation works fine on those. What seems to be missing here is constant folding of the very specific __builtin_ia32_pmovmskb128 and __builtin_ia32_pext_si in ix86_fold_builtin. It should not be very hard, mostly needs someone motivated ;-)