[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2023-08-21 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=24928

--- Comment #8 from Andrew Pinski  ---
status of this bug:
comment #0 first testcase: Fixed since GCC 9
comment #0 second testcase: still needs improvement

comment #3 is now basically PR 24928

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2021-12-21 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-21 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

--- Comment #7 from Uroš Bizjak  ---
See also PR55894.

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

--- Comment #6 from Marc Glisse  ---
(In reply to Matthias Kretz from comment #4)
> A similar test case showing that something is still missing

You don't seem to be passing constants here, so this is unrelated to this PR.
If you file a new one, please annotate your example explaining where you expect
what to simplify to what and why.

> (https://gcc.godbolt.org/z/t1DT7E):

Adding -fdump-tree-optimized=- -g0 and showing the compiler output makes this
more understandable for me...

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

--- Comment #5 from Marc Glisse  ---
(In reply to Matthias Kretz from comment #3)
> GCC 9 almost resolves this. However, for some reason this extended test case
> is not fully optimized: https://gcc.godbolt.org/z/jRrHth
> i.e. the call to dont_call_me() should be eliminated as dead code

We are left with:

_GLOBAL__sub_I__Z1fv ()
{
   [local count: 1073741824]:
  d = 125;
  return;

}

f ()
{
  unsigned int d.1_1;

   [local count: 1073741824]:
  d.1_1 = d;
  if (d.1_1 == 125)
[...]

This is a classic, if the initialization of global variables is only noticed to
be constant after optimizations (as opposed to in the front-end), gcc doesn't
manage to turn the dynamic initialization into a static one. Making the
intrinsics constexpr may help, but really this is something that would be nice
to fix eventually, there are several PRs blocked by this.

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

--- Comment #4 from Matthias Kretz  ---
A similar test case showing that something is still missing
(https://gcc.godbolt.org/z/t1DT7E):

#include 

inline __m128i cmp(__m128i x, __m128i y) {
return _mm_cmpeq_epi16(x, y);
}
inline unsigned to_bits(__m128i mask0) {
return _pext_u32(_mm_movemask_epi8(mask0), 0x);
}

inline __m128i to_vmask(unsigned bits) {
__m128i mask = _mm_set1_epi16(bits);
mask = _mm_and_si128(mask, _mm_setr_epi16(1, 2, 4, 8, 16, 32, 64, 128));
mask = _mm_cmpeq_epi16(mask, _mm_setzero_si128());
mask = _mm_xor_si128(mask, _mm_cmpeq_epi16(mask, mask));
return mask;
}

auto f(__m128i x, __m128i y) {
// should be:
// vpcmpeqw %xmm1, %xmm0, %xmm0
// ret
return to_vmask(to_bits(cmp(x, y)));
}

auto f(unsigned bits) {
// should be equivalent to `return 0xff & bits;`
return to_bits(to_vmask(bits));
}

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2019-01-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

Matthias Kretz  changed:

   What|Removed |Added

Version|8.0 |9.0

--- Comment #3 from Matthias Kretz  ---
GCC 9 almost resolves this. However, for some reason this extended test case is
not fully optimized: https://gcc.godbolt.org/z/jRrHth
i.e. the call to dont_call_me() should be eliminated as dead code

#include 

inline __m128i cmp(__m128i x, __m128i y) {
return _mm_cmpeq_epi16(x, y);
}
inline unsigned to_bits(__m128i mask0) {
return _pext_u32(_mm_movemask_epi8(mask0), 0x);
}

inline __m128i to_vmask(unsigned bits) {
__m128i mask = _mm_set1_epi16(bits);
mask = _mm_and_si128(mask, _mm_setr_epi16(1, 2, 4, 8, 16, 32, 64, 128));
mask = _mm_cmpeq_epi16(mask, _mm_setzero_si128());
mask = _mm_xor_si128(mask, _mm_cmpeq_epi16(mask, mask));
return mask;
}

inline bool is_eq(unsigned bits, __m128i vmask) {
return to_bits(vmask) == bits;
}

extern const auto a = __m128i{0x0001'0002'0004'0003, 0x0009'0008'0007'0006};
extern const auto b = __m128i{0x0001'0002'0005'0003, 0x'0008'0007'0006};
extern const auto c = cmp(a, b);
extern const auto d = to_bits(c);

void call_me();
void dont_call_me();
void f() {
if (is_eq(d, cmp(b, a))) {
call_me();
} else {
dont_call_me();
}
}

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2017-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-04-26
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
Confirmed.

[Bug target/80517] [missed optimization] constant propagation through Intel intrinsics

2017-04-25 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517

--- Comment #1 from Marc Glisse  ---
Several of those intrinsics are implemented using vector extensions and
constant propagation works fine on those. What seems to be missing here is
constant folding of the very specific __builtin_ia32_pmovmskb128 and
__builtin_ia32_pext_si in ix86_fold_builtin. It should not be very hard, mostly
needs someone motivated ;-)