[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2021-09-05 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

--- Comment #9 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Andrew Pinski from comment #7)
> > The UNSPEC_MASKOP ones are still there.
> > 
> > PR 93885 is the same issue.
> 
> void test(void* data, void* data2)
> {
> __m128i v = _mm_load_si128((__m128i const*)data);
> __mmask8 m = _mm_testn_epi16_mask(v, v);
> m = m | 0x0f;
> m = m | 0xf0;
> v = _mm_maskz_add_epi16(m, v, v);
> _mm_store_si128((__m128i*)data2, v);
> }
> 
> Should be ok.
> 
> Currently we rely on RA to choose whether to use mask register or gpr for
> bitwise operation, which means that if we remove UNSPEC_MASKOP, _kor_mask8
> will only generate gpr orb, to ensure the correspondence between intrinsic
> and instruction, UNSPEC_MASKOP is necessary, if the user wants GCC to
> optimize bitwise operation, it is recommended to use bitwise operator
> instead of intrinsic.

Similar for 88476

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2021-09-05 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

--- Comment #8 from Hongtao.liu  ---
(In reply to Andrew Pinski from comment #7)
> The UNSPEC_MASKOP ones are still there.
> 
> PR 93885 is the same issue.

void test(void* data, void* data2)
{
__m128i v = _mm_load_si128((__m128i const*)data);
__mmask8 m = _mm_testn_epi16_mask(v, v);
m = m | 0x0f;
m = m | 0xf0;
v = _mm_maskz_add_epi16(m, v, v);
_mm_store_si128((__m128i*)data2, v);
}

Should be ok.

Currently we rely on RA to choose whether to use mask register or gpr for
bitwise operation, which means that if we remove UNSPEC_MASKOP, _kor_mask8 will
only generate gpr orb, to ensure the correspondence between intrinsic and
instruction, UNSPEC_MASKOP is necessary, if the user wants GCC to optimize
bitwise operation, it is recommended to use bitwise operator instead of
intrinsic.

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2021-09-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

Andrew Pinski  changed:

   What|Removed |Added

 Blocks||93885

--- Comment #7 from Andrew Pinski  ---
The UNSPEC_MASKOP ones are still there.

PR 93885 is the same issue.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93885
[Bug 93885] Spurious instruction kshiftlw issued

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2020-08-20 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #6 from Hongtao.liu  ---
Fixed in GCC11 by
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=388cb292a94f98a276548cd6ce01285cf36d17df

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2018-12-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

--- Comment #5 from Jakub Jelinek  ---
The rationale for doing it the way it currently is done:
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02612.html

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2018-12-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

Jakub Jelinek  changed:

   What|Removed |Added

 CC||kyukhin at gcc dot gnu.org,
   ||uros at gcc dot gnu.org,
   ||vmakarov at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
The *s on the =k, km *mov{di,si}_internal patterns (which I've copied to the
*zero_extend?i?i2 patterns) were introduced in
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01113.html
but there wasn't any discussion on why that has been introduced.  Was that a
fear that the register allocator will start using the mask registers for cases
like
  memory1 = memory2 | memory3;
instead of the GPRs?  I'd say it would be useful to slightly disparage
transfers from GPRs to mask registers and back and perhaps also slightly
disparate mask stores into memory if needed to prevent using mask registers for
the logical ops or shifts with only memory arguments and keep the rest of the
alternative constaints (like =k, km) without any modifiers.  And if that works,
change the mask intrinsics to be normal arithmetics instead of special builtins
with UNSPECs at RTL.  Thoughts on that?

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2018-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-12-14
 Ever confirmed|0   |1

--- Comment #3 from Richard Biener  ---
Confirmed.

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2018-12-12 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

--- Comment #2 from Daniel Fruzynski  ---
I was playing with Compiler Explorer, to see how compilers optimize various
pieces of code. I found that next clang version (currently trunk) will be able
to analyze expressions which spans over vectors, masks and GPRs. I logged Bug
88476 to do something similar in gcc, please take a look. I think such approach
as in clang would be more beneficial.

In the past I also thought about template-based library, which would wrap
vector operations. One of unique concepts was to create separate types to hold
vector with bool values, and another one for int masks. With lazy instantiation
this should lead to faster resulting code. I did not try to write it yet, but
overall this approach look promising for me. With it such cases as in this bug
can 
appear as a side effect of inlining.

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2018-12-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
Well, if you want this constant folded, why are you using the _k* intrinsics at
all rather than just normal arithmetics on the __mmask8 etc. types?
The intrinsics are handled in GCC in a way to force those to be actually those
instructions.  We could surely implement those just by doing normal arithmetics
inside of the headers, but then it would be more likely that normal GPR
arithmetics would be used for those rather than mask logic etc. operations.