[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED Target Milestone|--- |12.0 --- Comment #6 from Uroš Bizjak --- Fixed.
[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812 --- Comment #5 from CVS Commits --- The master branch has been updated by Hongyu Wang : https://gcc.gnu.org/g:c8a889fc0e115d40a2d02f32842655f3eadc8fa1 commit r12-4601-gc8a889fc0e115d40a2d02f32842655f3eadc8fa1 Author: Hongyu Wang Date: Wed Oct 20 13:13:39 2021 +0800 i386: Fix wrong codegen for V8HF move without TARGET_AVX512F Since _Float16 type is enabled under sse2 target, returning V8HFmode vector without AVX512F target would generate wrong vmovdqa64 instruction. Adjust ix86_get_ssemov to avoid this. gcc/ChangeLog: PR target/102812 * config/i386/i386.c (ix86_get_ssemov): Adjust HFmode vector move to use the same logic as HImode. gcc/testsuite/ChangeLog: PR target/102812 * gcc.target/i386/pr102812.c: New test.
[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812 --- Comment #4 from Hongtao.liu --- (In reply to Hongyu Wang from comment #3) > (In reply to Uroš Bizjak from comment #2) > > Please note that the code above should compile via ix86_expand_vector_set, > > similar to: > > > > --cut here-- > > typedef short v8hi __attribute__((__vector_size__(16))); > > > > v8hi foo (short a) > > { > > return (v8hi) {a, 0, 0, 0, 0, 0, 0, 0 }; > > } > > --cut here-- > > > > that results in: > > > > vpxor %xmm0, %xmm0, %xmm0 > > vpinsrw $0, %edi, %xmm0, %xmm0 > > ret > > Currently we have > > if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode)) > return true; > > in ix86_vector_mode_supported_p, so for SSE2 target V8HFmode would be > returned in BLKmode. > > After I put V8HFmode to VALID_SSE2_REG_MODE the code would be like > > vmovss %xmm0, %xmm0, %xmm1 > vpxor %xmm0, %xmm0, %xmm0 > pextrw $0, %xmm1, -10(%rsp) > vpinsrw $0, -10(%rsp), %xmm0, %xmm0 > > Seems IRA spills the HF reg to memory.. > > I wonder whether we should move vector mode support to sse2 for now, as we > don't have sufficient HF vector arithmetic emulation for non-avx512fp16 > target. Acccording to document, maybe we can. @deftypefn {Target Hook} bool TARGET_VECTOR_MODE_SUPPORTED_P (machine_mode @var{mode}) Define this to return nonzero if the port is prepared to handle insns involving vector mode @var{mode}. At the very least, it must have move patterns for this mode. @end deftypefn
[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812 --- Comment #3 from Hongyu Wang --- (In reply to Uroš Bizjak from comment #2) > Please note that the code above should compile via ix86_expand_vector_set, > similar to: > > --cut here-- > typedef short v8hi __attribute__((__vector_size__(16))); > > v8hi foo (short a) > { > return (v8hi) {a, 0, 0, 0, 0, 0, 0, 0 }; > } > --cut here-- > > that results in: > > vpxor %xmm0, %xmm0, %xmm0 > vpinsrw $0, %edi, %xmm0, %xmm0 > ret Currently we have if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode)) return true; in ix86_vector_mode_supported_p, so for SSE2 target V8HFmode would be returned in BLKmode. After I put V8HFmode to VALID_SSE2_REG_MODE the code would be like vmovss %xmm0, %xmm0, %xmm1 vpxor %xmm0, %xmm0, %xmm0 pextrw $0, %xmm1, -10(%rsp) vpinsrw $0, -10(%rsp), %xmm0, %xmm0 Seems IRA spills the HF reg to memory.. I wonder whether we should move vector mode support to sse2 for now, as we don't have sufficient HF vector arithmetic emulation for non-avx512fp16 target.
[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812 --- Comment #2 from Uroš Bizjak --- Please note that the code above should compile via ix86_expand_vector_set, similar to: --cut here-- typedef short v8hi __attribute__((__vector_size__(16))); v8hi foo (short a) { return (v8hi) {a, 0, 0, 0, 0, 0, 0, 0 }; } --cut here-- that results in: vpxor %xmm0, %xmm0, %xmm0 vpinsrw $0, %edi, %xmm0, %xmm0 ret
[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812 --- Comment #1 from Hongtao.liu --- ix86_get_ssemov needs to be updated for V8HF/V16HF since they cound be existed under TARGET_SSE2/TARGET_AVX.