[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert

2021-12-16 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812

Uroš Bizjak  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |12.0

--- Comment #6 from Uroš Bizjak  ---
Fixed.

[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert

2021-10-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Hongyu Wang :

https://gcc.gnu.org/g:c8a889fc0e115d40a2d02f32842655f3eadc8fa1

commit r12-4601-gc8a889fc0e115d40a2d02f32842655f3eadc8fa1
Author: Hongyu Wang 
Date:   Wed Oct 20 13:13:39 2021 +0800

i386: Fix wrong codegen for V8HF move without TARGET_AVX512F

Since _Float16 type is enabled under sse2 target, returning
V8HFmode vector without AVX512F target would generate wrong
vmovdqa64 instruction. Adjust ix86_get_ssemov to avoid this.

gcc/ChangeLog:
PR target/102812
* config/i386/i386.c (ix86_get_ssemov): Adjust HFmode vector
move to use the same logic as HImode.

gcc/testsuite/ChangeLog:
PR target/102812
* gcc.target/i386/pr102812.c: New test.

[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert

2021-10-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812

--- Comment #4 from Hongtao.liu  ---
(In reply to Hongyu Wang from comment #3)
> (In reply to Uroš Bizjak from comment #2)
> > Please note that the code above should compile via ix86_expand_vector_set,
> > similar to:
> > 
> > --cut here--
> > typedef short v8hi __attribute__((__vector_size__(16)));
> > 
> > v8hi foo (short a)
> > {
> >   return (v8hi) {a, 0, 0, 0, 0, 0, 0, 0 };
> > }
> > --cut here--
> > 
> > that results in:
> > 
> > vpxor   %xmm0, %xmm0, %xmm0
> > vpinsrw $0, %edi, %xmm0, %xmm0
> > ret
> 
> Currently we have
> 
> if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
>   return true;
> 
> in ix86_vector_mode_supported_p, so for SSE2 target V8HFmode would be
> returned in BLKmode.
> 
> After I put V8HFmode to VALID_SSE2_REG_MODE the code would be like
> 
> vmovss  %xmm0, %xmm0, %xmm1
> vpxor   %xmm0, %xmm0, %xmm0
> pextrw  $0, %xmm1, -10(%rsp)   
> vpinsrw $0, -10(%rsp), %xmm0, %xmm0
> 
> Seems IRA spills the HF reg to memory..
> 
> I wonder whether we should move vector mode support to sse2 for now, as we
> don't have sufficient HF vector arithmetic emulation for non-avx512fp16
> target.
Acccording to document, maybe we can.
@deftypefn {Target Hook} bool TARGET_VECTOR_MODE_SUPPORTED_P (machine_mode
@var{mode})
Define this to return nonzero if the port is prepared to handle
insns involving vector mode @var{mode}.  At the very least, it
must have move patterns for this mode.
@end deftypefn

[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert

2021-10-20 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812

--- Comment #3 from Hongyu Wang  ---
(In reply to Uroš Bizjak from comment #2)
> Please note that the code above should compile via ix86_expand_vector_set,
> similar to:
> 
> --cut here--
> typedef short v8hi __attribute__((__vector_size__(16)));
> 
> v8hi foo (short a)
> {
>   return (v8hi) {a, 0, 0, 0, 0, 0, 0, 0 };
> }
> --cut here--
> 
> that results in:
> 
> vpxor   %xmm0, %xmm0, %xmm0
> vpinsrw $0, %edi, %xmm0, %xmm0
> ret

Currently we have

if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
  return true;

in ix86_vector_mode_supported_p, so for SSE2 target V8HFmode would be returned
in BLKmode.

After I put V8HFmode to VALID_SSE2_REG_MODE the code would be like

vmovss  %xmm0, %xmm0, %xmm1
vpxor   %xmm0, %xmm0, %xmm0
pextrw  $0, %xmm1, -10(%rsp)   
vpinsrw $0, -10(%rsp), %xmm0, %xmm0

Seems IRA spills the HF reg to memory..

I wonder whether we should move vector mode support to sse2 for now, as we
don't have sufficient HF vector arithmetic emulation for non-avx512fp16 target.

[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert

2021-10-20 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812

--- Comment #2 from Uroš Bizjak  ---
Please note that the code above should compile via ix86_expand_vector_set,
similar to:

--cut here--
typedef short v8hi __attribute__((__vector_size__(16)));

v8hi foo (short a)
{
  return (v8hi) {a, 0, 0, 0, 0, 0, 0, 0 };
}
--cut here--

that results in:

vpxor   %xmm0, %xmm0, %xmm0
vpinsrw $0, %edi, %xmm0, %xmm0
ret

[Bug target/102812] Unoptimal (and wrong) code for _Float16 insert

2021-10-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812

--- Comment #1 from Hongtao.liu  ---
ix86_get_ssemov needs to be updated for V8HF/V16HF since they cound be existed
under TARGET_SSE2/TARGET_AVX.