[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978

Hongtao.liu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #6 from Hongtao.liu  ---
.

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-21 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978

--- Comment #5 from Hongyu Wang  ---
Fixed for GCC 12.

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Hongyu Wang :

https://gcc.gnu.org/g:7bce0be03b857eefe5990c3ef0af06ea8f8ae04e

commit r12-7747-g7bce0be03b857eefe5990c3ef0af06ea8f8ae04e
Author: Hongyu Wang 
Date:   Sat Mar 19 01:16:29 2022 +0800

AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
mask should be and by 1 to ensure the mask is bind to lowest byte.
Use masked vmovss to perform same operation which omits higher bits
of mask.

gcc/ChangeLog:

PR target/104978
* config/i386/sse.md
(avx512fp16_fmaddcsh_v8hf_mask1

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978

--- Comment #3 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Hongtao.liu from comment #0)
> > #include
> > __m128h
> > foo (__m128h a, __m128h b, __m128h c, __mmask8 m)
> > { 
> > return _mm_mask_fcmadd_round_sch (a, m, b, c, 8);
> > }
> > 
> > 
> > _Z3fooDv8_DF16_S_S_h:
> > kmovd   k1, edi
> > vfcmaddcsh  xmm2{k1}, xmm0, xmm1, {rn-sae}
> > vmovaps xmm0{k1}, xmm2
> > ret
> > 
> > k1 must & 1 before vmovaps xmm0{k1}, xmm2.
> 
> Or just vmovaps xmm0, xmm2 since vfcmaddcsh will copy upper [32:128] from
> src1(xmm0) here.

No, intrinsic guide it using writemask k (elements are copied from a when mask
bit 0 is not set)

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978

--- Comment #2 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #0)
> #include
> __m128h
> foo (__m128h a, __m128h b, __m128h c, __mmask8 m)
> { 
> return _mm_mask_fcmadd_round_sch (a, m, b, c, 8);
> }
> 
> 
> _Z3fooDv8_DF16_S_S_h:
> kmovd   k1, edi
> vfcmaddcsh  xmm2{k1}, xmm0, xmm1, {rn-sae}
> vmovaps xmm0{k1}, xmm2
> ret
> 
> k1 must & 1 before vmovaps xmm0{k1}, xmm2.

Or just vmovaps xmm0, xmm2 since vfcmaddcsh will copy upper [32:128] from
src1(xmm0) here.

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978

--- Comment #1 from Hongtao.liu  ---
Similar for _mm_mask_fmadd_round_sch