[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #6 from Hongtao.liu --- .
[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 --- Comment #5 from Hongyu Wang --- Fixed for GCC 12.
[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 --- Comment #4 from CVS Commits --- The master branch has been updated by Hongyu Wang : https://gcc.gnu.org/g:7bce0be03b857eefe5990c3ef0af06ea8f8ae04e commit r12-7747-g7bce0be03b857eefe5990c3ef0af06ea8f8ae04e Author: Hongyu Wang Date: Sat Mar 19 01:16:29 2022 +0800 AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978] For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be and by 1 to ensure the mask is bind to lowest byte. Use masked vmovss to perform same operation which omits higher bits of mask. gcc/ChangeLog: PR target/104978 * config/i386/sse.md (avx512fp16_fmaddcsh_v8hf_mask1
[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 --- Comment #3 from Hongtao.liu --- (In reply to Hongtao.liu from comment #2) > (In reply to Hongtao.liu from comment #0) > > #include > > __m128h > > foo (__m128h a, __m128h b, __m128h c, __mmask8 m) > > { > > return _mm_mask_fcmadd_round_sch (a, m, b, c, 8); > > } > > > > > > _Z3fooDv8_DF16_S_S_h: > > kmovd k1, edi > > vfcmaddcsh xmm2{k1}, xmm0, xmm1, {rn-sae} > > vmovaps xmm0{k1}, xmm2 > > ret > > > > k1 must & 1 before vmovaps xmm0{k1}, xmm2. > > Or just vmovaps xmm0, xmm2 since vfcmaddcsh will copy upper [32:128] from > src1(xmm0) here. No, intrinsic guide it using writemask k (elements are copied from a when mask bit 0 is not set)
[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 --- Comment #2 from Hongtao.liu --- (In reply to Hongtao.liu from comment #0) > #include > __m128h > foo (__m128h a, __m128h b, __m128h c, __mmask8 m) > { > return _mm_mask_fcmadd_round_sch (a, m, b, c, 8); > } > > > _Z3fooDv8_DF16_S_S_h: > kmovd k1, edi > vfcmaddcsh xmm2{k1}, xmm0, xmm1, {rn-sae} > vmovaps xmm0{k1}, xmm2 > ret > > k1 must & 1 before vmovaps xmm0{k1}, xmm2. Or just vmovaps xmm0, xmm2 since vfcmaddcsh will copy upper [32:128] from src1(xmm0) here.
[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 --- Comment #1 from Hongtao.liu --- Similar for _mm_mask_fmadd_round_sch