[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2020-05-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Jakub Jelinek  ---
Fixed now.

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2020-05-06 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:319eafce3e54c8cb10e3fddce6823a6a558fca8b

commit r11-147-g319eafce3e54c8cb10e3fddce6823a6a558fca8b
Author: Jakub Jelinek 
Date:   Wed May 6 20:05:02 2020 +0200

x86: Fix vextract* masked patterns [PR93069]

The AVX512F documentation clearly states that in instructions where the
destination is a memory only merging-masking is possible, not zero-masking,
and the assembler enforces that.

The testcase in this patch fails to assemble because of
Error: unsupported masking for `vextracti32x8'
on
vextracti32x8   $0x0, %zmm1, -64(%rsp){%k1}{z}
For the vector extraction patterns, we apparently have 7 *_maskm patterns
that only accept memory destinations and rtx_equal_p merge-masking source
for it, 7 * corresponding patterns that allow memory destination
only for the non-masked cases (through ), then 2
* patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
do allow memory destination even for masked cases and are the cause of the
testsuite failure, because we must not allow C constraint if the
destination
is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
VI4F_256), which has another issue (for which I don't have a testcase
though), where if it would match zero-masking with register destination,
it wouldn't emit the needed {z} into assembly.
The attached patch fixes those 3 issues only, perhaps more suitable for
backporting.
But, even with that fixed, we are missing 3 further *_maskm patterns and
more importantly, I find the split into 3 separate patterns after subst,
*_maskm for masking with memory destination, *_mask for masking with
register destination and * for non-masking unnecessarily complex and harder
for reload, so the included patch below (non-attached) instead kills all
*_maskm patterns and splits the * patterns into * and *_mask
by hand instead of subst, where the *_mask ones make sure that with v
destination they use 0C, while with m destination they use 0 and as
condition enforce that either destination is not MEM, or rtx_equal_p
between
the destination and corresponding merging-masking operand source.
If we had those 3 missing *_maskm patterns, this patch would actually
result
in both shorter sse.md and shorter machine description after subst (e.g.
length of tmp-mddump.md), as we don't have them, the patch is actually 16
lines longer sse.md, but still shorter tmp-mddump.md.

2020-05-06  Jakub Jelinek  

PR target/93069
* config/i386/subst.md (store_mask_constraint,
store_mask_predicate):
Remove.
(avx512dq_vextract64x2_1_maskm,
avx512f_vextract32x4_1_maskm,
vec_extract_lo__maskm, vec_extract_hi__maskm): Remove.
(avx512dq_vextract64x2_1):
Split
into ...
(*avx512dq_vextract64x2_1,
avx512dq_vextract64x2_1_mask): ... these new
define_insns.  Even in the masked variant allow memory output but
in
that case use 0 rather than 0C constraint on the source of
masked-out
elts.
(avx512f_vextract32x4_1):
Split
into ...
(*avx512f_vextract32x4_1,
avx512f_vextract32x4_1_mask): ... these new
define_insns.
Even in the masked variant allow memory output but in that case use
0 rather than 0C constraint on the source of masked-out elts.
(vec_extract_lo_): Split into ...
(vec_extract_lo_, vec_extract_lo__mask): ... these new
define_insns.  Even in the masked variant allow memory output but
in
that case use 0 rather than 0C constraint on the source of
masked-out
elts.
(vec_extract_hi_): Split into ...
(vec_extract_hi_, vec_extract_hi__mask): ... these new
define_insns.  Even in the masked variant allow memory output but
in
that case use 0 rather than 0C constraint on the source of
masked-out
elts.

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2020-04-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #8 from Jakub Jelinek  ---
Yes, there is a larger patch approved for GCC11, but not for GCC10.

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2020-04-11 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #7 from Arseny Solokha  ---
Is there some further work pending, or can this PR be closed?

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2020-04-07 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #6 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:57e276f3e304ef92483763ee1028e5b3e1345e0f

commit r9-8473-g57e276f3e304ef92483763ee1028e5b3e1345e0f
Author: Jakub Jelinek 
Date:   Tue Apr 7 21:00:28 2020 +0200

Fix vextract* masked patterns [PR93069]

The AVX512F documentation clearly states that in instructions where the
destination is a memory only merging-masking is possible, not zero-masking,
and the assembler enforces that.

The testcase in this patch fails to assemble because of
Error: unsupported masking for `vextracti32x8'
on
vextracti32x8   $0x0, %zmm1, -64(%rsp){%k1}{z}
For the vector extraction patterns, we apparently have 7 *_maskm patterns
that only accept memory destinations and rtx_equal_p merge-masking source
for it, 7 * corresponding patterns that allow memory destination
only for the non-masked cases (through ), then 2
* patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
do allow memory destination even for masked cases and are the cause of the
testsuite failure, because we must not allow C constraint if the
destination
is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
VI4F_256), which has another issue (for which I don't have a testcase
though), where if it would match zero-masking with register destination,
it wouldn't emit the needed {z} into assembly.
The attached patch fixes those 3 issues only, perhaps more suitable for
backporting.

2020-03-30  Jakub Jelinek  

PR target/93069
* config/i386/sse.md (vec_extract_lo_): Use
 instead of m in output operand constraint.
(vec_extract_hi_): Use  instead of
%{%3%}.

* gcc.target/i386/avx512vl-pr93069.c: New test.
* gcc.dg/vect/pr93069.c: New test.

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2020-03-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #5 from Jakub Jelinek  ---
Smaller fix applied to GCC 10, larger one queued for GCC 11.

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2020-03-30 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:ec919cfcef8d7fcbaab24d0e0d472c65e5329ca6

commit r10-7457-gec919cfcef8d7fcbaab24d0e0d472c65e5329ca6
Author: Jakub Jelinek 
Date:   Mon Mar 30 17:38:21 2020 +0200

Fix vextract* masked patterns [PR93069]

The AVX512F documentation clearly states that in instructions where the
destination is a memory only merging-masking is possible, not zero-masking,
and the assembler enforces that.

The testcase in this patch fails to assemble because of
Error: unsupported masking for `vextracti32x8'
on
vextracti32x8   $0x0, %zmm1, -64(%rsp){%k1}{z}
For the vector extraction patterns, we apparently have 7 *_maskm patterns
that only accept memory destinations and rtx_equal_p merge-masking source
for it, 7 * corresponding patterns that allow memory destination
only for the non-masked cases (through ), then 2
* patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
do allow memory destination even for masked cases and are the cause of the
testsuite failure, because we must not allow C constraint if the
destination
is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
VI4F_256), which has another issue (for which I don't have a testcase
though), where if it would match zero-masking with register destination,
it wouldn't emit the needed {z} into assembly.
The attached patch fixes those 3 issues only, perhaps more suitable for
backporting.

2020-03-30  Jakub Jelinek  

PR target/93069
* config/i386/sse.md (vec_extract_lo_): Use
 instead of m in output operand constraint.
(vec_extract_hi_): Use  instead of
%{%3%}.

* gcc.target/i386/avx512vl-pr93069.c: New test.
* gcc.dg/vect/pr93069.c: New test.

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2019-12-28 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #3 from Jakub Jelinek  ---
Note, the above isn't the smallest possible fix, so perhaps for backporting
instead of the gcc/config/i386/ changes
--- gcc/config/i386/sse.md.jj   2019-12-27 18:16:48.146431083 +0100
+++ gcc/config/i386/sse.md  2019-12-28 16:54:09.536217497 +0100
@@ -8782,7 +8782,8 @@ (define_expand "avx_vextractf128"
 })

 (define_insn "vec_extract_lo_"
-  [(set (match_operand: 0 "nonimmediate_operand" "=v,v,m")
+  [(set (match_operand: 0 ""
+ "=v,v,")
(vec_select:
  (match_operand:V16FI 1 ""
 "v,,v")
@@ -8834,7 +8835,8 @@ (define_split
 })

 (define_insn "vec_extract_lo_"
-  [(set (match_operand: 0 "" "=v,v,m")
+  [(set (match_operand: 0 ""
+ "=v,v,")
(vec_select:
  (match_operand:VI8F_256 1 ""
"v,,v")
@@ -8844,7 +8846,7 @@ (define_insn "vec_extract_lo_ || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
-return "vextract64x2\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1,
0x0}";
+return "vextract64x2\t{$0x0, %1,
%0|%0, %1, 0x0}";
   else
 return "#";
 }
might be better, i.e. insist on _maskm patterns for masked operations into
memory and force non-memory destination for the rest of vector extractions,
plus the %{%3%} instead of  I think can't work properly if zero
masking is needed into register destination.

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2019-12-28 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

--- Comment #2 from Jakub Jelinek  ---
Created attachment 47556
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47556=edit
gcc10-pr93069.patch

Untested fix.

[Bug target/93069] Assembler messages: Error: unsupported masking for `vextracti32x8'

2019-12-28 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93069

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-12-28
 CC||jakub at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
Indeed, if the destination is memory, {z} is not allowed.