Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2021-01-21 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 06, 2021 at 11:34:32AM +0800, Hongtao Liu via Gcc-patches wrote:
> > >>
> > >> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
> > >> there's a data dependency, combiner patterns are usually the better
> > >> choice than peepholes.  I think you'd be looking to match something
> > >> likethis (from the . combine dump):
> > >>
> 
> Using combiner patterns, details is discussed in PR98348
> 
> Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk.
> gcc/ChangeLog:
> 
> PR target/96891
> PR target/98348
> * config/i386/sse.md (VI_128_256): New mode iterator.
> (*avx_cmp3_1, *avx_cmp3_2, *avx_cmp3_3,
>  *avx_cmp3_4, *avx2_eq3, *avx2_pcmp3_1,
>  *avx2_pcmp3_2, *avx2_gt3): New
> define_insn_and_split to lower avx512 vector comparison to avx
> version when dest is vector.
> (*_cmp3,*_cmp3,*_ucmp3):
> define_insn_and_split for negating the comparison result.
> * config/i386/predicates.md (float_vector_all_ones_operand):
> New predicate.
> * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use
> general NOT operator without UNSPEC_MASKOP.
> 
> gcc/testsuite/ChangeLog:
> 
> PR target/96891
> PR target/98348
> * gcc.target/i386/avx512bw-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-2.c: New test.
> * gcc.target/i386/avx512f-pr96891-3.c: New test.
> * g++.target/i386/avx512f-pr96891-1.C: New test.
> * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase.

Ok for trunk.  I'd prefer not to backport it to GCC 10.

Jakub



Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2021-01-05 Thread Hongtao Liu via Gcc-patches
> >>
> >> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
> >> there's a data dependency, combiner patterns are usually the better
> >> choice than peepholes.  I think you'd be looking to match something
> >> likethis (from the . combine dump):
> >>

Using combiner patterns, details is discussed in PR98348

Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk.
gcc/ChangeLog:

PR target/96891
PR target/98348
* config/i386/sse.md (VI_128_256): New mode iterator.
(*avx_cmp3_1, *avx_cmp3_2, *avx_cmp3_3,
 *avx_cmp3_4, *avx2_eq3, *avx2_pcmp3_1,
 *avx2_pcmp3_2, *avx2_gt3): New
define_insn_and_split to lower avx512 vector comparison to avx
version when dest is vector.
(*_cmp3,*_cmp3,*_ucmp3):
define_insn_and_split for negating the comparison result.
* config/i386/predicates.md (float_vector_all_ones_operand):
New predicate.
* config/i386/i386-expand.c (ix86_expand_sse_movcc): Use
general NOT operator without UNSPEC_MASKOP.

gcc/testsuite/ChangeLog:

PR target/96891
PR target/98348
* gcc.target/i386/avx512bw-pr96891-1.c: New test.
* gcc.target/i386/avx512f-pr96891-1.c: New test.
* gcc.target/i386/avx512f-pr96891-2.c: New test.
* gcc.target/i386/avx512f-pr96891-3.c: New test.
* g++.target/i386/avx512f-pr96891-1.C: New test.
* gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase.

>
> Jeff
>



--
BR,
Hongtao
From 240c830b3d35f7571da876a21aa71e263c3abe80 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Fri, 18 Dec 2020 15:56:06 +0800
Subject: [PATCH] Lower AVX512 vector comparison to AVX version when dest is
 vector.

gcc/ChangeLog:

	PR target/96891
	PR target/98348
	* config/i386/sse.md (VI_128_256): New mode iterator.
	(*avx_cmp3_1, *avx_cmp3_2, *avx_cmp3_3,
	 *avx_cmp3_4, *avx2_eq3, *avx2_pcmp3_1,
	 *avx2_pcmp3_2, *avx2_gt3): New
	define_insn_and_split to lower avx512 vector comparison to avx
	version when dest is vector.
	(*_cmp3,*_cmp3,*_ucmp3):
	define_insn_and_split for negating the comparison result.
	* config/i386/predicates.md (float_vector_all_ones_operand):
	New predicate.
	* config/i386/i386-expand.c (ix86_expand_sse_movcc): Use
	general NOT operator without UNSPEC_MASKOP.

gcc/testsuite/ChangeLog:

	PR target/96891
	PR target/98348
	* gcc.target/i386/avx512bw-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-2.c: New test.
	* gcc.target/i386/avx512f-pr96891-3.c: New test.
	* g++.target/i386/avx512f-pr96891-1.C: New test.
	* gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase.
---
 gcc/config/i386/i386-expand.c |  14 +-
 gcc/config/i386/predicates.md |  47 
 gcc/config/i386/sse.md| 261 +-
 .../g++.target/i386/avx512f-pr96891-1.C   |  37 +++
 .../gcc.target/i386/avx512bw-pr96891-1.c  |  75 +
 .../gcc.target/i386/avx512f-pr96891-1.c   |  40 +++
 .../gcc.target/i386/avx512f-pr96891-2.c   |  30 ++
 .../gcc.target/i386/avx512f-pr96891-3.c   |  39 +++
 .../gcc.target/i386/bitwise_mask_op-3.c   |   1 -
 9 files changed, 531 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/avx512f-pr96891-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 6e08fd32726..b4f8b275718 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3568,17 +3568,11 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
 		  ? force_reg (mode, op_false) : op_false);
   if (op_true == CONST0_RTX (mode))
 	{
-	  rtx (*gen_not) (rtx, rtx);
-	  switch (cmpmode)
-	{
-	case E_QImode: gen_not = gen_knotqi; break;
-	case E_HImode: gen_not = gen_knothi; break;
-	case E_SImode: gen_not = gen_knotsi; break;
-	case E_DImode: gen_not = gen_knotdi; break;
-	default: gcc_unreachable ();
-	}
 	  rtx n = gen_reg_rtx (cmpmode);
-	  emit_insn (gen_not (n, cmp));
+	  if (cmpmode == E_DImode && !TARGET_64BIT)
+	emit_insn (gen_knotdi (n, cmp));
+	  else
+	emit_insn (gen_rtx_SET (n, gen_rtx_fmt_e (NOT, cmpmode, cmp)));
 	  cmp = n;
 	  /* Reverse op_true op_false.  */
 	  std::swap (op_true, op_false);
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index be5aaa4d76f..0bb0729e933 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1069,6 +1069,53 @@ (define_predicate "zero_extended_scalar_load_operand"
   return true;
 })
 
+/* Return true if operand is a float vector constant that is all ones. */

Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2020-11-30 Thread Jeff Law via Gcc-patches



On 11/16/20 8:10 PM, Hongtao Liu wrote:
> On Tue, Nov 17, 2020 at 8:05 AM Jeff Law  wrote:
>>
>> On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote:
>>> Hi:
>>>   Add define_peephole2 to eliminate potential redundant conversion
>>> from mask to vector.
>>>   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
>>>   Ok for trunk?
>>>
>>> gcc/ChangeLog:
>>> PR target/96891
>>> * config/i386/sse.md (VI_128_256): New mode iterator.
>>> (define_peephole2): Lower avx512 vector compare to avx version
>>> when dest is vector.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/i386/avx512bw-pr96891-1.c: New test.
>>> * gcc.target/i386/avx512f-pr96891-1.c: New test.
>>> * gcc.target/i386/avx512f-pr96891-2.c: New test.
>> Aren't these the two insns in question:
>>
>>
>> (insn 7 4 8 2 (set (reg:QI 86)
>> (unspec:QI [
>> (reg:V8SF 90)
>> (reg:V8SF 89)
>> (const_int 2 [0x2])
>> ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3}
>>  (expr_list:REG_DEAD (reg:V8SF 90)
>> (expr_list:REG_DEAD (reg:V8SF 89)
>> (nil
>> (note 8 7 9 2 NOTE_INSN_DELETED)
>> (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ])
>> (vec_merge:V8SI (const_vector:V8SI [
>> (const_int -1 [0x]) repeated x8
>> ])
>> (const_vector:V8SI [
>> (const_int 0 [0]) repeated x8
>> ])
>> (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si}
>>  (expr_list:REG_DEAD (reg:QI 86)
>> (nil)))
>>
>>
>> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
>> there's a data dependency, combiner patterns are usually the better
>> choice than peepholes.  I think you'd be looking to match something
>> likethis (from the . combine dump):
>>
>> (set (reg:V8SI 82 [ _2 ])
>> (vec_merge:V8SI (const_vector:V8SI [
>> (const_int -1 [0x]) repeated x8
>> ])
>> (const_vector:V8SI [
>> (const_int 0 [0]) repeated x8
>> ])
>> (unspec:QI [
>> (reg:V8SF 90)
>> (reg:V8SF 89)
>> (const_int 2 [0x2])
>> ] UNSPEC_PCMP)))
>>
>>
>> Jeff
>>
> Yes, as discussed in [1], maybe it's better to refactor avx512 integer
> mask with VnBImode. Then unspec_pcmp could be dropped and simplify_rtx
> could handle vector comparison more effectively.
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521#c4
Thanks for the pointer.   I didn't realize this patch was essentially
abandoned.

Jeff



Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2020-11-16 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 17, 2020 at 8:05 AM Jeff Law  wrote:
>
>
> On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote:
> > Hi:
> >   Add define_peephole2 to eliminate potential redundant conversion
> > from mask to vector.
> >   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
> >   Ok for trunk?
> >
> > gcc/ChangeLog:
> > PR target/96891
> > * config/i386/sse.md (VI_128_256): New mode iterator.
> > (define_peephole2): Lower avx512 vector compare to avx version
> > when dest is vector.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx512bw-pr96891-1.c: New test.
> > * gcc.target/i386/avx512f-pr96891-1.c: New test.
> > * gcc.target/i386/avx512f-pr96891-2.c: New test.
>
> Aren't these the two insns in question:
>
>
> (insn 7 4 8 2 (set (reg:QI 86)
> (unspec:QI [
> (reg:V8SF 90)
> (reg:V8SF 89)
> (const_int 2 [0x2])
> ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3}
>  (expr_list:REG_DEAD (reg:V8SF 90)
> (expr_list:REG_DEAD (reg:V8SF 89)
> (nil
> (note 8 7 9 2 NOTE_INSN_DELETED)
> (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ])
> (vec_merge:V8SI (const_vector:V8SI [
> (const_int -1 [0x]) repeated x8
> ])
> (const_vector:V8SI [
> (const_int 0 [0]) repeated x8
> ])
> (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si}
>  (expr_list:REG_DEAD (reg:QI 86)
> (nil)))
>
>
> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
> there's a data dependency, combiner patterns are usually the better
> choice than peepholes.  I think you'd be looking to match something
> likethis (from the . combine dump):
>
> (set (reg:V8SI 82 [ _2 ])
> (vec_merge:V8SI (const_vector:V8SI [
> (const_int -1 [0x]) repeated x8
> ])
> (const_vector:V8SI [
> (const_int 0 [0]) repeated x8
> ])
> (unspec:QI [
> (reg:V8SF 90)
> (reg:V8SF 89)
> (const_int 2 [0x2])
> ] UNSPEC_PCMP)))
>
>
> Jeff
>

Yes, as discussed in [1], maybe it's better to refactor avx512 integer
mask with VnBImode. Then unspec_pcmp could be dropped and simplify_rtx
could handle vector comparison more effectively.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521#c4
-- 
BR,
Hongtao


Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2020-11-16 Thread Jeff Law via Gcc-patches


On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote:
> Hi:
>   Add define_peephole2 to eliminate potential redundant conversion
> from mask to vector.
>   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
>   Ok for trunk?
>
> gcc/ChangeLog:
> PR target/96891
> * config/i386/sse.md (VI_128_256): New mode iterator.
> (define_peephole2): Lower avx512 vector compare to avx version
> when dest is vector.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512bw-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-2.c: New test.

Aren't these the two insns in question:


(insn 7 4 8 2 (set (reg:QI 86)
    (unspec:QI [
    (reg:V8SF 90)
    (reg:V8SF 89)
    (const_int 2 [0x2])
    ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3}
 (expr_list:REG_DEAD (reg:V8SF 90)
    (expr_list:REG_DEAD (reg:V8SF 89)
    (nil
(note 8 7 9 2 NOTE_INSN_DELETED)
(insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ])
    (vec_merge:V8SI (const_vector:V8SI [
    (const_int -1 [0x]) repeated x8
    ])
    (const_vector:V8SI [
    (const_int 0 [0]) repeated x8
    ])
    (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si}
 (expr_list:REG_DEAD (reg:QI 86)
    (nil)))


Note there's a data dependency between them.  insn 7 feeds insn 9.  When
there's a data dependency, combiner patterns are usually the better
choice than peepholes.  I think you'd be looking to match something
likethis (from the . combine dump):

(set (reg:V8SI 82 [ _2 ])
    (vec_merge:V8SI (const_vector:V8SI [
    (const_int -1 [0x]) repeated x8
    ])
    (const_vector:V8SI [
    (const_int 0 [0]) repeated x8
    ])
    (unspec:QI [
    (reg:V8SF 90)
    (reg:V8SF 89)
    (const_int 2 [0x2])
    ] UNSPEC_PCMP)))


Jeff



Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2020-09-02 Thread H.J. Lu via Gcc-patches
On Wed, Sep 2, 2020 at 2:33 AM Hongtao Liu via Gcc-patches
 wrote:
>
> Hi:
>   Add define_peephole2 to eliminate potential redundant conversion
> from mask to vector.
>   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
>   Ok for trunk?
>
> gcc/ChangeLog:
> PR target/96891
> * config/i386/sse.md (VI_128_256): New mode iterator.
> (define_peephole2): Lower avx512 vector compare to avx version
> when dest is vector.
>
> gcc/testsuite/ChangeLog:

Missing PR target/96891

> * gcc.target/i386/avx512bw-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-2.c: New test.
>
> --
> BR,
> Hongtao



-- 
H.J.


[PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2020-09-02 Thread Hongtao Liu via Gcc-patches
Hi:
  Add define_peephole2 to eliminate potential redundant conversion
from mask to vector.
  Bootstrap is ok, regression test is ok for i386/x86-64 backend.
  Ok for trunk?

gcc/ChangeLog:
PR target/96891
* config/i386/sse.md (VI_128_256): New mode iterator.
(define_peephole2): Lower avx512 vector compare to avx version
when dest is vector.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-pr96891-1.c: New test.
* gcc.target/i386/avx512f-pr96891-1.c: New test.
* gcc.target/i386/avx512f-pr96891-2.c: New test.

-- 
BR,
Hongtao
From ba76432c08f47e4ecc1f355c0dfdea8908aaf9f4 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Wed, 2 Sep 2020 17:14:39 +0800
Subject: [PATCH] Lower AVX512 vector compare to AVX version when dest is
 vector.

gcc/ChangeLog:
	PR target/96891
	* config/i386/sse.md (VI_128_256): New mode iterator.
	(define_peephole2): Lower avx512 vector compare to avx version
	when dest is vector.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-2.c: New test.
---
 gcc/config/i386/sse.md| 93 +++
 .../gcc.target/i386/avx512bw-pr96891-1.c  | 36 +++
 .../gcc.target/i386/avx512f-pr96891-1.c   | 40 
 .../gcc.target/i386/avx512f-pr96891-2.c   | 30 ++
 4 files changed, 199 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8250325e1a3..31e0dc2a600 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -629,6 +629,9 @@ (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI])
 ;; All 256bit vector integer modes
 (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
+;; All 128 and 256bit vector integer modes
+(define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
+
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
 (define_mode_iterator VI14_128 [V16QI V4SI])
@@ -6703,6 +6706,96 @@ (define_insn "*_cvtmask2"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+/* Lower avx512 parallel floating compare to avx compare when dst is vector.  */
+(define_peephole2
+  [(set (match_operand: 0 "register_operand")
+	(unspec:
+	  [(match_operand:VF_128_256 1 "register_operand")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand")
+	   (match_operand:SI 3 "const_0_to_31_operand")]
+	  UNSPEC_PCMP))
+   (set (match_operand: 4 "register_operand")
+	(vec_merge:
+	  (match_operand: 5 "vector_all_ones_operand")
+	  (match_operand: 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 7)
+	(unspec:VF_128_256
+	  [(match_dup 1)
+	   (match_dup 2)
+	   (match_dup 3)] UNSPEC_PCMP))]
+  "operands[7] = gen_rtx_REG (mode, REGNO (operands[4]));")
+
+/* Lower avx512 parallel integral compare to avx compare when dst is vector.  */
+(define_peephole2
+  [(set (match_operand: 0 "register_operand")
+	(unspec:
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")]
+	  UNSPEC_MASKED_EQ))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4)
+  	(eq:VI_128_256
+	  (match_dup 1)
+	  (match_dup 2)))])
+
+(define_peephole2
+  [(set (match_operand: 0 "register_operand")
+	(unspec:
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")]
+	  UNSPEC_MASKED_GT))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGN