Ping: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-06-29 Thread Xionghu Luo via Gcc-patches

Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html


On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:

Hi,

On 2021/5/13 18:49, Segher Boessenkool wrote:

Hi!

On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:

The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3) | (op3),
(~op3) | (op2),
(op3) | (~op3),
(op2) | (~op3),
(op1&~op3) | (op3),
(op1&~op3) | (op2),
(op3) | (op1&~op3),
(op2) | (op1&~op3),

Combine pass will swap (op1&~op3) to (~op3) due to commutative
canonical, which could reduce it to the FIRST 4 patterns, but it won't
swap (op2) | (~op3) to (~op3) | (op2), so this patch
handles it with two patterns with different NOT op3 position and check
equality inside it.


Yup, that latter case does not have canonicalisation rules.  Btw, not
only combine does this canonicalisation: everything should,
non-canonical RTL is invalid RTL (in the instruction stream, you can do
everything in temporary code of course, as long as the RTL isn't
malformed).


-(define_insn "*altivec_vsel"
+(define_insn "altivec_vsel"
    [(set (match_operand:VM 0 "altivec_register_operand" "=v")
-    (if_then_else:VM
- (ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
-    (match_operand:VM 4 "zero_constant" ""))
- (match_operand:VM 2 "altivec_register_operand" "v")
- (match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+    (ior:VM
+ (and:VM
+  (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+  (match_operand:VM 1 "altivec_register_operand" "v"))
+ (and:VM
+  (match_operand:VM 2 "altivec_register_operand" "v")
+  (match_operand:VM 4 "altivec_register_operand" "v"]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
+  && (rtx_equal_p (operands[2], operands[3])
+  || rtx_equal_p (operands[4], operands[3]))"
+  {
+    if (rtx_equal_p (operands[2], operands[3]))
+  return "vsel %0,%1,%4,%3";
+    else
+  return "vsel %0,%1,%2,%3";
+  }
    [(set_attr "type" "vecmove")])


That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I
think.  So please write this as two patterns (and keep the expand if
that helps).


I was a bit concerned that there would be a lot of duplicate code if we
write two patterns for each vsel, totally 4 similar patterns in
altivec.md and another 4 in vsx.md make it difficult to maintain, however
I updated it since you prefer this way, as you pointed out the xxsel in
vsx.md could be folded by later patch.




+(define_insn "altivec_vsel2"


(same here of course).


  ;; Fused multiply add.
diff --git a/gcc/config/rs6000/rs6000-call.c 
b/gcc/config/rs6000/rs6000-call.c

index f5676255387..d65bdc01055 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -3362,11 +3362,11 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
  RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
RS6000_BTI_unsigned_V2DI },

    { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
  RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
RS6000_BTI_V2DI },

-  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
+  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS,


Are the _uns things still used for anything?  But, let's not change
this until Bill's stuff is in :-)

Why do you want to change this here, btw?  I don't understand.


OK, they are actually "unsigned type" overload builtin functions, change
it or not so far won't cause functionality issue, I will revert this change
in the updated patch.




+  if (target == 0
+  || GET_MODE (target) != tmode
+  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))


No space after ! and other unary operators (except for casts and other
operators you write with alphanumerics, like "sizeof").  I know you
copied this code, but :-)


OK, thanks.



@@ -15608,8 +15606,6 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx 
op_true, rtx op_false,

  case GEU:
  case LTU:
  case LEU:
-  /* Mark unsigned tests with CCUNSmode.  */
-  cc_mode = CCUNSmode;
    /* Invert condition to avoid compound test if necessary.  */
    if (rcode == GEU || rcode == LEU)


So this is related to the _uns thing.  Could you split off that change?
Probably as an earlier patch (but either works for me).


Not related to the ALTIVEC_BUILTIN_VSEL_2DI_UNS things, previously cc_mode
is a parameter to generate the condition for IF_THEN_ELSE instruction, now
we don't need it again as we use IOR (AND... AND...) style, remove it to 
avoid

build error.


-  cond2 = gen_rtx_fmt_ee (NE, cc_mode, gen_lowpart (dest_mode, mask),

Ping: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-06-06 Thread Xionghu Luo via Gcc-patches

Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html


On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:

Hi,

On 2021/5/13 18:49, Segher Boessenkool wrote:

Hi!

On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:

The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3) | (op3),
(~op3) | (op2),
(op3) | (~op3),
(op2) | (~op3),
(op1&~op3) | (op3),
(op1&~op3) | (op2),
(op3) | (op1&~op3),
(op2) | (op1&~op3),

Combine pass will swap (op1&~op3) to (~op3) due to commutative
canonical, which could reduce it to the FIRST 4 patterns, but it won't
swap (op2) | (~op3) to (~op3) | (op2), so this patch
handles it with two patterns with different NOT op3 position and check
equality inside it.


Yup, that latter case does not have canonicalisation rules.  Btw, not
only combine does this canonicalisation: everything should,
non-canonical RTL is invalid RTL (in the instruction stream, you can do
everything in temporary code of course, as long as the RTL isn't
malformed).


-(define_insn "*altivec_vsel"
+(define_insn "altivec_vsel"
    [(set (match_operand:VM 0 "altivec_register_operand" "=v")
-    (if_then_else:VM
- (ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
-    (match_operand:VM 4 "zero_constant" ""))
- (match_operand:VM 2 "altivec_register_operand" "v")
- (match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+    (ior:VM
+ (and:VM
+  (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+  (match_operand:VM 1 "altivec_register_operand" "v"))
+ (and:VM
+  (match_operand:VM 2 "altivec_register_operand" "v")
+  (match_operand:VM 4 "altivec_register_operand" "v"]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
+  && (rtx_equal_p (operands[2], operands[3])
+  || rtx_equal_p (operands[4], operands[3]))"
+  {
+    if (rtx_equal_p (operands[2], operands[3]))
+  return "vsel %0,%1,%4,%3";
+    else
+  return "vsel %0,%1,%2,%3";
+  }
    [(set_attr "type" "vecmove")])


That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I
think.  So please write this as two patterns (and keep the expand if
that helps).


I was a bit concerned that there would be a lot of duplicate code if we
write two patterns for each vsel, totally 4 similar patterns in
altivec.md and another 4 in vsx.md make it difficult to maintain, however
I updated it since you prefer this way, as you pointed out the xxsel in
vsx.md could be folded by later patch.




+(define_insn "altivec_vsel2"


(same here of course).


  ;; Fused multiply add.
diff --git a/gcc/config/rs6000/rs6000-call.c 
b/gcc/config/rs6000/rs6000-call.c

index f5676255387..d65bdc01055 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -3362,11 +3362,11 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
  RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
RS6000_BTI_unsigned_V2DI },

    { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
  RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
RS6000_BTI_V2DI },

-  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
+  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS,


Are the _uns things still used for anything?  But, let's not change
this until Bill's stuff is in :-)

Why do you want to change this here, btw?  I don't understand.


OK, they are actually "unsigned type" overload builtin functions, change
it or not so far won't cause functionality issue, I will revert this change
in the updated patch.




+  if (target == 0
+  || GET_MODE (target) != tmode
+  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))


No space after ! and other unary operators (except for casts and other
operators you write with alphanumerics, like "sizeof").  I know you
copied this code, but :-)


OK, thanks.



@@ -15608,8 +15606,6 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx 
op_true, rtx op_false,

  case GEU:
  case LTU:
  case LEU:
-  /* Mark unsigned tests with CCUNSmode.  */
-  cc_mode = CCUNSmode;
    /* Invert condition to avoid compound test if necessary.  */
    if (rcode == GEU || rcode == LEU)


So this is related to the _uns thing.  Could you split off that change?
Probably as an earlier patch (but either works for me).


Not related to the ALTIVEC_BUILTIN_VSEL_2DI_UNS things, previously cc_mode
is a parameter to generate the condition for IF_THEN_ELSE instruction, now
we don't need it again as we use IOR (AND... AND...) style, remove it to 
avoid

build error.


-  cond2 = gen_rtx_fmt_ee (NE, cc_mode, gen_lowpart (dest_mode, mask),

*Ping*: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-05-12 Thread Xionghu Luo via Gcc-patches




On 2021/4/30 14:32, Xionghu Luo wrote:

The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3) | (op3),
(~op3) | (op2),
(op3) | (~op3),
(op2) | (~op3),
(op1&~op3) | (op3),
(op1&~op3) | (op2),
(op3) | (op1&~op3),
(op2) | (op1&~op3),

Combine pass will swap (op1&~op3) to (~op3) due to commutative
canonical, which could reduce it to the FIRST 4 patterns, but it won't
swap (op2) | (~op3) to (~op3) | (op2), so this patch
handles it with two patterns with different NOT op3 position and check
equality inside it.

Tested pass on Power8LE, any comments?

gcc/ChangeLog:

* config/rs6000/altivec.md (*altivec_vsel): Change to ...
(altivec_vsel): ... this and update define.
(*altivec_vsel_uns): Delete.
(altivec_vsel2): New define_insn.
* config/rs6000/rs6000-call.c (altivec_expand_vec_sel_builtin):
New.
(altivec_expand_builtin): Call altivec_expand_vec_sel_builtin to
expand vel_sel.
* config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Use
bit-wise selection instead of per element.
* config/rs6000/vector.md:
* config/rs6000/vsx.md (*vsx_xxsel): Change to ...
(vsx_xxsel): ... this and update define.
(*vsx_xxsel_uns): Delete.
(vsx_xxsel2): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr94613.c: New test.
---
  gcc/config/rs6000/altivec.md   | 50 -
  gcc/config/rs6000/rs6000-call.c| 81 +++---
  gcc/config/rs6000/rs6000.c | 19 +++--
  gcc/config/rs6000/vector.md| 26 ---
  gcc/config/rs6000/vsx.md   | 56 ++-
  gcc/testsuite/gcc.target/powerpc/pr94613.c | 38 ++
  6 files changed, 201 insertions(+), 69 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr94613.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 1351dafbc41..f874f975519 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -667,26 +667,44 @@ (define_insn "*altivec_gev4sf"
"vcmpgefp %0,%1,%2"
[(set_attr "type" "veccmp")])
  
-(define_insn "*altivec_vsel"

+(define_insn "altivec_vsel"
[(set (match_operand:VM 0 "altivec_register_operand" "=v")
-   (if_then_else:VM
-(ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
-   (match_operand:VM 4 "zero_constant" ""))
-(match_operand:VM 2 "altivec_register_operand" "v")
-(match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+   (ior:VM
+(and:VM
+ (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+ (match_operand:VM 1 "altivec_register_operand" "v"))
+(and:VM
+ (match_operand:VM 2 "altivec_register_operand" "v")
+ (match_operand:VM 4 "altivec_register_operand" "v"]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
+  && (rtx_equal_p (operands[2], operands[3])
+  || rtx_equal_p (operands[4], operands[3]))"
+  {
+if (rtx_equal_p (operands[2], operands[3]))
+  return "vsel %0,%1,%4,%3";
+else
+  return "vsel %0,%1,%2,%3";
+  }
[(set_attr "type" "vecmove")])
  
-(define_insn "*altivec_vsel_uns"

+(define_insn "altivec_vsel2"
[(set (match_operand:VM 0 "altivec_register_operand" "=v")
-   (if_then_else:VM
-(ne:CCUNS (match_operand:VM 1 "altivec_register_operand" "v")
-  (match_operand:VM 4 "zero_constant" ""))
-(match_operand:VM 2 "altivec_register_operand" "v")
-(match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+   (ior:VM
+(and:VM
+ (match_operand:VM 1 "altivec_register_operand" "v")
+ (match_operand:VM 2 "altivec_register_operand" "v"))
+(and:VM
+ (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+ (match_operand:VM 4 "altivec_register_operand" "v"]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
+  && (rtx_equal_p (operands[1], operands[3])
+  || rtx_equal_p (operands[2], operands[3]))"
+  {
+if (rtx_equal_p (operands[1], operands[3]))
+  return "vsel %0,%4,%2,%3";
+else
+  return "vsel %0,%4,%1,%3";
+ }
[(set_attr "type" "vecmove")])
  
  ;; Fused multiply add.

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index f5676255387..d65bdc01055 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -3362,11 +3362,11 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
  RS6000_BTI_V2DI, RS6000_BTI_V2DI,