Ping: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-06-29 Thread Xionghu Luo via Gcc-patches

Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html


On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote:

Hi,

On 2021/6/9 07:25, Segher Boessenkool wrote:

On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:

vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for 
vmrghlb.


(vmrglb)


+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+  gen_altivec_vmrghb_direct (operands[0], operands[1], 
operands[2]));

+  else
+    emit_insn (
+  gen_altivec_vmrglb_direct (operands[0], operands[2], 
operands[1]));


Please don't indent like that, it doesn't match what we do elsewhere.
For better or for worse (for worse imo), we use deep hanging indents.
If you have to, you can do something like

   rtx insn;
   if (BYTES_BIG_ENDIAN)
 insn = gen_altivec_vmrghb_direct (operands[0], operands[1], 
operands[2]);

   else
 insn = gen_altivec_vmrglb_direct (operands[0], operands[2], 
operands[1]);

   emit_insn (insn);

(this is better even, in that it has only one emit_insn), or even

   rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
   : gen_altivec_vmrglb_direct;
   if (!BYTES_BIG_ENDIAN)
 std::swap (operands[1], operands[2]);
   emit_insn (fun (operands[0], operands[1], operands[2]));

Well, C++ does not allow that last example like that, sigh, so
   rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? 
gen_altivec_vmrghb_direct

    : gen_altivec_vmrglb_direct;

This is shorter than the other two options ;-)


Changed.




+(define_insn "altivec_vmrghb_direct"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
+    (vec_select:V16QI


This should be indented one space more.


    "TARGET_ALTIVEC"
    "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"


The original indent was correct, please restore.


-  emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo));
+  emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));


When you see a mode as part of a pattern name, chances are that it will
be a good candidate for using parameterized names with.  (But don't do
that now, just keep it in mind as a nice cleanup to do).


OK.



@@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target, 
rtx op0, rtx op1,

 : CODE_FOR_altivec_vmrglh_direct),
    {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 
23 } },

  { OPTION_MASK_ALTIVEC,
-  (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct
-   : CODE_FOR_altivec_vmrglw_direct),
+  (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
+   : CODE_FOR_altivec_vmrglw_direct_v4si),


The correct way is to align the ? and the : (or put everything on one
line of course, if that fits)

The parens around this are not needed btw, and are a distraction.


Changed.




--- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
@@ -317,10 +317,10 @@ int main ()
  /* { dg-final { scan-assembler-times "vctuxs" 2 } } */
  /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */
-/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */
+/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */
  /* { dg-final { scan-assembler-times "vmrghh" 8 } } */
-/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */
-/* { dg-final { scan-assembler-times "xxmrglw" 8 } } */
+/* { dg-final { scan-assembler-times "xxmrghw" 4 } } */
+/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
  /* { dg-final { scan-assembler-times "vmrglh" 8 } } */
  /* { dg-final { scan-assembler-times "xxlnor" 6 } } */
  /* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */
@@ -347,7 +347,7 @@ int main ()
  /* { dg-final { scan-assembler-times "vspltb" 6 } } */
  /* { dg-final { scan-assembler-times "vspltw" 0 } } */
  /* { dg-final { scan-assembler-times "vmrgow" 8 } } */
-/* { dg-final { scan-assembler-times "vmrglb" 5 { target le } } } */
+/* { dg-final { scan-assembler-times "vmrglb" 4 { target le } } } */
  /* { dg-final { scan-assembler-times "vmrglb" 6 { target be } } } */
  /* { dg-final { scan-assembler-times "vmrgew" 8 } } */
  /* { dg-final { scan-assembler-times "vsplth" 8 } } */


Are those changes correct?  It looks like a vmrglb became a vmrghb, and
that 4 each of xxmrghw and xxmrglw disappeared?  Both seem wrong?



This case is built with "-mdejagnu-cpu=power8 -O0 -mno-fold-gimple -dp"
and it also counted the generated instruction patterns.

1) "vsx_xxmrghw_v4si" is replaced by "altivec_vmrglw_direct_v4si/0", so 
it decreases from 8 to 4. (Likewise for vsx_xxmrglw_v4si.)


     li 9,48  # 1282 [c=4 l=4]  *movdi_internal64/3
-   lxvd2x 0,31,9    # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si
-   xxpermdi 0,0,0,2 # 32   [c=4 l=4]  xxswapd_v4si
-   xxmrglw 0,0,12   # 33   [c=4 l=4]  vsx_xxmrghw_v4si
+   

Ping: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-06-06 Thread Xionghu Luo via Gcc-patches

Ping, thanks.


On 2021/5/24 17:02, Xionghu Luo wrote:

From: Xiong Hu Luo 

vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrghlb.
Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select
+ vec_concat as normal RTL.

Tested pass on P8LE, P9LE and P8BE{m32}, ok for trunk?

gcc/ChangeLog:

* config/rs6000/altivec.md (*altivec_vmrghb_internal): Delete.
(altivec_vmrghb_direct): New.
(*altivec_vmrghh_internal): Delete.
(altivec_vmrghh_direct): New.
(*altivec_vmrghw_internal): Delete.
(altivec_vmrghw_direct_): New.
(altivec_vmrghw_direct): Delete.
(*altivec_vmrglb_internal): Delete.
(altivec_vmrglb_direct): New.
(*altivec_vmrglh_internal): Delete.
(altivec_vmrglh_direct): New.
(*altivec_vmrglw_internal): Delete.
(altivec_vmrglw_direct_): New.
(altivec_vmrglw_direct): Delete.
* config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Adjust.
* config/rs6000/rs6000.c (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md (vsx_xxmrghw_): Adjust.
(vsx_xxmrglw_):

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/builtins-1.c: Update instruction counts.
---
  gcc/config/rs6000/altivec.md  | 231 ++
  gcc/config/rs6000/rs6000-p8swap.c |   2 -
  gcc/config/rs6000/rs6000.c|  10 +-
  gcc/config/rs6000/vsx.md  |  18 +-
  gcc/testsuite/gcc.target/powerpc/builtins-1.c |   8 +-
  5 files changed, 95 insertions(+), 174 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 208d6343225..cae05be2c2d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -143,8 +143,6 @@ (define_c_enum "unspec"
 UNSPEC_VUPKHU_V4SF
 UNSPEC_VUPKLU_V4SF
 UNSPEC_VGBBD
-   UNSPEC_VMRGH_DIRECT
-   UNSPEC_VMRGL_DIRECT
 UNSPEC_VSPLT_DIRECT
 UNSPEC_VMRGEW_DIRECT
 UNSPEC_VMRGOW_DIRECT
@@ -1291,44 +1289,29 @@ (define_expand "altivec_vmrghb"
 (use (match_operand:V16QI 2 "register_operand"))]
"TARGET_ALTIVEC"
  {
-  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT 
(17),
-  GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
-  GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
-  GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23));
-  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
-  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
-  emit_insn (gen_rtx_SET (operands[0], x));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglb_direct (operands[0], operands[2], operands[1]));
DONE;
  })
  
-(define_insn "*altivec_vmrghb_internal"

+(define_insn "altivec_vmrghb_direct"
[(set (match_operand:V16QI 0 "register_operand" "=v")
-(vec_select:V16QI
+(vec_select:V16QI
  (vec_concat:V32QI
(match_operand:V16QI 1 "register_operand" "v")
(match_operand:V16QI 2 "register_operand" "v"))
- (parallel [(const_int 0) (const_int 16)
-(const_int 1) (const_int 17)
-(const_int 2) (const_int 18)
-(const_int 3) (const_int 19)
-(const_int 4) (const_int 20)
-(const_int 5) (const_int 21)
-(const_int 6) (const_int 22)
-(const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
-{
-  if (BYTES_BIG_ENDIAN)
-return "vmrghb %0,%1,%2";
-  else
-return "vmrglb %0,%2,%1";
-}
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vmrghb_direct"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-   (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
-  (match_operand:V16QI 2 "register_operand" "v")]
- UNSPEC_VMRGH_DIRECT))]
+ (parallel [(const_int  0) (const_int 16)
+(const_int  1) (const_int 17)
+(const_int  2) (const_int 18)
+(const_int  3) (const_int 19)
+(const_int  4) (const_int 20)
+(const_int  5) (const_int 21)
+(const_int  6) (const_int 22)
+(const_int  7) (const_int 23)])))]
"TARGET_ALTIVEC"
"vmrghb %0,%1,%2"
[(set_attr "type" "vecperm")])
@@ -1339,16 +1322,16 @@ (define_expand "altivec_vmrghh"
 (use (match_operand:V8HI 2 "register_operand"))]
"TARGET_ALTIVEC"
  {
-  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9),
-  GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11));
-  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1],