Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2023-03-30 Thread Xionghu Luo via Gcc-patches

Thanks,

On 2023/3/31 03:30, Segher Boessenkool wrote:

Hi!

On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.


This isn't so obvious at all.  All elements of these constructs are
very much not endian-independent, because of very unfortunate choices
in the meaning of some RTL constructs.  It is possible all things in
this negate all other things, but please show that then.


  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))


With BE, if the source vecs are ABCD and EFGH, the vec_concat gives
ABCDEFGH, and the vec_select than gives AEBF.

What happens for LE?


on LE, the sources looks like DCBA and HGFE, vec_concat gives HGFEACBA 
with index reversed [7 6 5 4 3 2 1 0], so it also chooses FBEA like BE.



Take the case as example on P8LE:

test.c

__attribute__ ((__noinline__))
vector int bar (vector int a, vector int b)
{
  return vec_vmrghw (a, b);
}

int main ()
{

  vector int a = {0xa1345678, 0xa2345678,0xa3345678, 0xa4345678};
  vector int b = {0xb1345678, 0xb2345678,0xb3345678, 0xb4345678};
  vector int c = bar (a, b);
  printf("%x,%x,%x,%x\n", c[0], c[1], c[2], c[3]);
  return c[0];
}


.expand:

_3 = VEC_PERM_EXPR ;

(insn 7 4 8 2 (set (reg:V16QI 122)
(subreg:V16QI (reg/v:V4SI 118 [ a ]) 0)) "test.c":15:10 -1
 (nil))
(insn 8 7 9 2 (set (reg:V16QI 123)
(subreg:V16QI (reg/v:V4SI 119 [ b ]) 0)) "test.c":15:10 -1
 (nil))
(insn 9 8 10 2 (set (reg:V4SI 124)
(vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 122) 0)
(subreg:V4SI (reg:V16QI 123) 0))
(parallel [
(const_int 0 [0])
(const_int 4 [0x4])
(const_int 1 [0x1])
(const_int 5 [0x5])
]))) "test.c":15:10 -1
 (nil))


And .vregs to .final:

(insn 15 9 16 (set (reg/i:V4SI 66 %v2)
(vec_select:V4SI (vec_concat:V8SI (reg:V4SI 66 %v2 [125])
(reg:V4SI 67 %v3 [126]))
(parallel [
(const_int 0 [0])
(const_int 4 [0x4])
(const_int 1 [0x1])
(const_int 5 [0x5])
]))) "test.c":16:1 1825 {altivec_vmrglw_direct_v4si_le}
 (expr_list:REG_DEAD (reg:V4SI 67 %v3 [126])
(nil)))


As altivec_vmrglw_direct_v4si_le is defined as with this patch:


(define_insn "altivec_vmrglw_direct__le"
  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
(vec_select:VSX_W
  (vec_concat:
(match_operand:VSX_W 2 "register_operand" "wa,v")
(match_operand:VSX_W 1 "register_operand" "wa,v"))
  (parallel [(const_int 0) (const_int 4)
 (const_int 1) (const_int 5)])))]
  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
  "@
   xxmrglw %x0,%x1,%x2
   vmrglw %0,%1,%2"
  [(set_attr "type" "vecperm")])


ASM:

bar:
.LFB11:
.cfi_startproc
xxmrglw 34,35,34
blr


./test
a1345678,b1345678,a2345678,b2345678

Exactly matches [a1 b1 a2 b2].  Does this look reasonable?


BR,
Xionghu



Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2023-03-30 Thread Segher Boessenkool
Hi!

On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.

This isn't so obvious at all.  All elements of these constructs are
very much not endian-independent, because of very unfortunate choices
in the meaning of some RTL constructs.  It is possible all things in
this negate all other things, but please show that then.

>  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>  (subreg:V4SI (reg:V16QI 139) 0)
>  (subreg:V4SI (reg:V16QI 140) 0))
>  [const_int 0 4 1 5]))

With BE, if the source vecs are ABCD and EFGH, the vec_concat gives
ABCDEFGH, and the vec_select than gives AEBF.

What happens for LE?


Segher


Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2023-02-27 Thread Xionghu Luo via Gcc-patches

Hi Segher, Ping this for stage 4...


On 2023/2/10 10:59, Xionghu Luo via Gcc-patches wrote:

Resend this patch...

v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
(altivec_vmrghb_direct_be): New pattern for BE.
(altivec_vmrghb_direct_le): New pattern for LE.
(altivec_vmrghh_direct): Remove.
(altivec_vmrghh_direct_be): New pattern for BE.
(altivec_vmrghh_direct_le): New pattern for LE.
(altivec_vmrghw_direct_): Remove.
(altivec_vmrghw_direct__be): New pattern for BE.
(altivec_vmrghw_direct__le): New pattern for LE.
(altivec_vmrglb_direct): Remove.
(altivec_vmrglb_direct_be): New pattern for BE.
(altivec_vmrglb_direct_le): New pattern for LE.
(altivec_vmrglh_direct): Remove.
(altivec_vmrglh_direct_be): New pattern for BE.
(altivec_vmrglh_direct_le): New pattern for LE.
(altivec_vmrglw_direct_): Remove.
(altivec_vmrglw_direct__be): New pattern for BE.
(altivec_vmrglw_direct__le): New pattern for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

PR target/106069
* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo 
---
  gcc/config/rs6000/altivec.md| 222 ++--
  gcc/config/rs6000/rs6000.cc |  24 +--
  gcc/config/rs6000/vsx.md|  28 +--
  gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++
  4 files changed, 307 insertions(+), 85 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 30606b8ab21..4bfeecec224 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
 (use (match_operand:V16QI 2 "register_operand"))]
"TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
DONE;
  })
  
-(define_insn "altivec_vmrghb_direct"

+(define_insn "altivec_vmrghb_direct_be"
[(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
 (const_int 5) (const_int 21)
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 2 "register_operand" "v")
+   (match_operand:V16QI 1 "register_operand" "v"))
+ (parallel [(const_int  8) (const_int 24)
+(const_int  9) (const_int 25)
+(const_int 10) (const_int 26)
+(const_int 11) (const_int 27)
+(const_int 12) (const_int 28)
+(const_int 13) 

[PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2023-02-09 Thread Xionghu Luo via Gcc-patches
Resend this patch...

v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
(altivec_vmrghb_direct_be): New pattern for BE.
(altivec_vmrghb_direct_le): New pattern for LE.
(altivec_vmrghh_direct): Remove.
(altivec_vmrghh_direct_be): New pattern for BE.
(altivec_vmrghh_direct_le): New pattern for LE.
(altivec_vmrghw_direct_): Remove.
(altivec_vmrghw_direct__be): New pattern for BE.
(altivec_vmrghw_direct__le): New pattern for LE.
(altivec_vmrglb_direct): Remove.
(altivec_vmrglb_direct_be): New pattern for BE.
(altivec_vmrglb_direct_le): New pattern for LE.
(altivec_vmrglh_direct): Remove.
(altivec_vmrglh_direct_be): New pattern for BE.
(altivec_vmrglh_direct_le): New pattern for LE.
(altivec_vmrglw_direct_): Remove.
(altivec_vmrglw_direct__be): New pattern for BE.
(altivec_vmrglw_direct__le): New pattern for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

PR target/106069
* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo 
---
 gcc/config/rs6000/altivec.md| 222 ++--
 gcc/config/rs6000/rs6000.cc |  24 +--
 gcc/config/rs6000/vsx.md|  28 +--
 gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++
 4 files changed, 307 insertions(+), 85 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 30606b8ab21..4bfeecec224 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
(use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
 (const_int 5) (const_int 21)
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 2 "register_operand" "v")
+   (match_operand:V16QI 1 "register_operand" "v"))
+ (parallel [(const_int  8) (const_int 24)
+(const_int  9) (const_int 25)
+(const_int 10) (const_int 26)
+(const_int 11) (const_int 27)
+(const_int 12) (const_int 28)
+(const_int 13) (const_int 29)
+(const_int 14) (const_int 30)
+(const_int 15) (const_int 

Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2023-02-09 Thread Segher Boessenkool
On Thu, Feb 09, 2023 at 10:15:22AM +0800, Xionghu Luo wrote:
> Thanks Kewen!
> Ping this again @Segher.
> Maybe we could also merge this patch if no objections from Segher as 
> several reviews and tests taken on this already...

Please send the patch as the head of its own thread, not as a reply deep
in a thread of an older version?


Segher


Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2023-02-08 Thread Xionghu Luo via Gcc-patches

Thanks Kewen!
Ping this again @Segher.
Maybe we could also merge this patch if no objections from Segher as 
several reviews and tests taken on this already...



BR,
Xionghu


On 2023/1/18 17:11, Kewen.Lin wrote:

Hi Segher,

I guessed that this patch escaped from your radar. :)

As Jakub asked the status in PR106069, I applied this attached patch from 
Xionghu
to the latest trunk, re-tested it and confirmed that it's still bootstrapped and
regtested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10.

This new version has separated out direct le and be, it's more clear than 
before,
it looked good to me.  What do you think of this?  Looking forward to your 
opinion.

btw, the link in archives:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600169.html

BR,
Kewen

on 2022/8/24 09:24, Xionghu Luo wrote:

主题:
Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the 
UNSPECS [PR106069]
From:
Xionghu Luo 
日期:
2022/8/24, 09:24

收件人:
"Kewen.Lin" , Segher Boessenkool 

抄送:
Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn 
, Segher Boessenkool 


Hi Segher, I'd like to resend and ping for this patch. Thanks.

v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch

 From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001
From: Xionghu Luo 
Date: Thu, 4 Aug 2022 03:44:58 +
Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
  UNSPECS [PR106069]

v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
(altivec_vmrghb_direct_be): New pattern for BE.
(altivec_vmrghb_direct_le): New pattern for LE.
(altivec_vmrghh_direct): Remove.
(altivec_vmrghh_direct_be): New pattern for BE.
(altivec_vmrghh_direct_le): New pattern for LE.
(altivec_vmrghw_direct_): Remove.
(altivec_vmrghw_direct__be): New pattern for BE.
(altivec_vmrghw_direct__le): New pattern for LE.
(altivec_vmrglb_direct): Remove.
(altivec_vmrglb_direct_be): New pattern for BE.
(altivec_vmrglb_direct_le): New pattern for LE.
(altivec_vmrglh_direct): Remove.
(altivec_vmrglh_direct_be): New pattern for BE.
(altivec_vmrglh_direct_le): New pattern for LE.
(altivec_vmrglw_direct_): Remove.
(altivec_vmrglw_direct__be): New pattern for BE.
(altivec_vmrglw_direct__le): New pattern for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

PR target/106069
* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo 
---
  gcc/config/rs6000/altivec.md| 222 ++--
  gcc/config/rs6000/rs6000.cc |  24 +--
  gcc/config/rs6000/vsx.md|  28 +--
  gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++
  4 files changed, 307 insertions(+), 85 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..c6a381908cb 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
 (use (match_operand:V16QI 2 "register_operand"))]
"TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn

Re: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2023-01-18 Thread Kewen.Lin via Gcc-patches
Hi Segher,

I guessed that this patch escaped from your radar. :)

As Jakub asked the status in PR106069, I applied this attached patch from 
Xionghu
to the latest trunk, re-tested it and confirmed that it's still bootstrapped and
regtested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10.

This new version has separated out direct le and be, it's more clear than 
before,
it looked good to me.  What do you think of this?  Looking forward to your 
opinion.

btw, the link in archives:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600169.html

BR,
Kewen

on 2022/8/24 09:24, Xionghu Luo wrote:
> 主题:
> Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the 
> UNSPECS [PR106069]
> From:
> Xionghu Luo 
> 日期:
> 2022/8/24, 09:24
> 
> 收件人:
> "Kewen.Lin" , Segher Boessenkool 
> 
> 抄送:
> Xionghu Luo , gcc-patches@gcc.gnu.org, David Edelsohn 
> , Segher Boessenkool 
> 
> 
> Hi Segher, I'd like to resend and ping for this patch. Thanks.
> 
> v4-0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removi.patch
> 
> From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001
> From: Xionghu Luo 
> Date: Thu, 4 Aug 2022 03:44:58 +
> Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
>  UNSPECS [PR106069]
> 
> v4: Update per comments.
> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
> patterns.
> v2: Split the direct pattern to be and le with same RTL but different insn.
> 
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>  (subreg:V4SI (reg:V16QI 139) 0)
>  (subreg:V4SI (reg:V16QI 140) 0))
>  [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
> Linux.
> 
> gcc/ChangeLog:
> 
>   PR target/106069
>   * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
>   (altivec_vmrghb_direct_be): New pattern for BE.
>   (altivec_vmrghb_direct_le): New pattern for LE.
>   (altivec_vmrghh_direct): Remove.
>   (altivec_vmrghh_direct_be): New pattern for BE.
>   (altivec_vmrghh_direct_le): New pattern for LE.
>   (altivec_vmrghw_direct_): Remove.
>   (altivec_vmrghw_direct__be): New pattern for BE.
>   (altivec_vmrghw_direct__le): New pattern for LE.
>   (altivec_vmrglb_direct): Remove.
>   (altivec_vmrglb_direct_be): New pattern for BE.
>   (altivec_vmrglb_direct_le): New pattern for LE.
>   (altivec_vmrglh_direct): Remove.
>   (altivec_vmrglh_direct_be): New pattern for BE.
>   (altivec_vmrglh_direct_le): New pattern for LE.
>   (altivec_vmrglw_direct_): Remove.
>   (altivec_vmrglw_direct__be): New pattern for BE.
>   (altivec_vmrglw_direct__le): New pattern for LE.
>   * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
>   Adjust.
>   * config/rs6000/vsx.md: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/106069
>   * g++.target/powerpc/pr106069.C: New test.
> 
> Signed-off-by: Xionghu Luo 
> ---
>  gcc/config/rs6000/altivec.md| 222 ++--
>  gcc/config/rs6000/rs6000.cc |  24 +--
>  gcc/config/rs6000/vsx.md|  28 +--
>  gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++
>  4 files changed, 307 insertions(+), 85 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..c6a381908cb 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
> (use (match_operand:V16QI 2 "register_operand"))]
>"

Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2022-08-23 Thread Xionghu Luo via Gcc-patches

Hi Segher, I'd like to resend and ping for this patch. Thanks.
From 23bffdacdf0eb1140c7a3571e6158797f4818d57 Mon Sep 17 00:00:00 2001
From: Xionghu Luo 
Date: Thu, 4 Aug 2022 03:44:58 +
Subject: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
 UNSPECS [PR106069]

v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
(altivec_vmrghb_direct_be): New pattern for BE.
(altivec_vmrghb_direct_le): New pattern for LE.
(altivec_vmrghh_direct): Remove.
(altivec_vmrghh_direct_be): New pattern for BE.
(altivec_vmrghh_direct_le): New pattern for LE.
(altivec_vmrghw_direct_): Remove.
(altivec_vmrghw_direct__be): New pattern for BE.
(altivec_vmrghw_direct__le): New pattern for LE.
(altivec_vmrglb_direct): Remove.
(altivec_vmrglb_direct_be): New pattern for BE.
(altivec_vmrglb_direct_le): New pattern for LE.
(altivec_vmrglh_direct): Remove.
(altivec_vmrglh_direct_be): New pattern for BE.
(altivec_vmrglh_direct_le): New pattern for LE.
(altivec_vmrglw_direct_): Remove.
(altivec_vmrglw_direct__be): New pattern for BE.
(altivec_vmrglw_direct__le): New pattern for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

PR target/106069
* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo 
---
 gcc/config/rs6000/altivec.md| 222 ++--
 gcc/config/rs6000/rs6000.cc |  24 +--
 gcc/config/rs6000/vsx.md|  28 +--
 gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++
 4 files changed, 307 insertions(+), 85 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..c6a381908cb 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
(use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
 (const_int 5) (const_int 21)
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 2 "register_operand" "v")
+

Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2022-08-17 Thread Xionghu Luo via Gcc-patches




On 2022/8/16 14:53, Kewen.Lin wrote:

Hi Xionghu,

Thanks for the updated version of patch, some comments are inlined.

on 2022/8/11 14:15, Xionghu Luo wrote:



On 2022/8/11 01:07, Segher Boessenkool wrote:

On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:

On 2022/8/9 11:01, Kewen.Lin wrote:

I have some concern on those changed "altivec_*_direct", IMHO the suffix
"_direct" is normally to indicate the define_insn is mapped to the
corresponding hw insn directly.  With this change, for example,
altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
misleading.  Maybe we can add the corresponding _direct_le and _direct_be
versions, both are mapped into the same insn but have different RTL
patterns.  Looking forward to Segher's and David's suggestions.


Thanks!  Do you mean same RTL patterns with different hw insn?


A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
instruction, never a vmrglb instead.  Misleading names are an expensive
problem.




Thanks.  Then on LE platforms, if user calls altivec_vmrghw,it will be
expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
For BE just strict forward, seems more clear :-), OK for master?


[PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS 
[PR106069]

v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
    (subreg:V4SI (reg:V16QI 139) 0)
    (subreg:V4SI (reg:V16QI 140) 0))
    [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
Linux(Thanks to Kewen).

gcc/ChangeLog:

 PR target/106069
 * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
 (altivec_vmrghb_direct_be): New pattern for BE.
 (altivec_vmrglb_direct_le): New pattern for LE.
 (altivec_vmrghh_direct): Remove.
 (altivec_vmrghh_direct_be): New pattern for BE.
 (altivec_vmrglh_direct_le): New pattern for LE.
 (altivec_vmrghw_direct_): Remove.
 (altivec_vmrghw_direct__be): New pattern for BE.
 (altivec_vmrglw_direct__le): New pattern for LE.
 (altivec_vmrglb_direct): Remove.
 (altivec_vmrglb_direct_be): New pattern for BE.
 (altivec_vmrghb_direct_le): New pattern for LE.
 (altivec_vmrglh_direct): Remove.
 (altivec_vmrglh_direct_be): New pattern for BE.
 (altivec_vmrghh_direct_le): New pattern for LE.
 (altivec_vmrglw_direct_): Remove.
 (altivec_vmrglw_direct__be): New pattern for BE.
 (altivec_vmrghw_direct__le): New pattern for LE.
 * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
 Adjust.
 * config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

 PR target/106069
 * g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo 
---
  gcc/config/rs6000/altivec.md    | 223 ++--
  gcc/config/rs6000/rs6000.cc |  36 ++--
  gcc/config/rs6000/vsx.md    |  24 +--
  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++
  4 files changed, 305 insertions(+), 98 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..78245f470e9 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-    : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT 
(17),
+  GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
+  GEN_INT (4), GEN_INT (20), GEN_INT (5),