[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-15 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

Vineet Gupta  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #22 from Vineet Gupta  ---
Fixed for gcc-14.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #21 from CVS Commits  ---
The master branch has been updated by Vineet Gupta :

https://gcc.gnu.org/g:d1189ceefc1da1515e039c9cfd2f5a67b5820207

commit r14-5507-gd1189ceefc1da1515e039c9cfd2f5a67b5820207
Author: Juzhe-Zhong 
Date:   Tue Nov 14 19:23:19 2023 -0800

RISC-V: fix vsetvli pass testsuite failure [PR/112447]

Fixes: f0e28d8c1371 ("RISC-V: Fix failed hoist in LICM of vmv.v.x
instruction")

Since above commit, we have following failure:

  FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions execution test
  FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test

The issue was not the commit but rather it unravelled an issue in the
vsetvli pass.

Here's Juzhe's analysis:

We have 2 types of global vsetvls insertion.
One is earliest fusion of each end of the block.
The other is LCM suggested edge vsetvls.

So before this patch, insertion as follows:

|  (insn 2817 2820 2818 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 8 [0x8])
|(const_int 7 [0x7])
|(const_int 1 [0x1]) repeated x2
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
|  (insn 2818 2817 999 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 32 [0x20])
|(const_int 1 [0x1]) repeated x3
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))

After this patch:

|  (insn 2817 2820 2819 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 32 [0x20])
|(const_int 1 [0x1]) repeated x3
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
|  (insn 2819 2817 999 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 8 [0x8])
|(const_int 7 [0x7])
|(const_int 1 [0x1]) repeated x2
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))

The original insertion order is incorrect.

We should first insert earliest fusion since it is the vsetvls information
already there which was seen by later LCM. We just delay the insertion.
So it should be come before the LCM suggested insertion.

PR target/112447

gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Insert
local vsetvl info before LCM suggested one.

Tested-by: Patrick O'Neill  # pre-commit-CI
#679
Co-developed-by: Vineet Gupta 

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #20 from JuzheZhong  ---
This is the reason of this patch:

The whole analysis is correct.But we made a mistake on inserting vsetvls.

This is the story.

We have 2 types of global vsetvls insertion.
One is earliest fusion of each end of the block.
The other is LCM suggested edge vsetvls.

So before this patch, insertion as follows:

(insn 2817 2820 2818 361 (set (reg:SI 67 vtype)
(unspec:SI [
(const_int 8 [0x8])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
 (nil))
(insn 2818 2817 999 361 (set (reg:SI 67 vtype)
(unspec:SI [
(const_int 32 [0x20])
(const_int 1 [0x1]) repeated x3
] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
 (nil))

After this patch:

(insn 2817 2820 2819 361 (set (reg:SI 67 vtype)
(unspec:SI [
(const_int 32 [0x20])
(const_int 1 [0x1]) repeated x3
] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
 (nil))
(insn 2819 2817 999 361 (set (reg:SI 67 vtype)
(unspec:SI [
(const_int 8 [0x8])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
 (nil))

The original insertion order is incorrect.

We should first insert earliest fusion since it is the vsetvls information
already there which was seen by later LCM. We just delay the insertion.
So it should be come before the LCM suggested insertion.

Feel free to investigate it with comparing before and after the patch.
Then feel free to send patch with this fix after you understand the reasons.

Thanks.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #19 from JuzheZhong  ---
Created attachment 56589
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56589=edit
bug fix patch

Hi,Vineet.

The attachment is the bug fix patch.
I have tested on both RV32 and RV64 C/C++.
No regression with reducing memset-3.c FAIL.

You can verify and send the patch.

Thanks.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #18 from JuzheZhong  ---
I know how to fix it.

Testing a candidate patch.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #17 from JuzheZhong  ---
The incorrect elimination happens on pre_global_vsetvl_info

You can try simple hack like this:

diff --git a/gcc/config/riscv/riscv-vsetvl.cc
b/gcc/config/riscv/riscv-vsetvl.cc
index 8466b5d019e..65dcf931808 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3135,6 +3135,8 @@ pre_vsetvl::pre_global_vsetvl_info ()
   for (const bb_info *bb : crtl->ssa->bbs ())
 {
   sbitmap d = m_del[bb->index ()];
+  if (bb->index () == 113 || bb->index () == 54)
+continue;
   if (bitmap_count_bits (d) == 0)
continue;


FAIL will be fixed.

So, the idea is that we should investigate why LCM calculation return

m_del to be true on BB 54 and BB 113.

The calculation is done by pre_edge_lcm_avs

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #16 from JuzheZhong  ---
The victim should be these 2 pieces of codes:

.L20:
lbu a1,0(a3)
li  t1,97
bne a1,t1,.L21
lbu t1,1(a3)
bne t1,a1,.L21
lbu a1,2(a3)
bne a1,t1,.L21
lbu t1,3(a3)
bne t1,a1,.L21
lbu a1,4(a3)
bne a1,t1,.L21
lbu t1,5(a3)
bne t1,a1,.L21
lbu a1,6(a3)
bne a1,t1,.L21
lbu a3,7(a3)
bne a3,a1,.L21
lui a3,%hi(A)
lbu a3,%lo(A)(a3)
mv  t1,a5
mv  a1,a4
bltua4,t3,.L24
mv  t1,t4
addia1,a4,-8
vmv.v.x v2,a3
vse8.v  v2,0(a2)
.L24:

.L29:
lbu t1,0(a1)
li  t6,97
bne t1,t6,.L21
lbu t6,1(a1)
bne t6,t1,.L21
lbu t1,2(a1)
bne t1,t6,.L21
lbu t6,3(a1)
bne t6,t1,.L21
lbu t1,4(a1)
bne t1,t6,.L21
lbu t6,5(a1)
bne t6,t1,.L21
lbu t1,6(a1)
bne t1,t6,.L21
lbu a1,7(a1)
bne a1,t1,.L21
mv  t1,a5
mv  a1,a4
bltua4,t3,.L31
li  t6,66
mv  t1,t4
addia1,a4,-8
vmv.v.x v2,t6
vse8.v  v2,0(a2)
.L31:

They are located on BB 54 and BB 113. Their VSETVLs should not be eliminiated.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #15 from Vineet Gupta  ---
(In reply to JuzheZhong from comment #14)
> Let me give you some guide which helps you to dig into the problem.
> 
> First, reduce the case as follows:

Did your msg get truncated or pressed send too soon ?

Because the reduced test you pasted is exactly what I uploaded to the bug and I
can't reduce it any further.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #14 from JuzheZhong  ---
Let me give you some guide which helps you to dig into the problem.

First, reduce the case as follows:

#include 

void abort (void);
void exit (int);

#ifndef MAX_OFFSET
#define MAX_OFFSET (sizeof (long long))
#endif

#ifndef MAX_COPY
#define MAX_COPY 15
#endif

#ifndef MAX_EXTRA
#define MAX_EXTRA (sizeof (long long))
#endif

#define MAX_LENGTH (MAX_OFFSET + MAX_COPY + MAX_EXTRA)

static union {
  char buf[MAX_LENGTH];
  long long align_int;
  long double align_fp;
} u;

char A = 'A';

void reset ()
{
  int i;

  for (i = 0; i < MAX_LENGTH; i++)
u.buf[i] = 'a';
}

void check (int off, int len, int ch)
{
  char *q;
  int i;

  q = u.buf;
  for (i = 0; i < off; i++, q++)
if (*q != 'a')
  abort ();

  for (i = 0; i < len; i++, q++)
if (*q != ch)
  abort ();

  for (i = 0; i < MAX_EXTRA; i++, q++)
if (*q != 'a')
  abort ();
}

int main ()
{
  int len;
  char *p;

  /* off == 0 */
  for (len = 0; len < MAX_COPY; len++)
{
  reset ();

  p = memset (u.buf, '\0', len);
  if (p != u.buf) abort ();
  check (0, len, '\0');

  p = memset (u.buf, A, len);
  if (p != u.buf) abort ();
  check (0, len, 'A');

  p = memset (u.buf, 'B', len);
  if (p != u.buf) abort ();
  check (0, len, 'B');
}

  /* off == 1 */
  for (len = 0; len < MAX_COPY; len++)
{
  reset ();

  p = memset (u.buf+1, '\0', len);
  if (p != u.buf+1) abort ();
  check (1, len, '\0');

  p = memset (u.buf+1, A, len);
  if (p != u.buf+1) abort ();
  check (1, len, 'A');

  p = memset (u.buf+1, 'B', len);
  if (p != u.buf+1) abort ();
  check (1, len, 'B');
}

  exit (0);
}

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #13 from Vineet Gupta  ---
Then I don't know where the problem actually is ?

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #12 from JuzheZhong  ---
The merge is correct here.

vmv.x.s demand SEW = 32 wheras vle/vse demand RATIO = 16 (e8mf2)

So, to make vsetvl valid for both of them, the vtype should be sew = 32 and
lmul = M2 (32 / 16 = 2).

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-14 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #11 from Vineet Gupta  ---
As a hack I commented out set_delete() to see what the extraneous vsetvli
would have been.

```
  .L36:
   # bb 3: start of outer loop: off 0

vsetvli zero,zero,e8,mf2,ta,ma # insn 2915
vse8.v  v1,0(t3)   # insn 2874, bb 3
vse8.v  v1,0(t6)
vse8.v  v1,0(s0)
  ...

# bb 181

addia4,a4,1
li  a7,15
bne a4,a7,.L36 # insn 1082

   # bb 182: start of outer loop: off 1

vsetvli zero,zero,e32,mf2,ta,ma# insn 2919
vmv.x.s a3,v1  # insn 1858
vsetvli zero,zero,e16,mf2,ta,ma
sw  a3,8(sp)
vmv.x.s a3,v1
```

Essentially phase 2 is fusing vsetvl info for insn 2874 and insn 1858
But the fused info doesn't seem right. 

vsetvli zero,zero,e32,m2,ta,ma
j   .L36

Manually modifying it to orig value fixes the test.

vsetvli zero,zero,e8,mf2,ta,ma
j   .L36

Phase 2 logs

```
  Try lift up 1.

  earliest:
Edge(bb 0 -> bb 2): n_bits = 13, set = {0 }
Edge(bb 62 -> bb 63): n_bits = 13, set = {4 }
Edge(bb 180 -> bb 181): n_bits = 13, set = {8 }
Edge(bb 181 -> bb 3): n_bits = 13, set = {2 }

Fuse curr info since prev info compatible with it:
  prev_info: VALID (insn 1858, bb 181)   <-- due to Edge(bb 181 -> bb
3)
Demand fields: demand_sew_only
SEW=32, VLMUL=mf2, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(nil)
VL=(nil)
  curr_info: VALID (insn 2874, bb 3)
Demand fields: demand_ratio_only demand_avl
SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(const_int 8 [0x8])
VL=(nil)

  prev_info after fused: VALID (insn 1858, bb 181)
Demand fields: demand_sew_lmul demand_avl
SEW=32, VLMUL=m2, RATIO=16, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(const_int 8 [0x8])
VL=(nil)
```

This fuse in turn is happening from 

DEF_SEW_LMUL_RULE (sew_only, ratio_only, sew_lmul,
   next_ratio_valid_for_prev_sew_p, always_false,
   modify_lmul_with_next_ratio)

I'm not really sure if the merge callback here is correct.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #10 from JuzheZhong  ---
(In reply to Vineet Gupta from comment #9)
> (In reply to JuzheZhong from comment #7)
> > Oh. I missed it:
> > 
> >   vmv.v.x   v2,s0
> > vse8.v  v2,0(a5)
> > 
> > Leave it to me today. It should be simple fix.
> > 
> > Thanks for report it.
> 
> Can I request you to let me continue to debug and fix it. I want to
> familiarize myself with the vsetv pass and this seems like a good
> opportunity to do so considering you think the fix is not hard.

OK. Thanks.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

Vineet Gupta  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-11-08
 Status|UNCONFIRMED |ASSIGNED

--- Comment #9 from Vineet Gupta  ---
(In reply to JuzheZhong from comment #7)
> Oh. I missed it:
> 
>   vmv.v.x v2,s0
>   vse8.v  v2,0(a5)
> 
> Leave it to me today. It should be simple fix.
> 
> Thanks for report it.

Can I request you to let me continue to debug and fix it. I want to familiarize
myself with the vsetv pass and this seems like a good opportunity to do so
considering you think the fix is not hard.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #8 from JuzheZhong  ---
Could you continue debug more cases ?


FAIL: gcc.c-torture/execute/pr89369.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr89369.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr89369.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/pr89369.c   -O3 -g  execution test
FAIL: gcc.dg/pr96239.c execution test
FAIL: gcc.dg/vshift-5.c execution test
FAIL: gcc.dg/torture/pr61346.c   -O2  execution test
FAIL: gcc.dg/torture/pr61346.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/pr61346.c   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.dg/torture/pr61346.c   -O3 -g  execution test

They are RV32 system. memset issue I will fix it soon today.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #7 from JuzheZhong  ---
Oh. I missed it:

vmv.v.x v2,s0
vse8.v  v2,0(a5)

Leave it to me today. It should be simple fix.

Thanks for report it.

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #6 from Vineet Gupta  ---
I have debugged this by single stepped in qemu 

when the test fails (first loop for offset 0, iteration 8)

The last VSETVLI is this one, 

   0x10a3e   0d107057  vsetvli  zero,zero,e32,m2,ta,ma
   0x10a42   j  0x10666

We eventually hit a VMV.v.x. which creates invalid pattern due to e32.

   (gdb) info reg vtype
   vtype  0xd1  209 # SEW = 010’b / e32, LMUL = 001’b / m2
   (gdb) info reg vl
   vl 0x8   8
   (gdb) info reg a0
   a0 0x41  65

   vmv.v.x  v2,a0

  (gdb) info reg v2
  v2 {q = {0x41004100410041} 
  (gdb) info reg v3
  v2 {q = {0x41004100410041}

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #5 from JuzheZhong  ---
I don't think VSETVL is wrong.


vsetivlizero,8,e8,mf2,ta,ma
sd  ra,120(sp)
vmv.x.s a1,v1
...
.L36:
vse8.v
...
vsetivlizero,8,e32,m2,ta,ma
j.L36

Both e8mf2 and e32m2 are valid for vse8.v since they have same ratio = 16

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #4 from Vineet Gupta  ---
Created attachment 56541
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56541=edit
asm output nok

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #3 from Vineet Gupta  ---
Created attachment 56540
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56540=edit
asm output ok

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #2 from Vineet Gupta  ---
Created attachment 56539
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56539=edit
manually reduced src

[Bug target/112447] risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3

2023-11-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112447

--- Comment #1 from JuzheZhong  ---
Could you share more assembly ?