[Bug tree-optimization/115073] New: RISC-V: Gimple fold not honor C[LT]Z_DEFINED_VALUE_AT_ZERO

2024-05-13 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115073

Bug ID: 115073
   Summary: RISC-V: Gimple fold not honor
C[LT]Z_DEFINED_VALUE_AT_ZERO
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kito at gcc dot gnu.org
  Target Milestone: ---
Target: riscv64-unknown-linux-gnu

# What's up?

A loop induction variable initialized from __builtin_ctz (x), and the loop
bound is 32, and increment is one, and GCC turn it into infinite loops when x
is 0.

However RISC-V has defined CTZ_DEFINED_VALUE_AT_ZERO as 32 for SImode, so it's
not UB IMO, but seems like gimple-range-op.cc and match.pd are not handle that.

# Command to reproduce
```
$ riscv64-unknown-elf-gcc -O3 -march=rv64gc_zba_zbb_zbc
```

# Testcase
```c
void f();
void foo(unsigned int id, unsigned int x)
{
for (unsigned int idx = __builtin_ctz(x); idx < 32; idx++) {
f();
}
}
```

# Asm output with comment:

```
foo:
addisp,sp,-32
sd  s0,16(sp)
sd  s1,8(sp)
sd  ra,24(sp)
ctzws0,a1 # s0 is 32 if a1 is 0
li  s1,32
.L2:
addiw   s0,s0,1   # thne s0 become 33 here
callf
bne s0,s1,.L2 # compare with 32, which never terminate
ld  ra,24(sp)
ld  s0,16(sp)
ld  s1,8(sp)
addisp,sp,32
jr  ra
```


# What I tried?

I try to call CTZ_DEFINED_VALUE_AT_ZERO gimple-range-op.cc but it seems not
help for this test case, and then I found it was screw up at match.pd when ccp
pass.

It applied a CTZ simplifications at match.pd:

```
 (for op (eq ne)
  (simplify
   /* __builtin_ctz (x) == C -> (x & ((1 << (C + 1)) - 1)) == (1 << C).  */
   (op (ctz:s @0) INTEGER_CST@1)
(with { tree type0 = TREE_TYPE (@0); 
int prec = TYPE_PRECISION (type0);
  } 
 (if (prec <= MAX_FIXED_MODE_SIZE)
  (if (tree_int_cst_sgn (@1) < 0 || wi::to_widest (@1) >= prec) 
   { constant_boolean_node (op == EQ_EXPR ? false : true, type); }
   (op (bit_and @0 { wide_int_to_tree (type0,
   wi::mask (tree_to_uhwi (@1) + 1,
 false, prec)); })
   { wide_int_to_tree (type0,
   wi::shifted_mask (tree_to_uhwi (@1), 1,
 false, prec)); })))
```

Then I found it has checked with CTZ_DEFINED_VALUE_AT_ZERO
(g:75f8900159133ce069ef1d2edf3b67c7bc82e305) untill
g:7383cb56e1170789929201b0dadc156888928fdd, but I realized it because is not
really work well here CLZ_DEFINED_VALUE_AT_ZERO.

So I did some aggressive experiment here: convert __builtin_ctz to IFN_CTZ with
second operand (from C[LT]Z_DEFINED_VALUE_AT_ZERO, ideally), it can work *IF*
backend provide patterns for ctz, but NOT work when backend is not provided, it
could be a problem to RISC-V since ctz is not included in baseline ISA for
RISC-V.

It might be arguable if target didn't have ctz/clz pattern but
C[LT]Z_DEFINED_VALUE_AT_ZERO is provided to backend, so I think middle-end
optimization should still honor with that?

Or another thought is convert that into target macro to resolve the issue
describe in g:75f8900159133ce069ef1d2edf3b67c7bc82e305?


# Aggressive experiment:
```
diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index 494da49791d..d84469a6dca 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -858,7 +858,16 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p
ATTRIBUTE_UNUSED,
  c, CALL_EXPR_ARG (*expr_p, 1));
return GS_OK;
  }
-   break;
+  if (fndecl && fndecl_built_in_p(fndecl, BUILT_IN_CTZ) &&
+  call_expr_nargs(*expr_p) == 1) {
+tree a = save_expr(CALL_EXPR_ARG(*expr_p, 0));
+*expr_p = build_call_expr_internal_loc(
+EXPR_LOCATION(*expr_p), IFN_CTZ, TREE_TYPE(a), 2, a,
+build_int_cst(TREE_TYPE(a), 32));
+return GS_OK;
+  }
+
+break;
   }

 default:;
```

[Bug target/114988] RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2

2024-05-09 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #3 from Kito Cheng  ---
> Can this fix backport to GCC-14 ?

Sure, GCC 14.1 released, so it open to accept fixes now :)

[Bug target/114747] [13 only] [RISC-V RVV] Wrong SEW set for mixed-size intrinsics

2024-05-06 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114747

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Kito Cheng  ---
Fixed on gcc 13 branch, and GCC 13.3 will have the fix :)

[Bug target/113095] [13 Regression] RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series

2024-04-30 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Kito Cheng  ---
Fixed on both trunk and GCC 13

[Bug target/111234] [13] RISC-V: ICE in vsetvl pass

2024-04-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111234

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Kito Cheng  ---
Backport to GCC 13

[Bug c/114885] RISC-V: ICE of unrecog insn when graphite for both the c/c++ and fortran

2024-04-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114885

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||kito at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2024-04-29

--- Comment #1 from Kito Cheng  ---
I can reproduce on my side

[Bug target/114172] [13 only] RISC-V: ICE with riscv rvv VSETVL intrinsic

2024-04-24 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114172

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Kito Cheng  ---
Fixed, and then gcc 13.3 will contain the fix, and that should release in near
future :)

[Bug target/111935] gcc ICE with risc-v vector intrinsics

2024-04-24 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111935

Kito Cheng  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
 CC||kito at gcc dot gnu.org

--- Comment #5 from Kito Cheng  ---
Checked this has fixed on trunk and GCC 13 branch

[Bug target/111234] [13] RISC-V: ICE in vsetvl pass

2024-04-24 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111234

--- Comment #4 from Kito Cheng  ---
Fixed on trunk, but still ICE on 13

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-15 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2024-04-15
 Status|UNCONFIRMED |NEW

--- Comment #3 from Kito Cheng  ---
Reduced case, not the final result, but it already run 8+ hours...
```
typedef int a;
typedef short b;
typedef unsigned c;
template < typename > using e = unsigned;
template < typename > void ab();
#pragma riscv intrinsic "vector"
template < typename f, int, int ac > struct g {
  using i = f;
  template < typename m > using j = g< m, 0, ac >;
  using k = g< i, 1, ac - 1 >;
  using ad = g< i, 1, ac + 1 >;
};
namespace ae {
struct af {
  using h = g< short, 6, 0 < 3 >;
};
struct ag {
  using h = af::h;
};
} template < typename, int > using ah = ae::ag::h;
template < class ai > using aj = typename ai::i;
template < class i, class ai > using j = typename ai::j< i >;
template < class ai > using ak = j< e< ai >, ai >;
template < class ai > using k = typename ai::k;
template < class ai > using ad = typename ai::ad;
template < a ap > vuint16m1_t ar(g< b, ap, 0 >, b);
template < a ap > vuint16m2_t ar(g< b, ap, 1 >, b);
template < a ap > vuint32m2_t ar(g< c, ap, 1 >, c);
template < a ap > vuint32m4_t ar(g< c, ap, 2 >, c);
template < class ai > using as = decltype(ar(ai(), aj< ai >()));
template < class ai > as< ai > at(ai);
namespace ae {
template < int ap > vuint32m4_t au(g< c, ap, 1 + 1 >, vuint32m2_t l) {
  return __riscv_vlmul_ext_v_u32m2_u32m4(l);
}
} template < int ap > vuint32m2_t aw(g< c, ap, 1 >, vuint16m1_t l) {
  return __riscv_vzext_vf2_u32m2(l, 0);
}
namespace ae {
vuint32m4_t ax(vuint32m4_t, vuint32m4_t, a);
}
template < class ay, class an > as< ay > az(ay ba, an bc) {
  an bb;
  return ae::ax(ae::au(ba, bc), ae::au(ba, bb), 2);
}
template < class bd > as< bd > be(bd, as< ad< bd > >);
namespace ae {
template < class bh, class bi > void bj(bh bk, bi bl) {
  ad< decltype(bk) > bn;
  az(bn, bl);
}
} template < int ap, int ac, class bp, class bq >
void br(g< c, ap, ac > bk, bp, bq bl) {
  ae::bj(bk, bl);
}
template < class ai > using bs = decltype(at(ai()));
struct bt;
template < int ac = 1 > class bu {
public:
  template < typename i > void operator()(i) {
ah< i, ac > d;
bt()(i(), d);
  }
};
struct bt {
  template < typename bv, class bf > void operator()(bv, bf bw) {
using bx = bv;
ak< bf > by;
k< bf > bz;
using bq = bs< decltype(by) >;
using bp = bs< decltype(bw) >;
bp cb;
ab< bx >();
for (;;) {
  bp cc;
  bq bl = aw(by, be(bz, cc));
  br(by, cb, bl);
}
  }
};
void d() { bu()(b()); }

```

[Bug target/114130] [11 Regression] RISC-V: `__atomic_compare_exchange` does not use sign-extended value for RV64

2024-04-12 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114130

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #6 from Kito Cheng  ---
Fixed on trunk also backport to 11~13 branch.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #4 from Kito Cheng  ---
Reduced case:
```c
typedef long c;
#pragma riscv intrinsic "vector"
template  struct d {};
struct e {
  using f = d<0>;
};
struct g {
  using f = e::f;
};
template  using h = g::f;
template  long k(d);
vbool16_t j(vuint64m4_t a) {
  c b;
  return __riscv_vmsne_vx_u64m4_b16(a, b, k(h()));
}

```

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-04-08

--- Comment #1 from Kito Cheng  ---
Confirmed, and try to reducing the testcase.

[Bug target/106530] RISCV documentation for -march= is very lacking

2024-02-16 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106530

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Kito Cheng  ---
g:19260a04ba6f75b1fae52afab50dcb43d44eb259 and
g:5a22bb250d8f4ad239e12fea9828c18a0aa23e38 should address this issue :)

[Bug target/109349] riscv: Add --print-supported-extensions

2024-02-16 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109349

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #6 from Kito Cheng  ---
Implemented on trunk now :)

[Bug target/113742] ICE: RTL check: expected elt 1 type 'i' or 'n', have 'e' (rtx set) in riscv_macro_fusion_pair_p, at config/riscv/riscv.cc:8416 with -O2 -finstrument-functions -mtune=sifive-p600-se

2024-02-04 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113742

--- Comment #1 from Kito Cheng  ---
Thanks, forward and assigned this to our (SiFive) engineer :)

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #23 from Kito Cheng  ---
> I am considering whether we should disable LICM for RISC-V by default if 
> vector is enabled ?

That's will cause regression for other program, also may hurt those program not
vectorized but benefited from LICM.

[Bug target/113240] Use wrong rule to pass fixed-length(size<=2*XLEN) vector argument

2024-01-04 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113240

--- Comment #6 from Kito Cheng  ---
> There needs to be a -Wabi warning for this too for the change between 
> versions.

This bug only happened on trunk, and GCC 13 is OK, so I think it's not the
case?

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-12 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #20 from Kito Cheng  ---
```
.L15:
li  a3,9
lui a4,%hi(s)
sw  a3,%lo(j)(t2)
sh  a5,%lo(s)(a4) <--a4 is hold the address of s
beq t0,zero,.L42
sw  t5,8(t4)
vsetvli zero,a4,e8,m8,ta,ma  <<--- a4 as avl
```

[Bug target/112817] RISC-V: RVV: provide a preprocessor macro for VLS codegen

2023-12-05 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112817

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #8 from Kito Cheng  ---
This topic has raised at last RISC-V GCC sync meeting, and one action item for
me is chat with JuzheZhong about -mrvv-vector-bits=zvl / __riscv_v_fixed_vlen /
riscv_rvv_vector_bits stuffs

[Bug target/112478] riscv: asm clobbers not honored

2023-11-16 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112478

Kito Cheng  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Kito Cheng  ---
Fixed on trunk :)

[Bug target/112109] Missing riscv vectorized strcmp (and other) expanders

2023-11-15 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112109

--- Comment #1 from Kito Cheng  ---
Just note:

I would like to introduce `-mstringop-strategy=`, `-mmemcpy-strategy=` and
-mmemset-strategy=` option to control the behavior like x86.

the possible option list from my mind is:

- auto: current status, use scalar or vector
- libcall: always fallback to lib call
- scalar: Only scalar
- vector: Only vector

I guess we may need few more option to control some detail, but it could add it
to --param later.

[Bug target/112537] Is there a way to disable cpymem pass for rvv

2023-11-14 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537

--- Comment #11 from Kito Cheng  ---
It's not scope of auto vectorization, so I would suggest add something like
`-mstringop-strategy=*` or `-mmemcpy-strategy=*` (from x86) or
`-param=riscv-mops-memcpy-size-threshold=` (from aarch64).

Personally I prefer x86 approach.

[Bug target/112537] Is there a way to disable cpymem pass for rvv

2023-11-14 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537

--- Comment #8 from Kito Cheng  ---
That remind me we may need one option like something -mgeneral-regs-only in
aarch64 and also for target attribute.

BTW, clang has an generic option called -mno-implicit-float can did similar
thing

[Bug target/112478] riscv: asm clobbers not honored

2023-11-14 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112478

--- Comment #8 from Kito Cheng  ---
Proposed fix:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636466.html

[Bug target/112527] RVV integer vector instructions generated with rv64gc_zvfh

2023-11-14 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112527

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng  ---
Just some boring supplement for the damm arch string since I guess not everyone
know that rule well:

zvfh require zve32f and zfhmin

that means rv64gc_zvfh is equivalent to rv64gc_zvfh_zvfhmin_zve32f

so rv64gc_zvfh has vector, but only zve32f.

[Bug target/112478] riscv: asm clobbers not honored

2023-11-13 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112478

Kito Cheng  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||kito at gcc dot gnu.org
   Last reconfirmed||2023-11-14
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |kito at gcc dot gnu.org

--- Comment #6 from Kito Cheng  ---
Oh, I guess I know what happened, I was confused by the commit you refer (but
it's the root cause as you pointed out!) since I thought it may related to far
jumps, but...actually not, the problem is something you describe in the title,
and can be demonstrate by following small program:

```c
void foo() {
asm volatile("# " : ::"ra");
}

```

Before that commit:
```asm
foo:
addisp,sp,-16
sd  ra,8(sp)
 #APP
# 2 "x.c" 1
# 
# 0 "" 2
 #NO_APP
ld  ra,8(sp)
addisp,sp,16
jr  ra

```

After that commit:
```asm
foo:
.LFB0:
.cfi_startproc
#APP
# 2 "x.c" 1
# 
# 0 "" 2
#NO_APP
ret
```

But why? because ra is accidentally become caller save register by following
change:

https://github.com/gcc-mirror/gcc/commit/71f906498ada9ec2780660b03bd6e27a93ad350c#diff-4083cffa971a940af1d435359a45dbfd4d5934384275b0ae5e0c71dece5fd866R331

So we no longer save it at prologue and epilogue longer...anyway I will take
this and send a patch to fix that soon.

[Bug target/112433] RISC-V GCC-15 feature: Split register allocation into RVV and non-RVV, and make vsetvl PASS run between them

2023-11-13 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112433

--- Comment #4 from Kito Cheng  ---
Yeah, 3 major goal in LLVM is improving scheduling, partial spilling and
re-materialization, but none of those points are issue for RISC-V GCC :P

Ref:
https://docs.google.com/presentation/d/1BOYNYKe1T-u3Q5HXRrcObLUkdKSPASmnuQTkALvJXto/edit

[Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV

2023-11-08 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #12 from Kito Cheng  ---
oh, yeah, you are right, it already take a5 to splat, so it's right, and as you
said it must be VLMAX, unless it AVL prorogation for both splat and the
following vadd.vv

[Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV

2023-11-08 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #10 from Kito Cheng  ---
(In reply to JuzheZhong from comment #9)
> I have a draft patch to fix it:
> 
> foo:
>   ble a0,zero,.L5
>   vsetvli a5,zero,e32,m1,ta,ma
>   vid.v   v2
> .L3:
>   vsetvli a5,a0,e32,m1,ta,ma
>   sllia4,a5,2
>   vle32.v v3,0(a1)
>   sub a0,a0,a5
>   vadd.vv v1,v2,v3
>   vse32.v v1,0(a2)
>   add a1,a1,a4
>   add a2,a2,a4
>   vsetvli a4,zero,e32,m1,ta,ma
>   vmv.v.x v1,a5

 this splat must be under "vsetvli  a5,a0,e32,m1,ta,ma" rather than
"vsetvlia4,zero,e32,m1,ta,ma"

>   vadd.vv v2,v2,v1
>   bne a0,zero,.L3
> .L5:
>   ret
> 
> Seems correct ?

[Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV

2023-11-08 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #8 from Kito Cheng  ---
> Oh. I understand it now. I think it's a bug.
> 
> And.. I just take a look at my internal LLVM...
> Also has same issue
> 
> I think we need to adapt the Gimple IR here:
> 
>   _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
>   _21 = vect_vec_iv_.6_22 + { POLY_INT_CST [4, 4], ... };
> 
> change it into:
> 
>   _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
>   _21 = vect_vec_iv_.6_22 + _35;

Yeah, so...I guess the original report still valid, it's just bring up another
potential bug :P

Personally I really hate that magic constraint for vl but it's just too
late.

[Bug target/112438] RISC-V: Failed to AVL propagation through induction variable

2023-11-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #6 from Kito Cheng  ---
The key is the splat of VLMAX instruction need move into loop body, but AVL
propagation should still able to do:

```
foo(int, int*, int*):
ble a0,zero,.L5
csrra5,vlenb
srlia5,a5,2
vsetvli a3,zero,e32,m1,ta,ma
vid.v   v2
.L3:
vsetvli a5,a0,e32,m1,ta,ma
sllia4,a5,2
vle32.v v1,0(a1)
sub a0,a0,a5
vadd.vv v1,v1,v2
vse32.v v1,0(a2)
add a1,a1,a4
vmv.v.x v4,a5   # Move to here, splat vl to a5 rather than
VLMAX
vsetvli a5,zero,e32,m1,ta,ma --- > redundant

add a2,a2,a4
vadd.vv v2,v2,v4
bne a0,zero,.L3
.L5:
ret
```

[Bug target/112438] RISC-V: Failed to AVL propagation through induction variable

2023-11-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #5 from Kito Cheng  ---
Assume:

VLEN = 128 and n = 5, *in is {0, 0, 0, 0, 0}
so VLMAX = 4 for e32m1

It can be run with vl = 4 for first iteration, and vl = 1 vl for second
iteration

But it could be something like that: vl = 3 for first iteration and vl = 2 for
second iteration, ok, let run the code with that:

foo(int, int*, int*):
ble a0,zero,.L5
csrra5,vlenb
srlia5,a5,2
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v4,a5 # v4 = {4, 4, 4, 4}
vid.v   v2# v2 = {0, 1, 2, 3}
.L3:
vsetvli a5,a0,e32,m1,ta,ma# first iteration got vl = 3
sllia4,a5,2
vle32.v v1,0(a1)  # v1 = {0, 0, 0}
sub a0,a0,a5
vadd.vv v1,v1,v2  # v1 = {0, 0, 0} + {0, 1, 2}
vse32.v v1,0(a2)  # out = {0, 1, 2, 0, 0}
add a1,a1,a4
vsetvli a5,zero,e32,m1,ta,ma
add a2,a2,a4
vadd.vv v2,v2,v4  # v2 = {0, 1, 2, 3} + {4, 4, 4, 4}
  #= {4, 5, 6, 7}
bne a0,zero,.L3
.L5:
ret

Ok, let run second iteration:

.L3:
vsetvli a5,a0,e32,m1,ta,ma# first iteration got vl = 2
sllia4,a5,2
vle32.v v1,0(a1)  # v1 = {0, 0}
sub a0,a0,a5
vadd.vv v1,v1,v2  # v1 = {0, 0} + {4, 5}
vse32.v v1,0(a2)  # out = {0, 1, 2, 4, 5}
add a1,a1,a4
vsetvli a5,zero,e32,m1,ta,ma
add a2,a2,a4
vadd.vv v2,v2,v4  # v2 = {4, 5, 6, 7} + {4, 4, 4, 4}
  #= {8, 9, 10, 11}
bne a0,zero,.L3

And the you will got {0, 1, 2, 4, 5} rather than {0, 1, 2, 3, 4}

[Bug target/112438] RISC-V: Failed to AVL propagation through induction variable

2023-11-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #2 from Kito Cheng  ---
oh, but the root cause might be little bit deeper, not just the problem of
propagation or not propagation the AVL.

[Bug target/112438] RISC-V: Failed to AVL propagation through induction variable

2023-11-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng  ---
Actually I suspect that should be a bug rather than missed-optimization, that
will only trigger on some CPU implementation, because ISA spec didn't guarantee
penultimate iteration will always got VLMAX for vl...

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#63-constraints-on-setting-vl

[Bug c/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-11-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #3 from Kito Cheng  ---
Share some thought from my end: we've tried at least 3 different approach on
LLVM side before, and now we model that as "partial early clobber", we plan to
upstream  this on LLVM side but just didn't get high enough priority yet :(

What means? Give some practical example to demo the idea:

1. It's normal live range without early clobber

vadd x, y z # y and z is dead after this use.

|-|
| read  | yz  |
| write | x   |
|-|


2. It's live range with early clobber.

vadd x, y z # y and z is dead after this use, and assume x is early clobber.

|-|
| read  | x   yz  |
| write | x   |
|-|


3. It's live range with partial early clobber.

vwadd.vv x, y, z # x is two time larger than y and z

So we split x into xh and xl to represent the high part and low part, and
assume  high part can be overlap with others.

||
| read  |xl  yz  |
| write | xh xl  |
||

And following case is assume high part can overlap with others:

||
| read  | xh yz  |
| write | xh xl  |
||

Then the register allocator should able to did the overlapping allocation
naturally IF we build live range.

[Bug c/112433] RISC-V GCC-15 feature: Split register allocation into RVV and non-RVV, and make vsetvl PASS run between them

2023-11-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112433

--- Comment #1 from Kito Cheng  ---
Give few more background why LLVM must do that way: LLVM can't allocate new
pseudo register during register allocation process, however spilling vector
register with specific length may require scratch register to setting the VL.

And the benefit of more exactly live range for GPR is kind of by-products which
we didn't aware during the discussion stage :P

[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c

2023-10-26 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #4 from Kito Cheng  ---
The testcase it self is look like tricky but right, 
it typically could use to optimize mixed-width (mixed-SEW) operations,

You can refer to the EEW stuffs in v-spec[1], most load store has encoding
static-EEW and then could apply such vsetvli fusion optimization.

[1]
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#52-vector-operands

Give a (more) practical example here:

```c
#include "riscv_vector.h"

void foo(int32_t *in1, int16_t *in2, int16_t *in3, int32_t *out, size_t n, int
cond, int avl) {
size_t vl = __riscv_vsetvl_e16mf2(avl);
vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
vint16mf2_t b = __riscv_vle16_v_i16mf2(in2, vl);
vint16mf2_t c = __riscv_vle16_v_i16mf2(in3, vl);
vint32m1_t x = __riscv_vwmacc_vv_i32m1(a, b, c, vl);
__riscv_vse32_v_i32m1(out, x, vl);
}

```

> Is is guaranteed by the RVV specification that the value of `vl' produced
> (which is then supplied as an argument to `__riscv_vle32_v_i32m1', etc.;
> I presume implicitly via the VL CSR as I can't see it in actual assembly
> produced) is going to be the same for all microarchitectures for both:
>
>   vsetvli zero,a6,e32,m1,tu,ma
>
>and:
>
>   vsetvli zero,a6,e16,mf2,ta,ma

This is another trick in this case: tail agnostic vs tail undisturbed

tail undisturbed has stronger semantic than tail agnostic, so using tail
undisturbed for agnostic is always safe and satisfied the semantic, same for
mask agnostic vs mask undisturbed.

But performance is another story, as I know some uArch implement agnostic as
undisturbed, which means agnostic or undisturbed no much difference, so fuse
those two vsetvli is become kind of optimization.

However you could imagine, that also means some uArch is implement agnostic in
another way: agnostic MAY has better performance than undisturbed, we should
not fuse those vsetvli IF we are targeting such target, anyway, our cost model
for RVV still in an initial states, so personally I am fine with that for now,
but I guess we need add some more stuff to -mtune to handle those difference.

[Bug target/111926] RISC-V: Use vsetvl insn replace csrr vlenb insn

2023-10-22 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111926

--- Comment #2 from Kito Cheng  ---
Forgot to mention, personally I love idea to simplify code gen, I could imagine
that's definitely an optimization for specific uarch :)

[Bug target/111926] RISC-V: Use vsetvl insn replace csrr vlenb insn

2023-10-22 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111926

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng  ---
Plz leave an option to let user has choice, performance things is hard to saw
which is absolutely better for all uarch, my thought is leaving an option and
let mtune and a command line option to control that.

[Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function

2023-10-12 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

Bug ID: 111791
   Summary: RISC-V: Strange loop vectorizaion on popcount function
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kito at gcc dot gnu.org
  Target Milestone: ---

Symptom:

A typical popcount implementation with Brian Kernighan’s algorithm, vectorizer
has recognized that as popcount, but...come with strange vectorization result,
I  know that might because I add -fno-vect-cost-model, but I still don't
understand why it vectorized, so I guess maybe it's something worth to report.

NOTE:
Those bad/strange code gen will gone once scalar popcount instruction
available. 

Case:
```
int popcount(unsigned long value)
  {
int nbits;
for (nbits = 0; value != 0; value &= value - 1)
  nbits++;
return nbits;
  }

```

Command to reproduce:
```
$ riscv64-unknown-linux-gnu-gcc x.c -march=rv64gcv -o - -S -fno-vect-cost-model
-O3
```

Sha1: g:faae30c49560f1481f036061fa2f894b0f7257f8 (some random point of top of
trunk)

Current output:
```
.globl  popcount
.type   popcount, @function
popcount:
.LFB0:
.cfi_startproc
beq a0,zero,.L4
addisp,sp,-16
.cfi_def_cfa_offset 16
sd  ra,8(sp)
.cfi_offset 1, -8
call__popcountdi2
csrra2,vlenb
sext.w  a0,a0
srlia2,a2,2
vsetvli a3,zero,e32,m1,ta,ma
vid.v   v1
.L3:
vsetvli a5,a0,e8,mf4,ta,ma
sub a0,a0,a5
vsetvli a3,zero,e32,m1,ta,ma
vmv1r.v v3,v1
vmv.v.x v2,a2
vadd.vv v1,v1,v2
bne a0,zero,.L3
ld  ra,8(sp)
.cfi_restore 1
addia5,a5,-1
vadd.vi v3,v3,1
vslidedown.vx   v3,v3,a5
addisp,sp,16
.cfi_def_cfa_offset 0
vmv.x.s a0,v3
jr  ra
.L4:
li  a0,0
ret
.cfi_endproc
.LFE0:
.size   popcount, .-popcount

```

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-03 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600

--- Comment #14 from Kito Cheng  ---
Some info for generated files:

-
File   blankcomment   code
-
insn-output.cc  3532  350291631721
insn-emit.cc   37288  402401161790
insn-recog.cc  44203 23 428130
insn-attrtab.cc 1014 30 169934
gimple-match-2.cc 77  2  49303
gimple-match-9.cc 29  2  33073
insn-extract.cc  241  8  28934
gimple-match-1.cc105  2  25578
gimple-match-8.cc114  2  24348
options.cc   325  1  24175
insn-opinit.cc12  6  20156
generic-match-9.cc55  2  19080
gimple-match-3.cc 98  2  17433
gimple-match-7.cc108  2  17105
gimple-match-10.cc   129  2  16888
gimple-match-6.cc115  2  16836
gimple-match-4.cc 97  2  16830
gimple-match-5.cc 99  2  16377
generic-match-3.cc57  2  16138
options-save.cc 1037 19  15121
generic-match-4.cc70  2  14095
gtype-desc.cc679 30  12597
insn-automata.cc  73 11  11735
generic-match-2.cc60  2  11543
generic-match-1.cc56  2  11504
generic-match-7.cc66  2  10238
generic-match-5.cc71  2  10231
generic-match-10.cc   66  2   9860
generic-match-6.cc61  2   9853
generic-match-8.cc53  2   9651
insn-modes.cc750410   7655
min-insn-modes.cc  9  2   2280
gengtype-lex.cc  398424   2126
insn-preds.cc146 32   1515
insn-dfatab.cc31  3   1230
insn-latencytab.cc26  3   1142
gcc-ranlib.cc 55 49196
insn-enums.cc  6  2173
insn-peep.cc   7  2 34
cc1-checksum.cc0  0  3
cc1plus-checksum.cc0  0  3

[Bug bootstrap/111664] [14 regression] Fails to build with mawk (error in gcc/opt-read.awk) after r14-4354-ge4a4b8e983bac8

2023-10-02 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111664

Kito Cheng  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2023-10-03

--- Comment #3 from Kito Cheng  ---
Proposed fix, and verified with mawk on my machine :)

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631785.html

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-02 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #13 from Kito Cheng  ---
I guess we may need something like this g:703417a0 for those generator for md
file?

[Bug target/111412] RISC-V:ICE in phase 6 of vsetvl pass

2023-09-18 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111412

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
 CC||kito at gcc dot gnu.org

--- Comment #2 from Kito Cheng  ---
fixed

[Bug target/111372] libgcc: RISCV C++ exception handling stack usage grew in 13.1

2023-09-14 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111372

--- Comment #5 from Kito Cheng  ---
> Ok, but it's better to have configure option or something else just
> for toolchains that definitely do not use vector extension

I can understand that there would be such a demand in the embedded world, but
that's not critical issue, so this won't get high priority to most RISC-V GCC
developer, it would be appreciate if you could send a patch for that.

[Bug target/110277] RISC-V: ICE when build RVV intrinsic float reduction with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.

2023-09-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110277

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||kito at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #3 from Kito Cheng  ---
Fixed on trunk

[Bug target/110299] RISC-V: ICE when build RVV intrinsic widen with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.

2023-09-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110299

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||kito at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #2 from Kito Cheng  ---
Fixed on trunk

[Bug target/111037] RISC-V: Invalid vsetvli fusion

2023-09-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111037

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Kito Cheng  ---
Fixed

[Bug target/111074] RISC-V: segmentation fault during RTL pass: vsetvl

2023-09-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111074

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
 CC||kito at gcc dot gnu.org

--- Comment #2 from Kito Cheng  ---
Fixed

[Bug target/110560] internal compiler error: in extract_constrain_insn_cached, at recog.cc:2704

2023-09-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110560

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Kito Cheng  ---
Should fixed now

[Bug target/109773] RISC-V: ICE when build RVV Intrinsic in Both GCC 13 && GCC 14

2023-09-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109773

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Kito Cheng  ---
Fixed on upstream for a while.

[Bug target/109725] [14 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4430

2023-08-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109725

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #6 from Kito Cheng  ---
Ok for back port :)

[Bug target/111065] [RISCV] t-linux-multilib specifies incorrect multilib reuse patterns

2023-08-18 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111065

--- Comment #4 from Kito Cheng  ---
I guess I skip too much detail here, the multilib for linux isn’t really honor
to the reause rule in the multilib config file for a while.

That just control how multilib build, e.g. build ilp32 with which arch, and we
will find matched ABI, but why we did that? The reason is simplify the reuse
rule, RISC-V has huge number of extension now, so enumeration the possible
combination are almost impossible.

But why it can’t use same scheme as baremetal? Okay, that’s because we encode
the abi in the path only, unlike baremetal we have encode both abi and arch, it
kinda of de facto ABI in linux/glibc, also it not make too much sense to having
too much different multilib within a (RISC-V) linux system.

[Bug target/111065] [RISCV] t-linux-multilib specifies incorrect multilib reuse patterns

2023-08-18 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111065

Kito Cheng  changed:

   What|Removed |Added

Version|og13 (devel/omp/gcc-13) |14.0
 CC||kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng  ---
One major issue around multilib for linux is we only encode abi to the path, so
it hard to extend that like baremetal toolchain.

[Bug target/111037] New: RISC-V: Invalid vsetvli fusion

2023-08-16 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111037

Bug ID: 111037
   Summary: RISC-V: Invalid vsetvli fusion
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kito at gcc dot gnu.org
CC: juzhe.zhong at rivai dot ai
  Target Milestone: ---
Target: riscv64

Reduced case:
```
#include 

void foo(_Float16 y, int64_t *i64p)
{
  vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
  vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
  vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
  asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
}
```

Command to reproduce:
$ riscv64-unknown-elf-gcc -O3 -march=rv64gczve64f_zvfh

foo:
vsetivlizero,1,e64,m1,ta,ma
vle64.v v1,0(a0)
vfmv.s.fv2,fa0 # Will raise illegal instruction here, because
we don't have F64 for vector 
vadd.vv v1,v1,v1
ret

[Bug target/110812] Missing TARGET_OPTION_SAVE/RESTORE on riscv

2023-07-26 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110812

Kito Cheng  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng  ---
Ooops, I thought those target hook should implement when we have implement
target attribute, anyway thanks for the hint!

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-07-20 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #4 from Kito Cheng  ---
> OK, so TA is either merge or all-ones.

Yes, your understand is correct, just few more detail is that can be mixing
with either merge or all-ones.

e.g.

An 4 x i32 vector with mask 1 0 1 0

Op  =  | a | b | c | d |
Mask = | 1 | 0 | 1 | 0 |

the result could be:
| a | b | c | d |
| a | all-1 | c | d |
| a | all-1 | c | all-1 |
| a | all-1 | c | d |


> Not sure how you can use MA at the moment since you specify an existing 
> operand in your target hook.  As far as
> I can see there's no value the target hook can provide that matches any
of the implementation semantics?

That's the key point - we don't know how to return an undefined value there, we
have intrinsic can generate undefined value, but it seems impossible to
generate that within the hook.

[Bug target/110748] RISC-V: optimize store of DF 0.0

2023-07-20 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748

--- Comment #2 from Kito Cheng  ---
And seems we already has such constraint for a while, not sure why GCC 13 did
that, I saw the status has changed to ASSIGNED, so I assume Vineet you are
already spending time on that, so I will just stop there :)

[Bug target/110748] RISC-V: optimize store of DF 0.0

2023-07-20 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng  ---
hmmm, weird, GCC 12 did well but something wrong after GCC 13?

https://godbolt.org/z/ToM1qTxrq

void zd(double *d) { *d = 0.0;  }
void zf(float *f) { *f = 0.0;  }

GCC 12:

zd:
sd  zero,0(a0)
ret
zf:
sw  zero,0(a0)
ret

GCC 13:
zd:
fmv.d.x fa5,zero
fsd fa5,0(a0)
ret
zf:
fmv.s.x fa5,zero
fsw fa5,0(a0)
ret

[Bug target/110696] RISC-V: -march doesn't imply correctly

2023-07-17 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110696

Kito Cheng  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-07-17
 Status|UNCONFIRMED |NEW

--- Comment #2 from Kito Cheng  ---
Fixed on upstream, but will wait one more week for backporting to GCC 13 branch

[Bug target/110478] RISC-V multilib gcc zicsr in the -march causing incorrect libgcc to be used

2023-06-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110478

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #3 from Kito Cheng  ---
I've fix few multilib issue on linux, however it's unfortunately fixed after
GCC 13.1 release...could you try trunk or releases/gcc-13 branch to see if that
issue resolved?

https://github.com/gcc-mirror/gcc/commit/6f0eb99c9bda726f953bdbe06dd3489a26af2823
https://github.com/gcc-mirror/gcc/commit/49d596e90deedbe9c7a1aa5824fb484fe3ad3193
https://github.com/gcc-mirror/gcc/commit/554aabc26786891ffb4d542c359eca0cef407ed1

[Bug target/110448] [RISC-V] RVV intrinsic api test error

2023-06-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110448

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Kito Cheng  ---
That might be annoying, but we (SiFive) promise that is we won't made any
incompatible change after RVV intrinsic 1.0 release.

So I gonna close this bug as resolved/invalid.

[Bug target/110448] [RISC-V] RVV intrinsic api test error

2023-06-28 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110448

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng  ---
That's incompatible change at RVV intrinsic spec land.

see https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222

[Bug target/110264] internal compiler error: riscv_vector::vector_insn_info::get_avl_reg_rtx

2023-06-26 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264

Kito Cheng  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Kito Cheng  ---
Fixed on trunk and backported to GCC 13

[Bug target/110188] gcc for RISC-V stack aligned error

2023-06-09 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110188

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #5 from Kito Cheng  ---
Each stack area will align to 16 byte, that could be optimized in theory, but
will complicate the frame layout implementation.

sp - 0
a9 / outgoing stack arguments area
sp - 4
 / outgoing stack arguments area
sp - 8
 / outgoing stack arguments area
sp - 12
 / outgoing stack arguments area
sp - 16
 / GPR save area
sp - 20
 / GPR save area
sp - 24
 / GPR save area
sp - 28
ra   / GPR save area
sp - 32



Complete layout has document in riscv.cc:

+---+
|   |
|  incoming stack arguments |
|   |
+---+ <-- incoming stack pointer
|   |
|  callee-allocated save area   |
|  for arguments that are   |
|  split between registers and  |
|  the stack|
|   |
+---+ <-- arg_pointer_rtx
|   |
|  callee-allocated save area   |
|  for register varargs |
|   |
+---+ <-- hard_frame_pointer_rtx;
|   | stack_pointer_rtx + gp_sp_offset
|  GPR save area|   + UNITS_PER_WORD
|   |
+---+ <-- stack_pointer_rtx + fp_sp_offset
|   |   + UNITS_PER_HWVALUE
|  FPR save area|
|   |
+---+ <-- frame_pointer_rtx (virtual)
|   |
|  local variables  |
|   |
  P +---+
|   |
|  outgoing stack arguments |
|   |
+---+ <-- stack_pointer_rtx

[Bug target/109972] RISC-V: Could use umodsi3/udivsi3/divsi3 libcalls for 32-bit division/remainder on RV64 without M extension

2023-06-02 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109972

--- Comment #3 from Kito Cheng  ---
We care but it's lower priority compare to other configuration, so create bug
to tracking here should be best solution for now :P

[Bug target/109974] RISCV: RVV VSETVL Pass ICE in SLP auto-vectorization

2023-05-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Kito Cheng  ---
Fixed on trunk

[Bug target/109547] [13] RISC-V: Multiple vsetvli for load/store loop

2023-05-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109547

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Kito Cheng  ---
Fixed on both trunk and gcc 13

[Bug target/109743] RISC-V: Unnecessary VSETVLI of the RVV intrinsic in loop

2023-05-12 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Kito Cheng  ---
Fixed on trunk

[Bug target/109748] RISC-V: Mis code gen for the RVV intrinsic VSETVL

2023-05-05 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109748

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Kito Cheng  ---
Should be resolved at trunk.

[Bug target/109748] RISC-V: Mis code gen for the RVV intrinsic VSETVL

2023-05-05 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109748

--- Comment #1 from Kito Cheng  ---
Is this also happened in GCC 13 branch?

[Bug target/109535] [13 regression] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471

2023-05-03 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #16 from Kito Cheng  ---
Fixed both on trunk and GCC 13 branch :)

[Bug target/109617] RISC-V: ICE for vlmul_ext_v intrinsic API

2023-05-02 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109617

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Kito Cheng  ---
fixed on trunk

[Bug target/109272] RISCV: vbool*_t opportunities of a better code generation

2023-04-25 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109272

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Kito Cheng  ---
Fixed on trunk

[Bug target/109547] [13] RISC-V: Multiple vsetvli for load/store loop

2023-04-21 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109547

Kito Cheng  changed:

   What|Removed |Added

Summary|RISC-V: Multiple vsetvli|[13] RISC-V: Multiple
   |for load/store loop |vsetvli for load/store loop
   Target Milestone|--- |13.2

[Bug target/109535] [13/14] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471

2023-04-20 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535

Kito Cheng  changed:

   What|Removed |Added

   Target Milestone|--- |13.2
Summary|internal compiler error: in |[13/14] internal compiler
   |finalize_new_accesses, at   |error: in
   |rtl-ssa/changes.cc:471  |finalize_new_accesses, at
   ||rtl-ssa/changes.cc:471

[Bug target/109547] RISC-V: Multiple vsetvli for load/store loop

2023-04-18 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109547

Kito Cheng  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-04-19

--- Comment #1 from Kito Cheng  ---
Confirmed.

[Bug target/109535] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471

2023-04-17 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535

Kito Cheng  changed:

   What|Removed |Added

   Last reconfirmed||2023-04-17
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |juzhe.zhong at rivai 
dot ai

[Bug target/109104] [13/14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1171 with -fzero-call-used-regs=all -march=rv64gv

2023-04-17 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109104

Kito Cheng  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Kito Cheng  ---
Fixed on trunk

[Bug target/109535] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471

2023-04-17 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535

--- Comment #5 from Kito Cheng  ---
Confirmed the the output is text file, it's just suffixed with .out

[Bug target/109479] [RISC-V] Build vint64m1_t with rv64gc_zve32x_zvl64b should promote information like "vint64m1_t requires the 'zve64x' extensions"

2023-04-12 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109479

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Kito Cheng  ---
Fixed on upstream now :)

[Bug target/109479] [RISC-V] Build with rv64gc_zve32x_zvl64b should fail but actually not

2023-04-12 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109479

Kito Cheng  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-04-12

--- Comment #3 from Kito Cheng  ---
Title might little bit misleading, -march=rv64gc_zve32x_zvl64b is valid arch
configuration, invalid thing is vint64m*_t and vuint64m*_t are invalid for
rv64gc_zve32x.

[Bug bootstrap/109461] build gcc for riscv target failed with `execvp: /bin/sh: Argument list too long error when using with --with-multilib-generator`

2023-04-09 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109461

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #2 from Kito Cheng  ---
https://github.com/gcc-mirror/gcc/commit/5ca9980fc86242505ffdaaf62bca1fd5db26550b
https://github.com/gcc-mirror/gcc/commit/d72ca12b846a9f5c01674b280b1817876c77888f

New multi-lib selection scheme should improve this, so that you don't need to
specify so loong multi-lib config.

I guess I should write more doc and adding release note to mention that.

[Bug target/109328] [13 Regression] Build fail in RISC-V port

2023-03-31 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109328

Kito Cheng  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Kito Cheng  ---
Verified with crosstool-ng, also fixed several missing dependency in t-riscv

[Bug target/109349] riscv: Add --print-supported-extensions

2023-03-30 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109349

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #4 from Kito Cheng  ---
MaskRay:

I would prefer to adding -march=help rather than --print-supported-extensions
on GNU toolchain side, that should be satisfy the conventions in GCC and also
having consistent with clang, although I am personally prefer -march=? rather
than -march=help, but I know clang has rename -mcpu=? -mtune=? to -mcpu=help
and -mtune=help, anyway that's minor.

BTW, 4vtomat is our(SiFive) team member, so actually we've plan to add that on
GNU toolchain side but because it's stage 4 for GCC so I still hold there :P


Andrew Pinski:

Yeah, I plan to make up document stuffs and release notes at April...

[Bug target/109104] [13 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1171 with -fzero-call-used-regs=all -march=rv64gv

2023-03-30 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109104

Kito Cheng  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||kito at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |pan2.li at intel dot com

--- Comment #4 from Kito Cheng  ---
Pan Li from Intel is working on fixing that

[Bug target/109328] [13 Regression] Build fail in RISC-V port

2023-03-30 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109328

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Kito Cheng  ---
I can reproduce the problem with crosstool-ng, and it has resolved by Andrew
Pinski's fix, I am reviewing the dependency in the file.

Plan to drop a complete version of patch later :)

[Bug target/109312] Missing __riscv_v_intrinsic

2023-03-28 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109312

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Kito Cheng  ---
Fixed, let me know if you got any issue on RVV intrinsic, thanks :)

[Bug target/109228] warning: implicit declaration of function '__riscv_vlenb'

2023-03-22 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109228

Kito Cheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Kito Cheng  ---
Fixed!

[Bug target/109244] internal compiler error: in setup_preferred_alternate_classes_for_new_pseudos, at ira.cc:2892

2023-03-22 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109244

Kito Cheng  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Kito Cheng  ---
Fixed, let us know if you got any issue on compiling or testing highway!

[Bug target/109244] internal compiler error: in setup_preferred_alternate_classes_for_new_pseudos, at ira.cc:2892

2023-03-22 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109244

--- Comment #4 from Kito Cheng  ---
Gonna commit the fix soon, and following code is the reduced case which is
reduced from your attachment.


Reduced case (reduced by creduce)

typedef int a;
using c = float;
template < typename > using e = int;
#pragma riscv intrinsic "vector"
template < typename, int, int f > struct aa {
  using g = int;
  template < typename > static constexpr int h() { return f; }
  template < typename i > using ab = aa< i, 0, h< i >() >;
};
template < int f > struct p { using j = aa< float, 6, f >; };
template < int f > struct k { using j = typename p< f >::j; };
template < typename, int f > using ac = typename k< f >::j;
template < class ad > using l = typename ad::g;
template < class g, class ad > using ab = typename ad::ab< g >;
template < class ad > using ae = ab< e< ad >, ad >;
template < int m > vuint32mf2_t ai(aa< a, m, -1 >, a aj) {
  return __riscv_vmv_v_x_u32mf2(aj, 0);
}
template < int m > vfloat32mf2_t ai(aa< c, m, -1 >, c);
template < class ad > using ak = decltype(ai(ad(), l< ad >()));
template < class ad > ak< ad > al(ad d) {
  ae< decltype(d) > am;
  return an(d, ai(am, 0));
}
template < typename g, int m > vuint8mf2_t ao(aa< g, m, -1 >, vuint32mf2_t n) {
  return __riscv_vreinterpret_v_u32mf2_u8mf2(n);
}
template < int m > vuint32mf2_t ap(aa< a, m, -1 >, vuint8mf2_t n) {
  return __riscv_vreinterpret_v_u8mf2_u32mf2(n);
}
template < typename g, int m > vuint8mf2_t ao(aa< g, m, -1 >, vfloat32mf2_t n)
{
  return __riscv_vreinterpret_v_u32mf2_u8mf2(
  __riscv_vreinterpret_v_f32mf2_u32mf2(n));
}
template < int m > vfloat32mf2_t ap(aa< c, m, -1 >, vuint8mf2_t);
template < class ad, class aq > ak< ad > an(ad d, aq n) {
  return ap(d, ao(d, n));
}
vbool64_t av(vuint32mf2_t, vuint32mf2_t);
template < class ad > bool ba(ad, vbool64_t);
template < class ad > using bb = decltype(al(ad()));
template < typename g > using be = ac< g, -1 >;
struct bf {
  template < class ad > bool bh(ad, bb< ad > bi) {
ae< ad > am;
return ba(am, av(an(am, bi), al(am)));
  }
};
int bo;
template < class ad, class bl, typename g > void o(ad d, bl bn, g) {
  bb< ad > bq = al(d);
  for (; bo;) {
int br = bn.bh(d, bq);
if (__builtin_expect(br, 0))
  for (;;)
;
  }
}
template < class ad, class bl, typename g > void bs(ad d, bl bn, g) {
  g bu;
  o(d, bn, bu);
}
template < class ad, class bl, typename g >
void bv(ad d, bl bn, g *, int, g *bt) {
  bs(d, bn, bt);
}
float by;
int bz;
float ca;
void b() {
  be< float > d;
  bf bn;
  bv(d, bn, , bz, );
}

[Bug c/109228] warning: implicit declaration of function '__riscv_vlenb'

2023-03-21 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109228

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng  ---
Thanks for report! we definitely missed that...

[Bug target/108185] [RISC-V] Sub-optimal code-gen for vsetvli: redundant stack store

2023-03-07 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108185

Kito Cheng  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Kito Cheng  ---
Resolved by Pan's patch :)

[Bug target/108339] [11/10 only] riscv64-linux-gnu: fails to link libgcc_s.so on the GCC 10 branch

2023-02-20 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108339

Kito Cheng  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Kito Cheng  ---
Backported to GCC 10 branch.

[Bug target/108764] [RISCV] Cost model for RVB is too aggressive

2023-02-12 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #3 from Kito Cheng  ---
> I think one solution is to change the cost model of such complex instructions 
> to the sum of the cost for each part. E.g. 
> cost for shNadd = COSTS_N_INSNS (SINGLE_SHIFT_COST) + COSTS_N_INSNS (1) # 
> cost of addition

Some RISC-V core implementation did has one cycle for shNadd operation as I
know,  but I know it's not true for every implementation.

Anyway, it's really uarch dependent, so I would prefer keep as it for now, and
then extend the cost model function to easier handle different uarch (-mtune)
when GCC 14 is open.

[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line

2023-01-17 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #13 from Kito Cheng  ---
Patch posted before, but seems like not everybody agree:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603049.html

[Bug target/108185] [RISC-V] Sub-optimal code-gen for vsetvli: redundant stack store

2023-01-02 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108185

Kito Cheng  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-01-03

--- Comment #4 from Kito Cheng  ---
So it's about the code gen quality instead of correctness, let me update the
title.

[Bug target/108185] [RISC-V]RVV assemble not set vsetvli correct.

2022-12-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108185

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #2 from Kito Cheng  ---
It seems right to me?


```
$ riscv64-unknown-elf-gcc pr108185.c -march=rv64gcv -mabi=lp64d -O3 -S   -o - 
.file   "pr108185.c"
.option nopic
.attribute arch,
"rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_v1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.align  1
.globl  foo5_3
.type   foo5_3, @function
foo5_3:
csrrt0,vlenb
sllit1,t0,1
csrra5,vlenb
sub sp,sp,t1
sllia3,a5,1
add a3,a3,sp
vl1re8.vv25,0(a0) # Load value from *(vint8m1_t*)in
sub a5,a3,a5
vs1r.v  v25,0(a1) # Store value to *(vint8m1_t*)out
vs1r.v  v25,0(a5) # Store value to stack, although it's
unused.
addia4,a1,800
csrrt0,vlenb
sllit1,t0,1
vsetvli a5,zero,e8,m1,ta,ma   # Right vsetvli for vsm.v
vsm.v   v25,0(a4)
add sp,sp,t1
jr  ra
.size   foo5_3, .-foo5_3
.ident  "GCC: (g44b22ab81cf) 13.0.0 20221229 (experimental)"
```

[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line

2022-09-01 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #7 from Kito Cheng  ---
We are hitting this issue on RISC-V, and got some complain from linux kernel
developers, but in different form as the original report, we found cold
function or any function is marked as cold by `-fguess-branch-probability` are
all not honor to the -falign-functions=N setting, that become problem on some
linux kernel feature since they want to control the minimal alignment to make
sure they can atomically update the instruction which require align to 4 byte.

However current GCC behavior can't guarantee that even -falign-functions=4 is
given, there is 3 option in my mind:

1. Fix -falign-functions=N, let it work as expect on -Os and all cold functions
2. Force align to 4 byte if -fpatchable-function-entry is given, that's should
be doable by adjust RISC-V's FUNCTION_BOUNDARY
3. Adjust RISC-V's FUNCTION_BOUNDARY to let it honor to -falign-functions=N
4. Adding a -malign-functions=N...Okay, I know that suck idea, x86 already
deprecated that.

But I think ideally this should fixed by 1 option if possible.

Testcase from RISC-V kernel guy:
```
/* { dg-do compile } */
/* { dg-options "-march=rv64gc -mabi=lp64d -O1 -falign-functions=128" } */
/* { dg-final { scan-assembler-times ".align 7" 2 } } */

// Using 128 byte align rather than 4 byte align since it easier to observe.

__attribute__((__cold__)) void a() {} // This function isn't align to 128 byte
void b() {} // This function align to 128 byte.
```

Proposed fix:
```
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 49d5cda122f..6f8ed85fea9 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -1907,8 +1907,7 @@ assemble_start_function (tree decl, const char *fnname)
  Note that we still need to align to DECL_ALIGN, as above,
  because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
   if (! DECL_USER_ALIGN (decl)
-  && align_functions.levels[0].log > align
-  && optimize_function_for_speed_p (cfun))
+  && align_functions.levels[0].log > align)
 {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
   int align_log = align_functions.levels[0].log;

```

  1   2   >