date:20231222

Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-22 Thread waffl3x

On Friday, December 22nd, 2023 at 10:26 AM, Jason Merrill  
wrote:
> 
> 
> On 12/22/23 04:01, waffl3x wrote:
> 
> > int n = 0;
> > auto f = [](this Self){
> > static_assert(__is_same (decltype(n), int));
> > decltype((n)) a; // { dg-error {is not captured} }
> > };
> > f();
> > 
> > Could you clarify if this error being removed was intentional. I do
> > recall that Patrick Palka wanted to remove this error in his patch, but
> > it seemed to me like you stated it would be incorrect to allow it.
> > Since the error is no longer present I assume I am misunderstanding the
> > exchange.
> > 
> > In any case, let me know if I need to modify my test case or if this
> > error needs to be added back in.
> 
> 
> Removing the error was correct under
> https://eel.is/c++draft/expr.prim#id.unqual-3
> Naming n in that lambda would not refer a capture by copy, so the
> decltype is the same as outside the lambda.
> 
> Jason

Alright, I've fixed my tests to reflect that.

I've got defaulting assignment operators working. Defaulting equality
and comparison operators seemed to work out of the box somehow, so I
just have to make some fleshed out tests for those cases.

There can always be more tests, I have a few ideas for what still needs
to be covered, mostly with dependent lambdas. Tests for xobj conversion
operators definitely need to be more fleshed out. I also need to
formulate some tests to make sure constraints are not being taking into
account when the object parameters should not correspond, but that's a
little more tough to test for than the valid cases.

Other than tests though, is there anything you can think of that the
patch is missing? Other than the aforementioned tests, I'm pretty
confident everything is done.

To recap, I have CWG2789 implemented on my end with the change we
discussed to require corresponding object parameters instead of the
same type, and I have CWG2586 implemented. I can't recall what other
outstanding issues we had, and my notes don't mention anything other
than tests. So I'm assuming everything is good.

Alex

Re: [PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-22 Thread chenglulu


Hi,

This patch will cause the following tests to fail: +FAIL: 
gcc.dg/vect/pr97081-2.c (internal compiler error: in extract_insn, at 
recog.cc:2812) +FAIL: gcc.dg/vect/pr97081-2.c (test for excess errors) 
+FAIL: gcc.dg/vect/pr97081-2.c -flto -ffat-lto-objects (internal 
compiler error: in extract_insn, at recog.cc:2812) +FAIL: 
gcc.dg/vect/pr97081-2.c -flto -ffat-lto-objects (test for excess errors)


在 2023/12/18 下午9:43, Xi Ruoyao 写道:

gcc/ChangeLog:

* config/loongarch/loongarch.md (rotl3):
New define_expand.
* config/loongarch/simd.md (vrotl3): Likewise.
(rotl3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/rotl-with-rotr.c: New test.
* gcc.target/loongarch/rotl-with-vrotr.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.md | 12 
  gcc/config/loongarch/simd.md  | 29 +++
  .../gcc.target/loongarch/rotl-with-rotr.c |  9 ++
  .../gcc.target/loongarch/rotl-with-vrotr.c| 24 +++
  .../gcc.target/loongarch/rotl-with-xvrotr.c   |  7 +
  5 files changed, 81 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 30025bf1908..939432b83e0 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2903,6 +2903,18 @@ (define_insn "rotrsi3_extend"
[(set_attr "type" "shift,shift")
 (set_attr "mode" "SI")])
  
+;; Expand left rotate to right rotate.

+(define_expand "rotl3"
+  [(set (match_dup 3)
+   (neg:SI (match_operand:SI 2 "register_operand")))
+   (set (match_operand:GPR 0 "register_operand")
+   (rotatert:GPR (match_operand:GPR 1 "register_operand")
+ (match_dup 3)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (SImode);
+  });
+
  ;; The following templates were added to generate "bstrpick.d + alsl.d"
  ;; instruction pairs.
  ;; It is required that the values of const_immalsl_operand and
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 13202f79bee..a42e20eb8fc 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -268,6 +268,35 @@ (define_insn "vrotr3"
[(set_attr "type" "simd_int_arith")
 (set_attr "mode" "")])
  
+;; Expand left rotate to right rotate.

+(define_expand "vrotl3"
+  [(set (match_dup 3)
+   (neg:IVEC (match_operand:IVEC 2 "register_operand")))
+   (set (match_operand:IVEC 0 "register_operand")
+   (rotatert:IVEC (match_operand:IVEC 1 "register_operand")
+  (match_dup 3)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (mode);
+  });
+
+;; Expand left rotate with a scalar amount to right rotate: negate the
+;; scalar before broadcasting it because scalar negation is cheaper than
+;; vector negation.
+(define_expand "rotl3"
+  [(set (match_dup 3)
+   (neg:SI (match_operand:SI 2 "register_operand")))
+   (set (match_dup 4)
+   (vec_duplicate:IVEC (match_dup 3)))
+   (set (match_operand:IVEC 0 "register_operand")
+   (rotatert:IVEC (match_operand:IVEC 1 "register_operand")
+  (match_dup 4)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (SImode);
+operands[4] = gen_reg_rtx (mode);
+  });
+
  ;; vrotri.{b/h/w/d}
  
  (define_insn "rotr3"

diff --git a/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c 
b/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
new file mode 100644
index 000..84cc53cecaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "rotr\\.w" } } */
+
+unsigned
+t (unsigned a, unsigned b)
+{
+  return a << b | a >> (32 - b);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr.c 
b/gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr.c
new file mode 100644
index 000..3ebf7e3c083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { scan-assembler-times "vrotr\\.w" 2 } } */
+/* { dg-final { scan-assembler-times "vneg\\.w" 1 } } */
+
+#ifndef VLEN
+#define VLEN 16
+#endif
+
+typedef unsigned int V __attribute__((vector_size(VLEN)));
+V a, b, c;
+
+void
+test (int x)
+{
+  b = a << x | a >> (32 - x);
+}
+
+void
+test2 (void)
+{
+  for (int i = 0; i < VLEN / sizeof (int); i++)
+c[i] = a[i] << b[i] | a[i] >> (32 - b[i]);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr.c 
b/gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr.c
new file mode 100644
index 000.

回复：回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

2023-12-22 Thread joshua

Hi Juzhe,
Sorry but I'm not quite familiar with the group_overlap framework. Could you 
take this pattern as an example to show how to disable an alternative in some 
target?
Joshua
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2023年12月22日(星期五) 18:32
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Yeah.
(define_insn "@pred_msbc"
 [(set (match_operand: 0 "register_operand" "=vr, vr, &vr")
 (unspec:
 [(minus:VI
 (match_operand:VI 1 "register_operand" " 0, vr, vr")
 (match_operand:VI 2 "register_operand" " vr, 0, vr"))
 (match_operand: 3 "register_operand" " vm, vm, vm")
 (unspec:
 [(match_operand 4 "vector_length_operand" " rK, rK, rK")
 (match_operand 5 "const_int_operand" " i, i, i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)] UNSPEC_VMSBC))]
"TARGET_VECTOR"
"vmsbc.vvm\t%0,%1,%2,%3"
 [(set_attr "type" "vicalu")
 (set_attr "mode" "")
 (set_attr "vl_op_idx" "4")
 (set (attr "avl_type_idx") (const_int 5))])
You should use an attribute to disable alternative 0 and alternative 1 
constraint.
juzhe.zh...@rivai.ai
发件人： joshua 
发送时间： 2023-12-22 18:29
收件人： juzhe.zh...@rivai.ai ; gcc-patches 

抄送： Jim Wilson ; palmer 
; andrew ; 
philipp.tomsich ; jeffreyalaw 
; christoph.muellner 
; jinma ; 
cooper.qu 
主题： 回复：回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,
What xtheadvector needs to handle is just that destination vector register 
cannot overlap source vector register group for instructions like vmadc/vmsbc. 
That is not what group_overlap means. We nned to add "&" to the registers in 
the corresponding xtheadvector patterns while rvv 1.0 doesn't have this 
constraint.
(define_insn "@pred_th_msbc"
 [(set (match_operand: 0 "register_operand" "=&vr")
 (unspec:
 [(minus:VI
 (match_operand:VI 1 "register_operand" " vr")
 (match_operand:VI 2 "register_operand" " vr"))
 (match_operand: 3 "register_operand" " vm")
 (unspec:
 [(match_operand 4 "vector_length_operand" " rK")
 (match_operand 5 "const_int_operand" " i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)] UNSPEC_VMSBC))]
 "TARGET_XTHEADVECTOR"
 "vmsbc.vvm\t%0,%1,%2,%3"
 [(set_attr "type" "vicalu")
 (set_attr "mode" "")
 (set_attr "vl_op_idx" "4")
 (set (attr "avl_type_idx") (const_int 5))])
Joshua
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2023年12月22日(星期五) 16:07
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
You mean theadvector doesn't want the current RVV1.0 register overlap magic as 
follows ?

 * 
The destination EEW is smaller than the source EEW and the overlap is in the 
lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi 
v0, v0, 3 is legal, but a destination of v1 is not).

 * 
The destination EEW is greater than the source EEW, the source EMUL is at least 
1, and the overlap is in the highest-numbered part of the destination register 
group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or 
v4 is not).
If yes, I suggest disable the overlap constraint using attribute, More details 
you can learn from 
(set_attr "group_overlap"
juzhe.zh...@rivai.ai
发件人： joshua 
发送时间： 2023-12-22 11:33
收件人： 钟居哲 ; gcc-patches 

抄送： jim.wilson.gcc ; palmer 
; andrew ; 
philipp.tomsich ; Jeff Law 
; Christoph Müllner 
; jinma ; 
Cooper Qu 
主题： 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,
Thank you for your comprehensive comments.
Classifying theadvector intrinsics into 3 kinds is really important to make our 
patchset more organized. 
For 1) and 3), I will split out the patches soon and hope they will be merged 
quickly.
For 2), according to the differences between vector and xtheadvector, it can be 
classfied into 3 kinds.
First is renamed load/store, renamed narrowing integer right shift, renamed 
narrowing fixed-point clip, and etc. I think we can use ASM targethook to 
rewrite the whole string of the instruct

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-22 Thread chenglulu



在 2023/12/23 上午10:26, chenglulu 写道:


在 2023/12/22 下午3:21, chenglulu 写道:


在 2023/12/22 下午3:09, Xi Ruoyao 写道:

On Fri, 2023-12-22 at 11:44 +0800, chenglulu wrote:

在 2023/12/21 下午8:00, chenglulu 写道:

Sorry, I've been busy with something else these two days. I don't
think there's anything wrong with the code,

but I need to test the spec.:-)

Hi, Ruoyao:

After applying this patch, spec2006 464.h264 ref will have a 6.4%
performance drop. So I'm going to retest it.

I think 6.4% is large enough not to be a random error.

Is there an example showing the code regression?

And I'm wondering if keeping the peephole besides the new
define_insn_and_split produces a better result instead of solely 
relying

on define_insn_and_split?

I haven't debugged this yet, I'm retesting, if there is still such a 
big performance gap,


I think I need to see the reason.

The performance drop has nothing to do with this patch. I found that 
the h264 performance compiled


by r14-6787 compared to r14-6421 dropped by 6.4%.


But there is a problem. My regression test has the following two fail 
items.(based on r14-6787)


+FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors)
+FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-22 Thread chenglulu




在 2023/12/22 下午3:21, chenglulu 写道:


在 2023/12/22 下午3:09, Xi Ruoyao 写道:

On Fri, 2023-12-22 at 11:44 +0800, chenglulu wrote:

在 2023/12/21 下午8:00, chenglulu 写道:

Sorry, I've been busy with something else these two days. I don't
think there's anything wrong with the code,

but I need to test the spec.:-)

Hi, Ruoyao:

After applying this patch, spec2006 464.h264 ref will have a 6.4%
performance drop. So I'm going to retest it.

I think 6.4% is large enough not to be a random error.

Is there an example showing the code regression?

And I'm wondering if keeping the peephole besides the new
define_insn_and_split produces a better result instead of solely relying
on define_insn_and_split?

I haven't debugged this yet, I'm retesting, if there is still such a 
big performance gap,


I think I need to see the reason.

The performance drop has nothing to do with this patch. I found that the 
h264 performance compiled


by r14-6787 compared to r14-6421 dropped by 6.4%.

Re: [PATCH] LoongArch: Add sign_extend pattern for 32-bit rotate shift

2023-12-22 Thread chenglulu


LGTM!

Thanks!

在 2023/12/17 下午11:16, Xi Ruoyao 写道:

Remove a redundant sign extension.

gcc/ChangeLog:

* config/loongarch/loongarch.md (rotrsi3_extend): New
define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/rotrw.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.md  | 10 ++
  gcc/testsuite/gcc.target/loongarch/rotrw.c | 17 +
  2 files changed, 27 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotrw.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index c7058282a21..30025bf1908 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2893,6 +2893,16 @@ (define_insn "rotr3"
[(set_attr "type" "shift,shift")
 (set_attr "mode" "")])
  
+(define_insn "rotrsi3_extend"

+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+   (sign_extend:DI
+ (rotatert:SI (match_operand:SI 1 "register_operand" "r,r")
+  (match_operand:SI 2 "arith_operand" "r,I"]
+  "TARGET_64BIT"
+  "rotr%i2.w\t%0,%1,%2"
+  [(set_attr "type" "shift,shift")
+   (set_attr "mode" "SI")])
+
  ;; The following templates were added to generate "bstrpick.d + alsl.d"
  ;; instruction pairs.
  ;; It is required that the values of const_immalsl_operand and
diff --git a/gcc/testsuite/gcc.target/loongarch/rotrw.c 
b/gcc/testsuite/gcc.target/loongarch/rotrw.c
new file mode 100644
index 000..6ed45e8b86c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/rotrw.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "rotr\\.w\t\\\$r4,\\\$r4,\\\$r5" } } */
+/* { dg-final { scan-assembler "rotri\\.w\t\\\$r4,\\\$r4,5" } } */
+/* { dg-final { scan-assembler-not "slli\\.w" } } */
+
+unsigned
+rotr (unsigned a, unsigned b)
+{
+  return a >> b | a << 32 - b;
+}
+
+unsigned
+rotri (unsigned a)
+{
+  return a >> 5 | a << 27;
+}

Re: Re: [PATCH] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread 钟居哲

Committed. Thanks Jeff.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-12-23 00:58
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Make PHI initial value occupy live V_REG in 
dynamic LMUL cost model analysis
 
 
On 12/22/23 02:51, Juzhe-Zhong wrote:
> Consider this following case:
> 
> foo:
>  ble a0,zero,.L11
>  lui a2,%hi(.LANCHOR0)
>  addisp,sp,-128
>  addia2,a2,%lo(.LANCHOR0)
>  mv  a1,a0
>  vsetvli a6,zero,e32,m8,ta,ma
>  vid.v   v8
>  vs8r.v  v8,0(sp) ---> spill
> .L3:
>  vl8re32.v   v16,0(sp)---> reload
>  vsetvli a4,a1,e8,m2,ta,ma
>  li  a3,0
>  vsetvli a5,zero,e32,m8,ta,ma
>  vmv8r.v v0,v16
>  vmv.v.x v8,a4
>  vmv.v.i v24,0
>  vadd.vv v8,v16,v8
>  vmv8r.v v16,v24
>  vs8r.v  v8,0(sp)---> spill
> .L4:
>  addiw   a3,a3,1
>  vadd.vv v8,v0,v16
>  vadd.vi v16,v16,1
>  vadd.vv v24,v24,v8
>  bne a0,a3,.L4
>  vsetvli zero,a4,e32,m8,ta,ma
>  sub a1,a1,a4
>  vse32.v v24,0(a2)
>  sllia4,a4,2
>  add a2,a2,a4
>  bne a1,zero,.L3
>  li  a0,0
>  addisp,sp,128
>  jr  ra
> .L11:
>  li  a0,0
>  ret
> 
> Pick unexpected LMUL = 8.
> 
> The root cause is we didn't involve PHI initial value in the dynamic LMUL 
> calculation:
> 
># j_17 = PHI---> # 
> vect_vec_iv_.8_24 = PHI <_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }(5)>
> 
> We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it does 
> allocate an vector register group for it.
Yup.  There's analogues in the scalar space.  Depending on the context 
we might consider the value live on the edge, at the end of e->src or at 
the start of e->dest.
 
In the scalar space we commonly have multiple constant values and we try 
to account for them as best as we can as each distinct constant can 
result in a constant load.  We also try to find pseudos that happen to 
already have the value we want so that they participate in the 
coalescing process.  I doubt either of these cases are particularly 
important for vector though.
 
 
> 
> This patch fixes this missing count. Then after this patch we pick up perfect 
> LMUL (LMUL = M4)
> 
> foo:
> ble a0,zero,.L9
> lui a4,%hi(.LANCHOR0)
> addi a4,a4,%lo(.LANCHOR0)
> mv a2,a0
> vsetivli zero,16,e32,m4,ta,ma
> vid.v v20
> .L3:
> vsetvli a3,a2,e8,m1,ta,ma
> li a5,0
> vsetivli zero,16,e32,m4,ta,ma
> vmv4r.v v16,v20
> vmv.v.i v12,0
> vmv.v.x v4,a3
> vmv4r.v v8,v12
> vadd.vv v20,v20,v4
> .L4:
> addiw a5,a5,1
> vmv4r.v v4,v8
> vadd.vi v8,v8,1
> vadd.vv v4,v16,v4
> vadd.vv v12,v12,v4
> bne a0,a5,.L4
> slli a5,a3,2
> vsetvli zero,a3,e32,m4,ta,ma
> sub a2,a2,a3
> vse32.v v12,0(a4)
> add a4,a4,a5
> bne a2,zero,.L3
> .L9:
> li a0,0
> ret
> 
> Tested on --with-arch=gcv no regression. Ok for trunk ?
> 
> PR target/113112
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Refine dump 
> information.
> (preferred_new_lmul_p): Make PHI initial value into live regs calculation.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.
OK assuming you've done the necessary regression testing.
 
jeff

[Committed] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread Juzhe-Zhong

Consider this following case:

foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addisp,sp,-128
addia2,a2,%lo(.LANCHOR0)
mv  a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v   v8
vs8r.v  v8,0(sp) ---> spill
.L3:
vl8re32.v   v16,0(sp)---> reload
vsetvli a4,a1,e8,m2,ta,ma
li  a3,0
vsetvli a5,zero,e32,m8,ta,ma
vmv8r.v v0,v16
vmv.v.x v8,a4
vmv.v.i v24,0
vadd.vv v8,v16,v8
vmv8r.v v16,v24
vs8r.v  v8,0(sp)---> spill
.L4:
addiw   a3,a3,1
vadd.vv v8,v0,v16
vadd.vi v16,v16,1
vadd.vv v24,v24,v8
bne a0,a3,.L4
vsetvli zero,a4,e32,m8,ta,ma
sub a1,a1,a4
vse32.v v24,0(a2)
sllia4,a4,2
add a2,a2,a4
bne a1,zero,.L3
li  a0,0
addisp,sp,128
jr  ra
.L11:
li  a0,0
ret

Pick unexpected LMUL = 8.

The root cause is we didn't involve PHI initial value in the dynamic LMUL 
calculation:

  # j_17 = PHI---> # vect_vec_iv_.8_24 = 
PHI <_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }(5)>

We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it does 
allocate an vector register group for it.

This patch fixes this missing count. Then after this patch we pick up perfect 
LMUL (LMUL = M4)

foo:
ble a0,zero,.L9
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
mv  a2,a0
vsetivlizero,16,e32,m4,ta,ma
vid.v   v20
.L3:
vsetvli a3,a2,e8,m1,ta,ma
li  a5,0
vsetivlizero,16,e32,m4,ta,ma
vmv4r.v v16,v20
vmv.v.i v12,0
vmv.v.x v4,a3
vmv4r.v v8,v12
vadd.vv v20,v20,v4
.L4:
addiw   a5,a5,1
vmv4r.v v4,v8
vadd.vi v8,v8,1
vadd.vv v4,v16,v4
vadd.vv v12,v12,v4
bne a0,a5,.L4
sllia5,a3,2
vsetvli zero,a3,e32,m4,ta,ma
sub a2,a2,a3
vse32.v v12,0(a4)
add a4,a4,a5
bne a2,zero,.L3
.L9:
li  a0,0
ret

Tested on --with-arch=gcv no regression.

PR target/113112

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Refine 
dump information.
(preferred_new_lmul_p): Make PHI initial value into live regs 
calculation.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 45 ---
 .../vect/costmodel/riscv/rvv/pr113112-1.c | 31 +
 2 files changed, 71 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index a316603e207..946eb4a9fc6 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -355,10 +355,11 @@ max_number_of_live_regs (const basic_block bb,
 }
 
   if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
-"Maximum lmul = %d, %d number of live V_REG at program "
-"point %d for bb %d\n",
-lmul, max_nregs, live_point, bb->index);
+dump_printf_loc (
+  MSG_NOTE, vect_location,
+  "Maximum lmul = %d, At most %d number of live V_REG at program "
+  "point %d for bb %d\n",
+  lmul, max_nregs, live_point, bb->index);
   return max_nregs;
 }
 
@@ -472,6 +473,41 @@ update_local_live_ranges (
  tree def = gimple_phi_arg_def (phi, j);
  auto *live_ranges = live_ranges_per_bb.get (bb);
  auto *live_range = live_ranges->get (def);
+ if (poly_int_tree_p (def))
+   {
+ /* Insert live range of INTEGER_CST or POLY_CST since we will
+need to allocate a vector register for it.
+
+E.g. # j_17 = PHI  will be transformed
+into # vect_vec_iv_.8_24 = PHI <_25(9), { 0, ... }(5)>
+
+The live range for such value is short which only lives
+from program point 0 to 1.  */
+ if (live_range)
+   {
+ unsigned int start = (*live_range).first;
+ (*live_range).first = 0;
+ if (dump_enabled_p ())
+   dump_printf_loc (
+ MSG_NOTE, vect_location,
+ "Update %T start point from %d to 0:\n", def, start);
+   }
+ else
+   {
+ live_range

[COMMITTED] robots.txt: Disallow a few more bugzilla queries

2023-12-22 Thread Mark Wielaard

Some spiders are hitting bugzilla hard generating dependency trees
or graphs, downloading large attachements or requesting all bugs
in xml format. Disallow all that.
---
 htdocs/robots.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/robots.txt b/htdocs/robots.txt
index b9fc830d..057c5899 100644
--- a/htdocs/robots.txt
+++ b/htdocs/robots.txt
@@ -10,4 +10,8 @@ Disallow: /cgit/
 Disallow: /svn
 Disallow: /cgi-bin/
 Disallow: /bugzilla/buglist.cgi
+Disallow: /bugzilla/show_bug.cgi*ctype=xml*
+Disallow: /bugzilla/attachment.cgi
+Disallow: /bugzilla/showdependencygraph.cgi
+Disallow: /bugzilla/showdependencytree.cgi
 Crawl-Delay: 60
-- 
2.39.3

[PATCH] RISC-V: RVV: add toggle to control vsetvl pass behavior

2023-12-22 Thread Vineet Gupta

RVV requires VSET?VL? instructions to dynamically configure VLEN at
runtime. There's a custom pass to do that which has a simple mode
which generates a VSETVL for each V insn and a lazy/optimal mode which
uses LCM dataflow to move VSETVL around, identify/delete the redundant
ones.

Currently simple mode is default for !optimize invocations while lazy
mode being the default.

This patch allows simple mode to be forced via a toggle independent of
the optimization level. A lot of gcc developers are currently doing this
in some form in their local setups, as in the initial phase of autovec
development issues are expected. It makes sense to provide this facility
upstream. It could potentially also be used by distro builder for any
quick workarounds in autovec bugs of future.

gcc/ChangeLog:
* config/riscv/riscv.opt: New -param=vsetvl-strategy.
* config/riscv/riscv-opts.h: New enum vsetvl_strategy_enum.
* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::pre_global_vsetvl_info): Use vsetvl_strategy.
(pass_vsetvl::execute): Use vsetvl_strategy.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-opts.h| 11 +++
 gcc/config/riscv/riscv-vsetvl.cc | 49 +---
 gcc/config/riscv/riscv.opt   | 17 +++
 3 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index dbf213926572..ad21ca63fb59 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -116,6 +116,17 @@ enum stringop_strategy_enum {
   STRATEGY_AUTO = STRATEGY_SCALAR | STRATEGY_VECTOR
 };
 
+/* Behavior of VSETVL Pass.  */
+enum vsetvl_strategy_enum {
+  /* Simple: Insert a vsetvl* instruction for each Vector instruction.  */
+  VSETVL_SIMPLE = 1,
+  /* Optimal but with a caveat : Run LCM dataflow analysis which could
+ lead to redundant vsetvl* instructions, which are retained.  */
+  VSETVL_OPT_NO_DEL = 2,
+  /* Optimal: Same as optimal with any redundant vsetvl* removed.  */
+  VSETVL_OPT = 4
+};
+
 #define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && 
TARGET_64BIT))
 
 /* Bit of riscv_zvl_flags will set contintuly, N-1 bit will set if N-bit is
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index eabaef80f898..e4f1372bce3f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3244,31 +3244,34 @@ pre_vsetvl::pre_global_vsetvl_info ()
}
 }
 
-  /* Remove vsetvl infos as LCM suggest */
-  for (const bb_info *bb : crtl->ssa->bbs ())
+  if (!(vsetvl_strategy & VSETVL_OPT_NO_DEL))
 {
-  sbitmap d = m_del[bb->index ()];
-  if (bitmap_count_bits (d) == 0)
-   continue;
-  gcc_assert (bitmap_count_bits (d) == 1);
-  unsigned expr_index = bitmap_first_set_bit (d);
-  vsetvl_info &info = *m_exprs[expr_index];
-  gcc_assert (info.valid_p ());
-  gcc_assert (info.get_bb () == bb);
-  const vsetvl_block_info &block_info = get_block_info (info.get_bb ());
-  gcc_assert (block_info.get_entry_info () == info);
-  info.set_delete ();
-}
+  /* Remove vsetvl infos as LCM suggest.  */
+  for (const bb_info *bb : crtl->ssa->bbs ())
+   {
+ sbitmap d = m_del[bb->index ()];
+ if (bitmap_count_bits (d) == 0)
+   continue;
+ gcc_assert (bitmap_count_bits (d) == 1);
+ unsigned expr_index = bitmap_first_set_bit (d);
+ vsetvl_info &info = *m_exprs[expr_index];
+ gcc_assert (info.valid_p ());
+ gcc_assert (info.get_bb () == bb);
+ const vsetvl_block_info &block_info = get_block_info (info.get_bb ());
+ gcc_assert (block_info.get_entry_info () == info);
+ info.set_delete ();
+   }
 
-  /* Remove vsetvl infos if all precessors are available to the block.  */
-  for (const bb_info *bb : crtl->ssa->bbs ())
-{
-  vsetvl_block_info &block_info = get_block_info (bb);
-  if (block_info.empty_p () || !block_info.full_available)
-   continue;
+  /* Remove vsetvl infos if all precessors are available to the block.  */
+  for (const bb_info *bb : crtl->ssa->bbs ())
+   {
+ vsetvl_block_info &block_info = get_block_info (bb);
+ if (block_info.empty_p () || !block_info.full_available)
+   continue;
 
-  vsetvl_info &info = block_info.get_entry_info ();
-  info.set_delete ();
+ vsetvl_info &info = block_info.get_entry_info ();
+ info.set_delete ();
+   }
 }
 
   for (const bb_info *bb : crtl->ssa->bbs ())
@@ -3627,7 +3630,7 @@ pass_vsetvl::execute (function *)
   if (!has_vector_insn (cfun))
 return 0;
 
-  if (!optimize)
+  if (!optimize || vsetvl_strategy & VSETVL_SIMPLE)
 simple_vsetvl ();
   else
 lazy_vsetvl ();
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 2bc1273fe284..5f824e2ddb3d 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/ris

Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-22 Thread Jason Merrill


On 12/22/23 04:01, waffl3x wrote:


   int n = 0;
   auto f = [](this Self){
 static_assert(__is_same (decltype(n), int));
 decltype((n)) a; // { dg-error {is not captured} }
   };
   f();

Could you clarify if this error being removed was intentional. I do
recall that Patrick Palka wanted to remove this error in his patch, but
it seemed to me like you stated it would be incorrect to allow it.
Since the error is no longer present I assume I am misunderstanding the
exchange.

In any case, let me know if I need to modify my test case or if this
error needs to be added back in.


Removing the error was correct under
https://eel.is/c++draft/expr.prim#id.unqual-3
Naming n in that lambda would not refer a capture by copy, so the 
decltype is the same as outside the lambda.


Jason

Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

2023-12-22 Thread Jeff Law





On 12/22/23 01:07, juzhe.zh...@rivai.ai wrote:
You mean theadvector doesn't want the current RVV1.0 register overlap 
magic  as follows ?


  *

The destination EEW is smaller than the source EEW and the overlap
is in the lowest-numbered part of the source register group (e.g.,
when LMUL=1, |vnsrl.wi v0, v0, 3| is legal, but a destination of
|v1| is not).

  *

The destination EEW is greater than the source EEW, the source EMUL
is at least 1, and the overlap is in the highest-numbered part of
the destination register group (e.g., when LMUL=8, |vzext.vf4 v0,
v6| is legal, but a source of |v0|, |v2|, or |v4| is not).


If yes, I suggest disable the overlap constraint using attribute, More 
details you can learn from
Yea, if there's alternatives we want to allow for xthead, but not rvv or 
vice-versa, I would think the "enabled" attribute would be a reasonable 
option.  Essentially it allows alternatives to be available or 
unavailable based on the subtarget.


It sounds like this may be necessary because of differences in how 
overlap is handled across 0.7 vs 1.0.


Jeff

Re: [PATCH] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread Jeff Law





On 12/22/23 02:51, Juzhe-Zhong wrote:

Consider this following case:

foo:
 ble a0,zero,.L11
 lui a2,%hi(.LANCHOR0)
 addisp,sp,-128
 addia2,a2,%lo(.LANCHOR0)
 mv  a1,a0
 vsetvli a6,zero,e32,m8,ta,ma
 vid.v   v8
 vs8r.v  v8,0(sp) ---> spill
.L3:
 vl8re32.v   v16,0(sp)---> reload
 vsetvli a4,a1,e8,m2,ta,ma
 li  a3,0
 vsetvli a5,zero,e32,m8,ta,ma
 vmv8r.v v0,v16
 vmv.v.x v8,a4
 vmv.v.i v24,0
 vadd.vv v8,v16,v8
 vmv8r.v v16,v24
 vs8r.v  v8,0(sp)---> spill
.L4:
 addiw   a3,a3,1
 vadd.vv v8,v0,v16
 vadd.vi v16,v16,1
 vadd.vv v24,v24,v8
 bne a0,a3,.L4
 vsetvli zero,a4,e32,m8,ta,ma
 sub a1,a1,a4
 vse32.v v24,0(a2)
 sllia4,a4,2
 add a2,a2,a4
 bne a1,zero,.L3
 li  a0,0
 addisp,sp,128
 jr  ra
.L11:
 li  a0,0
 ret

Pick unexpected LMUL = 8.

The root cause is we didn't involve PHI initial value in the dynamic LMUL 
calculation:

   # j_17 = PHI---> # vect_vec_iv_.8_24 = PHI 
<_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0 }(5)>

We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it does 
allocate an vector register group for it.
Yup.  There's analogues in the scalar space.  Depending on the context 
we might consider the value live on the edge, at the end of e->src or at 
the start of e->dest.


In the scalar space we commonly have multiple constant values and we try 
to account for them as best as we can as each distinct constant can 
result in a constant load.  We also try to find pseudos that happen to 
already have the value we want so that they participate in the 
coalescing process.  I doubt either of these cases are particularly 
important for vector though.





This patch fixes this missing count. Then after this patch we pick up perfect 
LMUL (LMUL = M4)

foo:
ble a0,zero,.L9
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
mv  a2,a0
vsetivlizero,16,e32,m4,ta,ma
vid.v   v20
.L3:
vsetvli a3,a2,e8,m1,ta,ma
li  a5,0
vsetivlizero,16,e32,m4,ta,ma
vmv4r.v v16,v20
vmv.v.i v12,0
vmv.v.x v4,a3
vmv4r.v v8,v12
vadd.vv v20,v20,v4
.L4:
addiw   a5,a5,1
vmv4r.v v4,v8
vadd.vi v8,v8,1
vadd.vv v4,v16,v4
vadd.vv v12,v12,v4
bne a0,a5,.L4
sllia5,a3,2
vsetvli zero,a3,e32,m4,ta,ma
sub a2,a2,a3
vse32.v v12,0(a4)
add a4,a4,a5
bne a2,zero,.L3
.L9:
li  a0,0
ret

Tested on --with-arch=gcv no regression. Ok for trunk ?

PR target/113112

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Refine 
dump information.
(preferred_new_lmul_p): Make PHI initial value into live regs 
calculation.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.

OK assuming you've done the necessary regression testing.

jeff

Re: [V6] c23: construct composite type for tagged types

2023-12-22 Thread Joseph Myers

On Thu, 21 Dec 2023, Martin Uecker wrote:

> This version now sets  DECL_NONADDRESSABLE_P, DECL_PADDING_P 
> and C_DECL_VARIABLE_SIZE and adds three new tests:
> c23-tag-alias-7.c, c23-tag-composite-10.c, and 
> gnu23-tag-composite-5.c.

This version is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-22 Thread Richard Biener




> Am 22.12.2023 um 16:05 schrieb Di Zhao OS :
> 
> Updated the fix in attachment.
> 
> Is it OK for trunk?

Ok

> Tested on aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu.
> 
> Thanks,
> Di Zhao
> 
>> -Original Message-
>> From: Di Zhao OS 
>> Sent: Sunday, December 17, 2023 8:31 PM
>> To: Thomas Schwinge ; gcc-patches@gcc.gnu.org
>> Cc: Richard Biener 
>> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in
>> get_reassociation_width
>> 
>> Hello Thomas,
>> 
>>> -Original Message-
>>> From: Thomas Schwinge 
>>> Sent: Friday, December 15, 2023 5:46 PM
>>> To: Di Zhao OS ; gcc-patches@gcc.gnu.org
>>> Cc: Richard Biener 
>>> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in
>>> get_reassociation_width
>>> 
>>> Hi!
>>> 
>>> On 2023-12-13T08:14:28+, Di Zhao OS 
>> wrote:
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/pr110279-2.c
 @@ -0,0 +1,41 @@
 +/* PR tree-optimization/110279 */
 +/* { dg-do compile } */
 +/* { dg-options "-Ofast --param tree-reassoc-width=4 --param fully-
>>> pipelined-fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */
 +/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } */
 +
 +#define LOOP_COUNT 8
 +typedef double data_e;
 +
 +#include 
 +
 +__attribute_noinline__ data_e
 +foo (data_e in)
>>> 
>>> Pushed to master branch commit 91e9e8faea4086b3b8aef2355fc12c1559d425f6
>>> "Fix 'gcc.dg/pr110279-2.c' syntax error due to '__attribute_noinline__'",
>>> see attached.
>>> 
>>> However:
>>> 
 +{
 +  data_e a1, a2, a3, a4;
 +  data_e tmp, result = 0;
 +  a1 = in + 0.1;
 +  a2 = in * 0.1;
 +  a3 = in + 0.01;
 +  a4 = in * 0.59;
 +
 +  data_e result2 = 0;
 +
 +  for (int ic = 0; ic < LOOP_COUNT; ic++)
 +{
 +  /* Test that a complete FMA chain with length=4 is not broken.  */
 +  tmp = a1 + a2 * a2 + a3 * a3 + a4 * a4 ;
 +  result += tmp - ic;
 +  result2 = result2 / 2 - tmp;
 +
 +  a1 += 0.91;
 +  a2 += 0.1;
 +  a3 -= 0.01;
 +  a4 -= 0.89;
 +
 +}
 +
 +  return result + result2;
 +}
 +
 +/* { dg-final { scan-tree-dump-not "was chosen for reassociation"
>>> "reassoc2"} } */
 +/* { dg-final { scan-tree-dump-times {\.FMA } 3 "optimized"} } */
>> 
>> Thank you for the fix.
>> 
>>> ..., I still see these latter two tree dump scans FAIL, for GCN:
>>> 
>>>$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
>>>  2 *: a3_40
>>>  2 *: a2_39
>>>Width = 4 was chosen for reassociation
>>>Transforming _15 = powmult_1 + powmult_3;
>>> into _63 = powmult_1 + a1_38;
>>>$ grep -F .FMA pr110279-2.c.265t.optimized
>>>  _63 = .FMA (a2_39, a2_39, a1_38);
>>>  _64 = .FMA (a3_40, a3_40, powmult_5);
>>> 
>>> ..., nvptx:
>>> 
>>>$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
>>>  2 *: a3_40
>>>  2 *: a2_39
>>>Width = 4 was chosen for reassociation
>>>Transforming _15 = powmult_1 + powmult_3;
>>> into _63 = powmult_1 + a1_38;
>>>$ grep -F .FMA pr110279-2.c.265t.optimized
>>>  _63 = .FMA (a2_39, a2_39, a1_38);
>>>  _64 = .FMA (a3_40, a3_40, powmult_5);
>> 
>> For these 2 targets, the reassoc_width for FMUL is 1 (default value),
>> While the testcase assumes that to be 4. The bug was introduced when I
>> updated the patch but forgot to update the testcase.
>> 
>>> ..., but also x86_64-pc-linux-gnu:
>>> 
>>>$  grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
>>>  2 *: a3_40
>>>  2 *: a2_39
>>>Width = 2 was chosen for reassociation
>>>Transforming _15 = powmult_1 + powmult_3;
>>> into _63 = powmult_1 + powmult_3;
>>>$ grep -cF .FMA pr110279-2.c.265t.optimized
>>>0
>> 
>> For x86_64 this needs "-mfma". Sorry the compile options missed that.
>> Can the change below fix these issues? I moved them into
>> testsuite/gcc.target/aarch64, since they rely on tunings.
>> 
>> Tested on aarch64-unknown-linux-gnu.
>> 
>>> 
>>> Grüße
>>> Thomas
>>> 
>>> 
>>> -
>>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
>> 80634
>>> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
>>> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht
>>> München, HRB 106955
>> 
>> Thanks,
>> Di Zhao
>> 
>> ---
>> gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-1.c | 3 +--
>> gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-2.c | 3 +--
>> 2 files changed, 2 insertions(+), 4 deletions(-)
>> rename gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-1.c (83%)
>> rename gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-2.c (78%)
>> 
>> diff --git a/gcc/testsuite/gcc.dg/pr110279-1.c
>> b/gcc/testsuite/gcc.target/aarch64/pr110279-1.c
>> similarity index 83%
>> rename from gcc/testsuite/gcc.dg/pr110279-1.c
>> re

[PATCH] libgccjit: Implement sizeof operator

2023-12-22 Thread Antoni Boucher

Hi.
This patch adds the support of the sizeof operator.
I was wondering if this new API entrypoint should take a location as a
parameter. What do you think?
Thanks for the review.
From e86e00efae450f04bc92ae6e4e61cf92c38d9b7d Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Tue, 19 Sep 2023 22:10:47 -0400
Subject: [PATCH] libgccjit: Implement sizeof operator

gcc/jit/ChangeLog:

	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_26): New ABI tag.
	* docs/topics/expressions.rst: Document gcc_jit_context_new_sizeof.
	* jit-playback.cc (new_sizeof): New method.
	* jit-playback.h (new_sizeof): New method.
	* jit-recording.cc (recording::context::new_sizeof,
	recording::memento_of_new_sizeof::replay_into,
	recording::memento_of_new_sizeof::make_debug_string,
	recording::memento_of_new_sizeof::write_reproducer): New methods.
	* jit-recording.h (class memento_of_new_sizeof): New class.
	* libgccjit.cc (gcc_jit_context_new_sizeof): New function.
	* libgccjit.h (gcc_jit_context_new_sizeof): New function.
	* libgccjit.map: New function.

gcc/testsuite/ChangeLog:

	* jit.dg/all-non-failing-tests.h: New test.
	* jit.dg/test-sizeof.c: New test.
---
 gcc/jit/docs/topics/compatibility.rst|  7 +++
 gcc/jit/docs/topics/expressions.rst  | 14 ++
 gcc/jit/jit-playback.cc  | 10 
 gcc/jit/jit-playback.h   |  3 ++
 gcc/jit/jit-recording.cc | 52 
 gcc/jit/jit-recording.h  | 28 +++
 gcc/jit/libgccjit.cc | 18 +++
 gcc/jit/libgccjit.h  | 12 +
 gcc/jit/libgccjit.map|  5 ++
 gcc/testsuite/jit.dg/all-non-failing-tests.h | 10 
 gcc/testsuite/jit.dg/test-sizeof.c   | 50 +++
 11 files changed, 209 insertions(+)
 create mode 100644 gcc/testsuite/jit.dg/test-sizeof.c

diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index ebede440ee4..00cf88a4666 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -378,3 +378,10 @@ alignment of a variable:
 
 ``LIBGCCJIT_ABI_25`` covers the addition of
 :func:`gcc_jit_type_get_restrict`
+
+.. _LIBGCCJIT_ABI_26:
+
+``LIBGCCJIT_ABI_26``
+
+``LIBGCCJIT_ABI_26`` covers the addition of
+:func:`gcc_jit_context_new_sizeof`
diff --git a/gcc/jit/docs/topics/expressions.rst b/gcc/jit/docs/topics/expressions.rst
index 42cfee36302..a18909f4890 100644
--- a/gcc/jit/docs/topics/expressions.rst
+++ b/gcc/jit/docs/topics/expressions.rst
@@ -126,6 +126,20 @@ Simple expressions
underlying string, so it is valid to pass in a pointer to an on-stack
buffer.
 
+.. function:: gcc_jit_rvalue *\
+  gcc_jit_context_new_sizeof (gcc_jit_context *ctxt, \
+  gcc_jit_type *type)
+
+   Generate an rvalue that is equal to the size of ``type``.
+
+   The parameter ``type`` must be non-NULL.
+
+   This is equivalent to this C code:
+
+   .. code-block:: c
+
+ sizeof (type)
+
 Constructor expressions
 ***
 
diff --git a/gcc/jit/jit-playback.cc b/gcc/jit/jit-playback.cc
index 537f3b1..09b5e89942f 100644
--- a/gcc/jit/jit-playback.cc
+++ b/gcc/jit/jit-playback.cc
@@ -974,6 +974,16 @@ new_rvalue_from_const  (type *type,
 
 /* Construct a playback::rvalue instance (wrapping a tree).  */
 
+playback::rvalue *
+playback::context::
+new_sizeof (type *type)
+{
+  tree inner = TYPE_SIZE_UNIT (type->as_tree ());
+  return new rvalue (this, inner);
+}
+
+/* Construct a playback::rvalue instance (wrapping a tree).  */
+
 playback::rvalue *
 playback::context::
 new_string_literal (const char *value)
diff --git a/gcc/jit/jit-playback.h b/gcc/jit/jit-playback.h
index b0166f8f6ce..a537a5433c3 100644
--- a/gcc/jit/jit-playback.h
+++ b/gcc/jit/jit-playback.h
@@ -139,6 +139,9 @@ public:
   new_rvalue_from_const (type *type,
 			 HOST_TYPE value);
 
+  rvalue *
+  new_sizeof (type *type);
+
   rvalue *
   new_string_literal (const char *value);
 
diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index 9b5b8005ebe..c1e4e68cae6 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -1076,6 +1076,21 @@ recording::context::new_global_init_rvalue (lvalue *variable,
   gbl->set_rvalue_init (init); /* Needed by the global for write dump.  */
 }
 
+/* Create a recording::memento_of_sizeof instance and add it
+   to this context's list of mementos.
+
+   Implements the post-error-checking part of
+   gcc_jit_context_new_sizeof.  */
+
+recording::rvalue *
+recording::context::new_sizeof (recording::type *type)
+{
+  recording::rvalue *result =
+new memento_of_new_sizeof (this, NULL, type);
+  record (result);
+  return result;
+}
+
 /* Create a recording::memento_of_new_string_literal instance and add it
to this context's list of mementos.
 
@@ -5310,6 +5325,43 @@ memento_of_new_rval

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-22 Thread Di Zhao OS

Updated the fix in attachment.

Is it OK for trunk?

Tested on aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu.

Thanks,
Di Zhao

> -Original Message-
> From: Di Zhao OS 
> Sent: Sunday, December 17, 2023 8:31 PM
> To: Thomas Schwinge ; gcc-patches@gcc.gnu.org
> Cc: Richard Biener 
> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in
> get_reassociation_width
> 
> Hello Thomas,
> 
> > -Original Message-
> > From: Thomas Schwinge 
> > Sent: Friday, December 15, 2023 5:46 PM
> > To: Di Zhao OS ; gcc-patches@gcc.gnu.org
> > Cc: Richard Biener 
> > Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > get_reassociation_width
> >
> > Hi!
> >
> > On 2023-12-13T08:14:28+, Di Zhao OS 
> wrote:
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr110279-2.c
> > > @@ -0,0 +1,41 @@
> > > +/* PR tree-optimization/110279 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-Ofast --param tree-reassoc-width=4 --param fully-
> > pipelined-fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */
> > > +/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } */
> > > +
> > > +#define LOOP_COUNT 8
> > > +typedef double data_e;
> > > +
> > > +#include 
> > > +
> > > +__attribute_noinline__ data_e
> > > +foo (data_e in)
> >
> > Pushed to master branch commit 91e9e8faea4086b3b8aef2355fc12c1559d425f6
> > "Fix 'gcc.dg/pr110279-2.c' syntax error due to '__attribute_noinline__'",
> > see attached.
> >
> > However:
> >
> > > +{
> > > +  data_e a1, a2, a3, a4;
> > > +  data_e tmp, result = 0;
> > > +  a1 = in + 0.1;
> > > +  a2 = in * 0.1;
> > > +  a3 = in + 0.01;
> > > +  a4 = in * 0.59;
> > > +
> > > +  data_e result2 = 0;
> > > +
> > > +  for (int ic = 0; ic < LOOP_COUNT; ic++)
> > > +{
> > > +  /* Test that a complete FMA chain with length=4 is not broken.  */
> > > +  tmp = a1 + a2 * a2 + a3 * a3 + a4 * a4 ;
> > > +  result += tmp - ic;
> > > +  result2 = result2 / 2 - tmp;
> > > +
> > > +  a1 += 0.91;
> > > +  a2 += 0.1;
> > > +  a3 -= 0.01;
> > > +  a4 -= 0.89;
> > > +
> > > +}
> > > +
> > > +  return result + result2;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-not "was chosen for reassociation"
> > "reassoc2"} } */
> > > +/* { dg-final { scan-tree-dump-times {\.FMA } 3 "optimized"} } */
> 
> Thank you for the fix.
> 
> > ..., I still see these latter two tree dump scans FAIL, for GCN:
> >
> > $ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
> >   2 *: a3_40
> >   2 *: a2_39
> > Width = 4 was chosen for reassociation
> > Transforming _15 = powmult_1 + powmult_3;
> >  into _63 = powmult_1 + a1_38;
> > $ grep -F .FMA pr110279-2.c.265t.optimized
> >   _63 = .FMA (a2_39, a2_39, a1_38);
> >   _64 = .FMA (a3_40, a3_40, powmult_5);
> >
> > ..., nvptx:
> >
> > $ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
> >   2 *: a3_40
> >   2 *: a2_39
> > Width = 4 was chosen for reassociation
> > Transforming _15 = powmult_1 + powmult_3;
> >  into _63 = powmult_1 + a1_38;
> > $ grep -F .FMA pr110279-2.c.265t.optimized
> >   _63 = .FMA (a2_39, a2_39, a1_38);
> >   _64 = .FMA (a3_40, a3_40, powmult_5);
> 
> For these 2 targets, the reassoc_width for FMUL is 1 (default value),
> While the testcase assumes that to be 4. The bug was introduced when I
> updated the patch but forgot to update the testcase.
> 
> > ..., but also x86_64-pc-linux-gnu:
> >
> > $  grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
> >   2 *: a3_40
> >   2 *: a2_39
> > Width = 2 was chosen for reassociation
> > Transforming _15 = powmult_1 + powmult_3;
> >  into _63 = powmult_1 + powmult_3;
> > $ grep -cF .FMA pr110279-2.c.265t.optimized
> > 0
> 
> For x86_64 this needs "-mfma". Sorry the compile options missed that.
> Can the change below fix these issues? I moved them into
> testsuite/gcc.target/aarch64, since they rely on tunings.
> 
> Tested on aarch64-unknown-linux-gnu.
> 
> >
> > Grüße
> >  Thomas
> >
> >
> > -
> > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634
> > München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
> > Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht
> > München, HRB 106955
> 
> Thanks,
> Di Zhao
> 
> ---
>  gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-1.c | 3 +--
>  gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-2.c | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
>  rename gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-1.c (83%)
>  rename gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-2.c (78%)
> 
> diff --git a/gcc/testsuite/gcc.dg/pr110279-1.c
> b/gcc/testsuite/gcc.target/aarch64/pr110279-1.c
> similarity index 83%
> rename from gcc/testsuite/gcc.dg/pr110279-1.c
> rename to gcc/testsuite/gcc.target/aarch64/pr110279-1.

[PATCH] libgccjit: Add missing builtins needed by optimizations

2023-12-22 Thread Antoni Boucher

Hi.
This patch adds missing builtins needed by optimizations.
Thanks for the review.
From 5ef20748a140d3384294a4218e6db7420cef692d Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Tue, 3 Jan 2023 15:04:41 -0500
Subject: [PATCH] libgccjit: Add missing builtins needed by optimizations

gcc/jit/ChangeLog:

	* jit-builtins.cc (ensure_optimization_builtins_exist): Add
	popcount builtins.

gcc/testsuite/ChangeLog:

	* jit.dg/all-non-failing-tests.h: New test.
	* jit.dg/test-popcount.c: New test.
---
 gcc/jit/jit-builtins.cc  |  3 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h | 10 +++
 gcc/testsuite/jit.dg/test-popcount.c | 84 
 3 files changed, 97 insertions(+)
 create mode 100644 gcc/testsuite/jit.dg/test-popcount.c

diff --git a/gcc/jit/jit-builtins.cc b/gcc/jit/jit-builtins.cc
index fdd0739789d..c84f2613f6a 100644
--- a/gcc/jit/jit-builtins.cc
+++ b/gcc/jit/jit-builtins.cc
@@ -609,6 +609,9 @@ builtins_manager::ensure_optimization_builtins_exist ()
  We can't loop through all of the builtin_data array, we don't
  support all types yet.  */
   (void)get_builtin_function_by_id (BUILT_IN_TRAP);
+  (void)get_builtin_function_by_id (BUILT_IN_POPCOUNT);
+  (void)get_builtin_function_by_id (BUILT_IN_POPCOUNTL);
+  (void)get_builtin_function_by_id (BUILT_IN_POPCOUNTLL);
 }
 
 /* Playback support.  */
diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h b/gcc/testsuite/jit.dg/all-non-failing-tests.h
index e762563f9bd..b768c8977f0 100644
--- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
+++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
@@ -268,6 +268,13 @@
 #undef create_code
 #undef verify_code
 
+/* test-popcount.c */
+#define create_code create_code_popcount
+#define verify_code verify_code_popcount
+#include "test-popcount.c"
+#undef create_code
+#undef verify_code
+
 /* test-pr103562.c: We don't add this one, since it touches
the optimization level of the context as a whole.  */
 
@@ -488,6 +495,9 @@ const struct testcase testcases[] = {
   {"nested_loop",
create_code_nested_loop,
verify_code_nested_loop},
+  {"popcount",
+   create_code_popcount,
+   verify_code_popcount},
   {"pr66700_observing_write_through_ptr",
create_code_pr66700_observing_write_through_ptr,
verify_code_pr66700_observing_write_through_ptr},
diff --git a/gcc/testsuite/jit.dg/test-popcount.c b/gcc/testsuite/jit.dg/test-popcount.c
new file mode 100644
index 000..6ad241fd2de
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-popcount.c
@@ -0,0 +1,84 @@
+#include 
+#include 
+#include 
+#include 
+
+#include "libgccjit.h"
+
+#include "harness.h"
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{
+  /* Let's try to inject the equivalent of:
+int
+popcount (unsigned int x)
+{
+  int i = 0;
+  while (x)
+{
+  x &= x - 1;
+  ++i;
+}
+  return i;
+}
+   */
+  gcc_jit_type *int_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
+  gcc_jit_type *uint_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_UNSIGNED_INT);
+
+  gcc_jit_param *param_x =
+gcc_jit_context_new_param (
+  ctxt,
+  NULL,
+  uint_type, "x");
+  gcc_jit_param *params[1] = {param_x};
+  gcc_jit_function *func =
+gcc_jit_context_new_function (ctxt,
+  NULL,
+  GCC_JIT_FUNCTION_EXPORTED,
+  int_type,
+  "popcount",
+  1, params, 0);
+
+  gcc_jit_lvalue *x = gcc_jit_param_as_lvalue (param_x);
+  gcc_jit_rvalue *x_rvalue = gcc_jit_lvalue_as_rvalue (x);
+  gcc_jit_lvalue *i =
+gcc_jit_function_new_local (func, NULL, int_type, "i");
+  gcc_jit_rvalue *zero = gcc_jit_context_zero (ctxt, int_type);
+
+  gcc_jit_block *initial =
+gcc_jit_function_new_block (func, "initial");
+  gcc_jit_block *while_block =
+gcc_jit_function_new_block (func, "while");
+
+  gcc_jit_block_add_assignment (initial, NULL, i, zero);
+  gcc_jit_block_end_with_jump (initial, NULL, while_block);
+
+  gcc_jit_block *after =
+gcc_jit_function_new_block (func, "after");
+
+  gcc_jit_block *while_body =
+gcc_jit_function_new_block (func, "while_body");
+  gcc_jit_rvalue *uzero = gcc_jit_context_zero (ctxt, uint_type);
+  gcc_jit_rvalue *cmp =
+gcc_jit_context_new_comparison (ctxt, NULL, GCC_JIT_COMPARISON_NE, x_rvalue, uzero);
+  gcc_jit_block_end_with_conditional (while_block, NULL, cmp, while_body, after);
+
+  gcc_jit_rvalue *uone = gcc_jit_context_one (ctxt, uint_type);
+  gcc_jit_rvalue *sub = gcc_jit_context_new_binary_op (ctxt, NULL, GCC_JIT_BINARY_OP_MINUS, uint_type, x_rvalue, uone);
+  gcc_jit_block_add_assignment_op (while_body, NULL, x, GCC_JIT_BINARY_OP_BITWISE_AND, sub);
+
+  gcc_jit_rvalue *one = gcc_jit_context_one (ctxt, int_type);
+  gcc_jit_block_add_assignment_op (while_body, NULL, i, GCC_JIT_BINARY_OP_PLUS, one);
+  gcc_jit_block_end_with_jump (while_body, NULL, while_block);
+
+  gcc_jit_block_end_with_return(after, NULL, gcc_jit_lvalue_as_rvalue (i));
+}
+
+void
+verify_code (gcc_jit_context *ctxt, gcc_j

Re: [PATCH v3] AArch64: Cleanup memset expansion

2023-12-22 Thread Wilco Dijkstra

v3: rebased to latest trunk

Cleanup memset implementation.  Similar to memcpy/memmove, use an offset and
bytes throughout.  Simplify the complex calculations when optimizing for size
by using a fixed limit.

Passes regress & bootstrap.

gcc/ChangeLog:
* config/aarch64/aarch64.h (MAX_SET_SIZE): New define.
* config/aarch64/aarch64.cc (aarch64_progress_pointer): Remove function.
(aarch64_set_one_block_and_progress_pointer): Simplify and clean up.
(aarch64_expand_setmem): Clean up implementation, use byte offsets,
simplify size calculation.

---

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
3ae42be770400da96ea3d9d25d6e1b2d393d034d..dd3b7988d585277181c478cd022fd7b6285929d0
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -1178,6 +1178,10 @@ typedef struct
mode that should actually be used.  We allow pairs of registers.  */
 #define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TImode)
 
+/* Maximum bytes set for an inline memset expansion.  With -Os use 3 STP
+   and 1 MOVI/DUP (same size as a call).  */
+#define MAX_SET_SIZE(speed) (speed ? 256 : 96)
+
 /* Maximum bytes moved by a single instruction (load/store pair).  */
 #define MOVE_MAX (UNITS_PER_WORD * 2)
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
f9850320f61c5ddccf47e6583d304e5f405a484f..0909b319d16b9a1587314bcfda0a8112b42a663f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26294,15 +26294,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amount)
next, amount);
 }
 
-/* Return a new RTX holding the result of moving POINTER forward by the
-   size of the mode it points to.  */
-
-static rtx
-aarch64_progress_pointer (rtx pointer)
-{
-  return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
-}
-
 typedef auto_vec, 12> copy_ops;
 
 /* Copy one block of size MODE from SRC to DST at offset OFFSET.  */
@@ -26457,45 +26448,21 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove)
   return true;
 }
 
-/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
-   SRC is a register we have created with the duplicated value to be set.  */
+/* Set one block of size MODE at DST at offset OFFSET to value in SRC.  */
 static void
-aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
-   machine_mode mode)
+aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode)
 {
-  /* If we are copying 128bits or 256bits, we can do that straight from
- the SIMD register we prepared.  */
-  if (known_eq (GET_MODE_BITSIZE (mode), 256))
-{
-  mode = GET_MODE (src);
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memset.  */
-  emit_insn (aarch64_gen_store_pair (*dst, src, src));
-
-  /* Move the pointers forward.  */
-  *dst = aarch64_move_pointer (*dst, 32);
-  return;
-}
-  if (known_eq (GET_MODE_BITSIZE (mode), 128))
+  /* Emit explict store pair instructions for 32-byte writes.  */
+  if (known_eq (GET_MODE_SIZE (mode), 32))
 {
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, GET_MODE (src), 0);
-  /* Emit the memset.  */
-  emit_move_insn (*dst, src);
-  /* Move the pointers forward.  */
-  *dst = aarch64_move_pointer (*dst, 16);
+  mode = V16QImode;
+  rtx dst1 = adjust_address (dst, mode, offset);
+  emit_insn (aarch64_gen_store_pair (dst1, src, src));
   return;
 }
-  /* For copying less, we have to extract the right amount from src.  */
-  rtx reg = lowpart_subreg (mode, src, GET_MODE (src));
-
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memset.  */
-  emit_move_insn (*dst, reg);
-  /* Move the pointer forward.  */
-  *dst = aarch64_progress_pointer (*dst);
+  if (known_lt (GET_MODE_SIZE (mode), 16))
+src = lowpart_subreg (mode, src, GET_MODE (src));
+  emit_move_insn (adjust_address (dst, mode, offset), src);
 }
 
 /* Expand a setmem using the MOPS instructions.  OPERANDS are the same
@@ -26524,7 +26491,7 @@ aarch64_expand_setmem_mops (rtx *operands)
 bool
 aarch64_expand_setmem (rtx *operands)
 {
-  int n, mode_bits;
+  int mode_bytes;
   unsigned HOST_WIDE_INT len;
   rtx dst = operands[0];
   rtx val = operands[2], src;
@@ -26537,11 +26504,9 @@ aarch64_expand_setmem (rtx *operands)
   || (STRICT_ALIGNMENT && align < 16))
 return aarch64_expand_setmem_mops (operands);
 
-  bool size_p = optimize_function_for_size_p (cfun);
-
   /* Default the maximum to 256-bytes when considering only libcall vs
  SIMD broadcast sequence.  */
-  unsigned max_set_size = 256;
+  unsigned max_set_size = MAX_SET_SIZE (optimize_function_for_speed_p (cfun));
   unsigned mops_threshold = aarch64_mops_memset_size_threshold;

[PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2023-12-22 Thread Alexander Monakov

From: Daniil Frolov 

PR 66487 is asking to provide sanitizer-like detection for C++ object
lifetime violations that are worked around with -fno-lifetime-dse or
-flifetime-dse=1 in Firefox, LLVM (PR 106943), OpenJade (PR 69534).

The discussion in the PR was centered around extending MSan, but MSan
was not ported to GCC (and requires rebuilding everything with
instrumentation).

Instead, allow Valgrind to see lifetime boundaries by emitting client
requests along *this = { CLOBBER }.  The client request marks the
"clobbered" memory as undefined for Valgrind; clobbering assignments
mark the beginning of ctor and end of dtor execution for C++ objects.
Hence, attempts to read object storage after the destructor, or
"pre-initialize" its fields prior to the constructor will be caught.

Valgrind client requests are offered as macros that emit inline asm.
For use in code generation, let's wrap them as libgcc builtins.

gcc/ChangeLog:

* Makefile.in (OBJS): Add gimple-valgrind-interop.o.
* builtins.def (BUILT_IN_VALGRIND_MAKE_UNDEFINED): New.
* common.opt (-fvalgrind-annotations): New option.
* doc/install.texi (--enable-valgrind-interop): Document.
* doc/invoke.texi (-fvalgrind-annotations): Document.
* passes.def (pass_instrument_valgrind): Add.
* tree-pass.h (make_pass_instrument_valgrind): Declare.
* gimple-valgrind-interop.cc: New file.

libgcc/ChangeLog:

* Makefile.in (LIB2ADD_ST): Add valgrind-interop.c.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac (--enable-valgrind-interop): New flag.
* libgcc2.h (__gcc_vgmc_make_mem_undefined): Declare.
* valgrind-interop.c: New file.

gcc/testsuite/ChangeLog:

* g++.dg/valgrind-annotations-1.C: New test.
* g++.dg/valgrind-annotations-2.C: New test.

Co-authored-by: Alexander Monakov 
---
Changes in v2:

* Take new clobber kinds into account.
* Do not link valgrind-interop.o into libgcc_s.so.

 gcc/Makefile.in   |   1 +
 gcc/builtins.def  |   3 +
 gcc/common.opt|   4 +
 gcc/doc/install.texi  |   5 +
 gcc/doc/invoke.texi   |  27 
 gcc/gimple-valgrind-interop.cc| 125 ++
 gcc/passes.def|   1 +
 gcc/testsuite/g++.dg/valgrind-annotations-1.C |  22 +++
 gcc/testsuite/g++.dg/valgrind-annotations-2.C |  12 ++
 gcc/tree-pass.h   |   1 +
 libgcc/Makefile.in|   3 +
 libgcc/config.in  |   6 +
 libgcc/configure  |  22 ++-
 libgcc/configure.ac   |  15 ++-
 libgcc/libgcc2.h  |   2 +
 libgcc/valgrind-interop.c |  40 ++
 16 files changed, 287 insertions(+), 2 deletions(-)
 create mode 100644 gcc/gimple-valgrind-interop.cc
 create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-1.C
 create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-2.C
 create mode 100644 libgcc/valgrind-interop.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9373800018..d027548203 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1507,6 +1507,7 @@ OBJS = \
gimple-ssa-warn-restrict.o \
gimple-streamer-in.o \
gimple-streamer-out.o \
+   gimple-valgrind-interop.o \
gimple-walk.o \
gimple-warn-recursion.o \
gimplify.o \
diff --git a/gcc/builtins.def b/gcc/builtins.def
index f03df32f98..b05e20e062 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1194,6 +1194,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, 
ATTR_NOTHROW_LEAF_LIST)
 /* Control Flow Redundancy hardening out-of-line checker.  */
 DEF_BUILTIN_STUB (BUILT_IN___HARDCFR_CHECK, "__builtin___hardcfr_check")
 
+/* Wrappers for Valgrind client requests.  */
+DEF_EXT_LIB_BUILTIN (BUILT_IN_VALGRIND_MAKE_UNDEFINED, 
"__gcc_vgmc_make_mem_undefined", BT_FN_VOID_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
+
 /* Synchronization Primitives.  */
 #include "sync-builtins.def"
 
diff --git a/gcc/common.opt b/gcc/common.opt
index d263a959df..2be5b8d0a6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3377,6 +3377,10 @@ Enum(auto_init_type) String(pattern) 
Value(AUTO_INIT_PATTERN)
 EnumValue
 Enum(auto_init_type) String(zero) Value(AUTO_INIT_ZERO)
 
+fvalgrind-annotations
+Common Var(flag_valgrind_annotations) Optimization
+Annotate lifetime boundaries with Valgrind client requests.
+
 ; -fverbose-asm causes extra commentary information to be produced in
 ; the generated assembly code (to make it more readable).  This option
 ; is generally only of use to those who actually need to read the
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index d20b43a5b2..d6e5e5fdaf 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1563,6 +1563,11 @@ Disab

Re: [PATCH] combine: Don't optimize paradoxical SUBREG AND CONST_INT on WORD_REGISTER_OPERATIONS targets [PR112758]

2023-12-22 Thread Eric Botcazou

> Bootstrapped/regtested on x86_64-linux and i686-linux (neither of which
> is WORD_REGISTER_OPERATIONS target), tested on the testcase using
> cross to riscv64-linux but don't have an easy access to a
> WORD_REGISTER_OPERATIONS target to bootstrap/regtest it there.
> 
> Ok for trunk?

Yes, thanks for fixing this.

-- 
Eric Botcazou

Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

2023-12-22 Thread juzhe.zh...@rivai.ai

Yeah.

(define_insn "@pred_msbc"
  [(set (match_operand: 0 "register_operand""=vr, vr, &vr")
  (unspec:
 [(minus:VI
   (match_operand:VI 1 "register_operand" "  0, vr,  vr")
   (match_operand:VI 2 "register_operand" " vr,  0,  vr"))
  (match_operand: 3 "register_operand"" vm, vm,  vm")
  (unspec:
[(match_operand 4 "vector_length_operand" " rK, rK,  rK")
 (match_operand 5 "const_int_operand" "  i,  i,   i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)] UNSPEC_VMSBC))]
  "TARGET_VECTOR"
  "vmsbc.vvm\t%0,%1,%2,%3"
  [(set_attr "type" "vicalu")
   (set_attr "mode" "")
   (set_attr "vl_op_idx" "4")
   (set (attr "avl_type_idx") (const_int 5))])

You should use an attribute to disable alternative 0 and alternative 1 
constraint.


juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2023-12-22 18:29
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： 回复：回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,
What xtheadvector needs to handle is just that destination vector register 
cannot overlap source vector register group for instructions like vmadc/vmsbc. 
That is not what group_overlap means. We nned to add "&" to the registers in 
the corresponding xtheadvector patterns while rvv 1.0 doesn't have this 
constraint.

(define_insn "@pred_th_msbc"
  [(set (match_operand: 0 "register_operand""=&vr")
(unspec:
[(minus:VI
  (match_operand:VI 1 "register_operand" "  vr")
  (match_operand:VI 2 "register_operand" " vr"))
(match_operand: 3 "register_operand"" vm")
(unspec:
  [(match_operand 4 "vector_length_operand" " rK")
(match_operand 5 "const_int_operand" "  i")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)] UNSPEC_VMSBC))]
  "TARGET_XTHEADVECTOR"
  "vmsbc.vvm\t%0,%1,%2,%3"
  [(set_attr "type" "vicalu")
  (set_attr "mode" "")
  (set_attr "vl_op_idx" "4")
  (set (attr "avl_type_idx") (const_int 5))])

Joshua







--
发件人：juzhe.zh...@rivai.ai 
发送时间：2023年12月22日(星期五) 16:07
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

You mean theadvector doesn't want the current RVV1.0 register overlap magic  as 
follows ?
The destination EEW is smaller than the source EEW and the overlap is in the 
lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi 
v0, v0, 3 is legal, but a destination of v1 is not).
The destination EEW is greater than the source EEW, the source EMUL is at least 
1, and the overlap is in the highest-numbered part of the destination register 
group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or 
v4 is not).

If yes, I suggest disable the overlap constraint using attribute, More details 
you can learn from 

(set_attr "group_overlap"


juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2023-12-22 11:33
收件人： 钟居哲; gcc-patches
抄送： jim.wilson.gcc; palmer; andrew; philipp.tomsich; Jeff Law; Christoph 
Müllner; jinma; Cooper Qu
主题： 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,

Thank you for your comprehensive comments.

Classifying theadvector intrinsics into 3 kinds is really important to make our 
patchset more organized. 

For 1) and 3), I will split out the patches soon and hope they will be merged 
quickly.
For 2), according to the differences between vector and xtheadvector, it can be 
classfied into 3 kinds.

First is renamed load/store, renamed narrowing integer right shift, renamed 
narrowing fixed-point clip, and etc. I think we can use ASM targethook to 
rewrite the whole string of the instructions, although it will still be a heavy 
work.
Second is no pseudo instruction like vneg/vfneg. We will add these pseudo 
instructions in binutils to make xtheadvector more compatible with vector.
Third is that destination vector register cannot overlap source vector register 
group for vmadc/vmsbc/widen arithmetic/narrow arithmetic. Currently I cannot 
come up with any better way than pattern copy.  Do you have any suggestions?

Joshua




--
发件人：钟居哲 
发送时间：2023年12月21日(星期四) 07:04
收件人："cooper.joshua"; 
"gcc-patches"
抄　送："jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; Jeff 
Law; "Christoph Müllner"; 
"cooper.joshua"; 
jinma; Cooper Qu
主　题：Re: [PATCH v3 0/6] RISC-V: Support XTheadVector extension

Hi, Joshua.

Thanks for working hard on clean up codes and support tons of work on 
theadvector.

After fully review this patch, I understand you have 3 kinds of theadvector 
intrinsics from the codebase of current RVV1.0 GCC.

1). instructions that can leverage all current codes of RVV1.0 intrinsic with

回复：回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

2023-12-22 Thread joshua

Hi Juzhe,
What xtheadvector needs to handle is just that destination vector register 
cannot overlap source vector register group for instructions like vmadc/vmsbc. 
That is not what group_overlap means. We nned to add "&" to the registers in 
the corresponding xtheadvector patterns while rvv 1.0 doesn't have this 
constraint.
(define_insn "@pred_th_msbc"
 [(set (match_operand: 0 "register_operand" "=&vr")
 (unspec:
 [(minus:VI
 (match_operand:VI 1 "register_operand" " vr")
 (match_operand:VI 2 "register_operand" " vr"))
 (match_operand: 3 "register_operand" " vm")
 (unspec:
 [(match_operand 4 "vector_length_operand" " rK")
 (match_operand 5 "const_int_operand" " i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)] UNSPEC_VMSBC))]
 "TARGET_XTHEADVECTOR"
 "vmsbc.vvm\t%0,%1,%2,%3"
 [(set_attr "type" "vicalu")
 (set_attr "mode" "")
 (set_attr "vl_op_idx" "4")
 (set (attr "avl_type_idx") (const_int 5))])
Joshua
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2023年12月22日(星期五) 16:07
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
You mean theadvector doesn't want the current RVV1.0 register overlap magic as 
follows ?

 * 
The destination EEW is smaller than the source EEW and the overlap is in the 
lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi 
v0, v0, 3 is legal, but a destination of v1 is not).

 * 
The destination EEW is greater than the source EEW, the source EMUL is at least 
1, and the overlap is in the highest-numbered part of the destination register 
group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or 
v4 is not).
If yes, I suggest disable the overlap constraint using attribute, More details 
you can learn from 
(set_attr "group_overlap"
juzhe.zh...@rivai.ai
发件人： joshua 
发送时间： 2023-12-22 11:33
收件人： 钟居哲 ; gcc-patches 

抄送： jim.wilson.gcc ; palmer 
; andrew ; 
philipp.tomsich ; Jeff Law 
; Christoph Müllner 
; jinma ; 
Cooper Qu 
主题： 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,
Thank you for your comprehensive comments.
Classifying theadvector intrinsics into 3 kinds is really important to make our 
patchset more organized. 
For 1) and 3), I will split out the patches soon and hope they will be merged 
quickly.
For 2), according to the differences between vector and xtheadvector, it can be 
classfied into 3 kinds.
First is renamed load/store, renamed narrowing integer right shift, renamed 
narrowing fixed-point clip, and etc. I think we can use ASM targethook to 
rewrite the whole string of the instructions, although it will still be a heavy 
work.
Second is no pseudo instruction like vneg/vfneg. We will add these pseudo 
instructions in binutils to make xtheadvector more compatible with vector.
Third is that destination vector register cannot overlap source vector register 
group for vmadc/vmsbc/widen arithmetic/narrow arithmetic. Currently I cannot 
come up with any better way than pattern copy. Do you have any suggestions?
Joshua
--
发件人：钟居哲 
发送时间：2023年12月21日(星期四) 07:04
收件人："cooper.joshua"; 
"gcc-patches"
抄　送："jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; Jeff 
Law; "Christoph Müllner"; 
"cooper.joshua"; 
jinma; Cooper Qu
主　题：Re: [PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi, Joshua.
Thanks for working hard on clean up codes and support tons of work on 
theadvector.
After fully review this patch, I understand you have 3 kinds of theadvector 
intrinsics from the codebase of current RVV1.0 GCC.
1). instructions that can leverage all current codes of RVV1.0 intrinsic with 
simply adding "th." prefix directly.
2). instructions that leverage current MD patterns but with some tweak and 
patterns copy since they are not simply added "th.".
3). new instructions that current RVV1.0 doesn't have like vlb instructions.
Overal, 1) and 3) look reasonable to me. But 2) need me some time to figure out 
the better way to do that (Current this patch with copying patterns is not 
approach I like)
So, I hope you can break this big patch into 3 different series patches.
1. Support partial theadvector instructions which leverage directly from 
current RVV1.0 with simple adding "th." prefix.
2. Support totally different name theadvector instructions but share same 
patterns as RVV1.0 instructions.
3. Support new headvector instructions like vlib...etc.
I think 1 and 3 separate patc

[x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2023-12-22 Thread Roger Sayle


This patch resolves the second part of PR target/112992, building upon
Hongtao Liu's solution to the first part.

The issue addressed by this patch is that when initializing vectors by
broadcasting integer constants, the compiler has the flexibility to
select the most appropriate vector mode to perform the broadcast, as
long as the resulting vector has an identical bit pattern.  For
example, the following constants are all equivalent:
V4SImode {0x01010101, 0x01010101, 0x01010101, 0x01010101 }
V8HImode {0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101 }
V16QImode {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, ... 0x01 }
So instruction sequences that construct any of these can be used to
construct the others (with a suitable cast/SUBREG).

On x86_64, it turns out that broadcasts of SImode constants are preferred,
as DImode constants often require a longer movabs instruction, and
HImode and QImode broadcasts require multiple uops on some architectures.
Hence, SImode is always the equal shortest/fastest implementation.

Examples of this improvement, can be seen in the testsuite.

gcc.target/i386/pr102021.c
Before:
   0:   48 b8 0c 00 0c 00 0cmovabs $0xc000c000c000c,%rax
   7:   00 0c 00
   a:   62 f2 fd 28 7c c0   vpbroadcastq %rax,%ymm0
  10:   c3  retq

After:
   0:   b8 0c 00 0c 00  mov$0xc000c,%eax
   5:   62 f2 7d 28 7c c0   vpbroadcastd %eax,%ymm0
   b:   c3  retq

and
gcc.target/i386/pr90773-17.c:
Before:
   0:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 7 
   7:   b8 0c 00 00 00  mov$0xc,%eax
   c:   62 f2 7d 08 7a c0   vpbroadcastb %eax,%xmm0
  12:   62 f1 7f 08 7f 02   vmovdqu8 %xmm0,(%rdx)
  18:   c7 42 0f 0c 0c 0c 0cmovl   $0xc0c0c0c,0xf(%rdx)
  1f:   c3  retq

After:
   0:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 7 
   7:   b8 0c 0c 0c 0c  mov$0xc0c0c0c,%eax
   c:   62 f2 7d 08 7c c0   vpbroadcastd %eax,%xmm0
  12:   62 f1 7f 08 7f 02   vmovdqu8 %xmm0,(%rdx)
  18:   c7 42 0f 0c 0c 0c 0cmovl   $0xc0c0c0c,0xf(%rdx)
  1f:   c3  retq

where according to Agner Fog's instruction tables broadcastd is slightly
faster on some microarchitectures, for example Knight's Landing.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-12-21  Roger Sayle  

gcc/ChangeLog
PR target/112992
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Allow call to
ix86_expand_vector_init_duplicate to fail, and return NULL_RTX.
(ix86_broadcast_from_constant): Revert recent change; Return a
suitable MEMREF independently of mode/target combinations.
(ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate
to decide whether expansion is possible/preferrable.  Only try
forcing DImode constants to memory (and trying again) if calling
ix86_expand_vector_init_duplicate fails with an DImode immediate
constant.
(ix86_expand_vector_init_duplicate) : Try using
V4SImode for suitable immediate constants.
: Try using V8SImode for suitable constants.
: Use constant pool for AVX without AVX2.
: Fail for CONST_INT_P, i.e. use constant pool.
: Likewise.
: For CONST_INT_P try using V4SImode via widen.
: For CONT_INT_P try using V8HImode via widen.
: Handle CONT_INTs via simplify_binary_operation.
Allow recursive calls to ix86_expand_vector_init_duplicate to fail.
: For CONST_INT_P try V8SImode via widen.
: For CONST_INT_P try V16HImode via widen.
(ix86_expand_vector_init): Move try using a broadcast for all_same
with ix86_expand_vector_init_duplicate before using constant pool.

gcc/testsuite/ChangeLog
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Update test case.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-11c.c: Likewise.
* gcc.target/i386/pr100865-12c.c: Likewise.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr102021.c: Likewise.
* gcc.

[x86_PATCH] peephole2 to resolve failure of gcc.target/i386/pr43644-2.c

2023-12-22 Thread Roger Sayle


This patch resolves the failure of pr43644-2.c in the testsuite, a code
quality test I added back in July, that started failing as the code GCC
generates for 128-bit values (and their parameter passing) has been in
flux.  After a few attempts at tweaking pattern constraints in the hope
of convincing reload to produce a more aggressive (but potentially
unsafe) register allocation, I think the best solution is to use a
peephole2 to catch/clean-up this specific case.

Specifically, the function:

unsigned __int128 foo(unsigned __int128 x, unsigned long long y) {
  return x+y;
}

currently generates:

foo:movq%rdx, %rcx
movq%rdi, %rax
movq%rsi, %rdx
addq%rcx, %rax
adcq$0, %rdx
ret

and with this patch/peephole2 now generates:

foo:movq%rdx, %rax
movq%rsi, %rdx
addq%rdi, %rax
adcq$0, %rdx
ret

which I believe is optimal.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-12-21  Roger Sayle  

gcc/ChangeLog
PR target/43644
* config/i386/i386.md (define_peephole2): Tweak register allocation
of *add3_doubleword_concat_zext.

gcc/testsuite/ChangeLog
PR target/43644
* gcc.target/i386/pr43644-2.c: Expect 2 movq instructions.


Thanks in advance, and for your patience with this testsuite noise.
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e862368..5967208 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6428,6 +6428,38 @@
  (clobber (reg:CC FLAGS_REG))])]
  "split_double_mode (mode, &operands[0], 1, &operands[0], &operands[5]);")
 
+(define_peephole2
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+   (match_operand:SWI48 1 "general_reg_operand"))
+   (set (match_operand:SWI48 2 "general_reg_operand")
+   (match_operand:SWI48 3 "general_reg_operand"))
+   (set (match_dup 1) (match_operand:SWI48 4 "general_reg_operand"))
+   (parallel [(set (reg:CCC FLAGS_REG)
+  (compare:CCC
+(plus:SWI48 (match_dup 2) (match_dup 0))
+(match_dup 2)))
+ (set (match_dup 2)
+  (plus:SWI48 (match_dup 2) (match_dup 0)))])]
+  "REGNO (operands[0]) != REGNO (operands[1])
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && REGNO (operands[0]) != REGNO (operands[3])
+   && REGNO (operands[0]) != REGNO (operands[4])
+   && REGNO (operands[1]) != REGNO (operands[2])
+   && REGNO (operands[1]) != REGNO (operands[3])
+   && REGNO (operands[1]) != REGNO (operands[4])
+   && REGNO (operands[2]) != REGNO (operands[3])
+   && REGNO (operands[2]) != REGNO (operands[4])
+   && REGNO (operands[3]) != REGNO (operands[4])
+   && peep2_reg_dead_p (4, operands[0])"
+  [(set (match_dup 2) (match_dup 1))
+   (set (match_dup 1) (match_dup 4))
+   (parallel [(set (reg:CCC FLAGS_REG)
+   (compare:CCC
+ (plus:SWI48 (match_dup 2) (match_dup 3))
+ (match_dup 2)))
+  (set (match_dup 2)
+   (plus:SWI48 (match_dup 2) (match_dup 3)))])])
+
 (define_insn "*add_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r,r")
(plus:SWI48
diff --git a/gcc/testsuite/gcc.target/i386/pr43644-2.c 
b/gcc/testsuite/gcc.target/i386/pr43644-2.c
index d470b0a..3316ac6 100644
--- a/gcc/testsuite/gcc.target/i386/pr43644-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr43644-2.c
@@ -6,4 +6,4 @@ unsigned __int128 foo(unsigned __int128 x, unsigned long long y)
   return x+y;
 }
 
-/* { dg-final { scan-assembler-times "movq" 1 } } */
+/* { dg-final { scan-assembler-times "movq" 2 } } */

Re: [PATCH] symtab-thunks: Use aggregate_value_p even on is_gimple_reg_type returns [PR112941]

2023-12-22 Thread Richard Biener




> Am 22.12.2023 um 09:26 schrieb Jakub Jelinek :
> 
> Hi!
> 
> Large/huge _BitInt types are returned in memory and the bitint lowering
> pass right now relies on that.
> The gimplification etc. use aggregate_value_p to see if it should be
> returned in memory or not and use
>   = _123;
>  return ;
> rather than
>  return _123;
> But expand_thunk used e.g. by IPA-ICF was performing an optimization,
> assuming is_gimple_reg_type is always passed in registers and not calling
> aggregate_value_p in that case.  The following patch changes it to match
> what the gimplification etc. are doing.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2023-12-22  Jakub Jelinek  
> 
>PR tree-optimization/112941
>* symtab-thunks.cc (expand_thunk): Check aggregate_value_p regardless
>of whether is_gimple_reg_type (restype) or not.
> 
>* gcc.dg/bitint-60.c: New test.
> 
> --- gcc/symtab-thunks.cc.jj2023-08-24 15:37:28.698418172 +0200
> +++ gcc/symtab-thunks.cc2023-12-21 16:42:41.406127267 +0100
> @@ -479,21 +479,15 @@ expand_thunk (cgraph_node *node, bool ou
> resdecl,
> build_int_cst (TREE_TYPE (resdecl), 0));
>}
> -  else if (!is_gimple_reg_type (restype))
> +  else if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
>{
> -  if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
> -{
> -  restmp = resdecl;
> +  restmp = resdecl;
> 
> -  if (VAR_P (restmp))
> -{
> -  add_local_decl (cfun, restmp);
> -  BLOCK_VARS (DECL_INITIAL (current_function_decl))
> -= restmp;
> -}
> +  if (VAR_P (restmp))
> +{
> +  add_local_decl (cfun, restmp);
> +  BLOCK_VARS (DECL_INITIAL (current_function_decl)) = restmp;
>}
> -  else
> -restmp = create_tmp_var (restype, "retval");
>}
>  else
>restmp = create_tmp_reg (restype, "retval");
> --- gcc/testsuite/gcc.dg/bitint-60.c.jj2023-12-21 16:49:41.289298560 +0100
> +++ gcc/testsuite/gcc.dg/bitint-60.c2023-12-21 16:49:09.061746003 +0100
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/112941 */
> +/* { dg-do compile { target bitint575 } } */
> +/* { dg-options "-O2 -std=c23" } */
> +
> +unsigned _BitInt(495) f1 (signed _BitInt(381) x) { unsigned _BitInt(539) y = 
> x; return y; }
> +unsigned _BitInt(495) f2 (unsigned _BitInt(381) x) { unsigned _BitInt(539) y 
> = x; return y; }
> +unsigned _BitInt(495) f3 (signed _BitInt(381) x) { _BitInt(539) y = x; 
> return y; }
> +unsigned _BitInt(495) f4 (unsigned _BitInt(381) x) { _BitInt(539) y = x; 
> return y; }
> +_BitInt(495) f5 (signed _BitInt(381) x) { unsigned _BitInt(539) y = x; 
> return y; }
> +_BitInt(495) f6 (unsigned _BitInt(381) x) { unsigned _BitInt(539) y = x; 
> return y; }
> +_BitInt(495) f7 (signed _BitInt(381) x) { _BitInt(539) y = x; return y; }
> +_BitInt(495) f8 (unsigned _BitInt(381) x) { _BitInt(539) y = x; return y; }
> +unsigned _BitInt(495) f9 (signed _BitInt(381) x) { return (unsigned 
> _BitInt(539)) x; }
> +unsigned _BitInt(495) f10 (unsigned _BitInt(381) x) { return (unsigned 
> _BitInt(539)) x; }
> +unsigned _BitInt(495) f11 (signed _BitInt(381) x) { return (_BitInt(539)) x; 
> }
> +unsigned _BitInt(495) f12 (unsigned _BitInt(381) x) { return (_BitInt(539)) 
> x; }
> +_BitInt(495) f13 (signed _BitInt(381) x) { return (unsigned _BitInt(539)) x; 
> }
> +_BitInt(495) f14 (unsigned _BitInt(381) x) { return (unsigned _BitInt(539)) 
> x; }
> +_BitInt(495) f15 (signed _BitInt(381) x) { return (_BitInt(539)) x; }
> +_BitInt(495) f16 (unsigned _BitInt(381) x) { return (_BitInt(539)) x; }
> 
>Jakub
>

Re: [PATCH] lower-bitint: Handle unreleased SSA_NAMEs from earlier passes gracefully [PR113102]

2023-12-22 Thread Richard Biener




> Am 22.12.2023 um 09:17 schrieb Jakub Jelinek :
> 
> Hi!
> 
> On the following testcase earlier passes leave around an unreleased
> SSA_NAME - non-GIMPLE_NOP SSA_NAME_DEF_STMT which isn't in any bb.
> The following patch makes bitint lowering resistent against those,
> the first hunk is where we'd for certain kinds of stmts try to ammend
> them and the latter is where we'd otherwise try to remove them,
> neither of which works.  The other loops over all SSA_NAMEs either
> already also check gimple_bb (SSA_NAME_DEF_STMT (s)) or it doesn't
> matter that much if we process it or not (worst case it means e.g.
> the pass wouldn't return early even when it otherwise could).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

> 2023-12-22  Jakub Jelinek  
> 
>PR tree-optimization/113102
>* gimple-lower-bitint.cc (gimple_lower_bitint): Handle unreleased
>large/huge _BitInt SSA_NAMEs.
> 
>* gcc.dg/bitint-59.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj2023-12-21 13:28:56.953120687 +0100
> +++ gcc/gimple-lower-bitint.cc2023-12-21 14:08:00.199704511 +0100
> @@ -5827,7 +5827,7 @@ gimple_lower_bitint (void)
>  tree_code rhs_code;
>  /* Unoptimize certain constructs to simpler alternatives to
> avoid having to lower all of them.  */
> -  if (is_gimple_assign (stmt))
> +  if (is_gimple_assign (stmt) && gimple_bb (stmt))
>switch (rhs_code = gimple_assign_rhs_code (stmt))
>  {
>  default:
> @@ -6690,6 +6690,11 @@ gimple_lower_bitint (void)
>  release_ssa_name (s);
>  continue;
>}
> +  if (gimple_bb (g) == NULL)
> +{
> +  release_ssa_name (s);
> +  continue;
> +}
>  if (gimple_code (g) != GIMPLE_ASM)
>{
>  gimple_stmt_iterator gsi = gsi_for_stmt (g);
> --- gcc/testsuite/gcc.dg/bitint-59.c.jj2023-12-21 14:12:01.860350727 +0100
> +++ gcc/testsuite/gcc.dg/bitint-59.c2023-12-21 14:11:54.766449179 +0100
> @@ -0,0 +1,14 @@
> +/* PR tree-optimization/113102 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +unsigned x;
> +
> +#if __BITINT_MAXWIDTH__ >= 191
> +void
> +foo (void)
> +{
> +  unsigned _BitInt(191) b = x;
> +  ~(b >> x) % 3;
> +}
> +#endif
> 
>Jakub
>

Re: [PATCH] lower-bitint: Fix handle_cast ICE [PR113102]

2023-12-22 Thread Richard Biener




> Am 22.12.2023 um 09:12 schrieb Jakub Jelinek :
> 
> Hi!
> 
> My recent change to use m_data[save_data_cnt] instead of
> m_data[save_data_cnt + 1] when inside of a loop (m_bb is non-NULL)
> broke the following testcase.  When we create a PHI node on the loop
> using prepare_data_in_out, both m_data[save_data_cnt{, + 1}] are
> computed and the fix was right, but there are also cases when we in
> a loop (m_bb non-NULL) emit a nested cast with too few limbs and
> then just use constant indexes for all accesses - in that case
> only m_data[save_data_cnt + 1] is initialized and m_data[save_data_cnt]
> is NULL.  In those cases, we want to use the former.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

> 2023-12-22  Jakub Jelinek  
> 
>PR tree-optimization/113102
>* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Only
>use m_data[save_data_cnt] if it is non-NULL.
> 
>* gcc.dg/bitint-58.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj2023-12-21 11:13:32.0 +0100
> +++ gcc/gimple-lower-bitint.cc2023-12-21 13:28:56.953120687 +0100
> @@ -1491,7 +1491,7 @@ bitint_large_huge::handle_cast (tree lhs
>m_data_cnt = tree_to_uhwi (m_data[save_data_cnt + 2]);
>  if (TYPE_UNSIGNED (rhs_type))
>t = build_zero_cst (m_limb_type);
> -  else if (m_bb)
> +  else if (m_bb && m_data[save_data_cnt])
>t = m_data[save_data_cnt];
>  else
>t = m_data[save_data_cnt + 1];
> --- gcc/testsuite/gcc.dg/bitint-58.c.jj2023-12-21 13:33:25.882383838 +0100
> +++ gcc/testsuite/gcc.dg/bitint-58.c2023-12-21 13:32:54.408821172 +0100
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/113102 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +_BitInt(3) a;
> +#if __BITINT_MAXWIDTH__ >= 4097
> +_BitInt(8) b;
> +_BitInt(495) c;
> +_BitInt(513) d;
> +_BitInt(1085) e;
> +_BitInt(4096) f;
> +
> +void
> +foo (void)
> +{
> +  a -= (_BitInt(4097)) d >> b;
> +}
> +
> +void
> +bar (void)
> +{
> +  __builtin_sub_overflow ((_BitInt(767)) c >> e, 0, &a);
> +}
> +
> +void
> +baz (void)
> +{
> +  _BitInt(768) x = (_BitInt(257))f;
> +  b /= x >> 0 / 0;/* { dg-warning "division by zero" } */
> +}
> +#endif
> 
>Jakub
>

[committed] c++: testsuite: Remove testsuite_tr1.h includes

2023-12-22 Thread Ken Matsui

This patch removes the testsuite_tr1.h dependency from g++.dg/ext/is_*.C
tests since the header is supposed to be used only by libstdc++, not
front-end.  This also includes test code consistency fixes.

For the record this fixes the test failures reported at
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641058.html

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_array.C: Remove testsuite_tr1.h.  Add necessary
definitions accordingly.  Tweak macros for consistency across
test codes.
* g++.dg/ext/is_bounded_array.C: Likewise.
* g++.dg/ext/is_function.C: Likewise.
* g++.dg/ext/is_member_function_pointer.C: Likewise.
* g++.dg/ext/is_member_object_pointer.C: Likewise.
* g++.dg/ext/is_member_pointer.C: Likewise.
* g++.dg/ext/is_object.C: Likewise.
* g++.dg/ext/is_reference.C: Likewise.
* g++.dg/ext/is_scoped_enum.C: Likewise.

Signed-off-by: Ken Matsui 
Reviewed-by: Patrick Palka 
Reviewed-by: Jason Merrill 
---
 gcc/testsuite/g++.dg/ext/is_array.C   | 15 ---
 gcc/testsuite/g++.dg/ext/is_bounded_array.C   | 20 -
 gcc/testsuite/g++.dg/ext/is_function.C| 41 +++
 .../g++.dg/ext/is_member_function_pointer.C   | 14 +++
 .../g++.dg/ext/is_member_object_pointer.C | 26 ++--
 gcc/testsuite/g++.dg/ext/is_member_pointer.C  | 29 ++---
 gcc/testsuite/g++.dg/ext/is_object.C  | 21 --
 gcc/testsuite/g++.dg/ext/is_reference.C   | 28 +++--
 gcc/testsuite/g++.dg/ext/is_scoped_enum.C | 12 ++
 9 files changed, 101 insertions(+), 105 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ext/is_array.C 
b/gcc/testsuite/g++.dg/ext/is_array.C
index facfed5c7cb..f1a6e08b87a 100644
--- a/gcc/testsuite/g++.dg/ext/is_array.C
+++ b/gcc/testsuite/g++.dg/ext/is_array.C
@@ -1,15 +1,14 @@
 // { dg-do compile { target c++11 } }
 
-#include 
+#define SA(X) static_assert((X),#X)
 
-using namespace __gnu_test;
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
 
-#define SA(X) static_assert((X),#X)
-#define SA_TEST_CATEGORY(TRAIT, X, expect) \
-  SA(TRAIT(X) == expect);  \
-  SA(TRAIT(const X) == expect);\
-  SA(TRAIT(volatile X) == expect); \
-  SA(TRAIT(const volatile X) == expect)
+class ClassType { };
 
 SA_TEST_CATEGORY(__is_array, int[2], true);
 SA_TEST_CATEGORY(__is_array, int[], true);
diff --git a/gcc/testsuite/g++.dg/ext/is_bounded_array.C 
b/gcc/testsuite/g++.dg/ext/is_bounded_array.C
index 346790eba12..b5fe435de95 100644
--- a/gcc/testsuite/g++.dg/ext/is_bounded_array.C
+++ b/gcc/testsuite/g++.dg/ext/is_bounded_array.C
@@ -1,21 +1,19 @@
 // { dg-do compile { target c++11 } }
 
-#include 
-
-using namespace __gnu_test;
-
 #define SA(X) static_assert((X),#X)
 
-#define SA_TEST_CONST(TRAIT, TYPE, EXPECT) \
+#define SA_TEST_FN(TRAIT, TYPE, EXPECT)\
   SA(TRAIT(TYPE) == EXPECT);   \
-  SA(TRAIT(const TYPE) == EXPECT)
+  SA(TRAIT(const TYPE) == EXPECT);
 
 #define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
-  SA(TRAIT(TYPE) == EXPECT);   \
-  SA(TRAIT(const TYPE) == EXPECT); \
-  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
   SA(TRAIT(const volatile TYPE) == EXPECT)
 
+class ClassType { };
+
 SA_TEST_CATEGORY(__is_bounded_array, int[2], true);
 SA_TEST_CATEGORY(__is_bounded_array, int[], false);
 SA_TEST_CATEGORY(__is_bounded_array, int[2][3], true);
@@ -31,8 +29,8 @@ SA_TEST_CATEGORY(__is_bounded_array, ClassType[][3], false);
 SA_TEST_CATEGORY(__is_bounded_array, int(*)[2], false);
 SA_TEST_CATEGORY(__is_bounded_array, int(*)[], false);
 SA_TEST_CATEGORY(__is_bounded_array, int(&)[2], false);
-SA_TEST_CONST(__is_bounded_array, int(&)[], false);
+SA_TEST_FN(__is_bounded_array, int(&)[], false);
 
 // Sanity check.
 SA_TEST_CATEGORY(__is_bounded_array, ClassType, false);
-SA_TEST_CONST(__is_bounded_array, void(), false);
+SA_TEST_FN(__is_bounded_array, void(), false);
diff --git a/gcc/testsuite/g++.dg/ext/is_function.C 
b/gcc/testsuite/g++.dg/ext/is_function.C
index 2e1594b12ad..1fc3c96df1f 100644
--- a/gcc/testsuite/g++.dg/ext/is_function.C
+++ b/gcc/testsuite/g++.dg/ext/is_function.C
@@ -1,16 +1,19 @@
 // { dg-do compile { target c++11 } }
 
-#include 
+#define SA(X) static_assert((X),#X)
 
-using namespace __gnu_test;
+#define SA_TEST_FN(TRAIT, TYPE, EXPECT)\
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT);
 
-#define SA(X) static_assert((X),#X)
 #define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
-  SA(TRAIT(TYPE) == EXPECT);

Re:[pushed] [PATCH v2] LoongArch: Add asm modifiers to the LSX and LASX directives in the doc.

2023-12-22 Thread chenglulu


Pushed to r14-6800.

在 2023/12/5 下午2:44, chenxiaolong 写道:

gcc/ChangeLog:

* doc/extend.texi:Add modifiers to the vector of asm in the doc.
* doc/md.texi:Refine the description of the modifier 'f' in the doc.
---
  gcc/doc/extend.texi | 47 +
  gcc/doc/md.texi |  2 +-
  2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 32ae15e1d5b..d87a079704c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -11820,10 +11820,57 @@ The list below describes the supported modifiers and 
their effects for LoongArch
  @item @code{d} @tab Same as @code{c}.
  @item @code{i} @tab Print the character ''@code{i}'' if the operand is not a 
register.
  @item @code{m} @tab Same as @code{c}, but the printed value is @code{operand 
- 1}.
+@item @code{u} @tab Print a LASX register.
+@item @code{w} @tab Print a LSX register.
  @item @code{X} @tab Print a constant integer operand in hexadecimal.
  @item @code{z} @tab Print the operand in its unmodified form, followed by a 
comma.
  @end multitable
  
+References to input and output operands in the assembler template of extended

+asm statements can use modifiers to affect the way the operands are formatted
+in the code output to the assembler.  For example, the following code uses the
+'w' modifier for LoongArch:
+
+@example
+test-asm.c:
+
+#include 
+
+__m128i foo (void)
+@{
+__m128i  a,b,c;
+__asm__ ("vadd.d %w0,%w1,%w2\n\t"
+   :"=f" (c)
+   :"f" (a),"f" (b));
+
+return c;
+@}
+
+@end example
+
+@noindent
+The compile command for the test case is as follows:
+
+@example
+gcc test-asm.c -mlsx -S -o test-asm.s
+@end example
+
+@noindent
+The assembly statement produces the following assembly code:
+
+@example
+vadd.d $vr0,$vr0,$vr1
+@end example
+
+This is a 128-bit vector addition instruction, @code{c} (referred to in the
+template string as %0) is the output, and @code{a} (%1) and @code{b} (%2) are
+the inputs.  @code{__m128i} is a vector data type defined in the  file
+@code{lsxintrin.h} (@xref{LoongArch SX Vector Intrinsics}).  The symbol '=f'
+represents a constraint using a floating-point register as an output type, and
+the 'f' in the input operand represents a constraint using a floating-point
+register operand, which can refer to the definition of a constraint
+(@xref{Constraints}) in gcc.
+
  @anchor{riscvOperandmodifiers}
  @subsubsection RISC-V Operand Modifiers
  
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

index 536ce997f01..2274da5ff69 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -2881,7 +2881,7 @@ $r1h
  @item LoongArch---@file{config/loongarch/constraints.md}
  @table @code
  @item f
-A floating-point register (if available).
+A floating-point or vector register (if available).
  @item k
  A memory operand whose address is formed by a base register and
  (optionally scaled) index register.

Re: [PATCH] testsuite: Remove testsuite_tr1.h

2023-12-22 Thread Ken Matsui

On Thu, Dec 21, 2023 at 11:38 AM Jason Merrill  wrote:
>
> On 12/21/23 10:52, Patrick Palka wrote:
> > On Thu, Dec 21, 2023 at 8:29 AM Patrick Palka  wrote:
> >>
> >> On Wed, 20 Dec 2023, Ken Matsui wrote:
> >>
> >>> This patch removes the testsuite_tr1.h dependency from g++.dg/ext/is_*.C
> >>> tests since the header is supposed to be used only by libstdc++, not
> >>> front-end.  This also includes test code consistency fixes.
> >
> > For the record this fixes the test failures reported at
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641058.html
> >
> >>
> >> LGTM
> >
> > Very minor but let's use the commit title
> >
> >c++: testsuite: Remove testsuite_tr1.h includes
> >
> > to convey that the commit only touches C++ tests, and isn't removing
> > the file testsuite_tr1.h but rather #includes of it :)
>
> OK with that change.
>

Thank you for your all's reviews!  I will soon push.

> Jason
>

[PATCH] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread Juzhe-Zhong

Consider this following case:

foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addisp,sp,-128
addia2,a2,%lo(.LANCHOR0)
mv  a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v   v8
vs8r.v  v8,0(sp) ---> spill
.L3:
vl8re32.v   v16,0(sp)---> reload
vsetvli a4,a1,e8,m2,ta,ma
li  a3,0
vsetvli a5,zero,e32,m8,ta,ma
vmv8r.v v0,v16
vmv.v.x v8,a4
vmv.v.i v24,0
vadd.vv v8,v16,v8
vmv8r.v v16,v24
vs8r.v  v8,0(sp)---> spill
.L4:
addiw   a3,a3,1
vadd.vv v8,v0,v16
vadd.vi v16,v16,1
vadd.vv v24,v24,v8
bne a0,a3,.L4
vsetvli zero,a4,e32,m8,ta,ma
sub a1,a1,a4
vse32.v v24,0(a2)
sllia4,a4,2
add a2,a2,a4
bne a1,zero,.L3
li  a0,0
addisp,sp,128
jr  ra
.L11:
li  a0,0
ret

Pick unexpected LMUL = 8.

The root cause is we didn't involve PHI initial value in the dynamic LMUL 
calculation:

  # j_17 = PHI---> # vect_vec_iv_.8_24 = 
PHI <_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }(5)>

We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it does 
allocate an vector register group for it.

This patch fixes this missing count. Then after this patch we pick up perfect 
LMUL (LMUL = M4)

foo:
ble a0,zero,.L9
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
mv  a2,a0
vsetivlizero,16,e32,m4,ta,ma
vid.v   v20
.L3:
vsetvli a3,a2,e8,m1,ta,ma
li  a5,0
vsetivlizero,16,e32,m4,ta,ma
vmv4r.v v16,v20
vmv.v.i v12,0
vmv.v.x v4,a3
vmv4r.v v8,v12
vadd.vv v20,v20,v4
.L4:
addiw   a5,a5,1
vmv4r.v v4,v8
vadd.vi v8,v8,1
vadd.vv v4,v16,v4
vadd.vv v12,v12,v4
bne a0,a5,.L4
sllia5,a3,2
vsetvli zero,a3,e32,m4,ta,ma
sub a2,a2,a3
vse32.v v12,0(a4)
add a4,a4,a5
bne a2,zero,.L3
.L9:
li  a0,0
ret

Tested on --with-arch=gcv no regression. Ok for trunk ?

PR target/113112

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Refine 
dump information.
(preferred_new_lmul_p): Make PHI initial value into live regs 
calculation.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 45 ---
 .../vect/costmodel/riscv/rvv/pr113112-1.c | 31 +
 2 files changed, 71 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index a316603e207..2d4b82a643a 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -355,10 +355,11 @@ max_number_of_live_regs (const basic_block bb,
 }
 
   if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
-"Maximum lmul = %d, %d number of live V_REG at program "
-"point %d for bb %d\n",
-lmul, max_nregs, live_point, bb->index);
+dump_printf_loc (
+  MSG_NOTE, vect_location,
+  "Maximum lmul = %d, At most %d number of live V_REG at program "
+  "point %d for bb %d\n",
+  lmul, max_nregs, live_point, bb->index);
   return max_nregs;
 }
 
@@ -472,6 +473,41 @@ update_local_live_ranges (
  tree def = gimple_phi_arg_def (phi, j);
  auto *live_ranges = live_ranges_per_bb.get (bb);
  auto *live_range = live_ranges->get (def);
+ if (poly_int_tree_p (def))
+   {
+ /* Insert live range of INTEGER_CST since we will need to
+allocate a vector register for it.
+
+E.g. # j_17 = PHI  will be transformed
+into # vect_vec_iv_.8_24 = PHI <_25(9), { 0, ... }(5)>
+
+The live range for such value is short which only lives
+at program point 0.  */
+ if (live_range)
+   {
+ unsigned int start = (*live_range).first;
+ (*live_range).first = 0;
+ if (dump_enabled_p ())
+   dump_printf_loc (
+ MSG_NOTE, vect_location,
+ "Update %T start point from %d to 0:\n", def, start);
+   }
+ else
+   {
+ live_ranges->p

Re: [r14-6770 Regression] FAIL: gcc.dg/gnu23-tag-4.c (test for excess errors) on Linux/x86_64

2023-12-22 Thread Martin Uecker



Hm, this is weird, as it really seems to depend on the
-march=  So if there is really a difference
between those structs which make them incompatible on
some archs, we should not consider them to be
compatible in general.

struct g { int a[n]; int b; } *y;
{ struct g { int a[4]; int b; } *y2 = y; }

But I do not see what could go wrong here as
sizeof / alignment is the same for n = 4.  So there
is something else I missed



Am Freitag, dem 22.12.2023 um 05:07 +0800 schrieb haochen.jiang:
> On Linux/x86_64,
> 
> 23fee88f84873b0b8b41c8e5a9b229d533fb4022 is the first bad commit
> commit 23fee88f84873b0b8b41c8e5a9b229d533fb4022
> Author: Martin Uecker 
> Date:   Tue Aug 15 14:58:32 2023 +0200
> 
> c23: tag compatibility rules for struct and unions
> 
> caused
> 
> FAIL: gcc.dg/gnu23-tag-4.c (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6770/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/gnu23-tag-4.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/gnu23-tag-4.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at haochen dot jiang at intel.com.)
> (If you met problems with cascadelake related, disabling AVX512F in command 
> line might save that.)
> (However, please make sure that there is no potential problems with AVX512.)

[PATCH] RISC-V: Support -m[no-]unaligned-access

2023-12-22 Thread Wang Pengcheng

These two options are negative alias of -m[no-]strict-align.

This matches LLVM implmentation.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add option alias.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-align-10.c: New test.
* gcc.target/riscv/predef-align-7.c: New test.
* gcc.target/riscv/predef-align-8.c: New test.
* gcc.target/riscv/predef-align-9.c: New test.

Signed-off-by: Wang Pengcheng
---
gcc/config/riscv/riscv.opt | 4 
gcc/testsuite/gcc.target/riscv/predef-align-10.c | 16 
gcc/testsuite/gcc.target/riscv/predef-align-7.c | 15 +++
gcc/testsuite/gcc.target/riscv/predef-align-8.c | 16 
gcc/testsuite/gcc.target/riscv/predef-align-9.c | 15 +++
5 files changed, 66 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-9.c

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index cf207d4dcdf..1e22998ce6e 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -116,6 +116,10 @@ mstrict-align
Target Mask(STRICT_ALIGN) Save
Do not generate unaligned memory accesses.

+munaligned-access
+Target Alias(mstrict-align) NegativeAlias
+Enable unaligned memory accesses.
+
Enum
Name(code_model) Type(enum riscv_code_model)
Known code models (for use with the -mcmodel= option):
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-10.c
b/gcc/testsuite/gcc.target/riscv/predef-align-10.c
new file mode 100644
index 000..c86b2c7a5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-10.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=rocket -munaligned-access" } */
+
+int main() {
+
+/* rocket default is cpu tune param misaligned access slow */
+#if !defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_slow is not set"
+#endif
+
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_fast is
unexpectedly set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-7.c
b/gcc/testsuite/gcc.target/riscv/predef-align-7.c
new file mode 100644
index 000..405f3686c2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-7.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=thead-c906 -mno-unaligned-access" } */
+
+int main() {
+
+#if !defined(__riscv_misaligned_avoid)
+#error "__riscv_misaligned_avoid is not set"
+#endif
+
+#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly
set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-8.c
b/gcc/testsuite/gcc.target/riscv/predef-align-8.c
new file mode 100644
index 000..64072c04a47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-8.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=thead-c906 -munaligned-access" } */
+
+int main() {
+
+/* thead-c906 default is cpu tune param misaligned access fast */
+#if !defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_fast is not set"
+#endif
+
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_slow is
unexpectedly set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-9.c
b/gcc/testsuite/gcc.target/riscv/predef-align-9.c
new file mode 100644
index 000..f5418de87cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-9.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=rocket -mno-unaligned-access" } */
+
+int main() {
+
+#if !defined(__riscv_misaligned_avoid)
+#error "__riscv_misaligned_avoid is not set"
+#endif
+
+#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly
set"
+#endif
+
+ return 0;
+}
-- 
2.20.1

Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-22 Thread waffl3x



  int n = 0;
  auto f = [](this Self){
static_assert(__is_same (decltype(n), int));
decltype((n)) a; // { dg-error {is not captured} }
  };
  f();

Could you clarify if this error being removed was intentional. I do
recall that Patrick Palka wanted to remove this error in his patch, but
it seemed to me like you stated it would be incorrect to allow it.
Since the error is no longer present I assume I am misunderstanding the
exchange.

In any case, let me know if I need to modify my test case or if this
error needs to be added back in.

Alex

[PATCH] combine: Don't optimize paradoxical SUBREG AND CONST_INT on WORD_REGISTER_OPERATIONS targets [PR112758]

2023-12-22 Thread Jakub Jelinek

Hi!

As discussed in the PR, the following testcase is miscompiled on RISC-V
64-bit, because num_sign_bit_copies in one spot pretends the bits in
a paradoxical SUBREG beyond SUBREG_REG SImode are all sign bit copies:
5444  /* For paradoxical SUBREGs on machines where all register 
operations
5445 affect the entire register, just look inside.  Note that 
we are
5446 passing MODE to the recursive call, so the number of sign 
bit
5447 copies will remain relative to that mode, not the inner 
mode.
5448
5449 This works only if loads sign extend.  Otherwise, if we 
get a
5450 reload for the inner part, it may be loaded from the 
stack, and
5451 then we lose all sign bit copies that existed before the 
store
5452 to the stack.  */
5453  if (WORD_REGISTER_OPERATIONS
5454  && load_extend_op (inner_mode) == SIGN_EXTEND
5455  && paradoxical_subreg_p (x)
5456  && MEM_P (SUBREG_REG (x)))
and then optimizes based on that in one place, but then the
r7-1077 optimization triggers in and treats all the upper bits in
paradoxical SUBREG as undefined and performs based on that another
optimization.  The r7-1077 optimization is done only if SUBREG_REG
is either a REG or MEM, from the discussions in the PR seems that if
it is a REG, the upper bits in paradoxical SUBREG on
WORD_REGISTER_OPERATIONS targets aren't really undefined, but we can't
tell what values they have because we don't see the operation which
computed that REG, and for MEM it depends on load_extend_op - if
it is SIGN_EXTEND, the upper bits are sign bit copies and so something
not really usable for the optimization, if ZERO_EXTEND, they are zeros
and it is usable for the optimization, for UNKNOWN I think it is better
to punt as well.

So, the following patch basically disables the r7-1077 optimization
on WORD_REGISTER_OPERATIONS unless we know it is still ok for sure,
which is either if sub_width is >= BITS_PER_WORD because then the
WORD_REGISTER_OPERATIONS rules don't apply, or load_extend_op on a MEM
is ZERO_EXTEND.

Bootstrapped/regtested on x86_64-linux and i686-linux (neither of which
is WORD_REGISTER_OPERATIONS target), tested on the testcase using
cross to riscv64-linux but don't have an easy access to a
WORD_REGISTER_OPERATIONS target to bootstrap/regtest it there.

Ok for trunk?

2023-12-22  Jakub Jelinek  

PR rtl-optimization/112758
* combine.cc (make_compopund_operation_int): Optimize AND of a SUBREG
based on nonzero_bits of SUBREG_REG and constant mask on
WORD_REGISTER_OPERATIONS targets only if it is a zero extending
MEM load.

* gcc.c-torture/execute/pr112758.c: New test.

--- gcc/combine.cc.jj   2023-12-11 23:52:03.528513943 +0100
+++ gcc/combine.cc  2023-12-21 20:25:45.461737423 +0100
@@ -8227,12 +8227,20 @@ make_compound_operation_int (scalar_int_
  int sub_width;
  if ((REG_P (sub) || MEM_P (sub))
  && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
- && sub_width < mode_width)
+ && sub_width < mode_width
+ && (!WORD_REGISTER_OPERATIONS
+ || sub_width >= BITS_PER_WORD
+ /* On WORD_REGISTER_OPERATIONS targets the bits
+beyond sub_mode aren't considered undefined,
+so optimize only if it is a MEM load when MEM loads
+zero extend, because then the upper bits are all zero.  */
+ || (MEM_P (sub)
+ && load_extend_op (sub_mode) == ZERO_EXTEND)))
{
  unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
  unsigned HOST_WIDE_INT mask;
 
- /* original AND constant with all the known zero bits set */
+ /* Original AND constant with all the known zero bits set.  */
  mask = UINTVAL (XEXP (x, 1)) | (~nonzero_bits (sub, sub_mode));
  if ((mask & mode_mask) == mode_mask)
{
--- gcc/testsuite/gcc.c-torture/execute/pr112758.c.jj   2023-12-21 
21:01:43.780755959 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr112758.c  2023-12-21 
21:01:30.521940358 +0100
@@ -0,0 +1,15 @@
+/* PR rtl-optimization/112758 */
+
+int a = -__INT_MAX__ - 1;
+
+int
+main ()
+{
+  if (-__INT_MAX__ - 1U == 0x8000ULL)
+{
+  unsigned long long b = 0x00ffULL;
+  if ((b & a) != 0x00ff8000ULL)
+   __builtin_abort ();
+}
+  return 0;
+}

Jakub

[PATCH v1] LoongArch: Fix insn output of vec_concat templates for LASX.

2023-12-22 Thread Chenghui Pan

When investigaing failure of gcc.dg/vect/slp-reduc-sad.c, following
instruction block are being generated by vec_concatv32qi (which is
generated by vec_initv32qiv16qi) at entrance of foo() function:

  vldx$vr3,$r5,$r6
  vld $vr2,$r5,0
  xvpermi.q   $xr2,$xr3,0x20

causes the reversion of vec_initv32qiv16qi operation's high and
low 128-bit part.

According to other target's similar impl and LSX impl for following
RTL representation, current definition in lasx.md of "vec_concat"
are wrong:

  (set (op0) (vec_concat (op1) (op2)))

For correct behavior, the last argument of xvpermi.q should be 0x02
instead of 0x20. This patch fixes this issue and cleanup the vec_concat
template impl.

gcc/ChangeLog:

* config/loongarch/lasx.md (vec_concatv4di): Delete.
(vec_concatv8si): Delete.
(vec_concatv16hi): Delete.
(vec_concatv32qi): Delete.
(vec_concatv4df): Delete.
(vec_concatv8sf): Delete.
(vec_concat): New template with insn output fixed.
---
 gcc/config/loongarch/lasx.md | 74 
 1 file changed, 7 insertions(+), 67 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index eeac8cd984b..a9d948bb606 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -590,77 +590,17 @@ (define_insn "lasx_xvinsgr2vr_"
   [(set_attr "type" "simd_insert")
(set_attr "mode" "")])
 
-(define_insn "vec_concatv4di"
-  [(set (match_operand:V4DI 0 "register_operand" "=f")
-   (vec_concat:V4DI
- (match_operand:V2DI 1 "register_operand" "0")
- (match_operand:V2DI 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
-
-(define_insn "vec_concatv8si"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-   (vec_concat:V8SI
- (match_operand:V4SI 1 "register_operand" "0")
- (match_operand:V4SI 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
-
-(define_insn "vec_concatv16hi"
-  [(set (match_operand:V16HI 0 "register_operand" "=f")
-   (vec_concat:V16HI
- (match_operand:V8HI 1 "register_operand" "0")
- (match_operand:V8HI 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
-
-(define_insn "vec_concatv32qi"
-  [(set (match_operand:V32QI 0 "register_operand" "=f")
-   (vec_concat:V32QI
- (match_operand:V16QI 1 "register_operand" "0")
- (match_operand:V16QI 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
-
-(define_insn "vec_concatv4df"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-   (vec_concat:V4DF
- (match_operand:V2DF 1 "register_operand" "0")
- (match_operand:V2DF 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "vec_concatv8sf"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-   (vec_concat:V8SF
- (match_operand:V4SF 1 "register_operand" "0")
- (match_operand:V4SF 2 "register_operand" "f")))]
+(define_insn "vec_concat"
+  [(set (match_operand:LASX 0 "register_operand" "=f")
+   (vec_concat:LASX
+ (match_operand: 1 "register_operand" "0")
+ (match_operand: 2 "register_operand" "f")))]
   "ISA_HAS_LASX"
 {
-  return "xvpermi.q\t%u0,%u2,0x20";
+  return "xvpermi.q\t%u0,%u2,0x02";
 }
   [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
+   (set_attr "mode" "")])
 
 ;; xshuf.w
 (define_insn "lasx_xvperm_"
-- 
2.39.3

[PATCH] symtab-thunks: Use aggregate_value_p even on is_gimple_reg_type returns [PR112941]

2023-12-22 Thread Jakub Jelinek

Hi!

Large/huge _BitInt types are returned in memory and the bitint lowering
pass right now relies on that.
The gimplification etc. use aggregate_value_p to see if it should be
returned in memory or not and use
   = _123;
  return ;
rather than
  return _123;
But expand_thunk used e.g. by IPA-ICF was performing an optimization,
assuming is_gimple_reg_type is always passed in registers and not calling
aggregate_value_p in that case.  The following patch changes it to match
what the gimplification etc. are doing.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-22  Jakub Jelinek  

PR tree-optimization/112941
* symtab-thunks.cc (expand_thunk): Check aggregate_value_p regardless
of whether is_gimple_reg_type (restype) or not.

* gcc.dg/bitint-60.c: New test.

--- gcc/symtab-thunks.cc.jj 2023-08-24 15:37:28.698418172 +0200
+++ gcc/symtab-thunks.cc2023-12-21 16:42:41.406127267 +0100
@@ -479,21 +479,15 @@ expand_thunk (cgraph_node *node, bool ou
 resdecl,
 build_int_cst (TREE_TYPE (resdecl), 0));
}
- else if (!is_gimple_reg_type (restype))
+ else if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
{
- if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
-   {
- restmp = resdecl;
+ restmp = resdecl;
 
- if (VAR_P (restmp))
-   {
- add_local_decl (cfun, restmp);
- BLOCK_VARS (DECL_INITIAL (current_function_decl))
-   = restmp;
-   }
+ if (VAR_P (restmp))
+   {
+ add_local_decl (cfun, restmp);
+ BLOCK_VARS (DECL_INITIAL (current_function_decl)) = restmp;
}
- else
-   restmp = create_tmp_var (restype, "retval");
}
  else
restmp = create_tmp_reg (restype, "retval");
--- gcc/testsuite/gcc.dg/bitint-60.c.jj 2023-12-21 16:49:41.289298560 +0100
+++ gcc/testsuite/gcc.dg/bitint-60.c2023-12-21 16:49:09.061746003 +0100
@@ -0,0 +1,20 @@
+/* PR tree-optimization/112941 */
+/* { dg-do compile { target bitint575 } } */
+/* { dg-options "-O2 -std=c23" } */
+
+unsigned _BitInt(495) f1 (signed _BitInt(381) x) { unsigned _BitInt(539) y = 
x; return y; }
+unsigned _BitInt(495) f2 (unsigned _BitInt(381) x) { unsigned _BitInt(539) y = 
x; return y; }
+unsigned _BitInt(495) f3 (signed _BitInt(381) x) { _BitInt(539) y = x; return 
y; }
+unsigned _BitInt(495) f4 (unsigned _BitInt(381) x) { _BitInt(539) y = x; 
return y; }
+_BitInt(495) f5 (signed _BitInt(381) x) { unsigned _BitInt(539) y = x; return 
y; }
+_BitInt(495) f6 (unsigned _BitInt(381) x) { unsigned _BitInt(539) y = x; 
return y; }
+_BitInt(495) f7 (signed _BitInt(381) x) { _BitInt(539) y = x; return y; }
+_BitInt(495) f8 (unsigned _BitInt(381) x) { _BitInt(539) y = x; return y; }
+unsigned _BitInt(495) f9 (signed _BitInt(381) x) { return (unsigned 
_BitInt(539)) x; }
+unsigned _BitInt(495) f10 (unsigned _BitInt(381) x) { return (unsigned 
_BitInt(539)) x; }
+unsigned _BitInt(495) f11 (signed _BitInt(381) x) { return (_BitInt(539)) x; }
+unsigned _BitInt(495) f12 (unsigned _BitInt(381) x) { return (_BitInt(539)) x; 
}
+_BitInt(495) f13 (signed _BitInt(381) x) { return (unsigned _BitInt(539)) x; }
+_BitInt(495) f14 (unsigned _BitInt(381) x) { return (unsigned _BitInt(539)) x; 
}
+_BitInt(495) f15 (signed _BitInt(381) x) { return (_BitInt(539)) x; }
+_BitInt(495) f16 (unsigned _BitInt(381) x) { return (_BitInt(539)) x; }

Jakub

[PATCH v1] LoongArch: Fix ICE when passing two same vector argument consecutively

2023-12-22 Thread Chenghui Pan

Following code will cause ICE on LoongArch target:

  #include 

  extern void bar (__m128i, __m128i);

  __m128i a;

  void
  foo ()
  {
bar (a, a);
  }

It is caused by missing constraint definition in mov_lsx. This
patch fixes the template and remove the unnecessary processing from
loongarch_split_move () function.

This patch also cleanup the redundant definition from
loongarch_split_move () and loongarch_split_move_p ().

gcc/ChangeLog:

* config/loongarch/lasx.md: Use loongarch_split_move and
  loongarch_split_move_p directly.
* config/loongarch/loongarch-protos.h
(loongarch_split_move): Remove unnecessary argument.
(loongarch_split_move_insn_p): Delete.
(loongarch_split_move_insn): Delete.
* config/loongarch/loongarch.cc
(loongarch_split_move_insn_p): Delete.
(loongarch_load_store_insns): Use loongarch_split_move_p
  directly.
(loongarch_split_move): remove the unnecessary processing.
(loongarch_split_move_insn): Delete.
* config/loongarch/lsx.md: Use loongarch_split_move and
  loongarch_split_move_p directly.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lsx/lsx-mov-1.c: New test.
---
 gcc/config/loongarch/lasx.md  |  4 +-
 gcc/config/loongarch/loongarch-protos.h   |  4 +-
 gcc/config/loongarch/loongarch.cc | 49 +--
 gcc/config/loongarch/lsx.md   | 10 ++--
 .../loongarch/vector/lsx/lsx-mov-1.c  | 14 ++
 5 files changed, 24 insertions(+), 57 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-mov-1.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index eeac8cd984b..6418ff52fe5 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -912,10 +912,10 @@ (define_split
   [(set (match_operand:LASX 0 "nonimmediate_operand")
(match_operand:LASX 1 "move_operand"))]
   "reload_completed && ISA_HAS_LASX
-   && loongarch_split_move_insn_p (operands[0], operands[1])"
+   && loongarch_split_move_p (operands[0], operands[1])"
   [(const_int 0)]
 {
-  loongarch_split_move_insn (operands[0], operands[1], curr_insn);
+  loongarch_split_move (operands[0], operands[1]);
   DONE;
 })
 
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index c66ab932d67..7bf21a45c69 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -82,11 +82,9 @@ extern rtx loongarch_legitimize_call_address (rtx);
 
 extern rtx loongarch_subword (rtx, bool);
 extern bool loongarch_split_move_p (rtx, rtx);
-extern void loongarch_split_move (rtx, rtx, rtx);
+extern void loongarch_split_move (rtx, rtx);
 extern bool loongarch_addu16i_imm12_operand_p (HOST_WIDE_INT, machine_mode);
 extern void loongarch_split_plus_constant (rtx *, machine_mode);
-extern bool loongarch_split_move_insn_p (rtx, rtx);
-extern void loongarch_split_move_insn (rtx, rtx, rtx);
 extern void loongarch_split_128bit_move (rtx, rtx);
 extern bool loongarch_split_128bit_move_p (rtx, rtx);
 extern void loongarch_split_256bit_move (rtx, rtx);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 390e3206a17..98709123770 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2562,7 +2562,6 @@ loongarch_split_const_insns (rtx x)
   return low + high;
 }
 
-bool loongarch_split_move_insn_p (rtx dest, rtx src);
 /* Return one word of 128-bit value OP, taking into account the fixed
endianness of certain registers.  BYTE selects from the byte address.  */
 
@@ -2602,7 +2601,7 @@ loongarch_load_store_insns (rtx mem, rtx_insn *insn)
 {
   set = single_set (insn);
   if (set
- && !loongarch_split_move_insn_p (SET_DEST (set), SET_SRC (set)))
+ && !loongarch_split_move_p (SET_DEST (set), SET_SRC (set)))
might_split_p = false;
 }
 
@@ -4220,7 +4219,7 @@ loongarch_split_move_p (rtx dest, rtx src)
SPLIT_TYPE describes the split condition.  */
 
 void
-loongarch_split_move (rtx dest, rtx src, rtx insn_)
+loongarch_split_move (rtx dest, rtx src)
 {
   rtx low_dest;
 
@@ -4258,33 +4257,6 @@ loongarch_split_move (rtx dest, rtx src, rtx insn_)
   loongarch_subword (src, true));
}
 }
-
-  /* This is a hack.  See if the next insn uses DEST and if so, see if we
- can forward SRC for DEST.  This is most useful if the next insn is a
- simple store.  */
-  rtx_insn *insn = (rtx_insn *) insn_;
-  struct loongarch_address_info addr = {};
-  if (insn)
-{
-  rtx_insn *next = next_nonnote_nondebug_insn_bb (insn);
-  if (next)
-   {
- rtx set = single_set (next);
- if (set && SET_SRC (set) == dest)
-   {
- if (MEM_P (src))
-   {
- rtx tmp = XEXP (src, 0);
- loongarch_c

[PATCH] lower-bitint: Handle unreleased SSA_NAMEs from earlier passes gracefully [PR113102]

2023-12-22 Thread Jakub Jelinek

Hi!

On the following testcase earlier passes leave around an unreleased
SSA_NAME - non-GIMPLE_NOP SSA_NAME_DEF_STMT which isn't in any bb.
The following patch makes bitint lowering resistent against those,
the first hunk is where we'd for certain kinds of stmts try to ammend
them and the latter is where we'd otherwise try to remove them,
neither of which works.  The other loops over all SSA_NAMEs either
already also check gimple_bb (SSA_NAME_DEF_STMT (s)) or it doesn't
matter that much if we process it or not (worst case it means e.g.
the pass wouldn't return early even when it otherwise could).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-22  Jakub Jelinek  

PR tree-optimization/113102
* gimple-lower-bitint.cc (gimple_lower_bitint): Handle unreleased
large/huge _BitInt SSA_NAMEs.

* gcc.dg/bitint-59.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2023-12-21 13:28:56.953120687 +0100
+++ gcc/gimple-lower-bitint.cc  2023-12-21 14:08:00.199704511 +0100
@@ -5827,7 +5827,7 @@ gimple_lower_bitint (void)
  tree_code rhs_code;
  /* Unoptimize certain constructs to simpler alternatives to
 avoid having to lower all of them.  */
- if (is_gimple_assign (stmt))
+ if (is_gimple_assign (stmt) && gimple_bb (stmt))
switch (rhs_code = gimple_assign_rhs_code (stmt))
  {
  default:
@@ -6690,6 +6690,11 @@ gimple_lower_bitint (void)
  release_ssa_name (s);
  continue;
}
+ if (gimple_bb (g) == NULL)
+   {
+ release_ssa_name (s);
+ continue;
+   }
  if (gimple_code (g) != GIMPLE_ASM)
{
  gimple_stmt_iterator gsi = gsi_for_stmt (g);
--- gcc/testsuite/gcc.dg/bitint-59.c.jj 2023-12-21 14:12:01.860350727 +0100
+++ gcc/testsuite/gcc.dg/bitint-59.c2023-12-21 14:11:54.766449179 +0100
@@ -0,0 +1,14 @@
+/* PR tree-optimization/113102 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -O2" } */
+
+unsigned x;
+
+#if __BITINT_MAXWIDTH__ >= 191
+void
+foo (void)
+{
+  unsigned _BitInt(191) b = x;
+  ~(b >> x) % 3;
+}
+#endif

Jakub

[PATCH] lower-bitint: Fix handle_cast ICE [PR113102]

2023-12-22 Thread Jakub Jelinek

Hi!

My recent change to use m_data[save_data_cnt] instead of
m_data[save_data_cnt + 1] when inside of a loop (m_bb is non-NULL)
broke the following testcase.  When we create a PHI node on the loop
using prepare_data_in_out, both m_data[save_data_cnt{, + 1}] are
computed and the fix was right, but there are also cases when we in
a loop (m_bb non-NULL) emit a nested cast with too few limbs and
then just use constant indexes for all accesses - in that case
only m_data[save_data_cnt + 1] is initialized and m_data[save_data_cnt]
is NULL.  In those cases, we want to use the former.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-22  Jakub Jelinek  

PR tree-optimization/113102
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Only
use m_data[save_data_cnt] if it is non-NULL.

* gcc.dg/bitint-58.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2023-12-21 11:13:32.0 +0100
+++ gcc/gimple-lower-bitint.cc  2023-12-21 13:28:56.953120687 +0100
@@ -1491,7 +1491,7 @@ bitint_large_huge::handle_cast (tree lhs
m_data_cnt = tree_to_uhwi (m_data[save_data_cnt + 2]);
  if (TYPE_UNSIGNED (rhs_type))
t = build_zero_cst (m_limb_type);
- else if (m_bb)
+ else if (m_bb && m_data[save_data_cnt])
t = m_data[save_data_cnt];
  else
t = m_data[save_data_cnt + 1];
--- gcc/testsuite/gcc.dg/bitint-58.c.jj 2023-12-21 13:33:25.882383838 +0100
+++ gcc/testsuite/gcc.dg/bitint-58.c2023-12-21 13:32:54.408821172 +0100
@@ -0,0 +1,31 @@
+/* PR tree-optimization/113102 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -O2" } */
+
+_BitInt(3) a;
+#if __BITINT_MAXWIDTH__ >= 4097
+_BitInt(8) b;
+_BitInt(495) c;
+_BitInt(513) d;
+_BitInt(1085) e;
+_BitInt(4096) f;
+
+void
+foo (void)
+{
+  a -= (_BitInt(4097)) d >> b;
+}
+
+void
+bar (void)
+{
+  __builtin_sub_overflow ((_BitInt(767)) c >> e, 0, &a);
+}
+
+void
+baz (void)
+{
+  _BitInt(768) x = (_BitInt(257))f;
+  b /= x >> 0 / 0; /* { dg-warning "division by zero" } */
+}
+#endif

Jakub

Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

2023-12-22 Thread juzhe.zh...@rivai.ai

You mean theadvector doesn't want the current RVV1.0 register overlap magic  as 
follows ?
The destination EEW is smaller than the source EEW and the overlap is in the 
lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi 
v0, v0, 3 is legal, but a destination of v1 is not).
The destination EEW is greater than the source EEW, the source EMUL is at least 
1, and the overlap is in the highest-numbered part of the destination register 
group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or 
v4 is not).

If yes, I suggest disable the overlap constraint using attribute, More details 
you can learn from 

(set_attr "group_overlap"


juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2023-12-22 11:33
收件人： 钟居哲; gcc-patches
抄送： jim.wilson.gcc; palmer; andrew; philipp.tomsich; Jeff Law; Christoph 
Müllner; jinma; Cooper Qu
主题： 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,

Thank you for your comprehensive comments.

Classifying theadvector intrinsics into 3 kinds is really important to make our 
patchset more organized. 

For 1) and 3), I will split out the patches soon and hope they will be merged 
quickly.
For 2), according to the differences between vector and xtheadvector, it can be 
classfied into 3 kinds.

First is renamed load/store, renamed narrowing integer right shift, renamed 
narrowing fixed-point clip, and etc. I think we can use ASM targethook to 
rewrite the whole string of the instructions, although it will still be a heavy 
work.
Second is no pseudo instruction like vneg/vfneg. We will add these pseudo 
instructions in binutils to make xtheadvector more compatible with vector.
Third is that destination vector register cannot overlap source vector register 
group for vmadc/vmsbc/widen arithmetic/narrow arithmetic. Currently I cannot 
come up with any better way than pattern copy.  Do you have any suggestions?

Joshua




--
发件人：钟居哲 
发送时间：2023年12月21日(星期四) 07:04
收件人："cooper.joshua"; 
"gcc-patches"
抄　送："jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; Jeff 
Law; "Christoph Müllner"; 
"cooper.joshua"; 
jinma; Cooper Qu
主　题：Re: [PATCH v3 0/6] RISC-V: Support XTheadVector extension

Hi, Joshua.

Thanks for working hard on clean up codes and support tons of work on 
theadvector.

After fully review this patch, I understand you have 3 kinds of theadvector 
intrinsics from the codebase of current RVV1.0 GCC.

1). instructions that can leverage all current codes of RVV1.0 intrinsic with 
simply adding "th." prefix directly.
2). instructions that leverage current MD patterns but with some tweak and 
patterns copy since they are not simply added "th.".
3). new instructions that current RVV1.0 doesn't have like vlb instructions.

Overal, 1) and 3) look reasonable to me. But 2) need me some time to figure out 
the better way to do that (Current this patch with copying patterns is not 
approach I like)

So, I hope you can break this big patch into 3 different series patches.

1. Support partial theadvector instructions which leverage directly from 
current RVV1.0 with simple adding "th." prefix.
2. Support totally different name theadvector instructions but share same 
patterns as RVV1.0 instructions.
3. Support new headvector instructions like vlib...etc.

I think 1 and 3 separate patches can be quickly merged after my more details 
reviewed and approved in the following patches you send like V4 ?.

For 2, it's a bit more complicate, but I think we can support like ARM and 
other targets, use ASM targethook to rewrite the whole string of the 
instructions.
For example, like strided load/store, you can know this instructions from 
attribute:
(set_attr "type" "vlds")






juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2023-12-20 20:20
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v3 0/6] RISC-V: Support XTheadVector extension
This patch series presents gcc implementation of the XTheadVector
extension [1].
 
[1] https://github.com/T-head-Semi/thead-extension-spec/
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in order not to
generate instructions that xtheadvector does not support,
causing 36 changes in vector.md.
 
For the th. prefix issue, we use current_output_insn and
the ASM_OUTPUT_OPCODE hook instead of directly modifying
patterns in vector.md.
 
We have run the GCC test suite and can confirm that there
are no regressions.
 
All the test results can be found in the following links,
Run without xtheadvector:
https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/803686.html
 
Run with xtheadvector:
https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/803687.html
 
Furthermore, we have run the tests in 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/main/examples, 
and all the tests pass

42 matches

Mail list logo