[PATCH 1/2] LoongArch: Use force_reg instead of gen_reg_rtx + emit_move_insn in vec_init expander [PR113033]

2023-12-18 Thread Xi Ruoyao
Jakub says: Then that seems like a bug in the loongarch vec_init pattern(s). Those really don't have a predicate in any of the backends on the input operand, so they need to force_reg it if it is something it can't handle. I've looked e.g. at i386 vec_init and that is exactly

[PATCH 2/2] LoongArch: Clean up vec_init expander

2023-12-18 Thread Xi Ruoyao
Non functional change, clean up the code. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_vector_init_same): Remove "temp2" and reuse "temp" instead. (loongarch_expand_vector_init): Use gcc_unreachable () instead of gcc_assert (0), and fix

Re: Fwd: [PATCH] LoongArch: Fix FP vector comparsons [PR113034]

2023-12-19 Thread Xi Ruoyao
the LSX/LASX code is wrong. > > Most seriously, the RTX code NE should be mapped to "cneq", not "cne". > > The "cneq" in the commit info may be "cune" according to the context? Oops, indeed. I'll push the patch with this typo fixed. > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Add sign_extend pattern for 32-bit rotate shift

2023-12-17 Thread Xi Ruoyao
Remove a redundant sign extension. gcc/ChangeLog: * config/loongarch/loongarch.md (rotrsi3_extend): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/loongarch/rotrw.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?

[PATCH] LoongArch: Fix FP vector comparsons [PR113034]

2023-12-17 Thread Xi Ruoyao
We had the following mappings between vfcmp submenmonics and RTX codes: (define_code_attr fcc [(unordered "cun") (ordered "cor") (eq "ceq") (ne "cne") (uneq "cueq") (unle "cule") (unlt "cult") (le "cle")

Pushed: [PATCH 0/3] LoongArch: Fix instruction costs

2023-12-17 Thread Xi Ruoyao
On Sun, 2023-12-10 at 01:03 +0800, Xi Ruoyao wrote: > Update LoongArch instruction costs based on the micro-benchmark results > on LA464 and LA664.  In particular, this allows generating alsl/slli or > alsl/slli + add pairs for multiplying some constants as on LA464/LA664 > a mul instr

Re: [PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-23 Thread Xi Ruoyao
On Sun, 2023-12-24 at 00:56 +0800, Xi Ruoyao wrote: > On Sat, 2023-12-23 at 15:00 +0800, chenglulu wrote: > > Hi, > > > > This patch will cause the following tests to fail: > > > > +FAIL: gcc.dg/vect/pr97081-2.c (internal compiler error: in extract_insn, > &

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-23 Thread Xi Ruoyao
here is a problem. My regression test has the following two fail > items.(based on r14-6787) > +FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors) > +FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6 Strange. I didn't see them on r14-6650 (with or without the patch). --

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-23 Thread Xi Ruoyao
On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote: > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote: > > > The performance drop has nothing to do with this patch. I found that the > > > h264 performance compiled > > > by r14-6787 compared to r14-6421 dropped

Re: [PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-23 Thread Xi Ruoyao
ence may be caused by a different binutils version or some other changes in GCC. I'll figure it out... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH v2] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-24 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/loongarch.md (rotl3): New define_expand. * config/loongarch/simd.md (vrotl3): Likewise. (rotl3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/rotl-with-rotr.c: New test. *

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-24 Thread Xi Ruoyao
On Sat, 2023-12-23 at 18:47 +0800, Xi Ruoyao wrote: > On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote: > > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote: > > > > The performance drop has nothing to do with this patch. I found that > > > > the h264 performa

Re: [PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-24 Thread Xi Ruoyao
On Sun, 2023-12-24 at 01:04 +0800, Xi Ruoyao wrote: > On Sun, 2023-12-24 at 00:56 +0800, Xi Ruoyao wrote: > > On Sat, 2023-12-23 at 15:00 +0800, chenglulu wrote: > > > Hi, > > > > > > This patch will cause the following tests to fail: > > > > >

Re: [PATCH v1] LoongArch: Fixed bug in *bstrins__for_ior_mask template.

2023-12-25 Thread Xi Ruoyao
gt; +  "&& true" >    [(set (match_dup 0) (match_dup 1)) >     (set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 4)) >   (match_dup 3))] -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-25 Thread Xi Ruoyao
On Mon, 2023-12-25 at 10:08 +0800, chenglulu wrote: > > 在 2023/12/24 下午8:59, Xi Ruoyao 写道: > > On Sat, 2023-12-23 at 18:47 +0800, Xi Ruoyao wrote: > > > On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote: > > > > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote:

Re: [PATCH] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Xi Ruoyao
float t2y = a[3]; > +  float t1z = a[4]; > +  float t2z = a[5]; > + > +  if (t1x > t2y  || t2x < t1y  || t1x > t2z || t2x < t1z || t1y > t2z || t2y > < t1z) > +    return 0; > + > +  return 1; > +} > +/* { dg-final { scan-tree-dump-times "if" 6 "gim

Re: [PATCH] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Xi Ruoyao
2.html where I've raised fp_add cost (which is used for estimating floating- point compare cost) to 5 instructions and see if it solves your problem without LOGICAL_OP_NON_SHORT_CIRCUIT? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Xi Ruoyao
z = a[4]; > +  float t2z = a[5]; > + > +  if (t1x > t2y  || t2x < t1y  || t1x > t2z || t2x < t1z || t1y > t2z || t2y > < t1z) > +    return 0; > + > +  return 1; > +} > +/* { dg-final { scan-tree-dump-times "if" 6 "gimple" } } */ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Xi Ruoyao
On Tue, 2023-12-12 at 20:39 +0800, Xi Ruoyao wrote: > On Tue, 2023-12-12 at 19:59 +0800, Jiahao Xu wrote: > > > I guess here the problem is floating-point compare instruction is much > > > more costly than other instructions but the fact is not correctly > > &g

PING^2: [PATCH v2] Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723]

2023-12-10 Thread Xi Ruoyao
Ping again. On Fri, 2023-12-01 at 13:44 +0800, Xi Ruoyao wrote: > Ping. > > On Fri, 2023-11-24 at 17:09 +0800, Xi Ruoyao wrote: > > With -fno-fp-int-builtin-inexact, trunc is not allowed to raise > > FE_INEXACT and it should produce an integral result (if the input is not &

Re: [PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Xi Ruoyao
nstruction yet, so we are not really eliding the branches as LOGICAL_OP_NON_SHORT_CIRCUIT = 1 supposes to do. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Xi Ruoyao
entire SPEC 2017 suite. The problem with LOGICAL_OP_NON_SHORT_CIRCUIT = 0 is it may regress fixed-point only code. In practice the usage of -ffast-math is very rare ("real" Linux packages invoking floating-point operations often just malfunction with it) and it seems not good to regress common cas

Re: [PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Xi Ruoyao
On Wed, 2023-12-13 at 14:32 +0800, Jiahao Xu wrote: > > 在 2023/12/13 下午2:21, Xi Ruoyao 写道: > > On Wed, 2023-12-13 at 14:17 +0800, Jiahao Xu wrote: > > > This test was extracted from the hot functions of 526.blender_r. Setting > > > LOGICAL_OP_NON_SHORT_CIRCUIT t

Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]

2023-12-13 Thread Xi Ruoyao
On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote: 在 2023/12/10 上午1:03, Xi Ruoyao 写道: Replace the instruction costs in loongarch_rtx_cost_data constructor based on micro-benchmark results on LA464 and LA664. This allows optimizations like "x * 17" to alsl, and "x * 68" to

[PATCH pushed] LoongArch: testsuite: Remove XFAIL in vect-ftint-no-inexact.c

2023-12-12 Thread Xi Ruoyao
After r14-6455 this no longer fails. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-ftint-no-inexact.c (xfail): Remove. --- Tested on loongarch64-linux-gnu. Pushed as obvious. gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c | 3 +-- 1 file changed, 1 insertion(+), 2

[PATCH] LoongArch: Use the movcf2gr instruction to implement cstore4

2023-12-13 Thread Xi Ruoyao
We used a branch to load floating-point comparison results into GPR. This is very slow when the branch is not predictable. Use the movcf2gr instruction to implement cstore4 if movcf2gr is fast enough. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (muse-movcf2gr): New

Re: [PATCH] expr: catch more `a*bool` while expanding [PR 112935]

2023-12-10 Thread Xi Ruoyao
e)); Should we declare this in the file scope instead? > +   bool bit0_p = gimple_zero_one_valued_p (treeop0, nullptr); > +   bool bit1_p = gimple_zero_one_valued_p (treeop1, nullptr); -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Fix infinite secondary reloading of FCCmode [PR113148]

2023-12-26 Thread Xi Ruoyao
The GCC internal doc says: X might be a pseudo-register or a 'subreg' of a pseudo-register, which could either be in a hard register or in memory. Use 'true_regnum' to find out; it will return -1 if the pseudo is in memory and the hard register number if it is in a register.

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-27 Thread Xi Ruoyao
ymbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [0 S1 > A8]))) "volatile.c":5:11 -1 >  (nil)) > > The volatile property of the mem here is gone, so the test fails. Phew. I guess I couldn't reproduce it because I have Jeff's ext-dce patch in my local repo, which removed the zero_extend... I'll rework this patch. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH v3] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-28 Thread Xi Ruoyao
The problem with peephole2 is it uses a naive sliding-window algorithm and misses many cases. For example: float a[1]; float t() { return a[0] + a[8000]; } is compiled to: la.local$r13,a la.local$r12,a+32768 fld.s $f1,$r13,0 fld.s $f0,$r12,-768

[PATCH v2] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-25 Thread Xi Ruoyao
The problem with peephole2 is it uses a naive sliding-window algorithm and misses many cases. For example: float a[1]; float t() { return a[0] + a[8000]; } is compiled to: la.local$r13,a la.local$r12,a+32768 fld.s $f1,$r13,0 fld.s $f0,$r12,-768

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-21 Thread Xi Ruoyao
e new define_insn_and_split produces a better result instead of solely relying on define_insn_and_split? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Fix warnings building libgcc

2023-12-09 Thread Xi Ruoyao
We are excluding loongarch-opts.h from target libraries, but now struct loongarch_target and gcc_options are not declared in the target libraries, causing: In file included from ../.././gcc/options.h:8, from ../.././gcc/tm.h:49, from

[PATCH 2/3] LoongArch: Fix instruction costs [PR112936]

2023-12-09 Thread Xi Ruoyao
Replace the instruction costs in loongarch_rtx_cost_data constructor based on micro-benchmark results on LA464 and LA664. This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl and slli. gcc/ChangeLog: PR target/112936 * config/loongarch/loongarch-def.cc

[PATCH 3/3] LoongArch: Add alslsi3_extend

2023-12-09 Thread Xi Ruoyao
Following the instruction cost fix, we are generating alsl.w $a0, $a0, $a0, 4 instead of li.w $t0, 17 mul.w $a0, $t0 for "x * 4", because alsl.w is 4 times faster than mul.w. But we didn't have a sign-extending pattern for alsl.w, causing an extra slli.w instruction generated to

[PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own

2023-12-09 Thread Xi Ruoyao
With loongarch-def.cc switched from C to C++, we can include rtl.h for COSTS_N_INSNS, instead of hard coding our own. THis is a non-functional change for now, but it will make the code more future-proof in case COSTS_N_INSNS in rtl.h would be changed. gcc/ChangeLog: *

[PATCH 0/3] LoongArch: Fix instruction costs

2023-12-09 Thread Xi Ruoyao
and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (3): LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own LoongArch: Fix instruction costs [PR112936] LoongArch: Add alslsi3_extend gcc/config/loongarch/loongarch-def.cc | 42

[PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-11 Thread Xi Ruoyao
The problem with peephole2 is it uses a naive sliding-window algorithm and misses many cases. For example: float a[1]; float t() { return a[0] + a[8000]; } is compiled to: la.local$r13,a la.local$r12,a+32768 fld.s $f1,$r13,0 fld.s $f0,$r12,-768

Re: [PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-30 Thread Xi Ruoyao
duc_fmin_scal_*? > If so, we probably need a new target selector for fmin/fmax reduction. Let me try if the [x]vf{min,max} instructions are IEEE-conform. They've still not released the volume 2 of the instruction manual so I can only try... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-30 Thread Xi Ruoyao
On Sat, 2023-12-30 at 20:25 +0800, Xi Ruoyao wrote: > On Sat, 2023-12-30 at 12:15 +, Richard Sandiford wrote: > > This shouldn't be necessary.  The test does: > > > >   for (int i = 0; i < n; i += 2) > >     { > >   x0 = __builtin_fmin (x0, ptr[i + 0]

[PATCH] LoongArch: Provide fmin/fmax RTL pattern for vectors

2023-12-31 Thread Xi Ruoyao
We already had smin/smax RTL pattern using vfmin/vfmax instructions. But for smin/smax, it's unspecified what will happen if either operand contains any NaN operands. So we would not vectorize the loop with -fno-finite-math-only (the default for all optimization levels expect -Ofast). But,

[PATCH v2] LoongArch: Implement FCCmode reload and cstore4

2023-12-15 Thread Xi Ruoyao
We used a branch to load floating-point comparison results into GPR. This is very slow when the branch is not predictable. Implement movfcc so we can reload FCCmode into GPRs, FPRs, and MEM. Then implement cstore4. gcc/ChangeLog: * config/loongarch/loongarch-tune.h

[PATCH] LoongArch: Remove constraint z from movsi_internal

2023-12-15 Thread Xi Ruoyao
We don't allow SImode in FCC, so constraint z is never really used here. gcc/ChangeLog: * config/loongarch/loongarch.md (movsi_internal): Remove constraint z. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 6 +++---

Re: [PATCH v3] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-29 Thread Xi Ruoyao
> +  return symbolic_pcrel_operand (op, Pmode) || > > +symbolic_pcrel_offset_operand (op, Pmode); > > +}) > > + > >   > Symbol '||' It shouldn't be at the end of the line. Indeed. > > +  return symbolic_pcrel_operand (op, Pmode) > +    || symbolic_pcrel_offset_operand (op, Pmode); > > Others LGTM. > Thanks! > > /* snip */ > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH v4] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-29 Thread Xi Ruoyao
Pushed v4 as attached, with the format issues fixed and a minor adjustment in the commit message ("define_insn_and_split" is changed to "define_insn_and_rewrite" to match the actual change). On Fri, 2023-12-29 at 19:55 +0800, Xi Ruoyao wrote: > On Fri, 2023-12-29 at 15:57

[PATCH pushed] LoongArch: Fix the format of bstrins__for_ior_mask condition (NFC)

2023-12-29 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/loongarch.md (bstrins__for_ior_mask): For the condition, remove unneeded trailing "\" and move "&&" to follow GNU coding style. NFC. --- Pushed as obvious. gcc/config/loongarch/loongarch.md | 4 ++-- 1 file changed, 2 insertions(+), 2

Pushed: [PATCH] LoongArch: Provide fmin/fmax RTL pattern for vectors

2024-01-03 Thread Xi Ruoyao
On Wed, 2024-01-03 at 16:24 +0800, chenglulu wrote: > LGTM! > > Thanks! Pushed r14-6890. FWIW sometimes tree optimizer still fails to emit .reduc_f{max,min} or it emits them sub-optimally. I've commented in PR112457 but maybe I should've created a new ticket... > 在 2024/1/1 上午3:1

Re: [RFA] [V3] new pass for sign/zero extension elimination

2024-01-04 Thread Xi Ruoyao
as possible.  Assuming the rest is ACK'd for the trunk we'll put it into > the list of optimizations enabled by -O2. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH]middle-end: Don't apply copysign optimization if target does not implement optab [PR112468]

2024-01-04 Thread Xi Ruoyao
_effective_target_s390_vx]) > > +|| ([istarget riscv*-*-*] > > + && [check_effective_target_riscv_v]) > > Unless I'm missing something, we have copysign in the scalar > floating-point ISAs as well.  So I think this should be > >   || ([istarget riscv*-*-*] >   && [check_effective_target_hard_float]) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-05 Thread Xi Ruoyao
ive me several hours trying to implement this... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-05 Thread Xi Ruoyao
On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote: > > 在 2024/1/5 下午4:37, Xi Ruoyao 写道: > > On Fri, 2024-01-05 at 11:40 +0800, Lulu Cheng wrote: > > >   bool > > >   loongarch_explicit_relocs_p (enum loongarch_symbol_type type) > > >   { > > &g

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-05 Thread Xi Ruoyao
On Fri, 2024-01-05 at 18:25 +0800, Xi Ruoyao wrote: > On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote: > > > > 在 2024/1/5 下午4:37, Xi Ruoyao 写道: > > > On Fri, 2024-01-05 at 11:40 +0800, Lulu Cheng wrote: > > > >   bool > > > >   loongarch_ex

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-05 Thread Xi Ruoyao
On Fri, 2024-01-05 at 20:45 +0800, chenglulu wrote: > > 在 2024/1/5 下午7:55, Xi Ruoyao 写道: > > On Fri, 2024-01-05 at 18:25 +0800, Xi Ruoyao wrote: > > > On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote: > > > > 在 2024/1/5 下午4:37, Xi Ruoyao 写道: > > > &

Re: [PATCH 1/4] LoongArch: Handle ISA evolution switches along with other options

2024-01-05 Thread Xi Ruoyao
SA_HAS_DIV32 etc. in the code base? It seems some of them are not replaced. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH]middle-end: Don't apply copysign optimization if target does not implement optab [PR112468]

2024-01-05 Thread Xi Ruoyao
ve_target_loongarch_sx] ||" because SIMD requires hard float. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 1/2] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-03 Thread Xi Ruoyao
perand:DI 2 "register_operand "="))] And use gen_movdi_pcrel64 (operands[0], operands[1], gen_reg_rtx(DImode)) in expand. > + "TARGET_64BIT" > + "la.local %0,$r15,%1" > + [(set_attr "mode" "DI") > +  (set_attr "length" "5")]) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 1/2] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-03 Thread Xi Ruoyao
On Thu, 2024-01-04 at 11:58 +0800, chenglulu wrote: > > 在 2024/1/4 上午11:51, Xi Ruoyao 写道: > > On Wed, 2023-12-27 at 16:46 +0800, Lulu Cheng wrote: > > > +(define_insn "movdi_pcrel64" > > > + [(set (match_operand:DI 0 "register_op

Re: [PATCH 2/3] LoongArch: Redundant sign extension elimination optimization.

2024-01-06 Thread Xi Ruoyao
_rtx (DImode); > +   emit_insn (gen_addsi3_extended (t, operands[1], operands[2])); AFAIK if !TARGET_64BIT a DImode should be actually a pair of hardware registers, but addsi3_extended don't output such a pair so this seems invalid... > +   t = gen_lowpart (SImode, t); > +  

Re: [PATCH 1/3] LoongArch: Optimized some of the symbolic expansion instructions generated during bitwise operations.

2024-01-06 Thread Xi Ruoyao
uot;")]) >   > +(define_insn "*nsi_internal" > +  [(set (match_operand:SI 0 "register_operand" "=r") > + (neg_bitwise:SI > +     (not:SI (match_operand:SI 1 "register_operand" "r")) > +     (match_operand:SI 2 "register_operand" "r")))] > +  "TARGET_64BIT" > +  "n\t%0,%2,%1" > +  [(set_attr "type" "logical") > +   (set_attr "mode" "SI")]) >   >  ;; >  ;;  > @@ -3167,7 +3210,6 @@ (define_expand "condjump" >     (label_ref (match_operand 1)) >     (pc)))]) >   > - >   >  ;; >  ;;  > @@ -3967,10 +4009,13 @@ (define_insn "bytepick_w_" >  (define_insn "bytepick_w__extend" >    [(set (match_operand:DI 0 "register_operand" "=r") >   (sign_extend:DI > -   (ior:SI (lshiftrt (match_operand:SI 1 "register_operand" "r") > -     (const_int )) > -   (ashift (match_operand:SI 2 "register_operand" "r") > -   (const_int bytepick_w_ashift_amount)] > + (subreg:SI > +   (ior:DI (subreg:DI (lshiftrt > +   (match_operand:SI 1 "register_operand" "r") > +   (const_int )) 0) > +   (subreg:DI (ashift > +   (match_operand:SI 2 "register_operand" "r") > +   (const_int bytepick_w_ashift_amount)) 0)) 0)))] >    "TARGET_64BIT" >    "bytepick.w\t%0,%1,%2," >    [(set_attr "mode" "SI")]) > diff --git a/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c > b/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c > new file mode 100644 > index 000..5753ef69db2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mabi=lp64d -O2" } */ > +/* { dg-final { scan-assembler-not "slli.w\t\\\$r\[0-9\]+,\\\$r\[0-9\]+,0" } > } */ > + > +struct pmop > +{ > +  unsigned int op_pmflags; > +  unsigned int op_pmpermflags; > +}; > +unsigned int PL_hints; > + > +struct pmop *pmop; > +void > +Perl_newPMOP (int type, int flags) > +{ > +  if (PL_hints & 0x0010) > +    pmop->op_pmpermflags |= 0x0001; > +  if (PL_hints & 0x0004) > +    pmop->op_pmpermflags |= 0x0800; > +  pmop->op_pmflags = pmop->op_pmpermflags; > +} -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 3/3] LoongArch: Redundant sign extension elimination optimization 2.

2024-01-06 Thread Xi Ruoyao
can-assembler-times "slli.w\t\\\$r\[0-9\]+,\\\$r\[0-9\]+,0" > 0 } } */ Use scan-assembler-not instead of scan-assembler-times ... 0. Otherwise LGTM. >  #include >  #define my_min(x, y) ((x) < (y) ? (x) : (y)) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-12 Thread Xi Ruoyao
enable-bootstrap > --enable-checking=release >     $ make BOOT_FLAGS="-mcmodel=extreme" > > What did I do wrong?:-( BOOT_CFLAGS, not BOOT_FLAGS :). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote: > At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote: > > On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote: > > > Xi Ruoyao wrote at 12:11pm on Monday, January > > > 15, 2024: > > >

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote: > Xi Ruoyao 于2024年1月15日周一 12:11写道: > > > > On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote: > > > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote: > > > > At 15:28 +0800 on Saturday 2024-01-1

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote: > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote: > > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote: > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/pr104992.c: Added addition

Re: Ping: [PATCH] LoongArch: Remove constraint z from movsi_internal

2024-01-15 Thread Xi Ruoyao
On Tue, 2024-01-16 at 14:16 +0800, chenglulu wrote: > > > 在 2024/1/16 下午1:34, Xi Ruoyao 写道: > > Ping. > > > > On Fri, 2023-12-15 at 20:56 +0800, Xi Ruoyao wrote: > > > We don't allow SImode in FCC, so constraint z is never really used &g

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-16 Thread Xi Ruoyao
On Tue, 2024-01-16 at 12:58 +0800, Xi Ruoyao wrote: > On Tue, 2024-01-16 at 10:57 +0800, chenxiaolong wrote: > > 在 2024-01-15一的 15:50 +0800,Xi Ruoyao写道: > > > On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote: > > > > At 14:42 +0800 on the first day

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-15 Thread Xi Ruoyao
On Tue, 2024-01-16 at 10:57 +0800, chenxiaolong wrote: > 在 2024-01-15一的 15:50 +0800,Xi Ruoyao写道: > > On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote: > > > At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote: > > > > On Mon, 2024-01-15 at

Ping: [PATCH] LoongArch: Remove constraint z from movsi_internal

2024-01-15 Thread Xi Ruoyao
Ping. On Fri, 2023-12-15 at 20:56 +0800, Xi Ruoyao wrote: > We don't allow SImode in FCC, so constraint z is never really used > here. > > gcc/ChangeLog: > > * config/loongarch/loongarch.md (movsi_internal): Remove > constraint z. > --- > > Bootstrappe

Re: [PATCH] libstdc++: atomic: Add missing clear_padding in __atomic_float constructor

2024-01-16 Thread Xi Ruoyao
ibstdc++-v3/testsuite/lib/dg-options.exp > @@ -337,6 +337,7 @@ proc add_options_for_libatomic { flags } { >    || ([istarget powerpc*-*-*] && [check_effective_target_ilp32]) >    || [istarget riscv*-*-*] >    || ([istarget sparc*-*-linux-gnu] && [check_effective_target_ilp32]) > + || ([istarget i?86-*-*] || [istarget x86_64-*-*]) This seems too overkill as "dg-add-options libatomic" is not intended to handle 16-byte atomics. Maybe we can fork this to a new dg-add-options like "add_options_for_libatomic_16b"? >         } { >   global TOOL_OPTIONS >   > --  > 2.25.1 -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-17 Thread Xi Ruoyao
On Wed, 2024-01-17 at 17:38 +0800, chenglulu wrote: > > 在 2024/1/13 下午9:05, Xi Ruoyao 写道: > > 在 2024-01-13星期六的 15:01 +0800,chenglulu写道: > > > 在 2024/1/12 下午7:42, Xi Ruoyao 写道: > > > > 在 2024-01-12星期五的 09:46 +0800,chenglulu写道: > > > > > > >

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-17 Thread Xi Ruoyao
derstand the purpose of adding > '-fno-tree-vectorize' here. I don't think -fno-tree-vectorize will make a difference here. This test case uses __attribute__((vector_size(...))) explicitly so the vector operation will be used even if -fno-tree-vectorize. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-13 Thread Xi Ruoyao
在 2024-01-13星期六的 15:01 +0800,chenglulu写道: > > 在 2024/1/12 下午7:42, Xi Ruoyao 写道: > > 在 2024-01-12星期五的 09:46 +0800,chenglulu写道: > > > > > > I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS: > > > > we n

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-13 Thread Xi Ruoyao
1 100644 > --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f > +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f > @@ -2,6 +2,7 @@ >  ! { dg-require-effective-target vect_double } >  ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 > -fpredictive

Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-21 Thread Xi Ruoyao
Ping :). On Tue, 2023-12-12 at 14:47 +0800, Xi Ruoyao wrote: > The problem with peephole2 is it uses a naive sliding-window algorithm > and misses many cases.  For example: > >     float a[1]; >     float t() { return a[0] + a[8000]; } > > is compiled to: >

Re: [PATCH] LoongArch: Added TLS Le Relax support.

2023-12-19 Thread Xi Ruoyao
_r". Or we'll hit: t.c:11:1: internal compiler error: output_operand: operand number missing after %-letter > +  [(set_attr "type" "move")] > +) > + -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Merge constant vector permuatation implementations.

2023-12-28 Thread Xi Ruoyao
   > rperm)); > +   tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0); Likewise. > +   emit_move_insn (tmp, sel); > +   break; > +     case E_V8SFmode: > +   sel = gen_rtx_CONST_VECTOR (E_V8SImode, gen_rtvec_v (d- > >nelt, > +

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread Xi Ruoyao
On Thu, 2023-11-23 at 17:12 +0800, chenglulu wrote: > > 在 2023/11/23 下午5:02, Xi Ruoyao 写道: > > On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote: > > > The fix_truncv4sfv4si2 template is indeed called when debugging with > > > gdb. > > > > &g

Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread Xi Ruoyao
t;)]) The problem is "lroundevenMN2" is not a standard pattern name. The SIMD version of ftintrne in patch 1 only works because we are expanding "roundevenM2" (it's a standard pattern name) to UNSPEC_SIMD_FRINTRNE, and then a define_insn can match (fix (UNSPEC_SIMD_FRINTRNE op)). But for non-SIMD we don't have roundevenM2. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread Xi Ruoyao
On Thu, 2023-11-23 at 18:12 +0800, Xi Ruoyao wrote: > On Thu, 2023-11-23 at 17:12 +0800, chenglulu wrote: > > > > 在 2023/11/23 下午5:02, Xi Ruoyao 写道: > > > On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote: > > > > The fix_truncv4sfv4si2 template is indeed ca

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread Xi Ruoyao
vst $vr0,$r12,16 jr $r1 But with a define_insn or define_insn_and_split: la.local$r12,.LANCHOR0 vld $vr0,$r12,0 vftint.w.s $vr0,$vr0 vst $vr0,$r12,16 jr $r1 (Our scalar code also generates sub-optimal frint.s

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread Xi Ruoyao
t's "not sure if that's a good idea in general" (comment 1 in the PR) so we can do this in a target-specific way. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread Xi Ruoyao
thing that time... > Thank you! > > 在 2023/11/20 上午8:47, Xi Ruoyao 写道: > > No functional change, just a cleanup. > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.md (lrint_allow_inexact): > > Remove. > > (2): Check if >

Re: [PATCH] rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]

2023-11-25 Thread Xi Ruoyao
rectly setting the sign bit with LSX vbitseti instruction - it will also set the sign bits for "junk" elements in the high bits of the vector register but there is no harm.) Can we make a target hook to control this? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations

2023-11-28 Thread Xi Ruoyao
On Mon, 2023-11-20 at 08:47 +0800, Xi Ruoyao wrote: > The [1/5] patch is the PR112578 fix at > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html. > It has been changed to remove the nearbyint pattern (because nearbyint > should not raise FE_INEXACT even if -ffp

Re: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-28 Thread Xi Ruoyao
ption{-fno-trapping-math}. > +Note that while the throughput of the sequence is higher than the throughput > of > +the non-reciprocal instruction, the precision of the sequence can be > decreased > +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994). > + > +@opindex m

Re: [PATCH v1] LoongArch: Remove duplicate definition of CLZ_DEFINED_VALUE_AT_ZERO.

2023-11-28 Thread Xi Ruoyao
> -  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2) > -#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ > -  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-28 Thread Xi Ruoyao
On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote: > > 在 2023/11/29 上午10:08, Xi Ruoyao 写道: > > On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote: > > > diff --git a/gcc/config/loongarch/predicates.md > > > b/gcc/config/loongarch/predicates.md > > >

Re: [PATCH v1 2/2] LoongArch: Optimize vector constant extract-{even/odd} permutation.

2023-11-29 Thread Xi Ruoyao
(loongarch_try_expand_lsx_vshuf_const): Adjust. > (loongarch_is_extraction_permutation): Adjust. > (loongarch_expand_vec_perm_const_2): Adjust. > > gcc/testsuite/ChangeLog: > > * gcc.target/loongarch/lasx-extract-even_odd-opt.c: New test. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-29 Thread Xi Ruoyao
g in LoongArch backend in order to make ext_dce work for mem-extend.c too? If yes then any pointers? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-22 Thread Xi Ruoyao
;) > > + (fix: (match_operand:FVEC 1 "register_operand" "f")))] > > +  "" > > +  "#" > > +  "" > > +  [(const_int 0)] > > +  { > > +    emit_insn (gen__vftintrz__ ( > > +  operands[0], operands

Re: [PATCH v1 1/2] LoongArch: Switch loongarch-def from C to C++ to make it possible.

2023-12-02 Thread Xi Ruoyao
LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS) >  #include "loongarch-def.h" > +#endif With this change we can revert r14-5634 (remove the #if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS) guards in loongarch-def.h as they'll be unneeded). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1 1/2] LoongArch: Switch loongarch-def from C to C++ to make it possible.

2023-12-02 Thread Xi Ruoyao
t that the code can't go here, I will add a prompt > message here.:-( If I read the code correctly, this is indeed unreachable so we can just put gcc_unreachable() here. But maybe I'm wrong. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Modify the check type of the vector builtin function.

2023-12-05 Thread Xi Ruoyao
sert_eq (const U , const V , int line) { static_assert(sizeof (res) == sizeof (ref)); if (!memcmp (, , sizeof(ref))) return; dump (res, sizeof (res), "res"); dump (ref, sizeof (ref), "ref"); } int main() { float x[4] = {}; int y[4] = {}; assert_eq(x, y, __LINE

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-29 Thread Xi Ruoyao
On Wed, 2023-11-29 at 20:37 +0800, Xi Ruoyao wrote: > On Wed, 2023-11-29 at 17:33 +0800, Xi Ruoyao wrote: > > On Mon, 2023-11-27 at 23:06 -0700, Jeff Law wrote: > > > This has (of course) been tested on rv64.  It's also been bootstrapped > > > and regressio

Ping: [PATCH v2] Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723]

2023-11-30 Thread Xi Ruoyao
Ping. On Fri, 2023-11-24 at 17:09 +0800, Xi Ruoyao wrote: > With -fno-fp-int-builtin-inexact, trunc is not allowed to raise > FE_INEXACT and it should produce an integral result (if the input is not > NaN or Inf).  Thus FE_INEXACT should not be raised. > > But (int)x may raise FE

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-30 Thread Xi Ruoyao
On Thu, 2023-11-30 at 08:44 -0700, Jeff Law wrote: > > > On 11/29/23 02:33, Xi Ruoyao wrote: > > On Mon, 2023-11-27 at 23:06 -0700, Jeff Law wrote: > > > This has (of course) been tested on rv64.  It's also been bootstrapped > > > and regression tested on x86.  B

Re: [PATCH v2 3/3] libphobos: LoongArch hardware support.

2023-12-01 Thread Xi Ruoyao
    version (D_SoftFloat) > +    return; > +    else > +    { > +    asm nothrow @nogc > +    { > +    "movgr2fcsr $r0,%0" : > +    : "r" (newState & (roundingMask | > allExceptions)); > +   

Re: [PATCH] LoongArch: Add support for TLS descriptors

2023-12-01 Thread Xi Ruoyao
On Fri, 2023-12-01 at 18:01 +0800, Xi Ruoyao wrote: > On Fri, 2023-12-01 at 17:55 +0800, mengqinggang wrote: > > Generate la.tls.desc macro instruction for TLS descriptors model. > > > > la.tls.desc expand to > >   pcalau12i $a0, %desc_pc_hi20(a) > >   ld.

Re: [PATCH] LoongArch: Add support for TLS descriptors

2023-12-01 Thread Xi Ruoyao
en GCC is configured to decide the default. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Add support for TLS descriptors

2023-12-01 Thread Xi Ruoyao
ult if it's supported by the assembler and --with-glibc-version= setting is high enough... Currently the only architecture (AFAIK) having TLS desc as the default is AArch64 because it supports TLS desc since the birthday. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

<    1   2   3   4   5   6   7   8   9   10   >