Jakub says:
Then that seems like a bug in the loongarch vec_init pattern(s).
Those really don't have a predicate in any of the backends on the
input operand, so they need to force_reg it if it is something it
can't handle. I've looked e.g. at i386 vec_init and that is exactly
Non functional change, clean up the code.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_expand_vector_init_same): Remove "temp2" and reuse
"temp" instead.
(loongarch_expand_vector_init): Use gcc_unreachable () instead
of gcc_assert (0), and fix
the LSX/LASX code is wrong.
> > Most seriously, the RTX code NE should be mapped to "cneq", not "cne".
>
> The "cneq" in the commit info may be "cune" according to the context?
Oops, indeed.
I'll push the patch with this typo fixed.
>
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
Remove a redundant sign extension.
gcc/ChangeLog:
* config/loongarch/loongarch.md (rotrsi3_extend): New
define_insn.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/rotrw.c: New test.
---
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
We had the following mappings between vfcmp submenmonics and RTX
codes:
(define_code_attr fcc
[(unordered "cun")
(ordered "cor")
(eq "ceq")
(ne "cne")
(uneq "cueq")
(unle "cule")
(unlt "cult")
(le "cle")
On Sun, 2023-12-10 at 01:03 +0800, Xi Ruoyao wrote:
> Update LoongArch instruction costs based on the micro-benchmark results
> on LA464 and LA664. In particular, this allows generating alsl/slli or
> alsl/slli + add pairs for multiplying some constants as on LA464/LA664
> a mul instr
On Sun, 2023-12-24 at 00:56 +0800, Xi Ruoyao wrote:
> On Sat, 2023-12-23 at 15:00 +0800, chenglulu wrote:
> > Hi,
> >
> > This patch will cause the following tests to fail:
> >
> > +FAIL: gcc.dg/vect/pr97081-2.c (internal compiler error: in extract_insn,
> &
here is a problem. My regression test has the following two fail
> items.(based on r14-6787)
> +FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors)
> +FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6
Strange. I didn't see them on r14-6650 (with or without the patch).
--
On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote:
> On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote:
> > > The performance drop has nothing to do with this patch. I found that the
> > > h264 performance compiled
> > > by r14-6787 compared to r14-6421 dropped
ence may be caused by a different binutils version or some
other changes in GCC. I'll figure it out...
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
gcc/ChangeLog:
* config/loongarch/loongarch.md (rotl3):
New define_expand.
* config/loongarch/simd.md (vrotl3): Likewise.
(rotl3): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/rotl-with-rotr.c: New test.
*
On Sat, 2023-12-23 at 18:47 +0800, Xi Ruoyao wrote:
> On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote:
> > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote:
> > > > The performance drop has nothing to do with this patch. I found that
> > > > the h264 performa
On Sun, 2023-12-24 at 01:04 +0800, Xi Ruoyao wrote:
> On Sun, 2023-12-24 at 00:56 +0800, Xi Ruoyao wrote:
> > On Sat, 2023-12-23 at 15:00 +0800, chenglulu wrote:
> > > Hi,
> > >
> > > This patch will cause the following tests to fail:
> > >
> >
gt; + "&& true"
> [(set (match_dup 0) (match_dup 1))
> (set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 4))
> (match_dup 3))]
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Mon, 2023-12-25 at 10:08 +0800, chenglulu wrote:
>
> 在 2023/12/24 下午8:59, Xi Ruoyao 写道:
> > On Sat, 2023-12-23 at 18:47 +0800, Xi Ruoyao wrote:
> > > On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote:
> > > > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote:
float t2y = a[3];
> + float t1z = a[4];
> + float t2z = a[5];
> +
> + if (t1x > t2y || t2x < t1y || t1x > t2z || t2x < t1z || t1y > t2z || t2y
> < t1z)
> + return 0;
> +
> + return 1;
> +}
> +/* { dg-final { scan-tree-dump-times "if" 6 "gim
2.html
where I've raised fp_add cost (which is used for estimating floating-
point compare cost) to 5 instructions and see if it solves your problem
without LOGICAL_OP_NON_SHORT_CIRCUIT?
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
z = a[4];
> + float t2z = a[5];
> +
> + if (t1x > t2y || t2x < t1y || t1x > t2z || t2x < t1z || t1y > t2z || t2y
> < t1z)
> + return 0;
> +
> + return 1;
> +}
> +/* { dg-final { scan-tree-dump-times "if" 6 "gimple" } } */
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Tue, 2023-12-12 at 20:39 +0800, Xi Ruoyao wrote:
> On Tue, 2023-12-12 at 19:59 +0800, Jiahao Xu wrote:
> > > I guess here the problem is floating-point compare instruction is much
> > > more costly than other instructions but the fact is not correctly
> > &g
Ping again.
On Fri, 2023-12-01 at 13:44 +0800, Xi Ruoyao wrote:
> Ping.
>
> On Fri, 2023-11-24 at 17:09 +0800, Xi Ruoyao wrote:
> > With -fno-fp-int-builtin-inexact, trunc is not allowed to raise
> > FE_INEXACT and it should produce an integral result (if the input is not
&
nstruction yet, so we are not really eliding the branches as
LOGICAL_OP_NON_SHORT_CIRCUIT = 1 supposes to do.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
entire SPEC 2017 suite.
The problem with LOGICAL_OP_NON_SHORT_CIRCUIT = 0 is it may regress
fixed-point only code. In practice the usage of -ffast-math is very
rare ("real" Linux packages invoking floating-point operations often
just malfunction with it) and it seems not good to regress common cas
On Wed, 2023-12-13 at 14:32 +0800, Jiahao Xu wrote:
>
> 在 2023/12/13 下午2:21, Xi Ruoyao 写道:
> > On Wed, 2023-12-13 at 14:17 +0800, Jiahao Xu wrote:
> > > This test was extracted from the hot functions of 526.blender_r. Setting
> > > LOGICAL_OP_NON_SHORT_CIRCUIT t
On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:
在 2023/12/10 上午1:03, Xi Ruoyao 写道:
Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.
This allows optimizations like "x * 17" to alsl, and "x * 68" to
After r14-6455 this no longer fails.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vect-ftint-no-inexact.c (xfail): Remove.
---
Tested on loongarch64-linux-gnu. Pushed as obvious.
gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c | 3 +--
1 file changed, 1 insertion(+), 2
We used a branch to load floating-point comparison results into GPR.
This is very slow when the branch is not predictable.
Use the movcf2gr instruction to implement cstore4 if movcf2gr
is fast enough.
gcc/ChangeLog:
* config/loongarch/genopts/loongarch.opt.in (muse-movcf2gr): New
e));
Should we declare this in the file scope instead?
> + bool bit0_p = gimple_zero_one_valued_p (treeop0, nullptr);
> + bool bit1_p = gimple_zero_one_valued_p (treeop1, nullptr);
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
The GCC internal doc says:
X might be a pseudo-register or a 'subreg' of a pseudo-register,
which could either be in a hard register or in memory. Use
'true_regnum' to find out; it will return -1 if the pseudo is in
memory and the hard register number if it is in a register.
ymbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [0 S1
> A8]))) "volatile.c":5:11 -1
> (nil))
>
> The volatile property of the mem here is gone, so the test fails.
Phew. I guess I couldn't reproduce it because I have Jeff's ext-dce
patch in my local repo, which removed the zero_extend...
I'll rework this patch.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases. For example:
float a[1];
float t() { return a[0] + a[8000]; }
is compiled to:
la.local$r13,a
la.local$r12,a+32768
fld.s $f1,$r13,0
fld.s $f0,$r12,-768
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases. For example:
float a[1];
float t() { return a[0] + a[8000]; }
is compiled to:
la.local$r13,a
la.local$r12,a+32768
fld.s $f1,$r13,0
fld.s $f0,$r12,-768
e new
define_insn_and_split produces a better result instead of solely relying
on define_insn_and_split?
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
We are excluding loongarch-opts.h from target libraries, but now struct
loongarch_target and gcc_options are not declared in the target
libraries, causing:
In file included from ../.././gcc/options.h:8,
from ../.././gcc/tm.h:49,
from
Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.
This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
and slli.
gcc/ChangeLog:
PR target/112936
* config/loongarch/loongarch-def.cc
Following the instruction cost fix, we are generating
alsl.w $a0, $a0, $a0, 4
instead of
li.w $t0, 17
mul.w $a0, $t0
for "x * 4", because alsl.w is 4 times faster than mul.w. But we didn't
have a sign-extending pattern for alsl.w, causing an extra slli.w
instruction generated to
With loongarch-def.cc switched from C to C++, we can include rtl.h for
COSTS_N_INSNS, instead of hard coding our own.
THis is a non-functional change for now, but it will make the code more
future-proof in case COSTS_N_INSNS in rtl.h would be changed.
gcc/ChangeLog:
*
and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (3):
LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our
own
LoongArch: Fix instruction costs [PR112936]
LoongArch: Add alslsi3_extend
gcc/config/loongarch/loongarch-def.cc | 42
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases. For example:
float a[1];
float t() { return a[0] + a[8000]; }
is compiled to:
la.local$r13,a
la.local$r12,a+32768
fld.s $f1,$r13,0
fld.s $f0,$r12,-768
duc_fmin_scal_*?
> If so, we probably need a new target selector for fmin/fmax reduction.
Let me try if the [x]vf{min,max} instructions are IEEE-conform. They've
still not released the volume 2 of the instruction manual so I can only
try...
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Sat, 2023-12-30 at 20:25 +0800, Xi Ruoyao wrote:
> On Sat, 2023-12-30 at 12:15 +, Richard Sandiford wrote:
> > This shouldn't be necessary. The test does:
> >
> > for (int i = 0; i < n; i += 2)
> > {
> > x0 = __builtin_fmin (x0, ptr[i + 0]
We already had smin/smax RTL pattern using vfmin/vfmax instructions.
But for smin/smax, it's unspecified what will happen if either operand
contains any NaN operands. So we would not vectorize the loop with
-fno-finite-math-only (the default for all optimization levels expect
-Ofast).
But,
We used a branch to load floating-point comparison results into GPR.
This is very slow when the branch is not predictable.
Implement movfcc so we can reload FCCmode into GPRs, FPRs, and MEM.
Then implement cstore4.
gcc/ChangeLog:
* config/loongarch/loongarch-tune.h
We don't allow SImode in FCC, so constraint z is never really used
here.
gcc/ChangeLog:
* config/loongarch/loongarch.md (movsi_internal): Remove
constraint z.
---
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
gcc/config/loongarch/loongarch.md | 6 +++---
> + return symbolic_pcrel_operand (op, Pmode) ||
> > +symbolic_pcrel_offset_operand (op, Pmode);
> > +})
> > +
> >
> Symbol '||' It shouldn't be at the end of the line.
Indeed.
>
> + return symbolic_pcrel_operand (op, Pmode)
> + || symbolic_pcrel_offset_operand (op, Pmode);
>
> Others LGTM.
> Thanks!
>
> /* snip */
>
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
Pushed v4 as attached, with the format issues fixed and a minor
adjustment in the commit message ("define_insn_and_split" is changed to
"define_insn_and_rewrite" to match the actual change).
On Fri, 2023-12-29 at 19:55 +0800, Xi Ruoyao wrote:
> On Fri, 2023-12-29 at 15:57
gcc/ChangeLog:
* config/loongarch/loongarch.md (bstrins__for_ior_mask):
For the condition, remove unneeded trailing "\" and move "&&" to
follow GNU coding style. NFC.
---
Pushed as obvious.
gcc/config/loongarch/loongarch.md | 4 ++--
1 file changed, 2 insertions(+), 2
On Wed, 2024-01-03 at 16:24 +0800, chenglulu wrote:
> LGTM!
>
> Thanks!
Pushed r14-6890.
FWIW sometimes tree optimizer still fails to emit .reduc_f{max,min} or
it emits them sub-optimally. I've commented in PR112457 but maybe I
should've created a new ticket...
> 在 2024/1/1 上午3:1
as possible. Assuming the rest is ACK'd for the trunk we'll put it into
> the list of optimizations enabled by -O2.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
_effective_target_s390_vx])
> > +|| ([istarget riscv*-*-*]
> > + && [check_effective_target_riscv_v])
>
> Unless I'm missing something, we have copysign in the scalar
> floating-point ISAs as well. So I think this should be
>
> || ([istarget riscv*-*-*]
> && [check_effective_target_hard_float])
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
ive me several hours trying to implement this...
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote:
>
> 在 2024/1/5 下午4:37, Xi Ruoyao 写道:
> > On Fri, 2024-01-05 at 11:40 +0800, Lulu Cheng wrote:
> > > bool
> > > loongarch_explicit_relocs_p (enum loongarch_symbol_type type)
> > > {
> > &g
On Fri, 2024-01-05 at 18:25 +0800, Xi Ruoyao wrote:
> On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote:
> >
> > 在 2024/1/5 下午4:37, Xi Ruoyao 写道:
> > > On Fri, 2024-01-05 at 11:40 +0800, Lulu Cheng wrote:
> > > > bool
> > > > loongarch_ex
On Fri, 2024-01-05 at 20:45 +0800, chenglulu wrote:
>
> 在 2024/1/5 下午7:55, Xi Ruoyao 写道:
> > On Fri, 2024-01-05 at 18:25 +0800, Xi Ruoyao wrote:
> > > On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote:
> > > > 在 2024/1/5 下午4:37, Xi Ruoyao 写道:
> > > &
SA_HAS_DIV32 etc. in the code base? It seems some of them are not
replaced.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
ve_target_loongarch_sx] ||" because SIMD
requires hard float.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
perand:DI 2 "register_operand "="))]
And use
gen_movdi_pcrel64 (operands[0], operands[1], gen_reg_rtx(DImode))
in expand.
> + "TARGET_64BIT"
> + "la.local %0,$r15,%1"
> + [(set_attr "mode" "DI")
> + (set_attr "length" "5")])
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Thu, 2024-01-04 at 11:58 +0800, chenglulu wrote:
>
> 在 2024/1/4 上午11:51, Xi Ruoyao 写道:
> > On Wed, 2023-12-27 at 16:46 +0800, Lulu Cheng wrote:
> > > +(define_insn "movdi_pcrel64"
> > > + [(set (match_operand:DI 0 "register_op
_rtx (DImode);
> + emit_insn (gen_addsi3_extended (t, operands[1], operands[2]));
AFAIK if !TARGET_64BIT a DImode should be actually a pair of hardware
registers, but addsi3_extended don't output such a pair so this seems
invalid...
> + t = gen_lowpart (SImode, t);
> +
uot;")])
>
> +(define_insn "*nsi_internal"
> + [(set (match_operand:SI 0 "register_operand" "=r")
> + (neg_bitwise:SI
> + (not:SI (match_operand:SI 1 "register_operand" "r"))
> + (match_operand:SI 2 "register_operand" "r")))]
> + "TARGET_64BIT"
> + "n\t%0,%2,%1"
> + [(set_attr "type" "logical")
> + (set_attr "mode" "SI")])
>
> ;;
> ;;
> @@ -3167,7 +3210,6 @@ (define_expand "condjump"
> (label_ref (match_operand 1))
> (pc)))])
>
> -
>
> ;;
> ;;
> @@ -3967,10 +4009,13 @@ (define_insn "bytepick_w_"
> (define_insn "bytepick_w__extend"
> [(set (match_operand:DI 0 "register_operand" "=r")
> (sign_extend:DI
> - (ior:SI (lshiftrt (match_operand:SI 1 "register_operand" "r")
> - (const_int ))
> - (ashift (match_operand:SI 2 "register_operand" "r")
> - (const_int bytepick_w_ashift_amount)]
> + (subreg:SI
> + (ior:DI (subreg:DI (lshiftrt
> + (match_operand:SI 1 "register_operand" "r")
> + (const_int )) 0)
> + (subreg:DI (ashift
> + (match_operand:SI 2 "register_operand" "r")
> + (const_int bytepick_w_ashift_amount)) 0)) 0)))]
> "TARGET_64BIT"
> "bytepick.w\t%0,%1,%2,"
> [(set_attr "mode" "SI")])
> diff --git a/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c
> b/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c
> new file mode 100644
> index 000..5753ef69db2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mabi=lp64d -O2" } */
> +/* { dg-final { scan-assembler-not "slli.w\t\\\$r\[0-9\]+,\\\$r\[0-9\]+,0" }
> } */
> +
> +struct pmop
> +{
> + unsigned int op_pmflags;
> + unsigned int op_pmpermflags;
> +};
> +unsigned int PL_hints;
> +
> +struct pmop *pmop;
> +void
> +Perl_newPMOP (int type, int flags)
> +{
> + if (PL_hints & 0x0010)
> + pmop->op_pmpermflags |= 0x0001;
> + if (PL_hints & 0x0004)
> + pmop->op_pmpermflags |= 0x0800;
> + pmop->op_pmflags = pmop->op_pmpermflags;
> +}
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
can-assembler-times "slli.w\t\\\$r\[0-9\]+,\\\$r\[0-9\]+,0"
> 0 } } */
Use scan-assembler-not instead of scan-assembler-times ... 0.
Otherwise LGTM.
> #include
> #define my_min(x, y) ((x) < (y) ? (x) : (y))
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
enable-bootstrap
> --enable-checking=release
> $ make BOOT_FLAGS="-mcmodel=extreme"
>
> What did I do wrong?:-(
BOOT_CFLAGS, not BOOT_FLAGS :).
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote:
> At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote:
> > On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote:
> > > Xi Ruoyao wrote at 12:11pm on Monday, January
> > > 15, 2024:
> > >
On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote:
> Xi Ruoyao 于2024年1月15日周一 12:11写道:
> >
> > On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:
> > > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> > > > At 15:28 +0800 on Saturday 2024-01-1
On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:
> At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote:
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/pr104992.c: Added addition
On Tue, 2024-01-16 at 14:16 +0800, chenglulu wrote:
>
>
> 在 2024/1/16 下午1:34, Xi Ruoyao 写道:
> > Ping.
> >
> > On Fri, 2023-12-15 at 20:56 +0800, Xi Ruoyao wrote:
> > > We don't allow SImode in FCC, so constraint z is never really used
&g
On Tue, 2024-01-16 at 12:58 +0800, Xi Ruoyao wrote:
> On Tue, 2024-01-16 at 10:57 +0800, chenxiaolong wrote:
> > 在 2024-01-15一的 15:50 +0800,Xi Ruoyao写道:
> > > On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote:
> > > > At 14:42 +0800 on the first day
On Tue, 2024-01-16 at 10:57 +0800, chenxiaolong wrote:
> 在 2024-01-15一的 15:50 +0800,Xi Ruoyao写道:
> > On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote:
> > > At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote:
> > > > On Mon, 2024-01-15 at
Ping.
On Fri, 2023-12-15 at 20:56 +0800, Xi Ruoyao wrote:
> We don't allow SImode in FCC, so constraint z is never really used
> here.
>
> gcc/ChangeLog:
>
> * config/loongarch/loongarch.md (movsi_internal): Remove
> constraint z.
> ---
>
> Bootstrappe
ibstdc++-v3/testsuite/lib/dg-options.exp
> @@ -337,6 +337,7 @@ proc add_options_for_libatomic { flags } {
> || ([istarget powerpc*-*-*] && [check_effective_target_ilp32])
> || [istarget riscv*-*-*]
> || ([istarget sparc*-*-linux-gnu] && [check_effective_target_ilp32])
> + || ([istarget i?86-*-*] || [istarget x86_64-*-*])
This seems too overkill as "dg-add-options libatomic" is not intended to
handle 16-byte atomics. Maybe we can fork this to a new dg-add-options
like "add_options_for_libatomic_16b"?
> } {
> global TOOL_OPTIONS
>
> --
> 2.25.1
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Wed, 2024-01-17 at 17:38 +0800, chenglulu wrote:
>
> 在 2024/1/13 下午9:05, Xi Ruoyao 写道:
> > 在 2024-01-13星期六的 15:01 +0800,chenglulu写道:
> > > 在 2024/1/12 下午7:42, Xi Ruoyao 写道:
> > > > 在 2024-01-12星期五的 09:46 +0800,chenglulu写道:
> > > >
> > >
derstand the purpose of adding
> '-fno-tree-vectorize' here.
I don't think -fno-tree-vectorize will make a difference here. This
test case uses __attribute__((vector_size(...))) explicitly so the
vector operation will be used even if -fno-tree-vectorize.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
在 2024-01-13星期六的 15:01 +0800,chenglulu写道:
>
> 在 2024/1/12 下午7:42, Xi Ruoyao 写道:
> > 在 2024-01-12星期五的 09:46 +0800,chenglulu写道:
> >
> > > > I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS:
> > > > we n
1 100644
> --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
> +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
> @@ -2,6 +2,7 @@
> ! { dg-require-effective-target vect_double }
> ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0
> -fpredictive
Ping :).
On Tue, 2023-12-12 at 14:47 +0800, Xi Ruoyao wrote:
> The problem with peephole2 is it uses a naive sliding-window algorithm
> and misses many cases. For example:
>
> float a[1];
> float t() { return a[0] + a[8000]; }
>
> is compiled to:
>
_r". Or we'll hit:
t.c:11:1: internal compiler error: output_operand: operand number
missing after %-letter
> + [(set_attr "type" "move")]
> +)
> +
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
> rperm));
> + tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0);
Likewise.
> + emit_move_insn (tmp, sel);
> + break;
> + case E_V8SFmode:
> + sel = gen_rtx_CONST_VECTOR (E_V8SImode, gen_rtvec_v (d-
> >nelt,
> +
On Thu, 2023-11-23 at 17:12 +0800, chenglulu wrote:
>
> 在 2023/11/23 下午5:02, Xi Ruoyao 写道:
> > On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
> > > The fix_truncv4sfv4si2 template is indeed called when debugging with
> > > gdb.
> > >
> &g
t;)])
The problem is "lroundevenMN2" is not a standard pattern name. The SIMD
version of ftintrne in patch 1 only works because we are expanding
"roundevenM2" (it's a standard pattern name) to UNSPEC_SIMD_FRINTRNE,
and then a define_insn can match (fix (UNSPEC_SIMD_FRINTRNE op)). But
for non-SIMD we don't have roundevenM2.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Thu, 2023-11-23 at 18:12 +0800, Xi Ruoyao wrote:
> On Thu, 2023-11-23 at 17:12 +0800, chenglulu wrote:
> >
> > 在 2023/11/23 下午5:02, Xi Ruoyao 写道:
> > > On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
> > > > The fix_truncv4sfv4si2 template is indeed ca
vst $vr0,$r12,16
jr $r1
But with a define_insn or define_insn_and_split:
la.local$r12,.LANCHOR0
vld $vr0,$r12,0
vftint.w.s $vr0,$vr0
vst $vr0,$r12,16
jr $r1
(Our scalar code also generates sub-optimal frint.s
t's
"not sure if that's a good idea in general" (comment 1 in the PR) so we
can do this in a target-specific way.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
thing that time...
> Thank you!
>
> 在 2023/11/20 上午8:47, Xi Ruoyao 写道:
> > No functional change, just a cleanup.
> >
> > gcc/ChangeLog:
> >
> > * config/loongarch/loongarch.md (lrint_allow_inexact):
> > Remove.
> > (2): Check if
>
rectly setting the sign bit
with LSX vbitseti instruction - it will also set the sign bits for
"junk" elements in the high bits of the vector register but there is no
harm.)
Can we make a target hook to control this?
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Mon, 2023-11-20 at 08:47 +0800, Xi Ruoyao wrote:
> The [1/5] patch is the PR112578 fix at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html.
> It has been changed to remove the nearbyint pattern (because nearbyint
> should not raise FE_INEXACT even if -ffp
ption{-fno-trapping-math}.
> +Note that while the throughput of the sequence is higher than the throughput
> of
> +the non-reciprocal instruction, the precision of the sequence can be
> decreased
> +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994).
> +
> +@opindex m
> - ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> -#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> - ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote:
>
> 在 2023/11/29 上午10:08, Xi Ruoyao 写道:
> > On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
> > > diff --git a/gcc/config/loongarch/predicates.md
> > > b/gcc/config/loongarch/predicates.md
> > >
(loongarch_try_expand_lsx_vshuf_const): Adjust.
> (loongarch_is_extraction_permutation): Adjust.
> (loongarch_expand_vec_perm_const_2): Adjust.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/loongarch/lasx-extract-even_odd-opt.c: New test.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
g in LoongArch backend in order to make ext_dce
work for mem-extend.c too? If yes then any pointers?
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
;)
> > + (fix: (match_operand:FVEC 1 "register_operand" "f")))]
> > + ""
> > + "#"
> > + ""
> > + [(const_int 0)]
> > + {
> > + emit_insn (gen__vftintrz__ (
> > + operands[0], operands
LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)
> #include "loongarch-def.h"
> +#endif
With this change we can revert r14-5634 (remove the #if
!defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)
guards in loongarch-def.h as they'll be unneeded).
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
t that the code can't go here, I will add a prompt
> message here.:-(
If I read the code correctly, this is indeed unreachable so we can just
put gcc_unreachable() here. But maybe I'm wrong.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
sert_eq (const U , const V , int line)
{
static_assert(sizeof (res) == sizeof (ref));
if (!memcmp (, , sizeof(ref)))
return;
dump (res, sizeof (res), "res");
dump (ref, sizeof (ref), "ref");
}
int main()
{
float x[4] = {};
int y[4] = {};
assert_eq(x, y, __LINE
On Wed, 2023-11-29 at 20:37 +0800, Xi Ruoyao wrote:
> On Wed, 2023-11-29 at 17:33 +0800, Xi Ruoyao wrote:
> > On Mon, 2023-11-27 at 23:06 -0700, Jeff Law wrote:
> > > This has (of course) been tested on rv64. It's also been bootstrapped
> > > and regressio
Ping.
On Fri, 2023-11-24 at 17:09 +0800, Xi Ruoyao wrote:
> With -fno-fp-int-builtin-inexact, trunc is not allowed to raise
> FE_INEXACT and it should produce an integral result (if the input is not
> NaN or Inf). Thus FE_INEXACT should not be raised.
>
> But (int)x may raise FE
On Thu, 2023-11-30 at 08:44 -0700, Jeff Law wrote:
>
>
> On 11/29/23 02:33, Xi Ruoyao wrote:
> > On Mon, 2023-11-27 at 23:06 -0700, Jeff Law wrote:
> > > This has (of course) been tested on rv64. It's also been bootstrapped
> > > and regression tested on x86. B
version (D_SoftFloat)
> + return;
> + else
> + {
> + asm nothrow @nogc
> + {
> + "movgr2fcsr $r0,%0" :
> + : "r" (newState & (roundingMask |
> allExceptions));
> +
On Fri, 2023-12-01 at 18:01 +0800, Xi Ruoyao wrote:
> On Fri, 2023-12-01 at 17:55 +0800, mengqinggang wrote:
> > Generate la.tls.desc macro instruction for TLS descriptors model.
> >
> > la.tls.desc expand to
> > pcalau12i $a0, %desc_pc_hi20(a)
> > ld.
en GCC is configured to decide the default.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
ult if
it's supported by the assembler and --with-glibc-version= setting is
high enough...
Currently the only architecture (AFAIK) having TLS desc as the default
is AArch64 because it supports TLS desc since the birthday.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
101 - 200 of 926 matches
Mail list logo