[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-21 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #22 from chenglulu  ---
(In reply to Xi Ruoyao from comment #21)
> (In reply to chenglulu from comment #19)
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index e7835ae34ae..6a808cb0a5c 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -2383,7 +2383,7 @@ loongarch_address_insns (rtx x, machine_mode mode,
> > bool might_split_p)
> > return factor;
> >  
> >case ADDRESS_REG_REG:
> > -   return factor;
> > +   return factor * 3;
> >  
> >case ADDRESS_CONST_INT:
> > return lsx_p ? 0 : factor;
> > 
> > With this patch, -march=la464 has a score of 11.9.
> > However, the specific revision plan has not yet been decided.
> 
> Hmm are ldx and stx really so slow?

I think it's more like it's because LDX/STX uses an extra register.

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-21 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #20 from chenglulu  ---
(In reply to chenglulu from comment #19)
> diff --git a/gcc/config/loongarch/loongarch.cc
> b/gcc/config/loongarch/loongarch.cc
> index e7835ae34ae..6a808cb0a5c 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -2383,7 +2383,7 @@ loongarch_address_insns (rtx x, machine_mode mode,
> bool might_split_p)
> return factor;
>  
>case ADDRESS_REG_REG:
> -   return factor;
> +   return factor * 3;
>  
>case ADDRESS_CONST_INT:
> return lsx_p ? 0 : factor;
> 
> With this patch, -march=la464 has a score of 11.9.
> However, the specific revision plan has not yet been decided.

This is the score of R14-9540

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-21 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #19 from chenglulu  ---
diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index e7835ae34ae..6a808cb0a5c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2383,7 +2383,7 @@ loongarch_address_insns (rtx x, machine_mode mode, bool
might_split_p)
return factor;

   case ADDRESS_REG_REG:
-   return factor;
+   return factor * 3;

   case ADDRESS_CONST_INT:
return lsx_p ? 0 : factor;

With this patch, -march=la464 has a score of 11.9.
However, the specific revision plan has not yet been decided.

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-14 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #18 from chenglulu  ---
(In reply to Xi Ruoyao from comment #17)
> Strangely PR114074 is a wrong-code (instead of missed-optimization) and
> reverting its fix seems improving performance for other targets...

This is very strange. I tried turning off reg_reg addressing on the basis of
r14-9540, and the performance was not much different from r14-9539. But
unfortunately I still don’t know why

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-14 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #16 from chenglulu  ---
The performance degradation on LoongArch is caused by one commit:

commit e0e9499aeffdaca88f0f29334384aa5f710a81a4 (HEAD -> trunk)
Author: Richard Biener 
Date:   Tue Mar 19 12:24:08 2024 +0100

tree-optimization/114151 - revert PR114074 fix

The following reverts the chrec_fold_multiply fix and only keeps
handling of constant overflow which keeps the original testcase
fixed.  A better solution might involve ranger improvements or
tracking of assumptions during SCEV analysis similar to what niter
analysis does.

PR tree-optimization/114151
PR tree-optimization/114269
PR tree-optimization/114322
PR tree-optimization/114074
* tree-chrec.cc (chrec_fold_multiply): Restrict the use of
unsigned arithmetic when actual overflow on constant operands
is observed.

* gcc.dg/pr68317.c: Revert last change.
The scores before and after this patch are:
(-g -Ofast -march=la464)
r14-9539: 12.3
r14-9540: 9.26

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-09 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #15 from chenglulu  ---
(In reply to Chen Chen from comment #14)
> (In reply to Xi Ruoyao from comment #13)
> > (In reply to Chen Chen from comment #12)
> > 
> > > No. I used system default gcc.
> > 
> > AOSC backports *many* changes not in upstream GCC 13.2 to their "13.2":
> > https://github.com/AOSC-Dev/aosc-os-abbs/tree/stable/core-devel/gcc/01-
> > runtime/patches
> > 
> > So the default GCC is simply not GCC 13.2.
> 
> You are correct. The above 13.2 results should be "AOSC system default gcc
> 13.2" results. Under AOSC system I recompiled official gcc 13.2 source with
> the same parameters except for "--with-tune=la664" (changed to
> "--with-tune=la464" since gcc 13.2 does not support "LA664" architecture).
> The test results from official gcc 13.2 are following:
> 
> -g -Ofast -march=native  : 6.54 (400s)
> -g -Ofast -march=native -flto: 6.57 (399s)
> -g -Ofast -march=la464   : 6.46 (405s)
> -g -Ofast -march=la464 -flto : 6.57 (399s)

The data of r13.2 I tested is similar to this. I am currently testing gcc with
the AOSC patch.

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-09 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #11 from chenglulu  ---
(In reply to Chen Chen from comment #0)
> We tested Loongarch64 CPU Loongson 3A6000 with "LA664" architecture in Linux
> operating system AOSC OS 11.4.0 (default gcc version is 13.2.0). And we
> found the 548.exchange2_r benchmark from SPEC 2017 INTrate suite suffered
> significant regressions from 14% to 28% with various compiling options.
> 
> The rate-1 results are following:
> 
> after snapshot 20240317 score 14.3-19.3% lower with parameters "-g -Ofast
> -march=native":
> 13.2.0:11.7 (223s) [gcc 13.2.0, system default]
Hi:

 I can't reproduce the score of r13.2. Have you made any modifications there?

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #8)
> 
> > diff --git a/gcc/config/loongarch/loongarch-def.cc
> > b/gcc/config/loongarch/loongarch-def.cc
> > index e8c129ce643..f27284cb20a 100644
> > --- a/gcc/config/loongarch/loongarch-def.cc
> > +++ b/gcc/config/loongarch/loongarch-def.cc
> > @@ -111,11 +111,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
> >   tune targets (i.e. -mtune=native while PRID does not correspond to
> >   any known "-mtune" type).  */
> >  array_tune loongarch_cpu_rtx_cost_data =
> > -  array_tune ()
> > -.set (CPU_LA664,
> > - loongarch_rtx_cost_data ()
> > -   .movcf2gr_ (COSTS_N_INSNS (1))
> > -   .movgr2cf_ (COSTS_N_INSNS (1)));
> > +  array_tune ();
> 
> But why?  Isn't movcf2gr and movgr2cf one-cycle on LA664?

I think this is weird too. I'm still testing other situations, and I'll find
out the reason after the testing is completed.

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #8 from chenglulu  ---
(In reply to Chen Chen from comment #0)
> We tested Loongarch64 CPU Loongson 3A6000 with "LA664" architecture in Linux
> operating system AOSC OS 11.4.0 (default gcc version is 13.2.0). And we
> found the 548.exchange2_r benchmark from SPEC 2017 INTrate suite suffered
> significant regressions from 14% to 28% with various compiling options.
> 
> The rate-1 results are following:
> 
/* snip */
> 
> after snapshot 20240317 score 18-23.1% lower with parameters "-g -Ofast
> -march=la664":   
> 13.2.0:"-march=la664" flag is not supported
> 20240317:  11.5 (227s)
> 20240324:  8.84 (296s)
> 20240430:  9.43 (278s)
> 14.1.0:9.42 (278s)
> 
/* snip */
> 
> 
> after snapshot 20240317 score 26.3-26.6% lower with parameters "-g -Ofast
> -march=la464":   
> 13.2.0:8.76 (299s)
> 20240317:  12.8 (205s)
> 20240324:  9.39 (279s)
> 20240430:  9.43 (278s)
> 14.1.0:9.43 (278s)
> 
> 

> 20240317:  11.5 (227s) -march=la664
> 20240317:  12.8 (205s) -march=la464
I looked for the reason for the gap between the above two results. The
performance regression is caused by r14-6814. If the following modifications
are made, the scores of -march=la664 and -march464 will be the same.

diff --git a/gcc/config/loongarch/loongarch-def.cc
b/gcc/config/loongarch/loongarch-def.cc
index e8c129ce643..f27284cb20a 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -111,11 +111,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
  tune targets (i.e. -mtune=native while PRID does not correspond to
  any known "-mtune" type).  */
 array_tune loongarch_cpu_rtx_cost_data =
-  array_tune ()
-.set (CPU_LA664,
- loongarch_rtx_cost_data ()
-   .movcf2gr_ (COSTS_N_INSNS (1))
-   .movgr2cf_ (COSTS_N_INSNS (1)));
+  array_tune ();

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #5 from chenglulu  ---
I will verify it on multiple machines to see if the problem can be reproduced.

[Bug target/114848] loongarch: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-04-27 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114848

--- Comment #5 from chenglulu  ---
(In reply to Xi Ruoyao from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > (In reply to Xi Ruoyao from comment #1)
> > > Hmm, AFAIK this should be already fixed with r14-6440?
> > > 
> > > I cannot reproduce it with r14-9823 but maybe it has regressed again in 
> > > the
> > > recent weeks.
> > 
> > Oh I only tested gcc 13.2.0. If it is fixed you can close it.
> 
> Hmm it looks like we need a backport to releases/gcc-13 (and 12?)

I have backpointed r14-6440 to gcc-13 and gcc-12 and am testing

> 
> I thought the bug was introduced by my shrink-wrap change (r14-545) so I
> didn't proposed a backport.  But it seems I was wrong and the bug exists
> even before r14-545.

[Bug libfortran/114304] [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

2024-04-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114304

chenglulu  changed:

   What|Removed |Added

 CC||chenglulu at loongson dot cn

--- Comment #23 from chenglulu  ---
(In reply to GCC Commits from comment #22)
> The master branch has been updated by Jerry DeLisle :
> 
> https://gcc.gnu.org/g:93adf88cc6744aa2c732b765e1e3b96e66cb3300
> 
> commit r14-9822-g93adf88cc6744aa2c732b765e1e3b96e66cb3300
> Author: Jerry DeLisle 
> Date:   Fri Apr 5 19:25:13 2024 -0700
> 
> libfortran: Fix handling of formatted separators.
> 
> PR libfortran/114304
> PR libfortran/105473
> 
> libgfortran/ChangeLog:
> 
> * io/list_read.c (eat_separator): Add logic to handle spaces
> preceding a comma or semicolon such that that a 'null' read
> occurs without error at the end of comma or semicolon
> terminated input lines. Add check and error message for ';'.
> (list_formatted_read_scalar): Treat comma as a decimal point
> when specified by the decimal mode on the first item.
> 
> gcc/testsuite/ChangeLog:
> 
> * gfortran.dg/pr105473.f90: Modify to verify new error message.
> * gfortran.dg/pr114304.f90: New test.

Hi,
This patch causes spec2017 527 and 627 tests to fail.

Re: [pushed][PATCH] LoongArch: Fix missing plugin header

2024-04-02 Thread chenglulu

Pushed to r14-9743.

在 2024/4/2 上午9:20, Yang Yujie 写道:

gcc/ChangeLog:

* config/loongarch/t-loongarch: Add loongarch-def-arrays.h
to OPTION_H_EXTRA.
---
  gcc/config/loongarch/t-loongarch | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/t-loongarch b/gcc/config/loongarch/t-loongarch
index 3dd7c4b031e..acf5da95310 100644
--- a/gcc/config/loongarch/t-loongarch
+++ b/gcc/config/loongarch/t-loongarch
@@ -18,8 +18,9 @@
  
  
  GTM_H += loongarch-multilib.h

-OPTIONS_H_EXTRA += $(srcdir)/config/loongarch/loongarch-def.h  \
-  $(srcdir)/config/loongarch/loongarch-tune.h  \
+OPTIONS_H_EXTRA += $(srcdir)/config/loongarch/loongarch-def.h  \
+  $(srcdir)/config/loongarch/loongarch-def-array.h \
+  $(srcdir)/config/loongarch/loongarch-tune.h  \
   $(srcdir)/config/loongarch/loongarch-cpucfg-map.h
  
  # Canonical target triplet from config.gcc




Re: [pushed][PATCH v5] LoongArch: Add support for TLS descriptors

2024-04-02 Thread chenglulu

Pushed to r14-9742.

Rebase to the latest, and modify invoke.texi to add a description of the 
TLS DESC compilation option.


在 2024/3/19 上午9:54, mengqinggang 写道:

Add support for TLS descriptors on normal code model and extreme code model.

Normal code model instruction sequence:
   -mno-explicit-relocs:
 la.tls.desc$r4, s
 add.d  $r12, $r4, $r2
   -mexplicit-relocs:
 pcalau12i  $r4,%desc_pc_hi20(s)
 addi.d $r4,$r4,%desc_pc_lo12(s)
 ld.d   $r1,$r4,%desc_ld(s)
 jirl   $r1,$r1,%desc_call(s)
 add.d  $r12, $r4, $r2

Extreme code model instruction sequence:
   -mno-explicit-relocs:
 la.tls.desc$r4, $r12, s
 add.d  $r12, $r4, $r2
   -mexplicit-relocs:
 pcalau12i  $r4,%desc_pc_hi20(s)
 addi.d $r12,$r0,%desc_pc_lo12(s)
 lu32i.d$r12,%desc64_pc_lo20(s)
 lu52i.d$r12,$r12,%desc64_pc_hi12(s)
 add.d  $r4,$r4,$r12
 ld.d   $r1,$r4,%desc_ld(s)
 jirl   $r1,$r1,%desc_call(s)
 add.d  $r12, $r4, $r2

The default is still traditional TLS model, but can be configured with
--with-tls={trad,desc}. The default can change to TLS descriptors once
libc and LLVM support this.

gcc/ChangeLog:

* config.gcc: Add --with-tls option to change TLS flavor.
* config/loongarch/genopts/loongarch.opt.in: Add -mtls-dialect to
configure TLS flavor.
* config/loongarch/loongarch-def.h (struct loongarch_target): Add
tls_dialect.
* config/loongarch/loongarch-driver.cc (la_driver_init): Add tls
flavor.
* config/loongarch/loongarch-opts.cc (loongarch_init_target): Add
tls_dialect.
(loongarch_config_target): Ditto.
(loongarch_update_gcc_opt_status): Ditto.
* config/loongarch/loongarch-opts.h (loongarch_init_target):Ditto.
(TARGET_TLS_DESC): New define.
* config/loongarch/loongarch.cc (loongarch_symbol_insns): Add TLS DESC
instructions sequence length.
(loongarch_legitimize_tls_address): New TLS DESC instruction sequence.
(loongarch_option_override_internal): Add la_opt_tls_dialect.
(loongarch_option_restore): Add la_target.tls_dialect.
* config/loongarch/loongarch.md (@got_load_tls_desc): Normal
code model for TLS DESC.
(got_load_tls_desc_off64): Extreme code model for TLS DESC.
* config/loongarch/loongarch.opt: Regenerated.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/cmodel-extreme-1.c: Add -mtls-dialect=trad.
* gcc.target/loongarch/cmodel-extreme-2.c: Ditto.
* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Ditto.
* gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c:
Ditto.
* gcc.target/loongarch/func-call-medium-1.c: Ditto.
* gcc.target/loongarch/func-call-medium-2.c: Ditto.
* gcc.target/loongarch/func-call-medium-3.c: Ditto.
* gcc.target/loongarch/func-call-medium-4.c: Ditto.
* gcc.target/loongarch/tls-extreme-macro.c: Ditto.
* gcc.target/loongarch/tls-gd-noplt.c: Ditto.
* gcc.target/loongarch/explicit-relocs-auto-extreme-tls-desc.c: New 
test.
* gcc.target/loongarch/explicit-relocs-auto-tls-desc.c: New test.
* gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c: New test.
* gcc.target/loongarch/explicit-relocs-tls-desc.c: New test.

Co-authored-by: Lulu Cheng 
Co-authored-by: Xi Ruoyao 
---
Changes v4 -> v5:
- Use (reg:P 4) instead of match_operand in got_load_tls_desc and
   got_load_tls_desc_off64.
- Change instruction sequence to prevent additional white spaces in the output 
asm
   before tabs.

Changes v3 -> v4:
- Add TLS descriptors test cases.

Changes v2 -> v3:
- Set default to traditional TLS model.
- Add support for -mexplicit-relocs and extreme code model.

Changes v1 -> v2:
- Clobber fcc0-fcc7 registers in got_load_tls_desc template.
- Support --with-tls in configure.

v4 link: https://sourceware.org/pipermail/gcc-patches/2024-March/647597.html
v3 link: https://sourceware.org/pipermail/gcc-patches/2024-March/647578.html
v2 link: https://sourceware.org/pipermail/gcc-patches/2024-February/646817.html
v1 link: https://sourceware.org/pipermail/gcc-patches/2023-December/638907.html

  gcc/config.gcc| 19 +-
  gcc/config/loongarch/genopts/loongarch.opt.in | 14 
  gcc/config/loongarch/loongarch-def.h  |  7 ++
  gcc/config/loongarch/loongarch-driver.cc  |  2 +-
  gcc/config/loongarch/loongarch-opts.cc| 12 +++-
  gcc/config/loongarch/loongarch-opts.h |  2 +
  gcc/config/loongarch/loongarch.cc | 47 +
  gcc/config/loongarch/loongarch.md | 68 +++
  gcc/config/loongarch/loongarch.opt| 14 
  .../gcc.target/loongarch/cmodel-extreme-1.c   |  2 +-
  .../gcc.target/loongarch/cmodel-extreme-2.c   |  2 +-
  .../explicit-relocs-auto-extreme-tls-desc.c   | 10 +++
  

Re:[pushed] [PATCH] Regenerate loongarch.opt.urls.

2024-04-01 Thread chenglulu

Pushed to r14-9741.

在 2024/4/1 上午11:08, Lulu Cheng 写道:

Fixes: d28ea8e5a704 ("LoongArch: Split loongarch_option_override_internal
  into smaller procedures")

gcc/ChangeLog:

* config/loongarch/loongarch.opt.urls: Regenerate.
---
  gcc/config/loongarch/loongarch.opt.urls | 19 +--
  1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.opt.urls 
b/gcc/config/loongarch/loongarch.opt.urls
index c78bbfea2da..8e16304b66a 100644
--- a/gcc/config/loongarch/loongarch.opt.urls
+++ b/gcc/config/loongarch/loongarch.opt.urls
@@ -51,10 +51,10 @@ 
UrlSuffix(gcc/LoongArch-Options.html#index-mexplicit-relocs-1)
  mexplicit-relocs
  UrlSuffix(gcc/LoongArch-Options.html#index-mexplicit-relocs-1)
  
-mrecip

+mrecip=
  UrlSuffix(gcc/LoongArch-Options.html#index-mrecip)
  
-mrecip=

+mrecip
  UrlSuffix(gcc/LoongArch-Options.html#index-mrecip)
  
  ; skipping UrlSuffix for 'mcmodel=' due to finding no URLs

@@ -64,3 +64,18 @@ 
UrlSuffix(gcc/LoongArch-Options.html#index-mdirect-extern-access)
  
  ; skipping UrlSuffix for 'mrelax' due to finding no URLs
  
+mfrecipe

+UrlSuffix(gcc/LoongArch-Options.html#index-mfrecipe)
+
+mdiv32
+UrlSuffix(gcc/LoongArch-Options.html#index-mdiv32)
+
+mlam-bh
+UrlSuffix(gcc/LoongArch-Options.html#index-mlam-bh)
+
+mlamcas
+UrlSuffix(gcc/LoongArch-Options.html#index-mlamcas)
+
+mld-seq-sa
+UrlSuffix(gcc/LoongArch-Options.html#index-mld-seq-sa)
+




Re: [PATCH] Regenerate loongarch.opt.urls.

2024-04-01 Thread chenglulu



在 2024/4/1 下午7:24, Mark Wielaard 写道:

Hi,

On Mon, Apr 01, 2024 at 11:08:08AM +0800, Lulu Cheng wrote:

Fixes: d28ea8e5a704 ("LoongArch: Split loongarch_option_override_internal
  into smaller procedures")

gcc/ChangeLog:

* config/loongarch/loongarch.opt.urls: Regenerate.

This looks OK to me. I cannot officially approve patches, but I think
this falls under the Obvious fixes rule.


Roger that. Thanks for your review.:-)

Lulu



Cheers,

Mark




Re: [PATCH v5] LoongArch: Add support for TLS descriptors

2024-04-01 Thread chenglulu



在 2024/4/1 下午9:51, Xi Ruoyao 写道:

Is this patch targeting GCC 14 or 15?  If 14 I guess we'd commit now...

Generally we don't add features in stage 4, but if we keep trad as the
default I think it'd be OK.  And RISC-V guys plan to push their TLS desc
implementation this week too.


I've rebase the code to the latest commit, and if the test is okay I'll 
bring it up right away.


Thanks!:-)



On Tue, 2024-03-19 at 09:54 +0800, mengqinggang wrote:

Add support for TLS descriptors on normal code model and extreme code model.

Normal code model instruction sequence:
   -mno-explicit-relocs:
     la.tls.desc$r4, s
     add.d  $r12, $r4, $r2
   -mexplicit-relocs:
     pcalau12i  $r4,%desc_pc_hi20(s)
     addi.d $r4,$r4,%desc_pc_lo12(s)
     ld.d   $r1,$r4,%desc_ld(s)
     jirl   $r1,$r1,%desc_call(s)
     add.d  $r12, $r4, $r2

Extreme code model instruction sequence:
   -mno-explicit-relocs:
     la.tls.desc$r4, $r12, s
     add.d  $r12, $r4, $r2
   -mexplicit-relocs:
     pcalau12i  $r4,%desc_pc_hi20(s)
     addi.d $r12,$r0,%desc_pc_lo12(s)
     lu32i.d$r12,%desc64_pc_lo20(s)
     lu52i.d$r12,$r12,%desc64_pc_hi12(s)
     add.d  $r4,$r4,$r12
     ld.d   $r1,$r4,%desc_ld(s)
     jirl   $r1,$r1,%desc_call(s)
     add.d  $r12, $r4, $r2

The default is still traditional TLS model, but can be configured with
--with-tls={trad,desc}. The default can change to TLS descriptors once
libc and LLVM support this.

gcc/ChangeLog:

* config.gcc: Add --with-tls option to change TLS flavor.
* config/loongarch/genopts/loongarch.opt.in: Add -mtls-dialect to
configure TLS flavor.
* config/loongarch/loongarch-def.h (struct loongarch_target): Add
tls_dialect.
* config/loongarch/loongarch-driver.cc (la_driver_init): Add tls
flavor.
* config/loongarch/loongarch-opts.cc (loongarch_init_target): Add
tls_dialect.
(loongarch_config_target): Ditto.
(loongarch_update_gcc_opt_status): Ditto.
* config/loongarch/loongarch-opts.h (loongarch_init_target):Ditto.
(TARGET_TLS_DESC): New define.
* config/loongarch/loongarch.cc (loongarch_symbol_insns): Add TLS DESC
instructions sequence length.
(loongarch_legitimize_tls_address): New TLS DESC instruction sequence.
(loongarch_option_override_internal): Add la_opt_tls_dialect.
(loongarch_option_restore): Add la_target.tls_dialect.
* config/loongarch/loongarch.md (@got_load_tls_desc): Normal
code model for TLS DESC.
(got_load_tls_desc_off64): Extreme code model for TLS DESC.
* config/loongarch/loongarch.opt: Regenerated.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/cmodel-extreme-1.c: Add -mtls-dialect=trad.
* gcc.target/loongarch/cmodel-extreme-2.c: Ditto.
* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Ditto.
* gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c:
Ditto.
* gcc.target/loongarch/func-call-medium-1.c: Ditto.
* gcc.target/loongarch/func-call-medium-2.c: Ditto.
* gcc.target/loongarch/func-call-medium-3.c: Ditto.
* gcc.target/loongarch/func-call-medium-4.c: Ditto.
* gcc.target/loongarch/tls-extreme-macro.c: Ditto.
* gcc.target/loongarch/tls-gd-noplt.c: Ditto.
* gcc.target/loongarch/explicit-relocs-auto-extreme-tls-desc.c: New 
test.
* gcc.target/loongarch/explicit-relocs-auto-tls-desc.c: New test.
* gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c: New test.
* gcc.target/loongarch/explicit-relocs-tls-desc.c: New test.

Co-authored-by: Lulu Cheng 
Co-authored-by: Xi Ruoyao 




[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-04-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #21 from chenglulu  ---
(In reply to Xi Ruoyao from comment #20)
> (In reply to chenglulu from comment #19)
> > (In reply to Xi Ruoyao from comment #18)
> > > (In reply to chenglulu from comment #17)
> > > 
> > > > The results of spec2006 on LA464 are:
> > > > -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16
> > > 
> > > Would you send a patch for them or prefer I to do it?
> > 
> > I'll send a patch tomorrow.
> 
> Ping.
> 
> I'd like to do another system rebuild after this patch lands for verifying
> GCC 14.

Oh sorry, I'm waiting for yujie's patch, just merged today. I'll send this
align patch tomorrow.

Re:[pushed] [PATCH] LoongArch: gcc12: Implement option save/restore.

2024-03-31 Thread chenglulu

Pushed to r12-10303.

在 2024/3/17 上午10:02, Lulu Cheng 写道:

LTO option streaming and target attributes both require per-function
target configuration, which is achieved via option save/restore.

We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target
context in addition to other automatically maintained option states
(via the "Save" option property in the .opt files).

PR target/113233

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Mark options with
the "Save" property.
* config/loongarch/loongarch-opts.cc
(loongarch_update_gcc_opt_status): Update the value of the
la_target to global_options.
* config/loongarch/loongarch-opts.h
(loongarch_update_gcc_opt_status): Add a function declaration.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Call the function
loongarch_update_gcc_opt_status.
(loongarch_option_save): New functions.
(loongarch_option_restore): Likewise.
(TARGET_OPTION_SAVE): Define macro.
(TARGET_OPTION_RESTORE): Likewise.
* config/loongarch/loongarch.opt: Regenerate.
---
  gcc/config/loongarch/genopts/loongarch.opt.in | 22 ++--
  gcc/config/loongarch/loongarch-opts.cc| 22 
  gcc/config/loongarch/loongarch-opts.h |  6 
  gcc/config/loongarch/loongarch.cc | 34 +--
  gcc/config/loongarch/loongarch.opt| 22 ++--
  5 files changed, 82 insertions(+), 24 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 420a3941b3b..a3107cb2294 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -58,7 +58,7 @@ EnumValue
  Enum(isa_ext_fpu) String(@@STR_ISA_EXT_FPU64@@) Value(ISA_EXT_FPU64)
  
  m@@OPTSTR_ISA_EXT_FPU@@=

-Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) 
Init(M_OPTION_NOT_SEEN) Save
  -m@@OPTSTR_ISA_EXT_FPU@@=FPU  Generate code for the given FPU.
  
  m@@OPTSTR_ISA_EXT_FPU@@=@@STR_ISA_EXT_FPU0@@

@@ -92,11 +92,11 @@ EnumValue
  Enum(cpu_type) String(@@STR_CPU_LA464@@) Value(CPU_LA464)
  
  m@@OPTSTR_ARCH@@=

-Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) 
Init(M_OPTION_NOT_SEEN) Save
  -m@@OPTSTR_ARCH@@=PROCESSOR   Generate code for the given PROCESSOR ISA.
  
  m@@OPTSTR_TUNE@@=

-Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) 
Init(M_OPTION_NOT_SEEN) Save
  -m@@OPTSTR_TUNE@@=PROCESSOR   Generate optimized code for PROCESSOR.
  
  
@@ -127,31 +127,31 @@ int la_opt_abi_ext = M_OPTION_NOT_SEEN
  
  
  mbranch-cost=

-Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
+Target RejectNegative Joined UInteger Var(loongarch_branch_cost) Save
  -mbranch-cost=COSTSet the cost of branches to roughly COST instructions.
  
  mcheck-zero-division

-Target Mask(CHECK_ZERO_DIV)
+Target Mask(CHECK_ZERO_DIV) Save
  Trap on integer divide by zero.
  
  mcond-move-int

-Target Var(TARGET_COND_MOVE_INT) Init(1)
+Target Var(TARGET_COND_MOVE_INT) Init(1) Save
  Conditional moves for integral are enabled.
  
  mcond-move-float

-Target Var(TARGET_COND_MOVE_FLOAT) Init(1)
+Target Var(TARGET_COND_MOVE_FLOAT) Init(1) Save
  Conditional moves for float are enabled.
  
  mmemcpy

-Target Mask(MEMCPY)
+Target Mask(MEMCPY) Save
  Prevent optimizing block moves, which is also the default behavior of -Os.
  
  mstrict-align

-Target Var(TARGET_STRICT_ALIGN) Init(0)
+Target Var(TARGET_STRICT_ALIGN) Init(0) Save
  Do not generate unaligned memory accesses.
  
  mmax-inline-memcpy-size=

-Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024)
+Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024) Save
  -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default 
is 1024.
  
  ; The code model option names for -mcmodel.

@@ -175,7 +175,7 @@ EnumValue
  Enum(cmodel) String(@@STR_CMODEL_EXTREME@@) Value(CMODEL_EXTREME)
  
  mcmodel=

-Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL)
+Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL) Save
  Specify the code model.
  
  mrelax

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index eb9c2a52f9e..b55baeccd2f 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -575,3 +575,25 @@ multilib_enabled_abi_list ()
  
return XOBFINISH (_obstack, const char *);

  }
+
+/* option status feedback for "gcc --help=target 

Re: [pushed][PATCH] LoongArch: gcc13: Implement option save/restore.

2024-03-31 Thread chenglulu

Pushed to r13-8545.

在 2024/3/17 上午10:02, Lulu Cheng 写道:

LTO option streaming and target attributes both require per-function
target configuration, which is achieved via option save/restore.

We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target
context in addition to other automatically maintained option states
(via the "Save" option property in the .opt files).

PR target/113233

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Mark options with
the "Save" property.
* config/loongarch/loongarch-opts.cc
(loongarch_update_gcc_opt_status): Update the value of the
la_target to global_options.
* config/loongarch/loongarch-opts.h
(loongarch_update_gcc_opt_status): Add a function declaration.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Call the function
loongarch_update_gcc_opt_status.
(loongarch_option_save): New functions.
(loongarch_option_restore): Likewise.
(TARGET_OPTION_SAVE): Define macro.
(TARGET_OPTION_RESTORE): Likewise.
* config/loongarch/loongarch.opt: Regenerate.
---
  gcc/config/loongarch/genopts/loongarch.opt.in | 24 ++---
  gcc/config/loongarch/loongarch-opts.cc| 22 
  gcc/config/loongarch/loongarch-opts.h |  6 
  gcc/config/loongarch/loongarch.cc | 34 +--
  gcc/config/loongarch/loongarch.opt| 24 ++---
  5 files changed, 84 insertions(+), 26 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 76acd35d39c..aea4f2a4f61 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -58,7 +58,7 @@ EnumValue
  Enum(isa_ext_fpu) String(@@STR_ISA_EXT_FPU64@@) Value(ISA_EXT_FPU64)
  
  m@@OPTSTR_ISA_EXT_FPU@@=

-Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) 
Init(M_OPTION_NOT_SEEN) Save
  -m@@OPTSTR_ISA_EXT_FPU@@=FPU  Generate code for the given FPU.
  
  m@@OPTSTR_ISA_EXT_FPU@@=@@STR_ISA_EXT_FPU0@@

@@ -92,11 +92,11 @@ EnumValue
  Enum(cpu_type) String(@@STR_CPU_LA464@@) Value(CPU_LA464)
  
  m@@OPTSTR_ARCH@@=

-Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) 
Init(M_OPTION_NOT_SEEN) Save
  -m@@OPTSTR_ARCH@@=PROCESSOR   Generate code for the given PROCESSOR ISA.
  
  m@@OPTSTR_TUNE@@=

-Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) 
Init(M_OPTION_NOT_SEEN) Save
  -m@@OPTSTR_TUNE@@=PROCESSOR   Generate optimized code for PROCESSOR.
  
  
@@ -127,31 +127,31 @@ int la_opt_abi_ext = M_OPTION_NOT_SEEN
  
  
  mbranch-cost=

-Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
+Target RejectNegative Joined UInteger Var(loongarch_branch_cost) Save
  -mbranch-cost=COSTSet the cost of branches to roughly COST instructions.
  
  mcheck-zero-division

-Target Mask(CHECK_ZERO_DIV)
+Target Mask(CHECK_ZERO_DIV) Save
  Trap on integer divide by zero.
  
  mcond-move-int

-Target Var(TARGET_COND_MOVE_INT) Init(1)
+Target Var(TARGET_COND_MOVE_INT) Init(1) Save
  Conditional moves for integral are enabled.
  
  mcond-move-float

-Target Var(TARGET_COND_MOVE_FLOAT) Init(1)
+Target Var(TARGET_COND_MOVE_FLOAT) Init(1) Save
  Conditional moves for float are enabled.
  
  mmemcpy

-Target Mask(MEMCPY)
+Target Mask(MEMCPY) Save
  Prevent optimizing block moves, which is also the default behavior of -Os.
  
  mstrict-align

-Target Var(TARGET_STRICT_ALIGN) Init(0)
+Target Var(TARGET_STRICT_ALIGN) Init(0) Save
  Do not generate unaligned memory accesses.
  
  mmax-inline-memcpy-size=

-Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024)
+Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024) Save
  -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default 
is 1024.
  
  mexplicit-relocs

@@ -182,11 +182,11 @@ EnumValue
  Enum(cmodel) String(@@STR_CMODEL_EXTREME@@) Value(CMODEL_EXTREME)
  
  mcmodel=

-Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL)
+Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL) Save
  Specify the code model.
  
  mdirect-extern-access

-Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0)
+Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0) Save
  Avoid using the GOT to access external symbols.
  
  mrelax

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index a52e25236ea..e158de9a12f 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc

Re: [PATCH] LoongArch: Increase division costs

2024-03-31 Thread chenglulu



在 2024/4/1 上午9:29, Xi Ruoyao 写道:

On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote:


I tested spec2006. In the floating-point program, the test items with large

fluctuations are removed, and the rest is basically unchanged.

The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and (10,22).

So IIUC (10,10) is better than (5,5), (10,22), and the originally
proposed (14,22)?  Then should I make a change to make all 4 costs (SF,
DF, SI, DI) 10?


I think this may require the analysis of the spec's test case. I took a 
look at the test results again,


where the scores of SPEC INT 462.libquantum fluctuated greatly, but the 
combination of (10,22)


showed an overall upward trend compared to the scores of the other two 
combinations.


I don't know if (10,22) this combination happens to have the kind of 
test cases in the changelog.


So can we change it together in GCC15?



I'd still want DI % 17 to be reduced as reciprocal sequence (but
not SI % 17) since DI % (smaller const) is quite important for
some workloads like competitive programming.  However "adapting with
different modulos" is not possible w/o refactoring generic code so it
must be deferred to at least GCC 15.





Re: [pushed][PATCH v4] LoongArch: Split loongarch_option_override_internal into smaller procedures

2024-03-31 Thread chenglulu

Pushed to r14-9737.

在 2024/3/30 下午4:43, Yang Yujie 写道:

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Mark -m[no-]recip as
aliases to -mrecip={all,none}, respectively.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-def.h (ABI_FPU_64): Rename to...
(ABI_FPU64_P): ...this.
(ABI_FPU_32): Rename to...
(ABI_FPU32_P): ...this.
(ABI_FPU_NONE): Rename to...
(ABI_NOFPU_P): ...this.
(ABI_LP64_P): Define.
* config/loongarch/loongarch.cc (loongarch_init_print_operand_punct):
Merged into loongarch_global_init.
(loongarch_cpu_option_override): Renamed to
loongarch_target_option_override.
(loongarch_option_override_internal): Move the work after
loongarch_config_target into loongarch_target_option_override.
(loongarch_global_init): Define.
(INIT_TARGET_FLAG): Move to loongarch-opts.cc.
(loongarch_option_override): Call loongarch_global_init
separately.
* config/loongarch/loongarch-opts.cc (loongarch_parse_mrecip_scheme):
Split the parsing of -mrecip= from
loongarch_option_override_internal.
(loongarch_generate_mrecip_scheme): Define. Split from
loongarch_option_override_internal.
(loongarch_target_option_override): Define. Renamed from
loongarch_cpu_option_override.
(loongarch_init_misc_options): Define. Split from
loongarch_option_override_internal.
(INIT_TARGET_FLAG): Move from loongarch.cc.
* config/loongarch/loongarch-opts.h (loongarch_target_option_override):
New prototype.
(loongarch_parse_mrecip_scheme): New prototype.
(loongarch_init_misc_options): New prototype.
(TARGET_ABI_LP64): Simplify with ABI_LP64_P.
* config/loongarch/loongarch.h (TARGET_RECIP_DIV): Simplify.
Do not reference specific CPU architecture (LA664).
(TARGET_RECIP_SQRT): Same.
(TARGET_RECIP_RSQRT): Same.
(TARGET_RECIP_VEC_DIV): Same.
(TARGET_RECIP_VEC_SQRT): Same.
(TARGET_RECIP_VEC_RSQRT): Same.
---
  gcc/config/loongarch/genopts/loongarch.opt.in |   8 +-
  gcc/config/loongarch/loongarch-def.h  |  11 +-
  gcc/config/loongarch/loongarch-opts.cc| 253 ++
  gcc/config/loongarch/loongarch-opts.h |  27 +-
  gcc/config/loongarch/loongarch.cc | 253 +++---
  gcc/config/loongarch/loongarch.h  |  18 +-
  gcc/config/loongarch/loongarch.opt|   8 +-
  7 files changed, 342 insertions(+), 236 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 02f918053f5..a77893d31d9 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -197,14 +197,14 @@ mexplicit-relocs
  Target Alias(mexplicit-relocs=, always, none)
  Use %reloc() assembly operators (for backward compatibility).
  
-mrecip

-Target RejectNegative Var(la_recip) Save
-Generate approximate reciprocal divide and square root for better throughput.
-
  mrecip=
  Target RejectNegative Joined Var(la_recip_name) Save
  Control generation of reciprocal estimates.
  
+mrecip

+Target Alias(mrecip=, all, none)
+Generate approximate reciprocal divide and square root for better throughput.
+
  ; The code model option names for -mcmodel.
  Enum
  Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 2dbf006d013..0cbf9476690 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -90,11 +90,16 @@ extern loongarch_def_array
  
  #define TO_LP64_ABI_BASE(C) (C)
  
-#define ABI_FPU_64(abi_base) \

+#define ABI_LP64_P(abi_base) \
+  (abi_base == ABI_BASE_LP64D \
+   || abi_base == ABI_BASE_LP64F \
+   || abi_base == ABI_BASE_LP64S)
+
+#define ABI_FPU64_P(abi_base) \
(abi_base == ABI_BASE_LP64D)
-#define ABI_FPU_32(abi_base) \
+#define ABI_FPU32_P(abi_base) \
(abi_base == ABI_BASE_LP64F)
-#define ABI_FPU_NONE(abi_base) \
+#define ABI_NOFPU_P(abi_base) \
(abi_base == ABI_BASE_LP64S)
  
  
diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc

index 627f9148adf..e600f08f03b 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "coretypes.h"
  #include "tm.h"
  #include "obstack.h"
+#include "opts.h"
  #include "diagnostic-core.h"
  
  #include "loongarch-cpu.h"

@@ -32,8 +33,12 @@ along with GCC; see the file COPYING3.  If not see
  #include "loongarch-str.h"
  #include "loongarch-def.h"
  
+/* Target configuration */

  struct loongarch_target la_target;
  
+/* RTL cost information */

+const struct loongarch_rtx_cost_data *loongarch_cost;
+
  /* ABI-related 

Re:[pushed] [PATCH] LoongArch: Add descriptions of the compilation options.

2024-03-31 Thread chenglulu

Pushed to r14-9736.

在 2024/3/30 下午3:58, Lulu Cheng 写道:

Add descriptions for the compilation options '-mfrecipe' '-mdiv32'
'-mlam-bh' '-mlamcas' and '-mld-seq-sa'.

gcc/ChangeLog:

* doc/invoke.texi: Add descriptions for the compilation
options.
---
  gcc/doc/invoke.texi | 45 +++--
  1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c584664e168..942103c23f5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1058,8 +1058,9 @@ Objective-C and Objective-C++ Dialects}.
  -mmax-inline-memcpy-size=@var{n}
  -mexplicit-relocs=@var{style} -mexplicit-relocs -mno-explicit-relocs
  -mdirect-extern-access -mno-direct-extern-access
--mcmodel=@var{code-model} -mrelax -mpass-mrelax-to-as}
--mrecip  -mrecip=@var{opt}
+-mcmodel=@var{code-model} -mrelax -mpass-mrelax-to-as
+-mrecip  -mrecip=@var{opt} -mfrecipe -mno-frecipe -mdiv32 -mno-div32
+-mlam-bh -mno-lam-bh -mlamcas -mno-lamcas -mld-seq-sa -mno-ld-seq-sa}
  
  @emph{M32R/D Options}

  @gccoptlist{-m32r2  -m32rx  -m32r
@@ -27095,6 +27096,46 @@ Enable the approximation for vectorized reciprocal 
square root.
  So, for example, @option{-mrecip=all,!sqrt} enables
  all of the reciprocal approximations, except for scalar square root.
  
+@opindex mfrecipe

+@opindex mno-frecipe
+@item -mfrecipe
+@itemx -mno-frecipe
+Use (do not use) @code{frecipe.@{s/d@}} and @code{frsqrte.@{s/d@}}
+instructions.  When build with @option{-march=la664}, it is enabled by default.
+The default is @option{-mno-frecipe}.
+
+@opindex mdiv32
+@opindex mno-div32
+@item -mdiv32
+@itemx -mno-div32
+Use (do not use) @code{div.w[u]} and @code{mod.w[u]} instructions with input
+not sign-extended.  When build with @option{-march=la664}, it is enabled by
+default.  The default is @option{-mno-div32}.
+
+@opindex mlam-bh
+@opindex mno-lam-bh
+@item -mlam-bh
+@itemx -mno-lam-bh
+Use (do not use) @code{am@{swap/add@}[_db].@{b/h@}} instructions.  When build
+with @option{-march=la664}, it is enabled by default.  The default is
+@option{-mno-lam-bh}.
+
+@opindex mlamcas
+@opindex mno-lamcas
+@item -mlamcas
+@itemx -mno-lamcas
+Use (do not use) @code{amcas[_db].@{b/h/w/d@}} instructions.  When build with
+@option{-march=la664}, it is enabled by default.  The default is
+@option{-mno-lamcas}.
+
+@opindex mld-seq-sa
+@opindex mno-ld-seq-sa
+@item -mld-seq-sa
+@itemx -mno-ld-seq-sa
+Whether a load-load barrier (@code{dbar 0x700}) is needed.  When build with
+@option{-march=la664}, it is enabled by default.  The default is
+@option{-mno-ld-seq-sa}, the load-load barrier is needed.
+
  @item loongarch-vect-unroll-limit
  The vectorizer will use available tuning information to determine whether it
  would be beneficial to unroll the main vectorized loop and by how much.  This




Re: [PATCH] LoongArch: Increase division costs

2024-03-28 Thread chenglulu



在 2024/3/27 下午8:42, Xi Ruoyao 写道:

On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:

On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:

在 2024/3/26 下午5:48, Xi Ruoyao 写道:

The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

  int
  main (void)
  {
    unsigned long stat = 0xdeadbeef;
    for (int i = 0; i < 1; i++)
  stat = (stat * stat + stat * 114514 + 1919810) % 17;
    asm(""::"r"(stat));
  }

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html

The test case div-const-reduction.c is modified to assemble the instruction
sequence as follows:
lu12i.w $r12,97440>>12# 0x3b9ac000
ori $r12,$r12,2567
mod.w   $r13,$r13,$r12

This sequence of instructions takes 5 clock cycles.

It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.


Hmm indeed, it seems a waste to do this reduction for int / 17.
I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?


I tested spec2006. In the floating-point program, the test items with large

fluctuations are removed, and the rest is basically unchanged.

The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and (10,22).



Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread chenglulu



在 2024/3/27 下午8:42, Xi Ruoyao 写道:

On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:

On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:

在 2024/3/26 下午5:48, Xi Ruoyao 写道:

The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

  int
  main (void)
  {
    unsigned long stat = 0xdeadbeef;
    for (int i = 0; i < 1; i++)
  stat = (stat * stat + stat * 114514 + 1919810) % 17;
    asm(""::"r"(stat));
  }

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html

The test case div-const-reduction.c is modified to assemble the instruction
sequence as follows:
lu12i.w $r12,97440>>12# 0x3b9ac000
ori $r12,$r12,2567
mod.w   $r13,$r13,$r12

This sequence of instructions takes 5 clock cycles.

It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.


Hmm indeed, it seems a waste to do this reduction for int / 17.
I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?

No problem, I'll test both cases ((10,10) and (10,22)).



[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-27 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #19 from chenglulu  ---
(In reply to Xi Ruoyao from comment #18)
> (In reply to chenglulu from comment #17)
> 
> > The results of spec2006 on LA464 are:
> > -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16
> 
> Would you send a patch for them or prefer I to do it?

I'll send a patch tomorrow.

Re: [PATCH] LoongArch: Increase division costs

2024-03-26 Thread chenglulu



在 2024/3/26 下午5:48, Xi Ruoyao 写道:

The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

 int
 main (void)
 {
   unsigned long stat = 0xdeadbeef;
   for (int i = 0; i < 1; i++)
 stat = (stat * stat + stat * 114514 + 1919810) % 17;
   asm(""::"r"(stat));
 }

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html


The test case div-const-reduction.c is modified to assemble the instruction
sequence as follows:
lu12i.w $r12,97440>>12# 0x3b9ac000
ori $r12,$r12,2567
mod.w   $r13,$r13,$r12

This sequence of instructions takes 5 clock cycles.



The sequence of instructions after adding the patch is:
lu12i.w $r15,1152917504>>12   # 0x44b82000
ori $r15,$r15,3993
mulh.w  $r12,$r16,$r15
srai.w  $r14,$r16,31
lu12i.w $r13,97440>>12# 0x3b9ac000
ori $r13,$r13,2567
srai.w  $r12,$r12,28
sub.w   $r12,$r12,$r14
mul.w   $r12,$r12,$r13
sub.w   $r16,$r16,$r12
This sequence of instructions takes 11 clock cycles.

This test case is optimized and takes 6 more clock cycles than before 
optimization,
so I need to run the spec.

Thanks!


gcc/ChangeLog:

* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Increase
default division cost to the average of the best case and worst
case senarios observed.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/div-const-reduction.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch-def.cc| 8 
  gcc/testsuite/gcc.target/loongarch/div-const-reduction.c | 9 +
  2 files changed, 13 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index e8c129ce643..93e72a520d5 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -95,12 +95,12 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
: fp_add (COSTS_N_INSNS (5)),
  fp_mult_sf (COSTS_N_INSNS (5)),
  fp_mult_df (COSTS_N_INSNS (5)),
-fp_div_sf (COSTS_N_INSNS (8)),
-fp_div_df (COSTS_N_INSNS (8)),
+fp_div_sf (COSTS_N_INSNS (12)),
+fp_div_df (COSTS_N_INSNS (15)),
  int_mult_si (COSTS_N_INSNS (4)),
  int_mult_di (COSTS_N_INSNS (4)),
-int_div_si (COSTS_N_INSNS (5)),
-int_div_di (COSTS_N_INSNS (5)),
+int_div_si (COSTS_N_INSNS (14)),
+int_div_di (COSTS_N_INSNS (22)),
  movcf2gr (COSTS_N_INSNS (7)),
  movgr2cf (COSTS_N_INSNS (15)),
  branch_cost (6),
diff --git a/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c 
b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
new file mode 100644
index 000..0ee86410dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=la464" } */
+/* { dg-final { scan-assembler-not "div\.\[dw\]" } } */
+
+int
+test (int a)
+{
+  return a % 17;
+}




[Bug tree-optimization/114027] [11/12 Regression] miscompile at `-O3 -fno-vect-cost-model -msse4.2`

2024-03-26 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

chenglulu  changed:

   What|Removed |Added

 CC||chenglulu at loongson dot cn

--- Comment #17 from chenglulu  ---
(In reply to Richard Biener from comment #14)
> int __attribute__((noipa))
> foo (int *f, int n)
> {
>   int res = 0;
>   for (int i = 0; i < n; ++i)
> {
>   if (f[2*i])
> res = 2;
>   if (f[2*i+1])
> res = -2;

Sorry I have a problem, the array f has 16 elements, the value of n is 16, when
the value of i is greater than 7, isn't it out of bounds to access f[2*i] and
f[2*i+1]?

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-25 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #17 from chenglulu  ---
(In reply to Xi Ruoyao from comment #15)
> > Hi,Ruoyao:
> > 
> >  The results of spec2006 on 3A6000 were obtained, I removed the more 
> > volatile
> > test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
> > -falign-lables=4' this set of parameters got the highest score. This is the
> > same combination of parameters as the coremark tested by Xu Chenghua.
> > 
> > The test of the 3A5000 will also be completed around the 15th of this month,
> > so I want to change the code after the test results of the 3a5000 are out.
> > What do you think?
> 
> Ok to me.
> 
> I'm getting some different results on LA664:
> 
> 22031.284424 Compiler flags : -O2 -falign-labels=4 -falign-functions=8
> -falign-loops=8 -falign-jumps=32 -DPERFORMANCE_RUN=1 -lrt
> 
> vs the "best" one:
> 
> 22075.055188 Compiler flags : -O2 -falign-labels=4 -falign-functions=32
> -falign-loops=16 -falign-jumps=8 -DPERFORMANCE_RUN=1 -lrt
> 
> maybe such a 0.1% difference is some random fluctuation, or hardware or
> kernel configuration difference anyway.

The results of spec2006 on LA464 are:
-falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16

Re: [pushed][PATCH v2 0/3] LoongArch: Cleanup unused/redundant codes.

2024-03-19 Thread chenglulu

Pushed to r14-9562...r14-9564.

在 2024/3/15 上午9:30, Chenghui Pan 写道:

Changes from v1: Some correction about ChangeLog format.

There's some unused/redundant definitions inside LoongArch target support
codes, these patches make a simple cleanup. Regression test passed.

Chenghui Pan (3):
   LoongArch: Remove unused/useless definitions.
   LoongArch: Change loongarch_expand_vec_cmp()'s return type from bool
 to void.
   LoongArch: Combine UNITS_PER_FP_REG and UNITS_PER_FPREG macros.

  gcc/config/loongarch/lasx.md|  6 ++--
  gcc/config/loongarch/loongarch-protos.h |  7 +
  gcc/config/loongarch/loongarch.cc   | 39 -
  gcc/config/loongarch/loongarch.h|  7 ++---
  gcc/config/loongarch/lsx.md |  6 ++--
  5 files changed, 13 insertions(+), 52 deletions(-)





Re: [PATCH] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread chenglulu



在 2024/3/18 下午5:34, Xi Ruoyao 写道:

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.cc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..57de8ef7d20 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
   argument.  Advance a local copy of CUM past the last "real" named
   argument, to find out how many registers are left over.  */
local_cum = *get_cumulative_args (cum);

I think it's important to add annotation information here:
    /* where there is no hidden return argument passed, arg.type

 is always NULL.  */

Others LTGM.

Thanks!


-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
  loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
  
/* Found out how many registers we need to save.  */




Re:[pushed] [PATCH v2] LoongArch: Remove masking process for operand 3 of xvpermi.q.

2024-03-14 Thread chenglulu

Pushed to r14-9486.

在 2024/3/14 上午9:26, Chenghui Pan 写道:

The behavior of non-zero unused bits in xvpermi.q instruction's
third operand is undefined on LoongArch, according to our
discussion (https://github.com/llvm/llvm-project/pull/83540),
we think that keeping original insn operand as unmodified
state is better solution.

This patch partially reverts 7b158e036a95b1ab40793dd53bed7dbd770ffdaf.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvpermi_q_):
Remove masking of operand 3.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c:
Reposition operand 3's value into instruction's defined accept range.
---
  gcc/config/loongarch/lasx.md| 5 -
  .../gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c   | 6 +++---
  2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index ac84db7f0ce..3f25c0c1756 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -640,8 +640,6 @@ (define_insn "lasx_xvpermi_d__1"
 (set_attr "mode" "")])
  
  ;; xvpermi.q

-;; Unused bits in operands[3] need be set to 0 to avoid
-;; causing undefined behavior on LA464.
  (define_insn "lasx_xvpermi_q_"
[(set (match_operand:LASX 0 "register_operand" "=f")
(unspec:LASX
@@ -651,9 +649,6 @@ (define_insn "lasx_xvpermi_q_"
  UNSPEC_LASX_XVPERMI_Q))]
"ISA_HAS_LASX"
  {
-  int mask = 0x33;
-  mask &= INTVAL (operands[3]);
-  operands[3] = GEN_INT (mask);
return "xvpermi.q\t%u0,%u2,%3";
  }
[(set_attr "type" "simd_splat")
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
index dbc29d2fb22..f89dfc31120 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
@@ -27,7 +27,7 @@ main ()
*((unsigned long*)& __m256i_result[2]) = 0x7fff7fff7fff;
*((unsigned long*)& __m256i_result[1]) = 0x7fe37fe3001d001d;
*((unsigned long*)& __m256i_result[0]) = 0x7fff7fff7fff;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x2a);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x22);
ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
  
*((unsigned long*)& __m256i_op0[3]) = 0x;

@@ -42,7 +42,7 @@ main ()
*((unsigned long*)& __m256i_result[2]) = 0x0019001c;
*((unsigned long*)& __m256i_result[1]) = 0x;
*((unsigned long*)& __m256i_result[0]) = 0x01fe;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0xb9);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x31);
ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
  
*((unsigned long*)& __m256i_op0[3]) = 0x00ff00ff00ff00ff;

@@ -57,7 +57,7 @@ main ()
*((unsigned long*)& __m256i_result[2]) = 0x;
*((unsigned long*)& __m256i_result[1]) = 0x00ff00ff00ff00ff;
*((unsigned long*)& __m256i_result[0]) = 0x00ff00ff00ff00ff;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0xca);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x02);
ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
  
return 0;




Re: [PATCH] LoongArch: Remove unused and incorrect "sge_" define_insn

2024-03-14 Thread chenglulu



在 2024/3/13 下午9:03, Xi Ruoyao 写道:

If this insn is really used, we'll have something like

 slti $r4,$r0,$r5

in the code.  The assembler will reject it because slti wants 2
register operands and 1 immediate operand.  But we've not got any bug
report for this, indicating this define_insn is unused at all.

Note that do_store_flag (in expr.cc) is already converting x >= 1 to
x > 0 unconditionally, so this define_insn is indeed unused and we can
just remove it.

gcc/ChangeLog:

* config/loongarch/loongarch.md (any_ge): Remove.
(sge_): Remove.
---

Not fully tested but should be obvious.  Ok for trunk?


LGTM!

Thanks!



  gcc/config/loongarch/loongarch.md | 10 --
  1 file changed, 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 525e1e82183..18fd9c1e7d5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -517,7 +517,6 @@ (define_code_iterator equality_op [eq ne])
  ;; These code iterators allow the signed and unsigned scc operations to use
  ;; the same template.
  (define_code_iterator any_gt [gt gtu])
-(define_code_iterator any_ge [ge geu])
  (define_code_iterator any_lt [lt ltu])
  (define_code_iterator any_le [le leu])
  
@@ -3355,15 +3354,6 @@ (define_insn "*sgt_"

[(set_attr "type" "slt")
 (set_attr "mode" "")])
  
-(define_insn "*sge_"

-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
-(const_int 1)))]
-  ""
-  "slti\t%0,%.,%1"
-  [(set_attr "type" "slt")
-   (set_attr "mode" "")])
-
  (define_insn "*slt_"
[(set (match_operand:GPR 0 "register_operand" "=r")
(any_lt:GPR (match_operand:X 1 "register_operand" "r")




Re: [pushed][PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-08 Thread chenglulu



在 2024/3/9 上午9:48, chenglulu 写道:

Pushed to r14-9407.

Cherry picked to r13-8413 and r12-10200.


在 2024/3/7 上午9:12, Lulu Cheng 写道:
If the hardware does not support LAMCAS, atomic_compare_and_swapsi 
needs to be
implemented through "ll.w+sc.w". In the implementation of the 
instruction sequence,

it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 
32-bit
and 64-bit, the two operand registers that need to be compared are 
symbolically
extended, and one of the operand registers is obtained from memory 
through the
"ll.w" instruction, which can ensure that the symbolic expansion is 
carried out.
However, the value of the other operand register is not guaranteed to 
be the

value of the sign extension.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_cas_value_strong):
In loongarch64, a sign extension operation is added when
operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/atomic-cas-int.C: New test.
---
  gcc/config/loongarch/sync.md  | 46 ++-
  .../g++.target/loongarch/atomic-cas-int.C | 32 +
  2 files changed, 67 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..d41c2d26811 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong"
 (clobber (match_scratch:GPR 5 "="))]
    ""
  {
-  return "1:\\n\\t"
- "ll.\\t%0,%1\\n\\t"
- "bne\\t%0,%z2,2f\\n\\t"
- "or%i3\\t%5,$zero,%3\\n\\t"
- "sc.\\t%5,%1\\n\\t"
- "beqz\\t%5,1b\\n\\t"
- "b\\t3f\\n\\t"
- "2:\\n\\t"
- "%G4\\n\\t"
- "3:\\n\\t";
+  output_asm_insn ("1:", operands);
+  output_asm_insn ("ll.\t%0,%1", operands);
+
+  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and 
higher, the
+ return value of the val_without_const_folding will not be 
truncated and

+ will be passed directly to the function compare_exchange_strong.
+ However, the instruction 'bne' does not distinguish between 
32-bit and
+ 64-bit operations.  so if the upper 32 bits of the register are 
not
+ extended by the 32nd bit symbol, then the comparison may not be 
valid

+ here.  This will affect the result of the operation.  */
+
+  if (TARGET_64BIT && REG_P (operands[2])
+  && GET_MODE (operands[2]) == SImode)
+    {
+  output_asm_insn ("addi.w\t%5,%2,0", operands);
+  output_asm_insn ("bne\t%0,%5,2f", operands);
+    }
+  else
+    output_asm_insn ("bne\t%0,%z2,2f", operands);
+
+  output_asm_insn ("or%i3\t%5,$zero,%3", operands);
+  output_asm_insn ("sc.\t%5,%1", operands);
+  output_asm_insn ("beqz\t%5,1b", operands);
+  output_asm_insn ("b\t3f", operands);
+  output_asm_insn ("2:", operands);
+  output_asm_insn ("%G4", operands);
+  output_asm_insn ("3:", operands);
+
+  return "";
  }
-  [(set (attr "length") (const_int 28))])
+  [(set (attr "length")
+ (if_then_else
+    (and (match_test "GET_MODE (operands[2]) == SImode")
+ (match_test "REG_P (operands[2])"))
+    (const_int 32)
+    (const_int 28)))])
    (define_insn "atomic_cas_value_strong_amcas"
    [(set (match_operand:QHWD 0 "register_operand" "=")
diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C 
b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C

new file mode 100644
index 000..830ce48267a
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+__attribute__ ((noinline)) long
+val_without_const_folding (long val)
+{
+  return val;
+}
+
+int
+main ()
+{
+  int oldval = 0xaa;
+  int newval = 0xbb;
+  std::atomic amo;
+
+  amo.store (oldval);
+
+  long longval = val_without_const_folding (0xff80 + 
oldval);

+  oldval = static_cast (longval);
+
+  amo.compare_exchange_strong (oldval, newval);
+
+  if (newval != amo.load (std::memory_order_relaxed))
+    __builtin_abort ();
+
+  return 0;
+}
+




Re: [pushed][PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-08 Thread chenglulu

Pushed to r14-9407.

在 2024/3/7 上午9:12, Lulu Cheng 写道:

If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
implemented through "ll.w+sc.w". In the implementation of the instruction 
sequence,
it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 32-bit
and 64-bit, the two operand registers that need to be compared are symbolically
extended, and one of the operand registers is obtained from memory through the
"ll.w" instruction, which can ensure that the symbolic expansion is carried out.
However, the value of the other operand register is not guaranteed to be the
value of the sign extension.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_cas_value_strong):
In loongarch64, a sign extension operation is added when
operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/atomic-cas-int.C: New test.
---
  gcc/config/loongarch/sync.md  | 46 ++-
  .../g++.target/loongarch/atomic-cas-int.C | 32 +
  2 files changed, 67 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..d41c2d26811 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong"
 (clobber (match_scratch:GPR 5 "="))]
""
  {
-  return "1:\\n\\t"
-"ll.\\t%0,%1\\n\\t"
-"bne\\t%0,%z2,2f\\n\\t"
-"or%i3\\t%5,$zero,%3\\n\\t"
-"sc.\\t%5,%1\\n\\t"
-"beqz\\t%5,1b\\n\\t"
-"b\\t3f\\n\\t"
-"2:\\n\\t"
-"%G4\\n\\t"
-"3:\\n\\t";
+  output_asm_insn ("1:", operands);
+  output_asm_insn ("ll.\t%0,%1", operands);
+
+  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
+ return value of the val_without_const_folding will not be truncated and
+ will be passed directly to the function compare_exchange_strong.
+ However, the instruction 'bne' does not distinguish between 32-bit and
+ 64-bit operations.  so if the upper 32 bits of the register are not
+ extended by the 32nd bit symbol, then the comparison may not be valid
+ here.  This will affect the result of the operation.  */
+
+  if (TARGET_64BIT && REG_P (operands[2])
+  && GET_MODE (operands[2]) == SImode)
+{
+  output_asm_insn ("addi.w\t%5,%2,0", operands);
+  output_asm_insn ("bne\t%0,%5,2f", operands);
+}
+  else
+output_asm_insn ("bne\t%0,%z2,2f", operands);
+
+  output_asm_insn ("or%i3\t%5,$zero,%3", operands);
+  output_asm_insn ("sc.\t%5,%1", operands);
+  output_asm_insn ("beqz\t%5,1b", operands);
+  output_asm_insn ("b\t3f", operands);
+  output_asm_insn ("2:", operands);
+  output_asm_insn ("%G4", operands);
+  output_asm_insn ("3:", operands);
+
+  return "";
  }
-  [(set (attr "length") (const_int 28))])
+  [(set (attr "length")
+ (if_then_else
+   (and (match_test "GET_MODE (operands[2]) == SImode")
+(match_test "REG_P (operands[2])"))
+   (const_int 32)
+   (const_int 28)))])
  
  (define_insn "atomic_cas_value_strong_amcas"

[(set (match_operand:QHWD 0 "register_operand" "=")
diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C 
b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
new file mode 100644
index 000..830ce48267a
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+__attribute__ ((noinline)) long
+val_without_const_folding (long val)
+{
+  return val;
+}
+
+int
+main ()
+{
+  int oldval = 0xaa;
+  int newval = 0xbb;
+  std::atomic amo;
+
+  amo.store (oldval);
+
+  long longval = val_without_const_folding (0xff80 + oldval);
+  oldval = static_cast (longval);
+
+  amo.compare_exchange_strong (oldval, newval);
+
+  if (newval != amo.load (std::memory_order_relaxed))
+__builtin_abort ();
+
+  return 0;
+}
+




Re: [pushed][PATCH] LoongArch: testsuite: Add compilation options to the regname-fp-s9.c.

2024-03-08 Thread chenglulu

Pushed to r14-9408.

在 2024/3/7 上午9:50, Lulu Cheng 写道:

When the value of the macro DEFAULT_CFLAGS is set to '-ansi -pedantic-errors',
regname-s9-fp.c will test to fail. To solve this problem, add the compilation
option '-Wno-pedantic -std=gnu90' to this test case.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/regname-fp-s9.c: Add compilation option
'-Wno-pedantic -std=gnu90'.
---
  gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
index d2e3b80f83c..77a74f1f667 100644
--- a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -1,3 +1,4 @@
  /* { dg-do compile } */
+/* { dg-additional-options "-Wno-pedantic -std=gnu90" } */
  register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
  register long fp asm("fp"); /* { dg-warning "register of 'fp' used for multiple 
global register variables" } */




Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-08 Thread chenglulu



在 2024/3/8 下午2:22, Xi Ruoyao 写道:

On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote:

在 2024/3/7 下午8:52, Xi Ruoyao 写道:

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:

I wanted to do this at first, but it didn't work out.

But then I thought about it, and there are two benefits to putting it in
the middle of ll/sc:

1. If there is an operation that uses the $r4 register after this atomic
operation, another

register is required to store $r4.

2. ll.w requires long cycles, so putting an addi.w command after ll.w
won't make a difference.

So based on the above, I didn't try again, but directly made a
modification like a patch.

Ah, the explanation makes sense to me.  Ok with the original patch then.

Ok.Thank you so much!:-)



Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread chenglulu



在 2024/3/7 下午8:52, Xi Ruoyao 写道:

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:


I wanted to do this at first, but it didn't work out.

But then I thought about it, and there are two benefits to putting it in 
the middle of ll/sc:


1. If there is an operation that uses the $r4 register after this atomic 
operation, another


register is required to store $r4.

2. ll.w requires long cycles, so putting an addi.w command after ll.w 
won't make a difference.


So based on the above, I didn't try again, but directly made a 
modification like a patch.




[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #16 from chenglulu  ---
(In reply to Xi Ruoyao from comment #15)
> > Hi,Ruoyao:
> > 
> >  The results of spec2006 on 3A6000 were obtained, I removed the more 
> > volatile
> > test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
> > -falign-lables=4' this set of parameters got the highest score. This is the
> > same combination of parameters as the coremark tested by Xu Chenghua.
> > 
> > The test of the 3A5000 will also be completed around the 15th of this month,
> > so I want to change the code after the test results of the 3a5000 are out.
> > What do you think?
> 
> Ok to me.
> 
> I'm getting some different results on LA664:
> 
> 22031.284424 Compiler flags : -O2 -falign-labels=4 -falign-functions=8
> -falign-loops=8 -falign-jumps=32 -DPERFORMANCE_RUN=1 -lrt
> 
> vs the "best" one:
> 
> 22075.055188 Compiler flags : -O2 -falign-labels=4 -falign-functions=32
> -falign-loops=16 -falign-jumps=8 -DPERFORMANCE_RUN=1 -lrt
> 
> maybe such a 0.1% difference is some random fluctuation, or hardware or
> kernel configuration difference anyway.

It's also possible that I'll find a few more machines to test the coremark
score.

Re:[pushed] [PATCH v1] LoongArch: testsuite:Fix problems with incorrect results in vector test cases.

2024-03-07 Thread chenglulu

Pushed to r14-9352.

在 2024/3/6 下午4:54, chenxiaolong 写道:

In simd_correctness_check.h, the role of the macro ASSERTEQ_64 is to check the
result of the passed vector values for the 64-bit data of each array element.
It turns out that it uses the abs() function to check only the lower 32 bits
of the data at a time, so it replaces abs() with the llabs() function.

However, the following two problems may occur after modification:

1.FAIL in lasx-xvfrint_s.c and lsx-vfrint_s.c
The reason for the error is because vector test cases that use __m{128,256} to
define vector types are composed of 32-bit primitive types, they should use
ASSERTEQ_32 instead of ASSERTEQ_64 to check for correctness.

2.FAIL in lasx-xvshuf_b.c and lsx-vshuf.c
The cause of the error is that the expected result of the function setting in
the test case is incorrect.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c: Replace
ASSERTEQ_64 with the macro ASSERTEQ_32.
* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Modify the expected
test results of some functions according to the function of the vector
instruction.
* gcc.target/loongarch/vector/lsx/lsx-vfrint_s.c: Same
modification as lasx-xvfrint_s.c.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Same
modification as lasx-xvshuf_b.c.
* gcc.target/loongarch/vector/simd_correctness_check.h: Use the llabs()
function instead of abs() to check the correctness of the results.
---
  .../loongarch/vector/lasx/lasx-xvfrint_s.c| 58 +--
  .../loongarch/vector/lasx/lasx-xvshuf_b.c | 14 ++---
  .../loongarch/vector/lsx/lsx-vfrint_s.c   | 50 
  .../loongarch/vector/lsx/lsx-vshuf.c  | 12 ++--
  .../loongarch/vector/simd_correctness_check.h |  2 +-
  5 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
index fbfe300eac4..4538528a67f 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
@@ -184,7 +184,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -203,7 +203,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -222,7 +222,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x01010101;

*((int *)&__m256_op0[6]) = 0x01010101;
@@ -241,7 +241,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -260,7 +260,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -279,7 +279,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -298,7 +298,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x01010101;

*((int *)&__m256_op0[6]) = 0x01010101;
@@ -317,7 +317,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 

Re:[pushed] [PATCH] LoongArch: Use /lib instead of /lib64 as the library search path for MUSL.

2024-03-07 Thread chenglulu

Pushed to r14-9351.

在 2024/3/6 上午9:19, Yang Yujie 写道:

gcc/ChangeLog:

* config.gcc: Add a case for loongarch*-*-linux-musl*.
* config/loongarch/linux.h: Disable the multilib-compatible
treatment for *musl* targets.
* config/loongarch/musl.h: New file.
---
  gcc/config.gcc   |  3 +++
  gcc/config/loongarch/linux.h |  4 +++-
  gcc/config/loongarch/musl.h  | 23 +++
  3 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/config/loongarch/musl.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a1480b72c46..3293be16699 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2538,6 +2538,9 @@ riscv*-*-freebsd*)
  
  loongarch*-*-linux*)

tm_file="elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h 
${tm_file}"
+case ${target} in
+ *-linux-musl*) tm_file="${tm_file} loongarch/musl.h"
+   esac
tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h 
loongarch/loongarch-driver.h"
extra_options="${extra_options} linux-android.opt"
tmake_file="${tmake_file} loongarch/t-multilib loongarch/t-linux"
diff --git a/gcc/config/loongarch/linux.h b/gcc/config/loongarch/linux.h
index 17d9f87537b..40d9ba6d405 100644
--- a/gcc/config/loongarch/linux.h
+++ b/gcc/config/loongarch/linux.h
@@ -21,7 +21,9 @@ along with GCC; see the file COPYING3.  If not see
   * This ensures that a compiler configured with --disable-multilib
   * can work in a multilib environment.  */
  
-#if defined(LA_DISABLE_MULTILIB) && defined(LA_DISABLE_MULTIARCH)

+#if !defined(LA_DEFAULT_TARGET_MUSL) \
+  && defined(LA_DISABLE_MULTILIB) \
+  && defined(LA_DISABLE_MULTIARCH)
  
#if DEFAULT_ABI_BASE == ABI_BASE_LP64D

  #define ABI_LIBDIR "lib64"
diff --git a/gcc/config/loongarch/musl.h b/gcc/config/loongarch/musl.h
new file mode 100644
index 000..fa43bc86606
--- /dev/null
+++ b/gcc/config/loongarch/musl.h
@@ -0,0 +1,23 @@
+/* Definitions for MUSL C library support.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+#ifndef LA_DEFAULT_TARGET_MUSL
+#define LA_DEFAULT_TARGET_MUSL
+#endif




Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-03-07 Thread chenglulu


在 2024/3/1 下午5:39, mengqinggang 写道:

Thanks, I try to send a new version patch next week.


在 2024/2/29 下午2:08, Xi Ruoyao 写道:

On Thu, 2024-02-29 at 09:42 +0800, mengqinggang wrote:

Generate la.tls.desc macro instruction for TLS descriptors model.

la.tls.desc expand to
   pcalau12i $a0, %desc_pc_hi20(a)
   ld.d  $a1, $a0, %desc_ld_pc_lo12(a)
   addi.d    $a0, $a0, %desc_add_pc_lo12(a)
   jirl  $ra, $a1, %desc_call(a)

The default is TLS descriptors, but can be configure with
-mtls-dialect={desc,trad}.

Please keep trad as the default for now.  Glibc-2.40 will be released
after GCC 14.1 but we don't want to end up in a situation where the
default configuration of the latest GCC release creating something not
working with latest Glibc release.

And there's also musl libc we need to take into account.

Or you can write some autoconf test for if the assembler supports
tlsdesc and check TARGET_GLIBC_MAJOR & TARGET_GLIBC_MINOR for Glibc
version to decide if enable desc by default.  If you want this but don't
have time to implement you can leave trad the default and I'll take care
of this.


I think the implementation of the options also needs to be tweaked.

I've modified a version and the patch is attached.

And here you need to add test cases.

Thanks.




/* snip */


+(define_insn "@got_load_tls_desc"
+  [(set (match_operand:P 0 "register_operand" "=r")
+    (unspec:P
+        [(match_operand:P 1 "symbolic_operand" "")]
+        UNSPEC_TLS_DESC))
+    (clobber (reg:SI FCC0_REGNUM))
+    (clobber (reg:SI FCC1_REGNUM))
+    (clobber (reg:SI FCC2_REGNUM))
+    (clobber (reg:SI FCC3_REGNUM))
+    (clobber (reg:SI FCC4_REGNUM))
+    (clobber (reg:SI FCC5_REGNUM))
+    (clobber (reg:SI FCC6_REGNUM))
+    (clobber (reg:SI FCC7_REGNUM))
+    (clobber (reg:SI A1_REGNUM))
+    (clobber (reg:SI RETURN_ADDR_REGNUM))]

Ok, the clobber list is correct.


+  "TARGET_TLS_DESC"
+  "la.tls.desc\t%0,%1"

With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead
of la.tls.desc.  As we don't want to add too many code we can just hard
code the 4 instructions here instead of splitting this insn, just
something like

{ return TARGET_EXPLICIT_RELOCS_ALWAS ? ".." : 
"la.tls.desc\t%0,%1"; }



+  [(set_attr "got" "load")
+   (set_attr "mode" "")])

We need (set_attr "length" "16") in this list as this actually expands
into 16 bytes.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 01ddc1a92f6..955e74d3bf9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2547,7 +2547,6 @@ loongarch*-*-linux*)
 	# Force .init_array support.  The configure script cannot always
 	# automatically detect that GAS supports it, yet we require it.
 	gcc_cv_initfini_array=yes
-	with_tls=${with_tls:-desc}
 	;;
 
 loongarch*-*-elf*)
@@ -5924,6 +5923,11 @@ case ${target} in
 		lasx)tm_defines="$tm_defines DEFAULT_ISA_EXT_SIMD=ISA_EXT_SIMD_LASX" ;;
 		esac
 
+		case ${with_tls} in
+		"" | trad)	tm_defines="$tm_defines DEFAULT_TLS_TYPE=TLS_TRADITIONAL" ;;
+		desc)		tm_defines="$tm_defines DEFAULT_TLS_TYPE=TLS_DESCRIPTORS" ;;
+		esac
+
 		tmake_file="loongarch/t-loongarch $tmake_file"
 		;;
 
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in
index 2cc943ef683..7de107c3e3d 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -264,7 +264,7 @@ TargetVariable
 HOST_WIDE_INT la_isa_evolution = 0
 
 Enum
-Name(tls_type) Type(enum loongarch_tls_type)
+Name(tls_type) Type(int)
 The possible TLS dialects:
 
 EnumValue
@@ -274,5 +274,5 @@ EnumValue
 Enum(tls_type) String(desc) Value(TLS_DESCRIPTORS)
 
 mtls-dialect=
-Target RejectNegative Joined Enum(tls_type) Var(loongarch_tls_dialect) Init(TLS_DESCRIPTORS) Save
+Target RejectNegative Joined Enum(tls_type) Var(la_opt_tls_dialect) Init(M_OPT_UNSET) Save
 Specify TLS dialect.
diff --git a/gcc/config/loongarch/loongarch-def.h b/gcc/config/loongarch/loongarch-def.h
index 2dbf006d013..48d60e2b456 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -175,6 +175,7 @@ struct loongarch_target
   int cpu_arch;	/* CPU_ */
   int cpu_tune;	/* same */
   int cmodel;	/* CMODEL_ */
+  int tls_dialect;  /* TLS_ */
 };
 
 /* CPU model */
@@ -188,6 +189,12 @@ enum {
   N_TUNE_TYPES	= 5
 };
 
+/* TLS types.  */
+enum {
+  TLS_TRADITIONAL = 0,
+  TLS_DESCRIPTORS = 1
+};
+
 /* CPU model properties */
 extern loongarch_def_array
   loongarch_cpu_strings;
diff --git a/gcc/config/loongarch/loongarch-driver.cc b/gcc/config/loongarch/loongarch-driver.cc
index 62658f531ad..8c4ed34698b 100644
--- a/gcc/config/loongarch/loongarch-driver.cc
+++ b/gcc/config/loongarch/loongarch-driver.cc
@@ -45,7 +45,7 @@ la_driver_init (int argc ATTRIBUTE_UNUSED, const char **argv ATTRIBUTE_UNUSED)
   /* Initialize all fields of la_target.  */
   loongarch_init_target (_target, M_OPT_UNSET, M_OPT_UNSET, M_OPT_UNSET,
 			 M_OPT_UNSET, M_OPT_UNSET, 

Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread chenglulu



在 2024/3/7 下午12:05, mengqinggang 写道:

Hi,

Thanks, this patch is LGTM.


I don't have a problem either.

Thanks.




在 2024/3/7 上午10:56, Xi Ruoyao 写道:

On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote:

Hi,

Whether to add an option to control the generation of R_LARCH_RELAX,
similar to as -mrelax/-mno-relax.

There are already -mrelax and -mno-relax, they can be checked in the
compiler code with TARGET_LINKER_RELAXATION.

/* snip */


+    case 'Q':
+  if (!TARGET_LINKER_RELAXATION)
+ break;

So with -mno-relax we'll break early here, then no R_LARCH_RELAX will be
printed.


+  if (code == HIGH)
+ op = XEXP (op, 0);
+
+  if (loongarch_classify_symbolic_expression (op) == 
SYMBOL_TLS_IE)

+ fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
+
+  break;

The tls-ie-norelax.c test case also checks for -mno-relax:


+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" 
} */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target 
tls_native } } } */

i.e. -mno-relax is used compiling this test case, and the compiled
assembly code should not contain R_LARCH_RELAX.





Re: [PATCH] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers

2024-03-06 Thread chenglulu

This test case is so cleverly designed!

I have no problem. Thank you!

在 2024/3/5 下午9:00, Xi Ruoyao 写道:

Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names.  A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.
---

Tested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/testsuite/gcc.target/loongarch/vfcmp-d.c  | 202 --
  gcc/testsuite/gcc.target/loongarch/vfcmp-f.c  | 347 ++
  gcc/testsuite/gcc.target/loongarch/xvfcmp-d.c | 202 --
  gcc/testsuite/gcc.target/loongarch/xvfcmp-f.c | 204 --
  4 files changed, 816 insertions(+), 139 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c 
b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
index 8b870ef38a0..87e4ed19e96 100644
--- a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
+++ b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
@@ -1,28 +1,188 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-fno-vect-cost-model" } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { check-function-bodies "**" "" } } */
  
  #define F double

  #define I long long
  
  #include "vfcmp-f.c"
  
-/* { dg-final { scan-assembler "compare_quiet_equal:.*\tvfcmp\\.ceq\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_equal\n" } } */

-/* { dg-final { scan-assembler 
"compare_quiet_not_equal:.*\tvfcmp\\.cune\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_greater:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_less:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_less:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_greater:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_unordered:.*\tvfcmp\\.cun\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_ordered:.*\tvfcmp\\.cor\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_ordered\n"
 } } */
+/*
+** compare_quiet_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.ceq.d (\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_quiet_not_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.cune.d(\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_signaling_greater:
+** vld 

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-06 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #14 from chenglulu  ---
(In reply to chenglulu from comment #13)
> (In reply to Xi Ruoyao from comment #9)
> > (In reply to chenglulu from comment #8)
> > > (In reply to Xi Ruoyao from comment #7)
> > > > Any update? :)
> > > 
> > > Well, I haven't run it yet. Since this does not have a big impact on the
> > > spec score, I am currently testing it on a single-channel machine, so the
> > > test time will be longer.
> > > I will reply here as soon as the results are available.
> > 
> > Can we determine on LA664 if the current default alignment is better than
> > not aligning at all?  Coremarks results suggest the current default is even
> > worse than not aligning, but arguably Coremarks is far different from real
> > workloads. However if the current default is not better than not aligning
> > (or the difference is only marginal and is likely covered up by some random
> > fluctuation) we can disable the aligning for LA664.
> > 
> > (Maybe we and the HW engineers have done some repetitive work or even some
> > work cancelling each other out :(. )
> 
> The results of spec2006 on 3A6000 were obtained, I removed the more volatile
> test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
> -falign-lables=4' this set of parameters got the highest score. This is the
> same combination of parameters as the coremark tested by Xu Chenghua.

Hi,Ruoyao:

The test of the 3a5000 will also be completed around the 15th of this month, so
I want to change the code after the test results of the 3a5000 are out.
What do you think?(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #8)
> > (In reply to Xi Ruoyao from comment #7)
> > > Any update? :)
> > 
> > Well, I haven't run it yet. Since this does not have a big impact on the
> > spec score, I am currently testing it on a single-channel machine, so the
> > test time will be longer.
> > I will reply here as soon as the results are available.
> 
> Can we determine on LA664 if the current default alignment is better than
> not aligning at all?  Coremarks results suggest the current default is even
> worse than not aligning, but arguably Coremarks is far different from real
> workloads. However if the current default is not better than not aligning
> (or the difference is only marginal and is likely covered up by some random
> fluctuation) we can disable the aligning for LA664.

Hi,Ruoyao:

 The results of spec2006 on 3A6000 were obtained, I removed the more volatile
test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
-falign-lables=4' this set of parameters got the highest score. This is the
same combination of parameters as the coremark tested by Xu Chenghua.

The test of the 3A5000 will also be completed around the 15th of this month, so
I want to change the code after the test results of the 3a5000 are out.
What do you think?

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-06 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #13 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #8)
> > (In reply to Xi Ruoyao from comment #7)
> > > Any update? :)
> > 
> > Well, I haven't run it yet. Since this does not have a big impact on the
> > spec score, I am currently testing it on a single-channel machine, so the
> > test time will be longer.
> > I will reply here as soon as the results are available.
> 
> Can we determine on LA664 if the current default alignment is better than
> not aligning at all?  Coremarks results suggest the current default is even
> worse than not aligning, but arguably Coremarks is far different from real
> workloads. However if the current default is not better than not aligning
> (or the difference is only marginal and is likely covered up by some random
> fluctuation) we can disable the aligning for LA664.
> 
> (Maybe we and the HW engineers have done some repetitive work or even some
> work cancelling each other out :(. )

The results of spec2006 on 3A6000 were obtained, I removed the more volatile
test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
-falign-lables=4' this set of parameters got the highest score. This is the
same combination of parameters as the coremark tested by Xu Chenghua.

Re: [PATCH v2] LoongArch: Allow s9 as a register alias

2024-03-05 Thread chenglulu



在 2024/3/5 下午7:50, Xi Ruoyao 写道:

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

v1 -> v2: Add a test case.

Ok for trunk?

Ok. Thanks!


  gcc/config/loongarch/loongarch.h   | 1 +
  gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 3 +++
  2 files changed, 4 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
{ "t8",   20 + GP_REG_FIRST },\
{ "x",21 + GP_REG_FIRST },\
{ "fp",   22 + GP_REG_FIRST },\
+  { "s9",22 + GP_REG_FIRST },\
{ "s0",   23 + GP_REG_FIRST },\
{ "s1",   24 + GP_REG_FIRST },\
{ "s2",   25 + GP_REG_FIRST },\
diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
new file mode 100644
index 000..d2e3b80f83c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
+register long fp asm("fp"); /* { dg-warning "register of 'fp' used for multiple 
global register variables" } */




[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #12 from chenglulu  ---
(In reply to Xi Ruoyao from comment #11)
> (In reply to chenglulu from comment #10)
> > (In reply to Xi Ruoyao from comment #9)
> > > (In reply to chenglulu from comment #8)
> > > > (In reply to Xi Ruoyao from comment #7)
> > > > > Any update? :)
> > > > 
> > > > Well, I haven't run it yet. Since this does not have a big impact on the
> > > > spec score, I am currently testing it on a single-channel machine, so 
> > > > the
> > > > test time will be longer.
> > > > I will reply here as soon as the results are available.
> > > 
> > > Can we determine on LA664 if the current default alignment is better than
> > > not aligning at all?  Coremarks results suggest the current default is 
> > > even
> > > worse than not aligning, but arguably Coremarks is far different from real
> > > workloads. However if the current default is not better than not aligning
> > > (or the difference is only marginal and is likely covered up by some 
> > > random
> > > fluctuation) we can disable the aligning for LA664.
> > > 
> > > (Maybe we and the HW engineers have done some repetitive work or even some
> > > work cancelling each other out :(. )
> > On March 8th I should be able to get the test results on the 3A6000 machine,
> > I need to judge the fluctuation of the spec and then let's see if the
> > default alignment is set?
> 
> I just mean if we cannot get a decisive result before GCC 14 we may just
> turn off alignment.  But if we can get a decisive result as expected in Mar
> we can just use the best we'll find.

Well, the results should be available before GCC14 is released. It also seems
that the setting of 3A5000 needs to be changed, because the value of
'-falign-labels' was affected by the macro ASM_OUTPUT_ALIGN_WITH_NOP in the
previous test.

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #8)
> > (In reply to Xi Ruoyao from comment #7)
> > > Any update? :)
> > 
> > Well, I haven't run it yet. Since this does not have a big impact on the
> > spec score, I am currently testing it on a single-channel machine, so the
> > test time will be longer.
> > I will reply here as soon as the results are available.
> 
> Can we determine on LA664 if the current default alignment is better than
> not aligning at all?  Coremarks results suggest the current default is even
> worse than not aligning, but arguably Coremarks is far different from real
> workloads. However if the current default is not better than not aligning
> (or the difference is only marginal and is likely covered up by some random
> fluctuation) we can disable the aligning for LA664.
> 
> (Maybe we and the HW engineers have done some repetitive work or even some
> work cancelling each other out :(. )
On March 8th I should be able to get the test results on the 3A6000 machine, I
need to judge the fluctuation of the spec and then let's see if the default
alignment is set?
In addition, I also tested it on the 3A5000 again, and the results will be
available around March 15th.
The conclusion of coremark from our team leader Xu Chenghua is that
'-falign-labels' have a regular effect on the performance of coremark, and when
the value of '-falign-labels' is greater than 4 bytes, the performance
decreases significantly.

Re: [PATCH] LoongArch: Allow s9 as a register alias

2024-02-29 Thread chenglulu



在 2024/2/29 下午3:14, Xi Ruoyao 写道:

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?


I think a test is needed.

Others LGTM.

Thanks!



  gcc/config/loongarch/loongarch.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
{ "t8",   20 + GP_REG_FIRST },\
{ "x",21 + GP_REG_FIRST },\
{ "fp",   22 + GP_REG_FIRST },\
+  { "s9",22 + GP_REG_FIRST },\
{ "s0",   23 + GP_REG_FIRST },\
{ "s1",   24 + GP_REG_FIRST },\
{ "s2",   25 + GP_REG_FIRST },\




Re: [PATCH 1/2] LoongArch: NFC: Deduplicate crc instruction defines

2024-02-26 Thread chenglulu

LGTM!

Thanks!

在 2024/2/26 下午12:28, Xi Ruoyao 写道:

Introduce an iterator for UNSPEC_CRC and UNSPEC_CRCC to make the next
change easier.

gcc/ChangeLog:

* config/loongarch/loongarch.md (CRC): New define_int_iterator.
(crc): New define_int_attr.
(loongarch_crc_w__w, loongarch_crcc_w__w): Unify
into ...
(loongarch__w__w): ... here.
---
  gcc/config/loongarch/loongarch.md | 18 +-
  1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2ce7a151880..4ded1b3a117 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4251,24 +4251,16 @@ (define_peephole2
  
  
  (define_mode_iterator QHSD [QI HI SI DI])

+(define_int_iterator CRC [UNSPEC_CRC UNSPEC_CRCC])
+(define_int_attr crc [(UNSPEC_CRC "crc") (UNSPEC_CRCC "crcc")])
  
-(define_insn "loongarch_crc_w__w"

+(define_insn "loongarch__w__w"
[(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRC))]
+CRC))]
""
-  "crc.w..w\t%0,%1,%2"
-  [(set_attr "type" "unknown")
-   (set_attr "mode" "")])
-
-(define_insn "loongarch_crcc_w__w"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
-  (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRCC))]
-  ""
-  "crcc.w..w\t%0,%1,%2"
+  ".w..w\t%0,%1,%2"
[(set_attr "type" "unknown")
 (set_attr "mode" "")])
  




Re: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-22 Thread chenglulu



在 2024/2/23 上午11:27, Xi Ruoyao 写道:

On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote:

在 2024/2/22 下午5:17, Xi Ruoyao 写道:

The gold linker has never been ported to LoongArch (and it seems
unlikely to be ported in the future as the new architectures are
focusing on lld and/or mold for fast linkers).

ChangeLog:

    * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
    list.
    * configure: Regenerate.
---

Ok for GCC trunk (to get synced into Binutils later)?

I have no problem. But I have a question. Is this modification simply
because we don’t

support it or is there an error somewhere?

If a user specify --enable-gold building Binutils, with loongarch in
this list the building system will attempt to build gold and fail.  If
removing loongarch from the list the building system will ignore --
enable-gold.


Okay, I understand.

Thanks!:-)



Re: [pushed][PATCH v1] LoongArch: When checking whether the assembler supports conditional branch relaxation, add compilation parameter "--fatal-warnings" to the assembler.

2024-02-22 Thread chenglulu

Pushed to r14-9142.

在 2024/2/21 上午11:30, Lulu Cheng 写道:

In binutils 2.40 and earlier versions, only a warning will be reported
when a relocation immediate value is out of bounds. As a result,
the value of the macro HAVE_AS_COND_BRANCH_RELAXATION will also be
defined as 1 when the assembler does not support conditional branch
relaxation. Therefore, add the compilation option "--fatal-warnings"
to avoid this problem.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Add parameter "--fatal-warnings" to assemble
when checking whether the assemble support conditional branch
relaxation.
---
  gcc/configure| 2 +-
  gcc/configure.ac | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 41b978b0380..f1d434fede0 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -31136,7 +31136,7 @@ else
 nop
 .endr
 beq $a0,$a1,a' > conftest.s
-if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags --fatal-warnings -o conftest.o 
conftest.s >&5'
{ { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
(eval $ac_try) 2>&5
ac_status=$?
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 72012d61e67..9ebc578e4cc 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5486,7 +5486,7 @@ x:
[Define if your assembler supports -mrelax option.])])
  gcc_GAS_CHECK_FEATURE([conditional branch relaxation support],
gcc_cv_as_loongarch_cond_branch_relax,
-  [],
+  [--fatal-warnings],
[a:
 .rept 32769
 nop




Re: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-22 Thread chenglulu



在 2024/2/22 下午5:17, Xi Ruoyao 写道:

The gold linker has never been ported to LoongArch (and it seems
unlikely to be ported in the future as the new architectures are
focusing on lld and/or mold for fast linkers).

ChangeLog:

* configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
list.
* configure: Regenerate.
---

Ok for GCC trunk (to get synced into Binutils later)?


I have no problem. But I have a question. Is this modification simply 
because we don’t


support it or is there an error somewhere?



  configure| 2 +-
  configure.ac | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 874966fb9f0..02b435c1163 100755
--- a/configure
+++ b/configure
@@ -3092,7 +3092,7 @@ case "${ENABLE_GOLD}" in
# Check for target supported by gold.
case "${target}" in
  i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
diff --git a/configure.ac b/configure.ac
index 4f34004a072..1a19c07a27b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -364,7 +364,7 @@ case "${ENABLE_GOLD}" in
# Check for target supported by gold.
case "${target}" in
  i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold




Re: [GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax

2024-02-22 Thread chenglulu



在 2024/2/22 下午6:20, Xi Ruoyao 写道:

To improve Binutils compatibility we've had to backported relaxation
support.  But if a user just updates to GCC 13.3 and sticks with
Binutils 2.41, there is no reason to use -mno-explicit-relocs as the
default because we are turning off relaxation for Binutils 2.41 (it
lacks conditional branch relaxation support) anyway.

So like GCC 14, make the default of -m[no-]explicit-relocs depend on
-m[no-]relax instead of HAVE_AS_MRELAX_OPTION.  Also update the doc to
reflect the behavior change.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in
(TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the default of
TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS
&& !loongarch_mrelax.
* doc/invoke.texi (-m[no-]explicit-relocs): Update for
LoongArch.
---

Ok for releases/gcc-13?


LGTM!

Thanks!



  gcc/config/loongarch/genopts/loongarch.opt.in |  2 +-
  gcc/config/loongarch/loongarch.cc |  4 
  gcc/config/loongarch/loongarch.opt|  2 +-
  gcc/doc/invoke.texi   | 11 +--
  4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index da6fedd153e..76acd35d39c 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -155,7 +155,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
  -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default 
is 1024.
  
  mexplicit-relocs

-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN)
  Use %reloc() assembly operators.
  
  ; The code model option names for -mcmodel.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 768e2427285..e78b81cd8fc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6222,6 +6222,10 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
gcc_unreachable ();
  }
  
+  if (TARGET_EXPLICIT_RELOCS == M_OPTION_NOT_SEEN)

+TARGET_EXPLICIT_RELOCS = (HAVE_AS_EXPLICIT_RELOCS
+ && !loongarch_mrelax);
+
/* Validate the guard size.  */
int guard_size = param_stack_clash_protection_guard_size;
  
diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt

index 59b1e06d3f2..e61fbaed2c1 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -162,7 +162,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
  -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default 
is 1024.
  
  mexplicit-relocs

-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN)
  Use %reloc() assembly operators.
  
  ; The code model option names for -mcmodel.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99657fb44d8..792ce283bb9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -25830,12 +25830,11 @@ The default code model is @code{normal}.
  @itemx -mno-explicit-relocs
  Use or do not use assembler relocation operators when dealing with symbolic
  addresses.  The alternative is to use assembler macros instead, which may
-limit optimization.  The default value for the option is determined during
-GCC build-time by detecting corresponding assembler support:
-@code{-mexplicit-relocs} if said support is present,
-@code{-mno-explicit-relocs} otherwise.  This option is mostly useful for
-debugging, or interoperation with assemblers different from the build-time
-one.
+limit instruction scheduling but allow linker relaxation.  The default
+value for the option is determined with the assembler capability detected
+during GCC build-time and the setting of @code{-mrelax}:
+@code{-mexplicit-relocs} if the assembler supports relocation operators
+but @code{-mrelax} is not enabled, @code{-mno-explicit-relocs} otherwise.
  
  @opindex mdirect-extern-access

  @item -mdirect-extern-access




Re: [PATCH v2] LoongArch: Split loongarch_option_override_internal into smaller procedures

2024-02-21 Thread chenglulu

Hi,yujie:

When using this patch to compile test cases, ICE will be reported.


 test.c

 float
foo(float a, float b)
{
  return a / b;
}

# ./gcc/cc1 test.c -o - -O2 -ffast-math -mrecip

recip.c: 在函数‘foo’中:
recip.c:5:1: 错误:无法识别的指令:
    5 | }
  | ^
(insn 9 8 10 2 (set (reg:SF 84)
    (unspec:SF [
    (reg/v:SF 82 [ b ])
    ] UNSPEC_RECIPE)) "recip.c":4:12 -1
 (nil))
during RTL pass: vregs
recip.c:5:1: 编译器内部错误:在 extract_insn 中,于 recog.cc:2812
0x135d1d4 _fatal_insn(char const*, rtx_def const*, char const*, int, 
char const*)

/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/rtl-error.cc:108
0x135d215 _fatal_insn_not_found(rtx_def const*, char const*, int, char 
const*)

/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/rtl-error.cc:116
0x13111b6 extract_insn(rtx_insn*)
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/recog.cc:2812
0xf84e72 instantiate_virtual_regs_in_insn
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/function.cc:1611
0xf85e90 instantiate_virtual_regs
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/function.cc:1994
0xf85f56 execute
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/function.cc:2041
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).

Please include the complete backtrace with any bug report.
参阅 <https://gcc.gnu.org/bugs/> 以获取指示。


在 2024/2/21 上午11:36, Yang Yujie 写道:

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Mark -m[no-]recip as
aliases to -mrecip={all,none}.
* config/loongarch/loongarch.opt: Same.
* config/loongarch/loongarch-def.h: Modify ABI condition macros for
convenience.
* config/loongarch/loongarch-opts.cc: Define option-handling
procedures split from the original loongarch_option_override_internal.
* config/loongarch/loongarch-opts.h: Same.
* config/loongarch/loongarch.cc: Clean up
loongarch_option_override_internal.
---
  gcc/config/loongarch/genopts/loongarch.opt.in |   8 +-
  gcc/config/loongarch/loongarch-def.h  |  11 +-
  gcc/config/loongarch/loongarch-opts.cc| 248 +
  gcc/config/loongarch/loongarch-opts.h |  27 +-
  gcc/config/loongarch/loongarch.cc | 253 +++---
  gcc/config/loongarch/loongarch.opt|   8 +-
  6 files changed, 325 insertions(+), 230 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 02f918053f5..a77893d31d9 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -197,14 +197,14 @@ mexplicit-relocs
  Target Alias(mexplicit-relocs=, always, none)
  Use %reloc() assembly operators (for backward compatibility).
  
-mrecip

-Target RejectNegative Var(la_recip) Save
-Generate approximate reciprocal divide and square root for better throughput.
-
  mrecip=
  Target RejectNegative Joined Var(la_recip_name) Save
  Control generation of reciprocal estimates.
  
+mrecip

+Target Alias(mrecip=, all, none)
+Generate approximate reciprocal divide and square root for better throughput.
+
  ; The code model option names for -mcmodel.
  Enum
  Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 2dbf006d013..0cbf9476690 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -90,11 +90,16 @@ extern loongarch_def_array
  
  #define TO_LP64_ABI_BASE(C) (C)
  
-#define ABI_FPU_64(abi_base) \

+#define ABI_LP64_P(abi_base) \
+  (abi_base == ABI_BASE_LP64D \
+   || abi_base == ABI_BASE_LP64F \
+   || abi_base == ABI_BASE_LP64S)
+
+#define ABI_FPU64_P(abi_base) \
(abi_base == ABI_BASE_LP64D)
-#define ABI_FPU_32(abi_base) \
+#define ABI_FPU32_P(abi_base) \
(abi_base == ABI_BASE_LP64F)
-#define ABI_FPU_NONE(abi_base) \
+#define ABI_NOFPU_P(abi_base) \
(abi_base == ABI_BASE_LP64S)
  
  
diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc

index 7eeac43ed2f..380208f38bf 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "coretypes.h"
  #include "tm.h"
  #include "obstack.h"
+#include "opts.h"
  #include "diagnostic-core.h"
  
  #include "loongarch-cpu.h"

@@ -32,8 +33,12 @@ along with GCC; see the file COPYING3.  If not see
  #include "loongarch-str.h"
  #include "loongarch-def.h"
  
+/* Target configuration */

  struct loongarch_target la_target;
  
+/* RTL cost information */

+const struct loongarch_rtx_cost_data *loongarch_cost;
+
  /* ABI-related configuration.  */
  #define ABI_COUNT (sizeof(abi_priority_list)/sizeof(struct loongarch_abi))
  static const struct loong

Re:[pushed] [PATCH v1 0/4] Fix a series of problems caused by

2024-02-21 Thread chenglulu

Pushed to r13-8349...r13-8352.

在 2024/2/21 上午11:04, Lulu Cheng 写道:

Because binutils2.42 corrects the implementation of
".align [abs-expr,[abs-expr[,abs-expr]]]".
The macro ASM_OUTPUT_ALIGN_WITH_NOP in GCC uses this assembler directive,
and an error occurs. See link below for detailed description.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645067.html

In order to solve the above problems, do the following operations:

1. Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP. (cherry pick r14-4674)
2. Check whether binutils supports the relax function. (cherry pick r14-4160)
3. Disable relaxation if the assembler don't support
   conditional branch relaxation. (cherry pick r14-5434)

PR112299 is also fixed here.

Lulu Cheng (2):
   LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.
   LoongArch: Check whether binutils supports the relax function. If
 supported, explicit relocs are turned off by default.

Xi Ruoyao (2):
   LoongArch: Disable relaxation if the assembler don't support
 conditional branch relaxation [PR112330]
   LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

  gcc/config.in | 12 
  gcc/config/loongarch/genopts/loongarch.opt.in | 11 +++-
  gcc/config/loongarch/gnu-user.h   |  3 +-
  gcc/config/loongarch/loongarch-opts.h | 12 
  gcc/config/loongarch/loongarch.h  | 22 +--
  gcc/config/loongarch/loongarch.opt| 11 +++-
  gcc/configure | 66 +++
  gcc/configure.ac  | 14 
  gcc/doc/invoke.texi   | 24 ++-
  9 files changed, 165 insertions(+), 10 deletions(-)





Re: [pushed][PATCH v1 0/4] Fix a series of problems caused by ASM_OUTPUT_ALIGN_WITH_NOP (release/gcc-12).

2024-02-21 Thread chenglulu

Pushed to r12-10169...r12-10172.

在 2024/2/21 上午11:10, Lulu Cheng 写道:

Because binutils2.42 corrects the implementation of
".align [abs-expr,[abs-expr[,abs-expr]]]".
The macro ASM_OUTPUT_ALIGN_WITH_NOP in GCC uses this assembler directive,
and an error occurs. See link below for detailed description.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645067.html

In order to solve the above problems, do the following operations:

1. Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP. (cherry pick r14-4674)
2. Check whether binutils supports the relax function. (cherry pick r14-4160)
3. Disable relaxation if the assembler don't support
   conditional branch relaxation. (cherry pick r14-5434)

PR112299 is also fixed here.

Lulu Cheng (2):
   LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.
   LoongArch: Check whether binutils supports the relax function. If
 supported, explicit relocs are turned off by default.

Xi Ruoyao (2):
   LoongArch: Disable relaxation if the assembler don't support
 conditional branch relaxation [PR112330]
   LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

  gcc/config.in | 18 +
  gcc/config/loongarch/genopts/loongarch.opt.in |  9 +++
  gcc/config/loongarch/gnu-user.h   |  4 +-
  gcc/config/loongarch/loongarch-opts.h | 12 
  gcc/config/loongarch/loongarch.h  | 22 +--
  gcc/config/loongarch/loongarch.opt|  9 +++
  gcc/configure | 66 +++
  gcc/configure.ac  | 14 
  gcc/doc/invoke.texi   | 24 ++-
  9 files changed, 169 insertions(+), 9 deletions(-)





Re: [PATCH v1 0/4] Fix a series of problems caused by

2024-02-20 Thread chenglulu

Sorry, this title is incomplete and has been resent.

在 2024/2/21 上午11:08, Lulu Cheng 写道:

Because binutils2.42 corrects the implementation of
".align [abs-expr,[abs-expr[,abs-expr]]]".
The macro ASM_OUTPUT_ALIGN_WITH_NOP in GCC uses this assembler directive,
and an error occurs. See link below for detailed description.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645067.html

In order to solve the above problems, do the following operations:

1. Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP. (cherry pick r14-4674)
2. Check whether binutils supports the relax function. (cherry pick r14-4160)
3. Disable relaxation if the assembler don't support
   conditional branch relaxation. (cherry pick r14-5434)

PR112299 is also fixed here.

Lulu Cheng (2):
   LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.
   LoongArch: Check whether binutils supports the relax function. If
 supported, explicit relocs are turned off by default.

Xi Ruoyao (2):
   LoongArch: Disable relaxation if the assembler don't support
 conditional branch relaxation [PR112330]
   LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

  gcc/config.in | 18 +
  gcc/config/loongarch/genopts/loongarch.opt.in |  9 +++
  gcc/config/loongarch/gnu-user.h   |  4 +-
  gcc/config/loongarch/loongarch-opts.h | 12 
  gcc/config/loongarch/loongarch.h  | 22 +--
  gcc/config/loongarch/loongarch.opt|  9 +++
  gcc/configure | 66 +++
  gcc/configure.ac  | 14 
  gcc/doc/invoke.texi   | 24 ++-
  9 files changed, 169 insertions(+), 9 deletions(-)





Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread chenglulu



在 2024/2/20 下午7:54, Xi Ruoyao 写道:

On Tue, 2024-02-20 at 19:50 +0800, chenglulu wrote:

在 2024/2/20 下午7:31, Xi Ruoyao 写道:

On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:

On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:


So I think that without worrying about performance and ensuring that
there is no problem

with binutils, I think we can make the following modifications:

     -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
     -   used for padding.  */
     +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
     +   default.  */
  #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
     -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
     +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))

What do you think of it?

Unfortunately it will cause warnings with GAS 2.41 or earlier like

t1.s:1: Warning: expected fill pattern missing
t1.s:5: Warning: expected fill pattern missing

And AFAIK these things may cause many test failures due to "excessive
errors" if running the GCC test suite with these earlier GAS versions.
Maybe we'll have to add some autoconf-based probing for the linker
anyway?

Or just silence the warning passing "--no-warn" to the assembler but I'm
highly unsure if this is really a good idea :(.


I am not opposed to adding detection code, but I looked at this problem
today

and I think this change is the smallest change. I asked Meng Qinggang and he

said that the warning of GAS 2.41 can be removed.

Yes, but we cannot change a released binutils-2.41 tarball and Binutils
folks don't make point releases like GCC.

OK, I agree with you. I will backpoint r14-4674 and r14-5434 to gcc12 
and gcc13.




Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread chenglulu



在 2024/2/20 下午7:31, Xi Ruoyao 写道:

On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:

On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:


So I think that without worrying about performance and ensuring that
there is no problem

with binutils, I think we can make the following modifications:

    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
    -   used for padding.  */
    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
    +   default.  */
     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))

What do you think of it?

Unfortunately it will cause warnings with GAS 2.41 or earlier like

t1.s:1: Warning: expected fill pattern missing
t1.s:5: Warning: expected fill pattern missing

And AFAIK these things may cause many test failures due to "excessive
errors" if running the GCC test suite with these earlier GAS versions.
Maybe we'll have to add some autoconf-based probing for the linker
anyway?

Or just silence the warning passing "--no-warn" to the assembler but I'm
highly unsure if this is really a good idea :(.

I am not opposed to adding detection code, but I looked at this problem 
today


and I think this change is the smallest change. I asked Meng Qinggang and he

said that the warning of GAS 2.41 can be removed.



Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-19 Thread chenglulu



在 2024/2/9 下午4:08, Xi Ruoyao 写道:

On Fri, 2024-02-09 at 00:02 +0800, chenglulu wrote:

在 2024/2/7 上午12:23, Xi Ruoyao 写道:

Hi Lulu,

I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
reasons:

1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
a correctness issue.  For example, a developer may use -falign-
functions=16 and then use the low 4 bits of a function pointer to encode
some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
really aligned to a 16 bytes boundary, causing some breakage.

2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
opcodes.  For example:

.globl _start
_start:
.balign 32
nop
nop
nop
addi.d $a0, $r0, 1
.balign 16,54525952,4
addi.d $a0, $a0, 1

is assembled and linked to:

0220 <_start>:
   220: 0340    nop
   224: 0340    nop
   228: 0340    nop
   22c: 02c00404    li.d$a0, 1
   230:     .word   0x   # <== OOPS!
   234: 02c00484    addi.d  $a0, $a0, 1

Arguably this is a bug in GAS (it should at least error out for the
unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
prefer it to support the 3-operand .align directive even -mrelax for
reasons I've given in [1]).  But we can at least work it around by
removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
2.42.

3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
".align 5" which works as expected since Binutils-2.38.

4. GCC < 14 does not have a default setting of -falign-*, so changing
this won't affect anyone who do not specify -falign-* explicitly.

[1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603

Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
then?


Ok, I agree with you.

Thanks!

Oops, with Binutils-2.41 GAS will fail to assemble some conditional
branches if we do this :(.

Not sure what to do (maybe backporting both this and a simplified
version of PR112330 fix?)  Let's reconsider after the holiday...

To solve this problem,based on r14-4674, r14-5434 also needs to be 
transplanted.


But I took a look and r14-5434  modified relatively many files.

So I think that without worrying about performance and ensuring that 
there is no problem


with binutils, I think we can make the following modifications:

  -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
  -   used for padding.  */
  +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
  +   default.  */
   #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
  -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
  +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))

What do you think of it?



Re:[pushed] [PATCH 2/2] LoongArch: Remove redundant symbol type conversions in larchintrin.h.

2024-02-17 Thread chenglulu

Pushed to r14-9054.

在 2024/2/6 上午10:10, Lulu Cheng 写道:

gcc/ChangeLog:

* config/loongarch/larchintrin.h (__movgr2fcsr): Remove redundant
symbol type conversions.
(__cacop_d): Likewise.
(__cpucfg): Likewise.
(__asrtle_d): Likewise.
(__asrtgt_d): Likewise.
(__lddir_d): Likewise.
(__ldpte_d): Likewise.
(__crc_w_b_w): Likewise.
(__crc_w_h_w): Likewise.
(__crc_w_w_w): Likewise.
(__crc_w_d_w): Likewise.
(__crcc_w_b_w): Likewise.
(__crcc_w_h_w): Likewise.
(__crcc_w_w_w): Likewise.
(__crcc_w_d_w): Likewise.
(__csrrd_w): Likewise.
(__csrwr_w): Likewise.
(__csrxchg_w): Likewise.
(__csrrd_d): Likewise.
(__csrwr_d): Likewise.
(__csrxchg_d): Likewise.
(__iocsrrd_b): Likewise.
(__iocsrrd_h): Likewise.
(__iocsrrd_w): Likewise.
(__iocsrrd_d): Likewise.
(__iocsrwr_b): Likewise.
(__iocsrwr_h): Likewise.
(__iocsrwr_w): Likewise.
(__iocsrwr_d): Likewise.
(__frecipe_s): Likewise.
(__frecipe_d): Likewise.
(__frsqrte_s): Likewise.
(__frsqrte_d): Likewise.
---
  gcc/config/loongarch/larchintrin.h | 69 ++
  1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
index 04672e71728..0f55bdae838 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -87,13 +87,13 @@ __rdtimel_w (void)
  /* Assembly instruction format:   fcsr, rj.  */
  /* Data types in instruction templates:  VOID, UQI, USI.  */
  #define __movgr2fcsr(/*ui5*/ _1, _2) \
-  __builtin_loongarch_movgr2fcsr ((_1), (unsigned int) _2);
+  __builtin_loongarch_movgr2fcsr ((_1), _2);
  
  #if defined __loongarch64

  /* Assembly instruction format:   ui5, rj, si12.  */
  /* Data types in instruction templates:  VOID, USI, UDI, SI.  */
  #define __cacop_d(/*ui5*/ _1, /*unsigned long int*/ _2, /*si12*/ _3) \
-  ((void) __builtin_loongarch_cacop_d ((_1), (unsigned long int) (_2), (_3)))
+  __builtin_loongarch_cacop_d ((_1), (_2), (_3))
  #else
  #error "Unsupported ABI."
  #endif
@@ -104,7 +104,7 @@ extern __inline unsigned int
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __cpucfg (unsigned int _1)
  {
-  return (unsigned int) __builtin_loongarch_cpucfg ((unsigned int) _1);
+  return __builtin_loongarch_cpucfg (_1);
  }
  
  #ifdef __loongarch64

@@ -114,7 +114,7 @@ extern __inline void
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __asrtle_d (long int _1, long int _2)
  {
-  __builtin_loongarch_asrtle_d ((long int) _1, (long int) _2);
+  __builtin_loongarch_asrtle_d (_1, _2);
  }
  
  /* Assembly instruction format:	rj, rk.  */

@@ -123,7 +123,7 @@ extern __inline void
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __asrtgt_d (long int _1, long int _2)
  {
-  __builtin_loongarch_asrtgt_d ((long int) _1, (long int) _2);
+  __builtin_loongarch_asrtgt_d (_1, _2);
  }
  #endif
  
@@ -131,7 +131,7 @@ __asrtgt_d (long int _1, long int _2)

  /* Assembly instruction format:   rd, rj, ui5.  */
  /* Data types in instruction templates:  DI, DI, UQI.  */
  #define __lddir_d(/*long int*/ _1, /*ui5*/ _2) \
-  ((long int) __builtin_loongarch_lddir_d ((long int) (_1), (_2)))
+  __builtin_loongarch_lddir_d ((_1), (_2))
  #else
  #error "Unsupported ABI."
  #endif
@@ -140,7 +140,7 @@ __asrtgt_d (long int _1, long int _2)
  /* Assembly instruction format:   rj, ui5.  */
  /* Data types in instruction templates:  VOID, DI, UQI.  */
  #define __ldpte_d(/*long int*/ _1, /*ui5*/ _2) \
-  ((void) __builtin_loongarch_ldpte_d ((long int) (_1), (_2)))
+  __builtin_loongarch_ldpte_d ((_1), (_2))
  #else
  #error "Unsupported ABI."
  #endif
@@ -151,7 +151,7 @@ extern __inline int
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __crc_w_b_w (char _1, int _2)
  {
-  return (int) __builtin_loongarch_crc_w_b_w ((char) _1, (int) _2);
+  return __builtin_loongarch_crc_w_b_w (_1, _2);
  }
  
  /* Assembly instruction format:	rd, rj, rk.  */

@@ -160,7 +160,7 @@ extern __inline int
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __crc_w_h_w (short _1, int _2)
  {
-  return (int) __builtin_loongarch_crc_w_h_w ((short) _1, (int) _2);
+  return __builtin_loongarch_crc_w_h_w (_1, _2);
  }
  
  /* Assembly instruction format:	rd, rj, rk.  */

@@ -169,7 +169,7 @@ extern __inline int
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __crc_w_w_w (int _1, int _2)
  {
-  return (int) __builtin_loongarch_crc_w_w_w ((int) _1, (int) _2);
+  return __builtin_loongarch_crc_w_w_w (_1, _2);
  }
  
  #ifdef __loongarch64

@@ -179,7 +179,7 @@ extern __inline int
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __crc_w_d_w (long int _1, 

Re:[pushed] [PATCH 1/2] LoongArch: Fix wrong return value type of __iocsrrd_h.

2024-02-17 Thread chenglulu

Pushed to r14-9053.

在 2024/2/6 上午10:10, Lulu Cheng 写道:

gcc/ChangeLog:

* config/loongarch/larchintrin.h (__iocsrrd_h): Modify the
function return value type to unsigned short.
---
  gcc/config/loongarch/larchintrin.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
index ff2c9f460ac..04672e71728 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -268,7 +268,7 @@ __iocsrrd_b (unsigned int _1)
  
  /* Assembly instruction format:	rd, rj.  */

  /* Data types in instruction templates:  UHI, USI.  */
-extern __inline unsigned char
+extern __inline unsigned short
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __iocsrrd_h (unsigned int _1)
  {




Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-08 Thread chenglulu



在 2024/2/7 上午12:23, Xi Ruoyao 写道:

Hi Lulu,

I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
reasons:

1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
a correctness issue.  For example, a developer may use -falign-
functions=16 and then use the low 4 bits of a function pointer to encode
some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
really aligned to a 16 bytes boundary, causing some breakage.

2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
opcodes.  For example:

.globl _start
_start:
.balign 32
nop
nop
nop
addi.d $a0, $r0, 1
.balign 16,54525952,4
addi.d $a0, $a0, 1

is assembled and linked to:

0220 <_start>:
  220:  0340nop
  224:  0340nop
  228:  0340nop
  22c:  02c00404li.d$a0, 1
  230:  .word   0x   # <== OOPS!
  234:  02c00484addi.d  $a0, $a0, 1

Arguably this is a bug in GAS (it should at least error out for the
unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
prefer it to support the 3-operand .align directive even -mrelax for
reasons I've given in [1]).  But we can at least work it around by
removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
2.42.

3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
".align 5" which works as expected since Binutils-2.38.

4. GCC < 14 does not have a default setting of -falign-*, so changing
this won't affect anyone who do not specify -falign-* explicitly.

[1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603

Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
then?


Ok, I agree with you.

Thanks!



Re: Pushed: [PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-05 Thread chenglulu



在 2024/2/5 上午1:01, Xi Ruoyao 写道:

I have a question. I see that you often add compilation options in

BOOT_CFLAGS.

I also want to test it. Do you have a recommended set of compilation
options?

When I build a compiler for my system I use
{BOOT_{C,CXX,LD}FLAGS,{C,CXX,LD}FLAGS_FOR_TARGET}="-O3 -march=la664 -
mtune=la664 -pipe -fgraphite-identity -floop-nest-optimize -fipa-pta -
fdevirtualize-at-ltrans -fno-semantic-interposition -Wl,-O1 -Wl,--as-
needed"

and enable PGO (make profiledbootstrap) and LTO (--with-build-
config=bootstrap-lto).

All of them but GRAPHITE (-fgraphite-identity -floop-nest-optimize)
seems "pretty safe" on the architectures I have a hardware of.  GRAPHITE
is causing bootstrap failure on AArch64 with GCC 13 (PR109929) if
combined with PGO and the real cause is still not found yet.

But when I do a test build I normally only enable the flags which may
help to catch some issues, for example when a change only affects LTO I
add --with-build-config=bootstrap-lto, when changing something related
to LASX I use -O3 -mlasx (or -O3 -march=la664) as BOOT_CFLAGS.



Thank you so much. I will try to add optimization options.



Re: [PATCH] LoongArch: libsanitizer: Enable build lsan and tsan for loongarch64.

2024-02-05 Thread chenglulu



在 2024/2/2 下午6:01, Jakub Jelinek 写道:

On Tue, Jan 30, 2024 at 10:09:51AM +0800, Lulu Cheng wrote:

From: chenguoqi 

libsanitizer/ChangeLog:

* configure.tgt: Enable tsan and lsan for loongarch64.
* tsan/Makefile.am: Add tsan_rtl_loongarch64.S to 
EXTRA_libtsan_la_SOURCES.

This line is too long and should read
* tsan/Makefile.am (EXTRA_libtsan_la_SOURCES): Add
tsan_rtl_loongarch64.S.


Modify the description here and submit it to r14-8816.

Thanks!




* tsan/Makefile.in: Regenerate.

Otherwise LGTM.

Jakub




Re:[pushed] [PATCH v1] LoongArch: testsuite: Fix gcc.dg/vect/vect-reduc-mul_{1,2}.c FAIL.

2024-02-04 Thread chenglulu

Pushed to r14-8784.

在 2024/2/2 上午9:42, Li Wei 写道:

This FAIL was introduced from r14-6908. The reason is that when merging
constant vector permutation implementations, the 128-bit matching situation
was not fully considered. In fact, the expansion of 128-bit vectors after
merging only supports value-based 4 elements set shuffle, so this time is a
complete implementation of the entire 128-bit vector constant permutation,
and some structural adjustments have also been made to the code.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_vselect): Adjust.
(loongarch_expand_vselect_vconcat): Ditto.
(loongarch_try_expand_lsx_vshuf_const): New, use vshuf to implement
all 128-bit constant permutation situations.
(loongarch_expand_lsx_shuffle): Adjust and rename function name.
(loongarch_is_imm_set_shuffle): Renamed function name.
(loongarch_expand_vec_perm_even_odd): Function forward declaration.
(loongarch_expand_vec_perm_even_odd_1): Add implement for 128-bit
extract-even and extract-odd permutations.
(loongarch_is_odd_extraction): Delete.
(loongarch_is_even_extraction): Ditto.
(loongarch_expand_vec_perm_const): Adjust.
---
  gcc/config/loongarch/loongarch.cc | 218 ++
  1 file changed, 163 insertions(+), 55 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 8bc18448753..61723844756 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8029,7 +8029,8 @@ struct expand_vec_perm_d
  
  static bool

  loongarch_expand_vselect (rtx target, rtx op0,
- const unsigned char *perm, unsigned nelt)
+ const unsigned char *perm, unsigned nelt,
+ bool testing_p)
  {
rtx rperm[MAX_VECT_LEN], x;
rtx_insn *insn;
@@ -8048,6 +8049,9 @@ loongarch_expand_vselect (rtx target, rtx op0,
remove_insn (insn);
return false;
  }
+
+  if (testing_p)
+  remove_insn (insn);
return true;
  }
  
@@ -8055,7 +8059,8 @@ loongarch_expand_vselect (rtx target, rtx op0,
  
  static bool

  loongarch_expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
- const unsigned char *perm, unsigned nelt)
+ const unsigned char *perm, unsigned nelt,
+ bool testing_p)
  {
machine_mode v2mode;
rtx x;
@@ -8063,7 +8068,7 @@ loongarch_expand_vselect_vconcat (rtx target, rtx op0, 
rtx op1,
if (!GET_MODE_2XWIDER_MODE (GET_MODE (op0)).exists ())
  return false;
x = gen_rtx_VEC_CONCAT (v2mode, op0, op1);
-  return loongarch_expand_vselect (target, x, perm, nelt);
+  return loongarch_expand_vselect (target, x, perm, nelt, testing_p);
  }
  
  static tree

@@ -8317,11 +8322,87 @@ loongarch_set_handled_components (sbitmap components)
  #define TARGET_ASM_ALIGNED_SI_OP "\t.word\t"
  #undef TARGET_ASM_ALIGNED_DI_OP
  #define TARGET_ASM_ALIGNED_DI_OP "\t.dword\t"
+
+/* Use the vshuf instruction to implement all 128-bit constant vector
+   permuatation.  */
+
+static bool
+loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d)
+{
+  int i;
+  rtx target, op0, op1, sel, tmp;
+  rtx rperm[MAX_VECT_LEN];
+
+  if (GET_MODE_SIZE (d->vmode) == 16)
+{
+  target = d->target;
+  op0 = d->op0;
+  op1 = d->one_vector_p ? d->op0 : d->op1;
+
+  if (GET_MODE (op0) != GET_MODE (op1)
+ || GET_MODE (op0) != GET_MODE (target))
+   return false;
+
+  if (d->testing_p)
+   return true;
+
+  for (i = 0; i < d->nelt; i += 1)
+ rperm[i] = GEN_INT (d->perm[i]);
+
+  if (d->vmode == E_V2DFmode)
+   {
+ sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm));
+ tmp = simplify_gen_subreg (E_V2DImode, d->target, d->vmode, 0);
+ emit_move_insn (tmp, sel);
+   }
+  else if (d->vmode == E_V4SFmode)
+   {
+ sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm));
+ tmp = simplify_gen_subreg (E_V4SImode, d->target, d->vmode, 0);
+ emit_move_insn (tmp, sel);
+   }
+  else
+   {
+ sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm));
+ emit_move_insn (d->target, sel);
+   }
+
+  switch (d->vmode)
+   {
+   case E_V2DFmode:
+ emit_insn (gen_lsx_vshuf_d_f (target, target, op1, op0));
+ break;
+   case E_V2DImode:
+ emit_insn (gen_lsx_vshuf_d (target, target, op1, op0));
+ break;
+   case E_V4SFmode:
+ emit_insn (gen_lsx_vshuf_w_f (target, target, op1, op0));
+ break;
+   case E_V4SImode:
+ emit_insn (gen_lsx_vshuf_w (target, target, op1, op0));
+ break;
+   case E_V8HImode:
+ emit_insn (gen_lsx_vshuf_h (target, target, op1, op0));
+ break;
+   case 

Re: [PATCH] LoongArch: Fix wrong LSX FP vector negation

2024-02-03 Thread chenglulu



在 2024/2/3 下午4:58, Xi Ruoyao 写道:

We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
fail when Python is built with LSX enabled.

Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead.  We are already doing this for LASX and now we can unify them
into simd.md.

gcc/ChangeLog:

* config/loongarch/lsx.md (neg2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?


LGTM!

Thanks!



  gcc/config/loongarch/lasx.md | 16 
  gcc/config/loongarch/lsx.md  | 11 ---
  gcc/config/loongarch/simd.md | 18 ++
  3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e2115ffb884..ac84db7f0ce 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -3028,22 +3028,6 @@ (define_insn "absv8sf2"
[(set_attr "type" "simd_logic")
 (set_attr "mode" "V8SF")])
  
-(define_insn "negv4df2"

-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-   (neg:V4DF (match_operand:V4DF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.d\t%u0,%u1,63"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "negv8sf2"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-   (neg:V8SF (match_operand:V8SF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.w\t%u0,%u1,31"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V8SF")])
-
  (define_insn "xvfmadd4"
[(set (match_operand:FLASX 0 "register_operand" "=f")
(fma:FLASX (match_operand:FLASX 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 7002edae4d4..b9b94b9079c 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -728,17 +728,6 @@ (define_expand "neg2"
DONE;
  })
  
-(define_expand "neg2"

-  [(set (match_operand:FLSX 0 "register_operand")
-   (neg:FLSX (match_operand:FLSX 1 "register_operand")))]
-  "ISA_HAS_LSX"
-{
-  rtx reg = gen_reg_rtx (mode);
-  emit_move_insn (reg, CONST0_RTX (mode));
-  emit_insn (gen_sub3 (operands[0], reg, operands[1]));
-  DONE;
-})
-
  (define_expand "lsx_vrepli"
[(match_operand:ILSX 0 "register_operand")
 (match_operand 1 "const_imm10_operand")]
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index cb0a19447a1..00ff2823a4e 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -85,12 +85,21 @@ (define_mode_attr simdfmt [(V2DF "d") (V4DF "d")
  (define_mode_attr simdifmt_for_f [(V2DF "l") (V4DF "l")
  (V4SF "w") (V8SF "w")])
  
+;; Suffix for integer mode in LSX or LASX instructions to operating FP

+;; vectors using integer vector operations.
+(define_mode_attr simdfmt_as_i [(V2DF "d") (V4DF "d")
+   (V4SF "w") (V8SF "w")])
+
  ;; Size of vector elements in bits.
  (define_mode_attr elmbits [(V2DI "64") (V4DI "64")
   (V4SI "32") (V8SI "32")
   (V8HI "16") (V16HI "16")
   (V16QI "8") (V32QI "8")])
  
+;; The index of sign bit in FP vector elements.

+(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63")
+(V4SF "31") (V8SF "31")])
+
  ;; This attribute is used to form an immediate operand constraint using
  ;; "const__operand".
  (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
@@ -457,6 +466,15 @@ (define_expand "reduc__scal_"
DONE;
  })
  
+;; FP negation.

+(define_insn "neg2"
+  [(set (match_operand:FVEC 0 "register_operand" "=f")
+   (neg:FVEC (match_operand:FVEC 1 "register_operand" "f")))]
+  ""
+  "vbitrevi.\t%0,%1,"
+  [(set_attr "type" "simd_logic")
+   (set_attr "mode" "")])
+
  ; The LoongArch SX Instructions.
  (include "lsx.md")
  




Re: [PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-03 Thread chenglulu



在 2024/2/2 下午5:55, Xi Ruoyao 写道:

We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
But in loongarch_symbol_insns:

 if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
   return 0;

And LSX_SUPPORTED_MODE_P is defined as:

 #define LSX_SUPPORTED_MODE_P(MODE) \
   (ISA_HAS_LSX \
&& GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...

GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:

 ALWAYS_INLINE poly_uint16
 mode_to_bytes (machine_mode mode)
 {
 #if GCC_VERSION >= 4001
   return (__builtin_constant_p (mode)
  ? mode_size_inline (mode) : mode_size[mode]);
 #else
   return mode_size[mode];
 #endif
 }

There is an assertion in mode_size_inline:

 gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);

Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
thus if __builtin_constant_p (mode) is evaluated true (it happens when
GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
cause an ICE.  OTOH if __builtin_constant_p (mode) is evaluated false,
mode_size[mode] is still an out-of-bound array access (the length or the
mode_size array is NUM_MACHINE_MODES).

So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
MAX_MACHINE_MODE in loongarch_symbol_insns.  This is very similar to a
MIPS bug PR98491 fixed by me about 3 years ago.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
MAX_MACHINE_MODE.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?


LGTM!

I have a question. I see that you often add compilation options in 
BOOT_CFLAGS.


I also want to test it. Do you have a recommended set of compilation 
options?


Thanks!



  gcc/config/loongarch/loongarch.cc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 963e86d61af..6badef45d62 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2007,7 +2007,8 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, 
machine_mode mode)
  {
/* LSX LD.* and ST.* cannot support loading symbols via an immediate
   operand.  */
-  if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
+  if (mode != MAX_MACHINE_MODE
+  && (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)))
  return 0;
  
switch (type)




Re: [PATCH] LoongArch: libsanitizer: Enable build lsan and tsan for loongarch64.

2024-02-03 Thread chenglulu



在 2024/2/2 下午6:01, Jakub Jelinek 写道:

On Tue, Jan 30, 2024 at 10:09:51AM +0800, Lulu Cheng wrote:

From: chenguoqi 

libsanitizer/ChangeLog:

* configure.tgt: Enable tsan and lsan for loongarch64.
* tsan/Makefile.am: Add tsan_rtl_loongarch64.S to 
EXTRA_libtsan_la_SOURCES.

This line is too long and should read
* tsan/Makefile.am (EXTRA_libtsan_la_SOURCES): Add
tsan_rtl_loongarch64.S.


* tsan/Makefile.in: Regenerate.

Otherwise LGTM.

Jakub


Thanks for your review.

I will send a patch for the V2 version immediately.




Re: [PATCH] LoongArch: Fix an ODR violation

2024-02-01 Thread chenglulu

LGTM!

Thanks!

在 2024/2/2 上午5:54, Xi Ruoyao 写道:

When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR
violation is detected:

 ../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
 'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
 57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
 ../../gcc/config/loongarch/loongarch-def.cc:186: note:
 'abi_minimal_isa' was previously declared here
 186 |   abi_minimal_isa = array,
 ../../gcc/config/loongarch/loongarch-def.cc:186: note:
 code may be misoptimized unless '-fno-strict-aliasing' is used

Fix it by adding a proper declaration of abi_minimal_isa into
loongarch-def.h and remove the ODR-violating local declaration in
loongarch-opts.cc.

gcc/ChangeLog:

* config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
* config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
the ODR-violating locale declaration.
---

Bootstrapped on loongarch64-linux-gnu.  Not fully regtested but it
should be an obvious fix.  Ok for trunk?

  gcc/config/loongarch/loongarch-def.h   | 3 +++
  gcc/config/loongarch/loongarch-opts.cc | 2 --
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index a1237ecf1fd..2dbf006d013 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -203,5 +203,8 @@ extern loongarch_def_array
loongarch_cpu_align;
  extern loongarch_def_array
loongarch_cpu_rtx_cost_data;
+extern loongarch_def_array<
+  loongarch_def_array,
+  N_ABI_BASE_TYPES> abi_minimal_isa;
  
  #endif /* LOONGARCH_DEF_H */

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index b87299513c9..7eeac43ed2f 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -53,8 +53,6 @@ static const int tm_multilib_list[] = { TM_MULTILIB_LIST };
  static int enabled_abi_types[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES] = { 0 };
  
  #define isa_required(ABI) (abi_minimal_isa[(ABI).base][(ABI).ext])

-extern "C" const struct loongarch_isa
-abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
  
  static inline int

  is_multilib_enabled (struct loongarch_abi abi)




Re: [pushed][PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-02-01 Thread chenglulu

Pushed to r14-8723.

在 2024/1/24 下午5:19, Jiahao Xu 写道:

gcc/ChangeLog:

* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.

diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
index 7692415e04d..ff2c9f460ac 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -336,38 +336,38 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2)
  #ifdef __loongarch_frecipe
  /* Assembly instruction format: fd, fj.  */
  /* Data types in instruction templates:  SF, SF.  */
-extern __inline void
+extern __inline float
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __frecipe_s (float _1)
  {
-  __builtin_loongarch_frecipe_s ((float) _1);
+  return (float) __builtin_loongarch_frecipe_s ((float) _1);
  }
  
  /* Assembly instruction format: fd, fj.  */

  /* Data types in instruction templates:  DF, DF.  */
-extern __inline void
+extern __inline double
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __frecipe_d (double _1)
  {
-  __builtin_loongarch_frecipe_d ((double) _1);
+  return (double) __builtin_loongarch_frecipe_d ((double) _1);
  }
  
  /* Assembly instruction format: fd, fj.  */

  /* Data types in instruction templates:  SF, SF.  */
-extern __inline void
+extern __inline float
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __frsqrte_s (float _1)
  {
-  __builtin_loongarch_frsqrte_s ((float) _1);
+  return (float) __builtin_loongarch_frsqrte_s ((float) _1);
  }
  
  /* Assembly instruction format: fd, fj.  */

  /* Data types in instruction templates:  DF, DF.  */
-extern __inline void
+extern __inline double
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __frsqrte_d (double _1)
  {
-  __builtin_loongarch_frsqrte_d ((double) _1);
+  return (double) __builtin_loongarch_frsqrte_d ((double) _1);
  }
  #endif
  
diff --git a/gcc/testsuite/gcc.target/loongarch/larch-frecipe-intrinsic.c b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-intrinsic.c

new file mode 100644
index 000..6ce2bde0acf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-intrinsic.c
@@ -0,0 +1,30 @@
+/* Test intrinsics for frecipe.{s/d} and frsqrte.{s/d} instructions */
+/* { dg-do compile } */
+/* { dg-options "-mfrecipe -O2" } */
+/* { dg-final { scan-assembler-times 
"test_frecipe_s:.*frecipe\\.s.*test_frecipe_s" 1 } } */
+/* { dg-final { scan-assembler-times 
"test_frecipe_d:.*frecipe\\.d.*test_frecipe_d" 1 } } */
+/* { dg-final { scan-assembler-times 
"test_frsqrte_s:.*frsqrte\\.s.*test_frsqrte_s" 1 } } */
+/* { dg-final { scan-assembler-times 
"test_frsqrte_d:.*frsqrte\\.d.*test_frsqrte_d" 1 } } */
+
+#include 
+
+float
+test_frecipe_s (float _1)
+{
+  return __frecipe_s (_1);
+}
+double
+test_frecipe_d (double _1)
+{
+  return __frecipe_d (_1);
+}
+float
+test_frsqrte_s (float _1)
+{
+  return __frsqrte_s (_1);
+}
+double
+test_frsqrte_d (double _1)
+{
+  return __frsqrte_d (_1);
+}




Re: [PATCH] LoongArch: libsanitizer: Enable build lsan and tsan for loongarch64.

2024-02-01 Thread chenglulu

Ping?

在 2024/1/30 上午10:09, Lulu Cheng 写道:

From: chenguoqi 

libsanitizer/ChangeLog:

* configure.tgt: Enable tsan and lsan for loongarch64.
* tsan/Makefile.am: Add tsan_rtl_loongarch64.S to 
EXTRA_libtsan_la_SOURCES.
* tsan/Makefile.in: Regenerate.
---
  libsanitizer/configure.tgt| 5 +
  libsanitizer/tsan/Makefile.am | 2 +-
  libsanitizer/tsan/Makefile.in | 3 ++-
  3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index 38fc7001ff7..77a0e68222b 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -79,6 +79,11 @@ case "${target}" in
fi
;;
loongarch64-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x8; then
+   TSAN_SUPPORTED=yes
+   LSAN_SUPPORTED=yes
+   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_loongarch64.lo
+   fi
;;
*)
UNSUPPORTED=1
diff --git a/libsanitizer/tsan/Makefile.am b/libsanitizer/tsan/Makefile.am
index cb8bf2e705e..e8fca16be5f 100644
--- a/libsanitizer/tsan/Makefile.am
+++ b/libsanitizer/tsan/Makefile.am
@@ -50,7 +50,7 @@ tsan_files = \
tsan_vector_clock.cpp
  
  libtsan_la_SOURCES = $(tsan_files)

-EXTRA_libtsan_la_SOURCES = tsan_rtl_amd64.S tsan_rtl_aarch64.S 
tsan_rtl_mips64.S tsan_rtl_ppc64.S tsan_rtl_s390x.S tsan_rtl_riscv64.S
+EXTRA_libtsan_la_SOURCES = tsan_rtl_amd64.S tsan_rtl_aarch64.S 
tsan_rtl_loongarch64.S tsan_rtl_mips64.S tsan_rtl_ppc64.S tsan_rtl_s390x.S 
tsan_rtl_riscv64.S
  libtsan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la 
$(top_builddir)/interception/libinterception.la $(TSAN_TARGET_DEPENDENT_OBJECTS)
  libtsan_la_DEPENDENCIES = 
$(top_builddir)/sanitizer_common/libsanitizer_common.la 
$(top_builddir)/interception/libinterception.la $(TSAN_TARGET_DEPENDENT_OBJECTS)
  if LIBBACKTRACE_SUPPORTED
diff --git a/libsanitizer/tsan/Makefile.in b/libsanitizer/tsan/Makefile.in
index 5cc6f95a40a..5bbdf3915b8 100644
--- a/libsanitizer/tsan/Makefile.in
+++ b/libsanitizer/tsan/Makefile.in
@@ -456,7 +456,7 @@ tsan_files = \
tsan_vector_clock.cpp
  
  libtsan_la_SOURCES = $(tsan_files)

-EXTRA_libtsan_la_SOURCES = tsan_rtl_amd64.S tsan_rtl_aarch64.S 
tsan_rtl_mips64.S tsan_rtl_ppc64.S tsan_rtl_s390x.S tsan_rtl_riscv64.S
+EXTRA_libtsan_la_SOURCES = tsan_rtl_amd64.S tsan_rtl_aarch64.S 
tsan_rtl_loongarch64.S tsan_rtl_mips64.S tsan_rtl_ppc64.S tsan_rtl_s390x.S 
tsan_rtl_riscv64.S
  libtsan_la_LIBADD =  \
$(top_builddir)/sanitizer_common/libsanitizer_common.la \
$(top_builddir)/interception/libinterception.la \
@@ -614,6 +614,7 @@ distclean-compile:
  @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/tsan_rtl_aarch64.Plo@am__quote@
  @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/tsan_rtl_access.Plo@am__quote@
  @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tsan_rtl_amd64.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/tsan_rtl_loongarch64.Plo@am__quote@
  @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/tsan_rtl_mips64.Plo@am__quote@
  @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tsan_rtl_mutex.Plo@am__quote@
  @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tsan_rtl_ppc64.Plo@am__quote@




Re:[pushed] [PATCH v2] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-02-01 Thread chenglulu

Pushed to r14-8722.

在 2024/1/26 下午4:41, Li Wei 写道:

We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/vect-10.f90: New test.
---
  gcc/config/loongarch/loongarch.cc  | 48 +++
  gcc/testsuite/gfortran.dg/vect/vect-10.f90 | 71 ++
  2 files changed, 119 insertions(+)
  create mode 100644 gcc/testsuite/gfortran.dg/vect/vect-10.f90

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index b494040d165..4d99e30828b 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4096,6 +4096,37 @@ 
loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi
return 1 << ceil_log2 (uf);
  }
  
+/* Check if assign stmt rhs op comes from a multiply-add operation.  */

+static bool
+loongarch_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
+{
+  gassign *assign = dyn_cast (stmt_info->stmt);
+  if (!assign)
+return false;
+  tree_code code = gimple_assign_rhs_code (assign);
+  if (code != PLUS_EXPR && code != MINUS_EXPR)
+return false;
+
+  auto is_mul_result = [&](int i)
+{
+  tree rhs = gimple_op (assign, i);
+  if (TREE_CODE (rhs) != SSA_NAME)
+   return false;
+
+  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
+  if (!def_stmt_info
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
+   return false;
+  gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
+  if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
+   return false;
+
+  return true;
+};
+
+  return is_mul_result (1) || is_mul_result (2);
+}
+
  unsigned
  loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
   stmt_vec_info stmt_info, slp_tree,
@@ -4108,6 +4139,23 @@ loongarch_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
  {
int stmt_cost = loongarch_builtin_vectorization_cost (kind, vectype,
misalign);
+  if (vectype && stmt_info)
+   {
+ gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+ machine_mode mode = TYPE_MODE (vectype);
+
+ /* We found through testing that this strategy (the stmt that
+matches the multiply-add pattern) has positive returns only
+when applied to the 128-bit vector stmt, so this restriction
+is currently made.  */
+ if (kind == vector_stmt && GET_MODE_SIZE (mode) == 16 && assign)
+   {
+ if (!vect_is_reduction (stmt_info)
+ && loongarch_multiply_add_p (m_vinfo, stmt_info))
+   stmt_cost = 0;
+   }
+   }
+
retval = adjust_cost_for_freq (stmt_info, where, count * stmt_cost);
m_costs[where] += retval;
  
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-10.f90 b/gcc/testsuite/gfortran.dg/vect/vect-10.f90

new file mode 100644
index 000..b85bc2702a3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/vect-10.f90
@@ -0,0 +1,71 @@
+! { dg-do compile }
+! { dg-additional-options "-Ofast -mlsx -fvect-cost-model=dynamic" { target 
loongarch64*-*-* } }
+
+MODULE material_mod
+
+IMPLICIT NONE
+
+integer, parameter :: dfp = selected_real_kind (13, 99)
+integer, parameter :: rfp = dfp
+
+PUBLIC Mat_updateE, iepx, iepy, iepz
+
+PRIVATE
+
+integer, dimension (:, :, :), allocatable :: iepx, iepy, iepz
+real (kind = rfp), dimension (:), allocatable :: Dbdx, Dbdy, Dbdz
+integer :: imin, jmin, kmin
+integer, dimension (6) :: Exsize
+integer, dimension (6) :: Eysize
+integer, dimension (6) :: Ezsize
+integer, dimension (6) :: Hxsize
+integer, dimension (6) :: Hysize
+integer, dimension (6) :: Hzsize
+
+CONTAINS
+
+SUBROUTINE mat_updateE (nx, ny, nz, Hx, Hy, Hz, Ex, Ey, Ez)
+
+integer, intent (in) :: nx, ny, nz
+
+real (kind = rfp), intent (inout), &
+  dimension (Exsize (1) : Exsize (2), Exsize (3) : Exsize (4), Exsize (5) : 
Exsize (6)) :: Ex
+real (kind = rfp), intent (inout), &
+  dimension (Eysize (1) : Eysize (2), Eysize (3) : Eysize (4), Eysize (5) : 
Eysize (6)) :: Ey
+real (kind = rfp), intent (inout), &
+  dimension (Ezsize (1) : Ezsize (2), Ezsize (3) : Ezsize (4), Ezsize (5) : 
Ezsize (6)) :: Ez
+real (kind = rfp), intent (in),   

Re: [pushed][PATCH v5 0/5] When cmodel=extreme, add macro implementation and fix problems with explicit relos implementation.

2024-02-01 Thread chenglulu

Pushed to r14-8717...r14-8721.

在 2024/1/29 下午4:21, Lulu Cheng 写道:

When cmodel=extreme, since the symbol address is obtained through four 
instructions,
errors may occur in some cases during linking. Xi Ruoyao fixes this problem.

https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc#extreme-code-model


v4 -> v5:
   1. Modify code format.
   2. Add the implementation patch submitted by Xi Ruoyao about 
'-mcmodel=extreme -mexplicit-relocs=always'.

v3 -> v4:
   1. Add macro support for TLS symbols
   2. Added support for loading __get_tls_addr symbol address using call36.
   3. Merge template got_load_tls_{ld/gd/le/ie}.
   4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.


v2 -> v3:
   1. Modify the detection rules of a test case.

v1 -> v2:
   1. Use the temporarily allocated registers as intermediate registers to 
implement the extreme macro.
   2. Fixed bugs in v1 test cases.



Lulu Cheng (4):
   LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.
   LoongArch: Add the macro implementation of mcmodel=extreme.
   LoongArch: Enable explicit reloc for extreme TLS GD/LD with
 -mexplicit-relocs=auto.
   LoongArch: Added support for loading __get_tls_addr symbol address
 using call36.

Xi Ruoyao (1):
   LoongArch: Don't split the instructions containing relocs for extreme
 code model.

  gcc/config/loongarch/loongarch-protos.h   |   1 +
  gcc/config/loongarch/loongarch.cc | 265 ++
  gcc/config/loongarch/loongarch.md | 125 ++---
  gcc/config/loongarch/predicates.md|  12 +
  .../loongarch/cmodel-extreme-mi-thunk-1.C |  11 +
  .../loongarch/cmodel-extreme-mi-thunk-2.C |   6 +
  .../loongarch/cmodel-extreme-mi-thunk-3.C |   6 +
  .../gcc.target/loongarch/attr-model-5.c   |   8 +
  .../gcc.target/loongarch/cmodel-extreme-1.c   |  18 ++
  .../gcc.target/loongarch/cmodel-extreme-2.c   |   7 +
  .../explicit-relocs-extreme-auto-tls-ld-gd.c  |   5 +
  .../explicit-relocs-medium-auto-tls-ld-gd.c   |   5 +
  ...icit-relocs-medium-call36-auto-tls-ld-gd.c |   5 +
  .../loongarch/func-call-extreme-1.c   |  14 +-
  .../loongarch/func-call-extreme-2.c   |  29 +-
  .../loongarch/func-call-extreme-3.c   |   2 +-
  .../loongarch/func-call-extreme-4.c   |   2 +-
  .../loongarch/func-call-extreme-5.c   |   7 +
  .../loongarch/func-call-extreme-6.c   |   7 +
  .../gcc.target/loongarch/tls-extreme-macro.c  |  35 +++
  20 files changed, 375 insertions(+), 195 deletions(-)
  create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-1.C
  create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-2.C
  create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-3.C
  create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-model-5.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-2.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-extreme-macro.c





Re: [pushed][PATCH v2] LoongArch: Modify the address calculation logic for obtaining array element values through fp.

2024-02-01 Thread chenglulu

Pushed to r14-8716.

在 2024/1/30 下午3:55, Lulu Cheng 写道:

Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + 
offset) + a x C).
Thereby modifying the register dependencies and optimizing the code.
The value of C is 2 4 or 8.

The following is the assembly code before and after a loop modification in 
spec2006 401.bzip:

  old  | new
  735 .L71:|  735 .L71:
  736 slli.d  $r12,$r15,2  |  736 slli.d  $r12,$r15,2
  737 ldx.w   $r13,$r22,$r12   |  737 ldx.w   $r13,$r22,$r12
  738 addi.d  $r15,$r15,-1 |  738 addi.d  $r15,$r15,-1
  739 slli.w  $r16,$r15,0  |  739 slli.w  $r16,$r15,0
  740 addi.w  $r13,$r13,-1 |  740 addi.w  $r13,$r13,-1
  741 slti$r14,$r13,0  |  741 slti$r14,$r13,0
  742 add.w   $r12,$r26,$r13   |  742 add.w   $r12,$r26,$r13
  743 maskeqz $r12,$r12,$r14   |  743 maskeqz $r12,$r12,$r14
  744 masknez $r14,$r13,$r14   |  744 masknez $r14,$r13,$r14
  745 or  $r12,$r12,$r14   |  745 or  $r12,$r12,$r14
  746 ldx.bu  $r14,$r30,$r12   |  746 ldx.bu  $r14,$r30,$r12
  747 lu12i.w $r13,4096>>12|  747 alsl.d  
$r14,$r14,$r18,2
  748 ori $r13,$r13,432|  748 ldptr.w $r13,$r14,0
  749 add.d   $r13,$r13,$r3|  749 addi.w  $r17,$r13,-1
  750 alsl.d  $r14,$r14,$r13,2 |  750 stptr.w $r17,$r14,0
  751 ldptr.w $r13,$r14,-1968  |  751 slli.d  $r13,$r13,2
  752 addi.w  $r17,$r13,-1 |  752 stx.w   $r12,$r22,$r13
  753 st.w$r17,$r14,-1968  |  753 ldptr.w $r12,$r19,0
  754 slli.d  $r13,$r13,2  |  754 blt $r12,$r16,.L71
  755 stx.w   $r12,$r22,$r13   |  755 .align  4
  756 ldptr.w $r12,$r18,-2048  |  756
  757 blt $r12,$r16,.L71   |  757
  758 .align  4|  758

This patch is ported from riscv's commit r14-3111.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (mem_shadd_or_shadd_rtx_p): New 
function.
(loongarch_legitimize_address): Add logical transformation code.

---
v1 -> v2:
   Modify code format and comment information.

---
  gcc/config/loongarch/loongarch.cc | 43 +++
  1 file changed, 43 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index b494040d165..b8f6f6689bb 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3219,6 +3219,22 @@ loongarch_split_symbol (rtx temp, rtx addr, machine_mode 
mode, rtx *low_out)
return true;
  }
  
+/* Helper loongarch_legitimize_address.  Given X, return true if it

+   is a left shift by 1, 2 or 3 positions or a multiply by 2, 4 or 8.
+
+   This respectively represent canonical shift-add rtxs or scaled
+   memory addresses.  */
+static bool
+mem_shadd_or_shadd_rtx_p (rtx x)
+{
+  return ((GET_CODE (x) == ASHIFT
+  || GET_CODE (x) == MULT)
+ && CONST_INT_P (XEXP (x, 1))
+ && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
+ || (GET_CODE (x) == MULT
+ && IN_RANGE (exact_log2 (INTVAL (XEXP (x, 1))), 1, 3;
+}
+
  /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
 be legitimized in a way that the generic machinery might not expect,
 return a new address, otherwise return NULL.  MODE is the mode of
@@ -3242,6 +3258,33 @@ loongarch_legitimize_address (rtx x, rtx oldx 
ATTRIBUTE_UNUSED,
loongarch_split_plus (x, , );
if (offset != 0)
  {
+  /* Handle (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case.  
*/
+  if (GET_CODE (base) == PLUS && mem_shadd_or_shadd_rtx_p (XEXP (base, 0))
+ && IMM12_OPERAND (offset))
+   {
+ rtx index = XEXP (base, 0);
+ rtx fp = XEXP (base, 1);
+
+ if (REG_P (fp) && REGNO (fp) == VIRTUAL_STACK_VARS_REGNUM)
+   {
+ /* If we were given a MULT, we must fix the constant
+as we're going to create the ASHIFT form.  */
+ int shift_val = INTVAL (XEXP (index, 1));
+ if (GET_CODE (index) == MULT)
+   shift_val = exact_log2 (shift_val);
+
+ rtx reg1 = gen_reg_rtx (Pmode);
+ rtx reg3 = gen_reg_rtx (Pmode);
+ loongarch_emit_binary (PLUS, reg1, fp, GEN_INT (offset));
+ loongarch_emit_binary (PLUS, reg3,
+gen_rtx_ASHIFT (Pmode, XEXP (index, 0),
+GEN_INT (shift_val)),
+reg1);
+
+ return reg3;
+   }
+ 

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-02-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #8 from chenglulu  ---
(In reply to Xi Ruoyao from comment #7)
> Any update? :)

Well, I haven't run it yet. Since this does not have a big impact on the spec
score, I am currently testing it on a single-channel machine, so the test time
will be longer.
I will reply here as soon as the results are available.

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-28 Thread chenglulu



在 2024/1/27 下午10:03, chenglulu 写道:


在 2024/1/27 下午7:11, Xi Ruoyao 写道:

On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote:

On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote:

在 2024/1/26 下午6:57, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:

在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
 1. Add macro support for TLS symbols
 2. Added support for loading __get_tls_addr symbol address 
using call36.

 3. Merge template got_load_tls_{ld/gd/le/ie}.
 4. Enable explicit reloc for extreme TLS GD/LD with 
-mexplicit-relocs=auto.
I've rebased and attached the patch to fix the bad split in 
-mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  
I've not

tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.
I suppose this still won't work yet because Binutils is not fully 
fixed.

GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" 
TLS

transition can still happen.


The following situations are not handled in the patch:

diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc

index 3fab4b64453..6336a9f696f 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree
thunk_fndecl ATTRIBUTE_UNUSED,
   {
 if (TARGET_CMODEL_EXTREME)
  {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, 
fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, 
temp2));

+   }
+ else
+   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, 
temp2));

It looks like this part is unreachable: with -mcmodel=extreme
use_sibcall_p will never be true.

So cleaned up this part and fixed an ERROR in the added test:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc

index 3a97ba61362..7b8c85a1606 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7481,21 +7481,24 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,

   allowed, otherwise load the address into a register first.  */
    if (use_sibcall_p)
  {
-  if (TARGET_CMODEL_EXTREME)
-    {
-  emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
-  insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
-    }
-  else
-    insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
+  /* If TARGET_CMODEL_EXTREME, we cannot do a direct jump at all
+ and const_call_insn_operand should have returned false. */
+  gcc_assert (!TARGET_CMODEL_EXTREME);
+
+  insn = emit_call_insn (gen_sibcall_internal (fnaddr, 
const0_rtx));

    SIBLING_CALL_P (insn) = 1;
  }
    else
  {
-  if (TARGET_CMODEL_EXTREME)
+  if (!TARGET_CMODEL_EXTREME)
+    loongarch_emit_move (temp1, fnaddr);
+  else if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)
  emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
    else
-    loongarch_emit_move (temp1, fnaddr);
+    {
+  emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+  emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+    }
      emit_jump_insn (gen_indirect_jump (temp1));
  }
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c 


index 27baf4886d6..35bd4570a9e 100644
--- 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
+++ 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c

@@ -1,5 +1,5 @@
  /* { dg-do compile } */
  /* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=extreme 
-fno-plt" } */
-/* { dg-final { scan-assembler-not "la.tls.[lg]d" { target 
tls_native } } } */
+/* { dg-final { scan-assembler-not "la.tls.\[lg\]d" { target 
tls_native } } } */

    #include "./explicit-relocs-auto-tls-ld-gd.c"

And added 3 tests for output_mi_thunk.  The updated patch attached, now
running regression test.



@@ -2870,20 +2872,30 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)

    {
  if (loongarch_explicit_relocs_p (SYMBOL_GOT_DISP))
    {
- rtx tmp1 = gen_reg_rtx (Pmode);
- rtx high = gen_reg_rtx (Pmode);
+ gcc_assert (la_opt_explicit_relocs !=
+ EXPLICIT_RELOCS_NONE);

This operator is written at the end of the line, an

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread chenglulu



在 2024/1/27 下午7:11, Xi Ruoyao 写道:

On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote:

On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote:

在 2024/1/26 下午6:57, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:

在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
     1. Add macro support for TLS symbols
     2. Added support for loading __get_tls_addr symbol address using call36.
     3. Merge template got_load_tls_{ld/gd/le/ie}.
     4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.

I suppose this still won't work yet because Binutils is not fully fixed.
GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
transition can still happen.


The following situations are not handled in the patch:

diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc

index 3fab4b64453..6336a9f696f 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree
thunk_fndecl ATTRIBUTE_UNUSED,
   {
     if (TARGET_CMODEL_EXTREME)
  {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
+ else
+   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));

It looks like this part is unreachable: with -mcmodel=extreme
use_sibcall_p will never be true.

So cleaned up this part and fixed an ERROR in the added test:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3a97ba61362..7b8c85a1606 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7481,21 +7481,24 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,
   allowed, otherwise load the address into a register first.  */
if (use_sibcall_p)
  {
-  if (TARGET_CMODEL_EXTREME)
-   {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
- insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
-   }
-  else
-   insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
+  /* If TARGET_CMODEL_EXTREME, we cannot do a direct jump at all
+and const_call_insn_operand should have returned false.  */
+  gcc_assert (!TARGET_CMODEL_EXTREME);
+
+  insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
SIBLING_CALL_P (insn) = 1;
  }
else
  {
-  if (TARGET_CMODEL_EXTREME)
+  if (!TARGET_CMODEL_EXTREME)
+   loongarch_emit_move (temp1, fnaddr);
+  else if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)
emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
else
-   loongarch_emit_move (temp1, fnaddr);
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
  
emit_jump_insn (gen_indirect_jump (temp1));

  }
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
index 27baf4886d6..35bd4570a9e 100644
--- 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
+++ 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
  /* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=extreme -fno-plt" 
} */
-/* { dg-final { scan-assembler-not "la.tls.[lg]d" { target tls_native } } } */
+/* { dg-final { scan-assembler-not "la.tls.\[lg\]d" { target tls_native } } } 
*/
  
  #include "./explicit-relocs-auto-tls-ld-gd.c"


And added 3 tests for output_mi_thunk.  The updated patch attached, now
running regression test.



@@ -2870,20 +2872,30 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)

    {
  if (loongarch_explicit_relocs_p (SYMBOL_GOT_DISP))
    {
- rtx tmp1 = gen_reg_rtx (Pmode);
- rtx high = gen_reg_rtx (Pmode);
+ gcc_assert (la_opt_explicit_relocs !=
+ EXPLICIT_RELOCS_NONE);

This operator is written at the end of the line, and I thi

[Bug c/113626] New: The r14-8450 commit causes the loongarch [x]vfcmp-{d/f}.c test case to fail

2024-01-26 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113626

Bug ID: 113626
   Summary: The r14-8450 commit causes the loongarch
[x]vfcmp-{d/f}.c test case to fail
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chenglulu at loongson dot cn
  Target Milestone: ---

The r14-8450 commit causes the loongarch [x]vfcmp-{d/f}.c test case to fail

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu



在 2024/1/26 下午6:57, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:

在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
    1. Add macro support for TLS symbols
    2. Added support for loading __get_tls_addr symbol address using call36.
    3. Merge template got_load_tls_{ld/gd/le/ie}.
    4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.

I suppose this still won't work yet because Binutils is not fully fixed.
GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
transition can still happen.



The following situations are not handled in the patch:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc


index 3fab4b64453..6336a9f696f 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,

 {
   if (TARGET_CMODEL_EXTREME)
    {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
+ else
+   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
  insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
    }
   else
@@ -7482,7 +7488,15 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,

   else
 {
   if (TARGET_CMODEL_EXTREME)
-   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+   {
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
+ else
+   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+   }
   else
    loongarch_emit_move (temp1, fnaddr);




Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu



在 2024/1/26 下午6:57, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:

在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
    1. Add macro support for TLS symbols
    2. Added support for loading __get_tls_addr symbol address using call36.
    3. Merge template got_load_tls_{ld/gd/le/ie}.
    4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.

I suppose this still won't work yet because Binutils is not fully fixed.
GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
transition can still happen.

I temporarily changed my binutils to turn off this function.;-)







Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread chenglulu



在 2024/1/26 下午4:59, chenglulu 写道:


在 2024/1/26 下午4:52, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:


+(define_insn "@load_tls"
    [(set (match_operand:P 0 "register_operand" "=r")
  (unspec:P
      [(match_operand:P 1 "symbolic_operand" "")]
-        UNSPEC_TLS_GD))]
+        UNSPEC_TLS))]

/* snip */


+{
+  enum loongarch_symbol_type symbol_type;
+  gcc_assert (loongarch_symbolic_constant_p (operands[1],
_type));

/* snip */


+  switch (symbol_type)
+    {
+    case SYMBOL_TLS_LE:
+  return "la.tls.le\t%0,%1";
+    case SYMBOL_TLS_IE:
+  return "la.tls.ie\t%0,%1";
+    case SYMBOL_TLSLDM:
+  return "la.tls.ld\t%0,%1";
+    case SYMBOL_TLSGD:
+  return "la.tls.gd\t%0,%1";

/* snip */


+    default:
+  gcc_unreachable ();
+    }
+}
+  [(set_attr "mode" "")
+   (set_attr "length" "2")])
When the symbol type is TLS LE and -mcmodel=extreme, 4 instructions are 
generated here, and I will also modify them here.



Should be 8, it's in bytes.


Um, sorry, I meant to use insn_count.




Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu



在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
   1. Add macro support for TLS symbols
   2. Added support for loading __get_tls_addr symbol address using call36.
   3. Merge template got_load_tls_{ld/gd/le/ie}.
   4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.



Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread chenglulu



在 2024/1/26 下午4:52, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:


+(define_insn "@load_tls"
    [(set (match_operand:P 0 "register_operand" "=r")
    (unspec:P
        [(match_operand:P 1 "symbolic_operand" "")]
-       UNSPEC_TLS_GD))]
+       UNSPEC_TLS))]

/* snip */


+{
+  enum loongarch_symbol_type symbol_type;
+  gcc_assert (loongarch_symbolic_constant_p (operands[1],
_type));

/* snip */


+  switch (symbol_type)
+    {
+    case SYMBOL_TLS_LE:
+  return "la.tls.le\t%0,%1";
+    case SYMBOL_TLS_IE:
+  return "la.tls.ie\t%0,%1";
+    case SYMBOL_TLSLDM:
+  return "la.tls.ld\t%0,%1";
+    case SYMBOL_TLSGD:
+  return "la.tls.gd\t%0,%1";

/* snip */


+    default:
+  gcc_unreachable ();
+    }
+}
+  [(set_attr "mode" "")
+   (set_attr "length" "2")])

Should be 8, it's in bytes.


Um, sorry, I meant to use insn_count.



Re: [pushed][PATCH] LoongArch: Split vec_selects of bottom elements into simple move

2024-01-26 Thread chenglulu

Pushed to r14-8447.

在 2024/1/16 上午10:23, Jiahao Xu 写道:

For below pattern, can be treated as a simple move because floating point
and vector share a common register on loongarch64.

(set (reg/v:SF 32 $f0 [orig:93 res ] [93])
   (vec_select:SF (reg:V8SF 32 $f0 [115])
   (parallel [
   (const_int 0 [0])
   ])))

gcc/ChangeLog:

* config/loongarch/lasx.md (vec_extract_0):
New define_insn_and_split patten.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-extract.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 72f7161311c..90f66ee4d24 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -761,6 +761,21 @@ (define_expand "vec_extract"
DONE;
  })
  
+(define_insn_and_split "vec_extract_0"

+  [(set (match_operand: 0 "register_operand" "=f")
+(vec_select:
+  (match_operand:FLASX 1 "register_operand" "f")
+  (parallel [(const_int 0)])))]
+  "ISA_HAS_LSX"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
+}
+  [(set_attr "move_type" "fmove")
+   (set_attr "mode" "")])
+
  (define_expand "vec_perm"
   [(match_operand:LASX 0 "register_operand")
(match_operand:LASX 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-extract.c 
b/gcc/testsuite/gcc.target/loongarch/vect-extract.c
new file mode 100644
index 000..ce126e3a4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-extract.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mlasx -fno-vect-cost-model 
-fno-unroll-loops" } */
+/* { dg-final { scan-assembler-not "xvpickve.w" } } */
+/* { dg-final { scan-assembler-not "xvpickve.d" } } */
+
+float
+sum_float (float *a, int n) {
+  float res = 0.0;
+  for (int i = 0; i < n; i++)
+res += a[i];
+  return res;
+}
+
+double
+sum_double (double *a, int n) {
+  double res = 0.0;
+  for (int i = 0; i < n; i++)
+res += a[i];
+  return res;
+}




Re: [pushed][PATCH v1] LoongArch: Optimize implementation of single-precision floating-point approximate division.

2024-01-26 Thread chenglulu

Pushed to r14-8444.

在 2024/1/24 下午5:44, Li Wei 写道:

We found that in the spec17 521.wrf program, some loop invariant code generated
from single-precision floating-point approximate division calculation failed to
propose a loop. This is because the pseudo-register that stores the
intermediate temporary calculation results is rewritten in the implementation
of single-precision floating-point approximate division, failing to propose
invariants in the loop2_invariant pass. To this end, the intermediate temporary
calculation results are stored in new pseudo-registers without destroying the
read-write dependency, so that they could be recognized as loop invariants in
the loop2_invariant pass.
After optimization, the number of instructions of 521.wrf is reduced by 0.18%
compared with before optimization (1716612948501 -> 1713471771364).

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/invariant-recip.c: New test.
---
  gcc/config/loongarch/loongarch.cc | 19 +++
  .../gcc.target/loongarch/invariant-recip.c| 33 +++
  2 files changed, 46 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/invariant-recip.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 32a0b6f43e8..1b88147fd8c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -10894,16 +10894,23 @@ void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, 
machine_mode mode)
/* x0 = 1./b estimate.  */
emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
  unspec)));
-  /* 2.0 - b * x0  */
+  /* e0 = 2.0 - b * x0.  */
emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode,
   gen_rtx_NEG (mode, b), x0, mtwo)));
  
-  /* x0 = a * x0  */

if (a != CONST1_RTX (mode))
-emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0)));
-
-  /* res = e0 * x0  */
-  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+{
+  rtx e1 = gen_reg_rtx (mode);
+  /* e1 = a * x0.  */
+  emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, a, x0)));
+  /* res = e0 * e1.  */
+  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, e1)));
+}
+  else
+{
+  /* res = e0 * x0.  */
+  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+}
  }
  
  static bool

diff --git a/gcc/testsuite/gcc.target/loongarch/invariant-recip.c 
b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c
new file mode 100644
index 000..2f64f6ed5e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=loongarch64 -mabi=lp64d -mrecip -mfrecipe 
-fdump-rtl-loop2_invariant " } */
+/* { dg-final { scan-rtl-dump "Decided to move dependent invariant" 
"loop2_invariant" } } */
+
+void
+nislfv_rain_plm (int im, int km, float dzl[im][km], float rql[im][km],
+ float dt)
+{
+  int i, k;
+  float con1, decfl;
+  float dz[km], qn[km], wi[km + 1];
+
+  for (i = 0; i < im; i++)
+{
+  for (k = 0; k < km; k++)
+{
+  dz[k] = dzl[i][k];
+}
+  con1 = 0.05;
+  for (k = km - 1; k >= 0; k--)
+{
+  decfl = (wi[k + 1] - wi[k]) * dt / dz[k];
+  if (decfl > con1)
+{
+  wi[k] = wi[k + 1] - con1 * dz[k] / dt;
+}
+}
+  for (k = 0; k < km; k++)
+{
+  rql[i][k] = qn[k];
+}
+}
+}




Re: [pushed][PATCH v3] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-26 Thread chenglulu



在 2024/1/26 下午3:32, Richard Biener 写道:

On Fri, Jan 26, 2024 at 7:23 AM chenxiaolong  wrote:

gcc/testsuite/ChangeLog:

OK


Pushed to r14-8445.

Thank you everyone for your review!




 * gcc.dg/signbit-2.c: Added additional "-mlsx" compilation options.
 * gfortran.dg/graphite/vect-pr40979.f90: Dito.
 * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
---
  gcc/testsuite/gcc.dg/signbit-2.c   | 1 +
  gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90| 1 +
  gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 +
  3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
index 62bb4047d74..5511bb78149 100644
--- a/gcc/testsuite/gcc.dg/signbit-2.c
+++ b/gcc/testsuite/gcc.dg/signbit-2.c
@@ -5,6 +5,7 @@
  /* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* 
x86_64-*-* } } } */
  /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
  /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
+/* { dg-additional-options "-mlsx" { target loongarch_sx } } */
  /* { dg-skip-if "no fallback for MVE" { arm_mve } } */

  #include 
diff --git a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 
b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
index a42290948c4..6f2ad1166a4 100644
--- a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
@@ -1,6 +1,7 @@
  ! { dg-do compile }
  ! { dg-require-effective-target vect_double }
  ! { dg-additional-options "-msse2" { target { { i?86-*-* x86_64-*-* } && 
ilp32 } } }
+! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }

  module mqc_m
  integer, parameter, private :: longreal = selected_real_kind(15,90)
diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index 08965cc5e20..97b88821731 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -2,6 +2,7 @@
  ! { dg-require-effective-target vect_double }
  ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 
-fpredictive-commoning -fdump-tree-pcom-details -std=legacy" }
  ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } 
} }
+! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
  ! { dg-additional-options "-mzarch" { target { s390*-*-* } } }

  *** RESID COMPUTES THE RESIDUAL:  R = V - AU
--
2.20.1





Re:[pushed] [PATCH v3] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT

2024-01-26 Thread chenglulu

Pushed to r14-8446.

在 2024/1/16 上午10:32, Jiahao Xu 写道:

Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.

SPEC2017 performance evaluation shows 1% performance improvement for fprate
GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.6%
on 3A6000.

gcc/ChangeLog:

* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/short-circuit.c: New test.

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 4e6ede926d3..8b453ab3140 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -869,6 +869,7 @@ typedef struct {
 1 is the default; other values are interpreted relative to that.  */
  
  #define BRANCH_COST(speed_p, predictable_p) la_branch_cost

+#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
  
  /* Return the asm template for a conditional branch instruction.

 OPCODE is the opcode's mnemonic and OPERANDS is the asm template for
diff --git a/gcc/testsuite/gcc.target/loongarch/short-circuit.c 
b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
new file mode 100644
index 000..bed585ee172
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-gimple" } */
+
+int
+short_circuit (float *a)
+{
+  float t1x = a[0];
+  float t2x = a[1];
+  float t1y = a[2];
+  float t2y = a[3];
+  float t1z = a[4];
+  float t2z = a[5];
+
+  if (t1x > t2y  || t2x < t1y  || t1x > t2z || t2x < t1z || t1y > t2z || t2y < 
t1z)
+return 0;
+
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "if" 6 "gimple" } } */




Re: [PATCH v1] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-01-26 Thread chenglulu



在 2024/1/24 下午5:36, Li Wei 写道:

We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/vect-10.f90: New test.
---
  gcc/config/loongarch/loongarch.cc  | 42 +
  gcc/testsuite/gfortran.dg/vect/vect-10.f90 | 71 ++
  2 files changed, 113 insertions(+)
  create mode 100644 gcc/testsuite/gfortran.dg/vect/vect-10.f90

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 072c68d97e3..32a0b6f43e8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4096,6 +4096,36 @@ 
loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi
return 1 << ceil_log2 (uf);
  }
  
+static bool

+loongarch_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
+{
+  gassign *assign = dyn_cast (stmt_info->stmt);
+  if (!assign)
+return false;
+  tree_code code = gimple_assign_rhs_code (assign);
+  if (code != PLUS_EXPR && code != MINUS_EXPR)
+return false;
+
+  auto is_mul_result = [&](int i)
+{
+  tree rhs = gimple_op (assign, i);
+  if (TREE_CODE (rhs) != SSA_NAME)
+   return false;
+
+  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
+  if (!def_stmt_info
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
+   return false;
+  gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
+  if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
+   return false;
+
+  return true;
+};
+
+  return is_mul_result (1) || is_mul_result (2);
+}
+
  unsigned
  loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
   stmt_vec_info stmt_info, slp_tree,
@@ -4108,6 +4138,18 @@ loongarch_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
  {
int stmt_cost = loongarch_builtin_vectorization_cost (kind, vectype,
misalign);
+  if (vectype && stmt_info)
+   {
+ gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+ machine_mode mode = TYPE_MODE (vectype);


Hi, Liwei:

I think the code here needs to be commented.

Thanks.


+ if (kind == vector_stmt && GET_MODE_SIZE (mode) == 16 && assign)
+   {
+ if (!vect_is_reduction (stmt_info)
+ && loongarch_multiply_add_p (m_vinfo, stmt_info))
+   stmt_cost = 0;
+   }
+   }
+
retval = adjust_cost_for_freq (stmt_info, where, count * stmt_cost);
m_costs[where] += retval;
  




Re: [pushed][PATCH] Loongarch: Remove vec_concatz pattern

2024-01-24 Thread chenglulu

Pushed to r14-8414.

在 2024/1/24 下午5:19, Jiahao Xu 写道:

It is incorrect to use vld/vori to implement the vec_concatz because when 
the LSX
instruction is used to update the value of the vector register, the upper 128 
bits of
the vector register will not be zeroed.

gcc/ChangeLog:

* config/loongarch/lasx.md (@vec_concatz): Remove this 
define_insn pattern.
* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): Use 
vec_concat.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 90f66ee4d24..e2115ffb884 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -582,21 +582,6 @@ (define_insn "lasx_xvinsgr2vr_"
[(set_attr "type" "simd_insert")
 (set_attr "mode" "")])
  
-(define_insn "@vec_concatz"

-  [(set (match_operand:LASX 0 "register_operand" "=f")
-(vec_concat:LASX
-  (match_operand: 1 "nonimmediate_operand")
-  (match_operand: 2 "const_0_operand")))]
-  "ISA_HAS_LASX"
-{
-  if (MEM_P (operands[1]))
-return "vld\t%w0,%1";
-  else
-return "vori.b\t%w0,%w1,0";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "")])
-
  (define_insn "vec_concat"
[(set (match_operand:LASX 0 "register_operand" "=f")
(vec_concat:LASX
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 072c68d97e3..cd335827570 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -9917,17 +9917,12 @@ loongarch_expand_vector_group_init (rtx target, rtx 
vals)
gcc_unreachable ();
  }
  
-  if (high == CONST0_RTX (half_mode))

-emit_insn (gen_vec_concatz (vmode, target, low, high));
-  else
-{
-  if (!register_operand (low, half_mode))
-   low = force_reg (half_mode, low);
-  if (!register_operand (high, half_mode))
-   high = force_reg (half_mode, high);
-  emit_insn (gen_rtx_SET (target,
- gen_rtx_VEC_CONCAT (vmode, low, high)));
-}
+  if (!register_operand (low, half_mode))
+low = force_reg (half_mode, low);
+  if (!register_operand (high, half_mode))
+high = force_reg (half_mode, high);
+  emit_insn (gen_rtx_SET (target,
+ gen_rtx_VEC_CONCAT (vmode, low, high)));
  }
  
  /* Expand initialization of a vector which has all same elements.  */




Re: [PATCH] Loongarch: Remove vec_concatz pattern

2024-01-24 Thread chenglulu

Jiahao:

 Note that the LoongArch 'a' in the title needs to be capitalized.

 I modified this patch and incorporated it first.


在 2024/1/24 下午5:19, Jiahao Xu 写道:

It is incorrect to use vld/vori to implement the vec_concatz because when 
the LSX
instruction is used to update the value of the vector register, the upper 128 
bits of
the vector register will not be zeroed.

gcc/ChangeLog:

* config/loongarch/lasx.md (@vec_concatz): Remove this 
define_insn pattern.
* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): Use 
vec_concat.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 90f66ee4d24..e2115ffb884 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -582,21 +582,6 @@ (define_insn "lasx_xvinsgr2vr_"
[(set_attr "type" "simd_insert")
 (set_attr "mode" "")])
  
-(define_insn "@vec_concatz"

-  [(set (match_operand:LASX 0 "register_operand" "=f")
-(vec_concat:LASX
-  (match_operand: 1 "nonimmediate_operand")
-  (match_operand: 2 "const_0_operand")))]
-  "ISA_HAS_LASX"
-{
-  if (MEM_P (operands[1]))
-return "vld\t%w0,%1";
-  else
-return "vori.b\t%w0,%w1,0";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "")])
-
  (define_insn "vec_concat"
[(set (match_operand:LASX 0 "register_operand" "=f")
(vec_concat:LASX
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 072c68d97e3..cd335827570 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -9917,17 +9917,12 @@ loongarch_expand_vector_group_init (rtx target, rtx 
vals)
gcc_unreachable ();
  }
  
-  if (high == CONST0_RTX (half_mode))

-emit_insn (gen_vec_concatz (vmode, target, low, high));
-  else
-{
-  if (!register_operand (low, half_mode))
-   low = force_reg (half_mode, low);
-  if (!register_operand (high, half_mode))
-   high = force_reg (half_mode, high);
-  emit_insn (gen_rtx_SET (target,
- gen_rtx_VEC_CONCAT (vmode, low, high)));
-}
+  if (!register_operand (low, half_mode))
+low = force_reg (half_mode, low);
+  if (!register_operand (high, half_mode))
+high = force_reg (half_mode, high);
+  emit_insn (gen_rtx_SET (target,
+ gen_rtx_VEC_CONCAT (vmode, low, high)));
  }
  
  /* Expand initialization of a vector which has all same elements.  */




Re:[pushed] [PATCH] LoongArch: Disable TLS type symbols from generating non-zero offsets.

2024-01-24 Thread chenglulu

Pushed to r14-8412.

在 2024/1/23 上午11:54, Lulu Cheng 写道:

TLS gd ld and ie type symbols will generate corresponding GOT entries,
so non-zero offsets cannot be generated.
The address of TLS le type symbol+addend is not implemented in binutils,
so non-zero offset is not generated here for the time being.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For symbols of type tls, non-zero Offset is not generated.
---
  gcc/config/loongarch/loongarch.cc | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 82467474288..f2ce1f6906d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1924,11 +1924,7 @@ loongarch_symbolic_constant_p (rtx x, enum 
loongarch_symbol_type *symbol_type)
x = UNSPEC_ADDRESS (x);
  }
else if (SYMBOL_REF_P (x) || LABEL_REF_P (x))
-{
-  *symbol_type = loongarch_classify_symbol (x);
-  if (*symbol_type == SYMBOL_TLS)
-   return true;
-}
+*symbol_type = loongarch_classify_symbol (x);
else
  return false;
  
@@ -1939,17 +1935,21 @@ loongarch_symbolic_constant_p (rtx x, enum loongarch_symbol_type *symbol_type)

   relocations.  */
switch (*symbol_type)
  {
-case SYMBOL_TLS_IE:
-case SYMBOL_TLS_LE:
-case SYMBOL_TLSGD:
-case SYMBOL_TLSLDM:
  case SYMBOL_PCREL:
  case SYMBOL_PCREL64:
/* GAS rejects offsets outside the range [-2^31, 2^31-1].  */
return sext_hwi (INTVAL (offset), 32) == INTVAL (offset);
  
+/* The following symbol types do not allow non-zero offsets.  */

  case SYMBOL_GOT_DISP:
+case SYMBOL_TLS_IE:
+case SYMBOL_TLSGD:
+case SYMBOL_TLSLDM:
  case SYMBOL_TLS:
+/* From an implementation perspective, tls_le symbols are allowed to
+   have non-zero offsets, but currently binutils has not added support,
+   so the generation of non-zero offsets is prohibited here.  */
+case SYMBOL_TLS_LE:
return false;
  }
gcc_unreachable ();




Re: [PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread chenglulu



在 2024/1/24 下午5:58, Jiahao Xu 写道:


在 2024/1/24 下午5:48, Xi Ruoyao 写道:

On Wed, 2024-01-24 at 17:19 +0800, Jiahao Xu wrote:

gcc/ChangeLog:

* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.

diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h

index 7692415e04d..ff2c9f460ac 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -336,38 +336,38 @@ __iocsrwr_d (unsigned long int _1, unsigned 
int _2)

  #ifdef __loongarch_frecipe
  /* Assembly instruction format: fd, fj.  */
  /* Data types in instruction templates:  SF, SF.  */
-extern __inline void
+extern __inline float
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __frecipe_s (float _1)
  {
-  __builtin_loongarch_frecipe_s ((float) _1);
+  return (float) __builtin_loongarch_frecipe_s ((float) _1);

I don't think the (float) conversion is needed.


Indeed, this float conversion is unnecessary; I simply included it to 
align with the definitions of other intrinsic functions.


This is generated in batches like vectors, so there will be such 
redundant symbol type conversion.


We will remove the redundant types later.




Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-24 Thread chenglulu



在 2024/1/24 上午3:36, Xi Ruoyao 写道:

On Mon, 2024-01-22 at 15:27 +0800, chenglulu wrote:

The failure of this test case was because the compiler believes that two
(UNSPEC_PCREL_64_PART2 [(symbol)]) instances would always produce the
same result, but this isn't true because the result depends on PC.  Thus
(pc) needed to be included in the RTX, like:

    [(set (match_operand:DI 0 "register_operand" "=r")
  (unspec:DI [(match_operand:DI 2 "") (pc)]
UNSPEC_LA_PCREL_64_PART1))
     (set (match_operand:DI 1 "register_operand" "=r")
  (unspec:DI [(match_dup 2) (pc)] UNSPEC_LA_PCREL_64_PART2))]

With this the buggy REG_UNUSED notes were gone.  But it then prevented
the CSE when loading the address of __tls_get_addr (i.e. if we address
10 TLE_LD symbols in a function it would emit 10 instances of "la.global
__tls_get_addr") so I added an REG_EQUAL note for it.  For symbols other
than __tls_get_addr such notes are added automatically by optimization
passes.

Updated patch attached.


I'm eliminating redundant la.global directives in my macro
implementation.

I will be testing this patch.





With this patch, spec2006 can pass the test, but spec2017 621 and 654
tests fail.
I haven't debugged the specific cause of the problem yet.

Try removing the TARGET_DELEGITIMIZE_ADDRESS hook?  After eating some
unhealthy food in the midnight I realized the hook only
papers over the same issue caused spec2006 failure.  I tried a bootstrap
with BOOT_CFLAGS=-O2 -g -mcmodel=extreme and TARGET_DELEGITIMIZE_ADDRESS
commented out, and there is no more spurious "note: non-delegitimized
UNSPEC UNSPEC_LA_PCREL_64_PART1 (42) found in variable location" things.
I feel that this hook is still written in a buggy way, so maybe removing
it will solve the spec2017 issue.

I found the problem. Binutils did not consider the four instructions 
when converting


the type from TLS IE to TLS LE, which caused the conversion error.




Re: [PATCH] LoongArch: testsuite: Disable stack protector for got-load.C

2024-01-23 Thread chenglulu

LGTM!

Thanks!

在 2024/1/23 下午7:35, Xi Ruoyao 写道:

When building GCC with --enable-default-ssp, the stack protector is
enabled for got-load.C, causing additional GOT loads for
__stack_chk_guard.  So mem/u will be matched more than 2 times and the
test will fail.

Disable stack protector to fix this issue.

gcc/testsuite:

* g++.target/loongarch/got-load.C (dg-options): Add
-fno-stack-protector.
---

Ok for trunk?

  gcc/testsuite/g++.target/loongarch/got-load.C | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/loongarch/got-load.C 
b/gcc/testsuite/g++.target/loongarch/got-load.C
index 20924c73942..17870176ab4 100644
--- a/gcc/testsuite/g++.target/loongarch/got-load.C
+++ b/gcc/testsuite/g++.target/loongarch/got-load.C
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand" } */
+/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand -fno-stack-protector" } */
  /* { dg-final { scan-rtl-dump-times "mem/u" 2 "expand" } } */
  
  #include 




Re: Pushed: [PATCH v2] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-23 Thread chenglulu



在 2024/1/23 下午4:04, Xi Ruoyao 写道:

On Tue, 2024-01-23 at 10:37 +0800, chenglulu wrote:

LGTM!

Thanks!

Pushed v2 as attached.  The only change is in the comment: Qinggang told
me TLE LE relaxation actually *requires* explicit relocs.


I think one of the reasons is also because we cannot properly use a 
macro to describe TLS LE relaxation.




Re: [pushed][PATCH v1] LoongArch: doc:Combined with the content of target-supports.exp, add the attribute description related to LoongArch.

2024-01-22 Thread chenglulu

Pushed to r14-8344.

在 2024/1/17 上午9:24, chenxiaolong 写道:

gcc/ChangeLog:

* doc/sourcebuild.texi: Add attributes for keywords.
---
  gcc/doc/sourcebuild.texi | 20 
  1 file changed, 20 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 8082100a3c9..6c33237ac78 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2352,6 +2352,26 @@ AArch64 target that is able to generate and execute 
armv8.3-a FJCVTZS
  instruction.
  @end table
  
+@subsubsection LoongArch specific attributes

+
+@table @code
+@item loongarch_sx
+LoongArch target that generates instructions for SX.
+
+@item loongarch_asx
+LoongArch target that generates instructions for ASX.
+
+@item loongarch_sx_hw
+LoongArch target that is able to generate and execute SX code.
+
+@item loongarch_asx_hw
+LoongArch target that is able to generate and execute ASX code.
+
+@item loongarch_call36_support
+LoongArch binutils supports call36 relocation.
+
+@end table
+
  @subsubsection MIPS-specific attributes
  
  @table @code




Re: [PATCH] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-22 Thread chenglulu

LGTM!

Thanks!

在 2024/1/23 上午2:42, Xi Ruoyao 写道:

Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler
macro.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false
for SYMBOL_TLS_LDM and SYMBOL_TLS_GD.
(loongarch_call_tls_get_addr): Do not split symbols of
SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is
EXPLICIT_RELOCS_AUTO.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check
for la.tls.ld and la.tls.gd.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.cc| 9 -
  .../loongarch/explicit-relocs-auto-tls-ld-gd.c   | 3 ++-
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 82467474288..58df0b5637d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1970,11 +1970,10 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type 
type)
  {
case SYMBOL_TLS_IE:
case SYMBOL_TLS_LE:
-  case SYMBOL_TLSGD:
-  case SYMBOL_TLSLDM:
case SYMBOL_PCREL64:
-   /* The linker don't know how to relax TLS accesses or 64-bit
-  pc-relative accesses.  */
+   /* TLS IE cannot be relaxed.  TLS LE relaxation does not require
+  using the assembly macro.  The linker does not relax 64-bit
+  pc-relative accesses as at now.  */
return true;
case SYMBOL_GOT_DISP:
/* The linker don't know how to relax GOT accesses in extreme
@@ -2789,7 +2788,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
  
start_sequence ();
  
-  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)

+  if (la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS)
  {
/* Split tls symbol to high and low.  */
rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
index 957ff98df62..ca55fcfc53e 100644
--- a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
+++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
@@ -6,4 +6,5 @@ extern __thread int b __attribute__((visibility("default")));
  
  int test() { return a + b; }
  
-/* { dg-final { scan-assembler-not "la.tls" { target tls_native } } } */

+/* { dg-final { scan-assembler "la\\.tls\\.ld" { target tls_native } } } */
+/* { dg-final { scan-assembler "la\\.tls\\.gd" { target tls_native } } } */




  1   2   3   4   5   >