[PATCH] LoongArch: Guard REGNO with REG_P in loongarch_expand_conditional_move [PR115169]

2024-05-22 Thread Xi Ruoyao
gcc/ChangeLog:

PR target/115169
* config/loongarch/loongarch.cc
(loongarch_expand_conditional_move): Guard REGNO with REG_P.
---

Bootstrapped with --enable-checking=all.  Ok for trunk and 14?

 gcc/config/loongarch/loongarch.cc | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e7835ae34ae..1b6df6a4365 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5344,6 +5344,7 @@ loongarch_expand_conditional_move (rtx *operands)
   rtx op1_extend = op1;
 
   /* Record whether operands[2] and operands[3] modes are promoted to 
word_mode.  */
+  bool promote_op[2] = {false, false};
   bool promote_p = false;
   machine_mode mode = GET_MODE (operands[0]);
 
@@ -5351,9 +5352,15 @@ loongarch_expand_conditional_move (rtx *operands)
 loongarch_emit_float_compare (, , );
   else
 {
-  if ((REGNO (op0) == REGNO (operands[2])
-  || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
- && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+  if (GET_MODE_SIZE (GET_MODE (op0)) < word_mode)
+   {
+ promote_op[0] = (REG_P (op0) && REG_P (operands[2]) &&
+  REGNO (op0) == REGNO (operands[2]));
+ promote_op[1] = (REG_P (op1) && REG_P (operands[3]) &&
+  REGNO (op1) == REGNO (operands[3]));
+   }
+
+  if (promote_op[0] || promote_op[1])
{
  mode = word_mode;
  promote_p = true;
@@ -5395,7 +5402,7 @@ loongarch_expand_conditional_move (rtx *operands)
 
   if (promote_p)
{
- if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+ if (promote_op[0])
op2 = op0_extend;
  else
{
@@ -5403,7 +5410,7 @@ loongarch_expand_conditional_move (rtx *operands)
  op2 = force_reg (mode, op2);
}
 
- if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+ if (promote_op[1])
op3 = op1_extend;
  else
{
-- 
2.45.1



Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Xi Ruoyao
On Thu, 2024-05-16 at 12:14 +0200, Aldy Hernandez wrote:
> Wait, what's the preferred way of reverting a patch?  I followed what
> I saw in:
> 
> commit 04ee1f788ceaa4c7f777ff3b9441ae076191439c
> Author: Jeff Law 
> Date:   Mon May 13 21:42:38 2024 -0600
> 
>     Revert "[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension"
> 
>     This reverts commit df15eb15b5f820321c81efc75f0af13ff8c0dd5b.

Revert is OK, but revert revert is not.

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651144.html

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Ping: [PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-05-15 Thread Xi Ruoyao
Ping.

On Mon, 2024-05-06 at 12:45 +0800, Xi Ruoyao wrote:
> In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
> the build is configured --enable-default-pie.  Let's fix them.
> 
> Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?
> 
> Xi Ruoyao (2):
>   i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
>   i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]
> 
>  gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
>  gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH] driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

2024-05-09 Thread Xi Ruoyao
On Thu, 2024-05-09 at 20:21 +, Joseph Myers wrote:
> On Wed, 8 May 2024, Xi Ruoyao wrote:
> 
> > In GCC 14 we started to emit URLs for "command-line option  is
> > valid for  but not " and "-Werror= argument
> > '-Werror=' is not valid for " warnings.  So we should
> > have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
> > -fdiagnostics-urls= wouldn't be able to control URLs in these warnings.
> > 
> > No test cases are added because with TERM=xterm-256colors PR114980
> > already triggers some test failures.
> > 
> > gcc/ChangeLog:
> > 
> > PR driver/114980
> > * opts-common.cc (prune_options): Move -fdiagnostics-urls=
> > early like -fdiagnostics-color=.
> 
> OK.

Pushed r15-355 and r14-10192.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

2024-05-07 Thread Xi Ruoyao
In GCC 14 we started to emit URLs for "command-line option  is
valid for  but not " and "-Werror= argument
'-Werror=' is not valid for " warnings.  So we should
have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
-fdiagnostics-urls= wouldn't be able to control URLs in these warnings.

No test cases are added because with TERM=xterm-256colors PR114980
already triggers some test failures.

gcc/ChangeLog:

PR driver/114980
* opts-common.cc (prune_options): Move -fdiagnostics-urls=
early like -fdiagnostics-color=.
---

Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk and
releases/gcc-14?

 gcc/opts-common.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 4a2dff243b0..2d1e86ff94f 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -1152,6 +1152,7 @@ prune_options (struct cl_decoded_option **decoded_options,
   unsigned int options_to_prepend = 0;
   unsigned int Wcomplain_wrong_lang_idx = 0;
   unsigned int fdiagnostics_color_idx = 0;
+  unsigned int fdiagnostics_urls_idx = 0;
 
   /* Remove arguments which are negated by others after them.  */
   new_decoded_options_count = 0;
@@ -1185,6 +1186,12 @@ prune_options (struct cl_decoded_option 
**decoded_options,
++options_to_prepend;
  fdiagnostics_color_idx = i;
  continue;
+   case OPT_fdiagnostics_urls_:
+ gcc_checking_assert (i != 0);
+ if (fdiagnostics_urls_idx == 0)
+   ++options_to_prepend;
+ fdiagnostics_urls_idx = i;
+ continue;
 
default:
  gcc_assert (opt_idx < cl_options_count);
@@ -1248,6 +1255,12 @@ keep:
= old_decoded_options[fdiagnostics_color_idx];
  new_decoded_options_count++;
}
+  if (fdiagnostics_urls_idx != 0)
+   {
+ new_decoded_options[argv_0 + options_prepended++]
+   = old_decoded_options[fdiagnostics_urls_idx];
+ new_decoded_options_count++;
+   }
   gcc_checking_assert (options_to_prepend == options_prepended);
 }
 
-- 
2.45.0



Re: [pushed] [PATCH v4 1/2] LoongArch: Define ISA versions

2024-05-07 Thread Xi Ruoyao
On Tue, 2024-05-07 at 18:01 +0800, Lulu Cheng wrote:
> 
> 在 2024/5/7 下午5:42, Xi Ruoyao 写道:
> > On Tue, 2024-05-07 at 17:07 +0800, Xi Ruoyao wrote:
> > > Hmm, after this change the default (-march=la64v1.0) is enabling LSX:
> > > 
> > > $ echo "int dummy;" | cc -c -v |& tail -n1
> > > COLLECT_GCC_OPTIONS='-c' '-v' '-mabi=lp64d' '-march=la64v1.0' '-
> > > mfpu=64'
> > > '-msimd=lsx' '-mcmodel=normal' '-mtune=generic'
> > > 
> > > Is this expected or there's something wrong?
> > Note that
> > https://github.com/loongson/la-toolchain-conventions?tab=readme-ov-file#configuring-the-target-isa
> > says:
> > 
> > LoongArch V1.1 features:
> > 
> > Enable or disable features introduced by LoongArch V1.1. The LSX / LASX
> > part of the LoongArch v1.1 update should only be enabled with lsx / lasx
> > itself enabled.
> > 
> > So to me -march=la64v1.0 should not imply -mlsx.
> 
> 
> The link 
> https://github.com/loongson/la-toolchain-conventions?tab=readme-ov-file#target-presets
>  
> has a detailed description of -march.
> -march=la64v1.0 will open lsx by default.

Hmm, I think it's worthy noted in
https://gcc.gnu.org/gcc-14/changes.html then.  I.e, for the

"It is now recommended to use -march=la64v1.0 as the only compiler
option to describe the target ISA when building binaries for
distribution."

paragraph, add something like:

"GCC now defaults to -march=la64v1.0 for loongarch64-* targets unless 
configured with a different --with-arch= option.  The -march=la64v1.0
option also implies -mlsx."

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [pushed] [PATCH v4 1/2] LoongArch: Define ISA versions

2024-05-07 Thread Xi Ruoyao
On Tue, 2024-05-07 at 17:07 +0800, Xi Ruoyao wrote:
> Hmm, after this change the default (-march=la64v1.0) is enabling LSX:
> 
> $ echo "int dummy;" | cc -c -v |& tail -n1
> COLLECT_GCC_OPTIONS='-c' '-v' '-mabi=lp64d' '-march=la64v1.0' '-
> mfpu=64'
> '-msimd=lsx' '-mcmodel=normal' '-mtune=generic'
> 
> Is this expected or there's something wrong?

Note that
https://github.com/loongson/la-toolchain-conventions?tab=readme-ov-file#configuring-the-target-isa
says:

LoongArch V1.1 features:

Enable or disable features introduced by LoongArch V1.1. The LSX / LASX
part of the LoongArch v1.1 update should only be enabled with lsx / lasx
itself enabled.

So to me -march=la64v1.0 should not imply -mlsx.

> On Tue, 2024-04-23 at 11:31 +0800, Lulu Cheng wrote:
> > Pushed to r14-10083.
> > 
> > 在 2024/4/23 上午10:42, Yang Yujie 写道:
> > > These ISA versions are defined as -march= parameters and
> > > are recommended for building binaries for distribution.
> > > 
> > > Detailed description of these definitions can be found at
> > > https://github.com/loongson/la-toolchain-conventions, which
> > > the LoongArch GCC port aims to conform to.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * config.gcc: Make la64v1.0 the default ISA preset of the
> > > lp64d ABI.
> > >   * config/loongarch/genopts/loongarch-strings: Define
> > > la64v1.0, la64v1.1.
> > >   * config/loongarch/genopts/loongarch.opt.in: Likewise.
> > >   * config/loongarch/loongarch-c.cc
> > > (LARCH_CPP_SET_PROCESSOR): Likewise.
> > >   (loongarch_cpu_cpp_builtins): Likewise.
> > >   * config/loongarch/loongarch-cpu.cc (get_native_prid):
> > > Likewise.
> > >   (fill_native_cpu_config): Likewise.
> > >   * config/loongarch/loongarch-def.cc (array_tune):
> > > Likewise.
> > >   * config/loongarch/loongarch-def.h: Likewise.
> > >   * config/loongarch/loongarch-driver.cc
> > > (driver_set_m_parm):
> > > Likewise.
> > >   (driver_get_normalized_m_opts): Likewise.
> > >   * config/loongarch/loongarch-opts.cc
> > > (default_tune_for_arch): Likewise.
> > >   (TUNE_FOR_ARCH): Likewise.
> > >   (arch_str): Likewise.
> > >   (loongarch_target_option_override): Likewise.
> > >   * config/loongarch/loongarch-opts.h (TARGET_uARCH_LA464):
> > > Likewise.
> > >   (TARGET_uARCH_LA664): Likewise.
> > >   * config/loongarch/loongarch-str.h (STR_CPU_ABI_DEFAULT):
> > > Likewise.
> > >   (STR_ARCH_ABI_DEFAULT): Likewise.
> > >   (STR_TUNE_GENERIC): Likewise.
> > >   (STR_ARCH_LA64V1_0): Likewise.
> > >   (STR_ARCH_LA64V1_1): Likewise.
> > >   * config/loongarch/loongarch.cc
> > > (loongarch_cpu_sched_reassociation_width): Likewise.
> > >   (loongarch_asm_code_end): Likewise.
> > >   * config/loongarch/loongarch.opt: Likewise.
> > >   * doc/invoke.texi: Likewise.
> > > ---
> > >   gcc/config.gcc    | 34 
> > >   .../loongarch/genopts/loongarch-strings   |  5 +-
> > >   gcc/config/loongarch/genopts/loongarch.opt.in | 43 --
> > >   gcc/config/loongarch/loongarch-c.cc   | 37 +++--
> > >   gcc/config/loongarch/loongarch-cpu.cc | 35 
> > >   gcc/config/loongarch/loongarch-def.cc | 83
> > > +--
> > > 
> > >   gcc/config/loongarch/loongarch-def.h  | 37 ++---
> > >   gcc/config/loongarch/loongarch-driver.cc  |  8 +-
> > >   gcc/config/loongarch/loongarch-opts.cc    | 66 +++--
> > > --
> > >   gcc/config/loongarch/loongarch-opts.h |  4 +-
> > >   gcc/config/loongarch/loongarch-str.h  |  5 +-
> > >   gcc/config/loongarch/loongarch.cc | 11 +--
> > >   gcc/config/loongarch/loongarch.opt    | 43 --
> > >   gcc/doc/invoke.texi   | 57 -
> > >   14 files changed, 300 insertions(+), 168 deletions(-)
> > > 
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 5df3c52f8e9..929695c25ab 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -5072,7 +5072,7 @@ case "${target}" in
> > >   
> > >   # Perform initial sanity checks on --with-*
> > > options.
> > >   case ${with_arch} in
> > > - "" | abi-default | loongarch64 | la[46]64) ;; #
> > > OK,
> > > append here.
>

Re: [pushed] [PATCH v4 1/2] LoongArch: Define ISA versions

2024-05-07 Thread Xi Ruoyao
END_STRING (loongarch_cpu_strings[target->cpu_arch]);
> > +    APPEND_STRING (loongarch_arch_strings[target->cpu_arch]);
> >   
> >     APPEND1 ('\0')
> >     return XOBFINISH (_obstack, const char *);
> > @@ -956,7 +986,7 @@ loongarch_target_option_override (struct
> > loongarch_target *target,
> >     /* Other arch-specific overrides.  */
> >     switch (target->cpu_arch)
> >   {
> > -  case CPU_LA664:
> > +  case ARCH_LA664:
> >     /* Enable -mrecipe=all for LA664 by default.  */
> >     if (!opts_set->x_recip_mask)
> >       {
> > diff --git a/gcc/config/loongarch/loongarch-opts.h
> > b/gcc/config/loongarch/loongarch-opts.h
> > index 9844b27ed27..f80482357ac 100644
> > --- a/gcc/config/loongarch/loongarch-opts.h
> > +++ b/gcc/config/loongarch/loongarch-opts.h
> > @@ -127,8 +127,8 @@ struct loongarch_flags {
> >     (la_target.isa.evolution & OPTION_MASK_ISA_LD_SEQ_SA)
> >   
> >   /* TARGET_ macros for use in *.md template conditionals */
> > -#define TARGET_uARCH_LA464   (la_target.cpu_tune == CPU_LA464)
> > -#define TARGET_uARCH_LA664   (la_target.cpu_tune == CPU_LA664)
> > +#define TARGET_uARCH_LA464   (la_target.cpu_tune ==
> > TUNE_LA464)
> > +#define TARGET_uARCH_LA664   (la_target.cpu_tune ==
> > TUNE_LA664)
> >   
> >   /* Note: optimize_size may vary across functions,
> >  while -m[no]-memcpy imposes a global constraint.  */
> > diff --git a/gcc/config/loongarch/loongarch-str.h
> > b/gcc/config/loongarch/loongarch-str.h
> > index 20da2b169ed..47f761babb2 100644
> > --- a/gcc/config/loongarch/loongarch-str.h
> > +++ b/gcc/config/loongarch/loongarch-str.h
> > @@ -27,10 +27,13 @@ along with GCC; see the file COPYING3.  If not
> > see
> >   #define OPTSTR_TUNE "tune"
> >   
> >   #define STR_CPU_NATIVE "native"
> > -#define STR_CPU_ABI_DEFAULT "abi-default"
> > +#define STR_ARCH_ABI_DEFAULT "abi-default"
> > +#define STR_TUNE_GENERIC "generic"
> >   #define STR_CPU_LOONGARCH64 "loongarch64"
> >   #define STR_CPU_LA464 "la464"
> >   #define STR_CPU_LA664 "la664"
> > +#define STR_ARCH_LA64V1_0 "la64v1.0"
> > +#define STR_ARCH_LA64V1_1 "la64v1.1"
> >   
> >   #define STR_ISA_BASE_LA64 "la64"
> >   
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index 6b92e7034c5..e7835ae34ae 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -9609,9 +9609,10 @@ loongarch_cpu_sched_reassociation_width
> > (struct loongarch_target *target,
> >   
> >     switch (target->cpu_tune)
> >   {
> > -    case CPU_LOONGARCH64:
> > -    case CPU_LA464:
> > -    case CPU_LA664:
> > +    case TUNE_GENERIC:
> > +    case TUNE_LOONGARCH64:
> > +    case TUNE_LA464:
> > +    case TUNE_LA664:
> >     /* Vector part.  */
> >     if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P
> > (mode))
> >     {
> > @@ -10980,9

[PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-05-05 Thread Xi Ruoyao
In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
the build is configured --enable-default-pie.  Let's fix them.

Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?

Xi Ruoyao (2):
  i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
  i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]

 gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
 gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

-- 
2.45.0



[PATCH 2/2] i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]

2024-05-05 Thread Xi Ruoyao
After r14-811 "call *nop@GOTPCREL(%rip)" is only generated with
-mno-direct-extern-access even if --enable-default-pie.  So the r13-1614
change to this file is not valid anymore.

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.target/i386/fentryname3.c (dg-final): Revert r13-1614
change.
---
 gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/fentryname3.c 
b/gcc/testsuite/gcc.target/i386/fentryname3.c
index c14a4ebb0cf..bd7c997c178 100644
--- a/gcc/testsuite/gcc.target/i386/fentryname3.c
+++ b/gcc/testsuite/gcc.target/i386/fentryname3.c
@@ -3,8 +3,7 @@
 /* { dg-require-profiling "-pg" } */
 /* { dg-options "-pg -mfentry"  } */
 /* { dg-final { scan-assembler "section.*__entry_loc" } } */
-/* { dg-final { scan-assembler "0x0f, 0x1f, 0x44, 0x00, 0x00" { target nonpic 
} } } */
-/* { dg-final { scan-assembler "call\t\\*nop@GOTPCREL" { target { ! nonpic } } 
} } */
+/* { dg-final { scan-assembler "0x0f, 0x1f, 0x44, 0x00, 0x00" } } */
 /* { dg-final { scan-assembler-not "__fentry__" } } */
 
 __attribute__((fentry_name("nop"), fentry_section("__entry_loc")))
-- 
2.45.0



[PATCH 1/2] i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]

2024-05-05 Thread Xi Ruoyao
For a --enable-default-pie build, using -fno-pic (for compiler) but
not -no-pie (for linker) triggers some linker warnings counted as
excess errors:

/usr/bin/ld: /tmp/cc8MgxiR.o: warning: relocation in read-only
section `.text.startup'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.target/i386/pr113689-1.c (dg-options): Add -no-pie.
---
 gcc/testsuite/gcc.target/i386/pr113689-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr113689-1.c 
b/gcc/testsuite/gcc.target/i386/pr113689-1.c
index 9b8474ed933..0424db2dfdc 100644
--- a/gcc/testsuite/gcc.target/i386/pr113689-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr113689-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { lp64 && fpic } } } */
-/* { dg-options "-O2 -fno-pic -fprofile -mcmodel=large" } */
+/* { dg-options "-O2 -fno-pic -no-pie -fprofile -mcmodel=large" } */
 /* { dg-skip-if "PR90698" { *-*-darwin* } } */
 /* { dg-skip-if "PR113909" { *-*-solaris2* } } */
 
-- 
2.45.0



Pushed: [PATCH] LoongArch: Add constraints for bit string operation define_insn_and_split's [PR114861]

2024-04-26 Thread Xi Ruoyao
On Sat, 2024-04-27 at 11:04 +0800, Lulu Cheng wrote:
> LGTM!
> 
> Thanks.

Pushed r15-11 and r14-10142.

> 在 2024/4/26 下午9:52, Xi Ruoyao 写道:
> > Without the constrants, the compiler attempts to use a stack slot as the
> > target, causing an ICE building the kernel with -Os:
> > 
> >  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c:3144:1:
> >  error: could not split insn
> >  (insn:TI 1764 67 1745
> >    (set (mem/c:DI (reg/f:DI 3 $r3) [707 %sfp+-80 S8 A64])
> >     (and:DI (reg/v:DI 28 $r28 [orig:422 raster_config ] [422])
> >     (const_int -50331649 [0xfcff])))
> >    "drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c":1386:21 111
> >    {*bstrins_di_for_mask}
> >    (nil))
> > 
> > Add these constrants to fix the issue.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/114861
> > * config/loongarch/loongarch.md (bstrins__for_mask): Add
> > constraints for operands.
> > (bstrins__for_ior_mask): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR target/114861
> > * gcc.target/loongarch/pr114861.c: New test.
> > ---
> >   gcc/config/loongarch/loongarch.md | 16 
> >   gcc/testsuite/gcc.target/loongarch/pr114861.c | 39 +++
> >   2 files changed, 47 insertions(+), 8 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/pr114861.c
> > 
> > diff --git a/gcc/config/loongarch/loongarch.md 
> > b/gcc/config/loongarch/loongarch.md
> > index a316c8fb820..5c80c169cbf 100644
> > --- a/gcc/config/loongarch/loongarch.md
> > +++ b/gcc/config/loongarch/loongarch.md
> > @@ -1543,9 +1543,9 @@ (define_insn "and3_extended"
> >  (set_attr "mode" "")])
> >   
> >   (define_insn_and_split "*bstrins__for_mask"
> > -  [(set (match_operand:GPR 0 "register_operand")
> > -   (and:GPR (match_operand:GPR 1 "register_operand")
> > -(match_operand:GPR 2 "ins_zero_bitmask_operand")))]
> > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > +   (and:GPR (match_operand:GPR 1 "register_operand" "r")
> > +(match_operand:GPR 2 "ins_zero_bitmask_operand" "i")))]
> >     ""
> >     "#"
> >     ""
> > @@ -1563,11 +1563,11 @@ (define_insn_and_split "*bstrins__for_mask"
> >     })
> >   
> >   (define_insn_and_split "*bstrins__for_ior_mask"
> > -  [(set (match_operand:GPR 0 "register_operand")
> > -   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand")
> > -  (match_operand:GPR 2 "const_int_operand"))
> > -(and:GPR (match_operand:GPR 3 "register_operand")
> > -     (match_operand:GPR 4 "const_int_operand"]
> > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > +   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand" "r")
> > +     (match_operand:GPR 2 "const_int_operand" "i"))
> > +(and:GPR (match_operand:GPR 3 "register_operand" "r")
> > +     (match_operand:GPR 4 "const_int_operand" "i"]
> >     "loongarch_pre_reload_split ()
> >  && loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
> >     "#"
> > diff --git a/gcc/testsuite/gcc.target/loongarch/pr114861.c 
> > b/gcc/testsuite/gcc.target/loongarch/pr114861.c
> > new file mode 100644
> > index 000..e6507c406b9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/loongarch/pr114861.c
> > @@ -0,0 +1,39 @@
> > +/* PR114861: ICE building the kernel with -Os
> > +   Reduced from linux/fs/ntfs3/attrib.c at revision c942a0cd3603.  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -march=loongarch64 -msoft-float -mabi=lp64s" } */
> > +
> > +long evcn, attr_collapse_range_vbo, attr_collapse_range_bytes;
> > +unsigned short flags;
> > +int attr_collapse_range_ni_0_0;
> > +int *attr_collapse_range_mi;
> > +unsigned attr_collapse_range_svcn, attr_collapse_range_vcn1;
> > +void ni_insert_nonresident (unsigned, unsigned short, int **);
> > +int mi_pack_runs (int);
> > +int
> > +attr_collapse_range (void)
> > +{
> > +  _Bool __trans_tmp_1;
> > +  int run = attr_collapse_range_ni_0_0;
> > +  unsigned evcn1, vcn, end;
> > +  short a_flags = flags;
> > +  __trans_tmp_1 = flags & (32768 | 1);
> > +  if (__trans_tmp_1)
> > +    return 2;
> > +  vcn = attr_collapse_range_vbo;
> > +  end = attr_collapse_range_bytes;
> > +  evcn1 = evcn;
> > +  for (;;)
> > +    if (attr_collapse_range_svcn >= end)
> > +  {
> > +    unsigned eat, next_svcn = mi_pack_runs (42);
> > +    attr_collapse_range_vcn1 = (vcn ? vcn : attr_collapse_range_svcn);
> > +    eat = (0 < end) - attr_collapse_range_vcn1;
> > +    mi_pack_runs (run - eat);
> > +    if (next_svcn + eat)
> > +  ni_insert_nonresident (evcn1 - eat - next_svcn, a_flags,
> > + _collapse_range_mi);
> > +  }
> > +    else
> > +  return 42;
> > +}
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Add constraints for bit string operation define_insn_and_split's [PR114861]

2024-04-26 Thread Xi Ruoyao
Without the constrants, the compiler attempts to use a stack slot as the
target, causing an ICE building the kernel with -Os:

drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c:3144:1:
error: could not split insn
(insn:TI 1764 67 1745
  (set (mem/c:DI (reg/f:DI 3 $r3) [707 %sfp+-80 S8 A64])
   (and:DI (reg/v:DI 28 $r28 [orig:422 raster_config ] [422])
   (const_int -50331649 [0xfcff])))
  "drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c":1386:21 111
  {*bstrins_di_for_mask}
  (nil))

Add these constrants to fix the issue.

gcc/ChangeLog:

PR target/114861
* config/loongarch/loongarch.md (bstrins__for_mask): Add
constraints for operands.
(bstrins__for_ior_mask): Likewise.

gcc/testsuite/ChangeLog:

PR target/114861
* gcc.target/loongarch/pr114861.c: New test.
---
 gcc/config/loongarch/loongarch.md | 16 
 gcc/testsuite/gcc.target/loongarch/pr114861.c | 39 +++
 2 files changed, 47 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr114861.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a316c8fb820..5c80c169cbf 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1543,9 +1543,9 @@ (define_insn "and3_extended"
(set_attr "mode" "")])
 
 (define_insn_and_split "*bstrins__for_mask"
-  [(set (match_operand:GPR 0 "register_operand")
-   (and:GPR (match_operand:GPR 1 "register_operand")
-(match_operand:GPR 2 "ins_zero_bitmask_operand")))]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (and:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "ins_zero_bitmask_operand" "i")))]
   ""
   "#"
   ""
@@ -1563,11 +1563,11 @@ (define_insn_and_split "*bstrins__for_mask"
   })
 
 (define_insn_and_split "*bstrins__for_ior_mask"
-  [(set (match_operand:GPR 0 "register_operand")
-   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand")
-  (match_operand:GPR 2 "const_int_operand"))
-(and:GPR (match_operand:GPR 3 "register_operand")
- (match_operand:GPR 4 "const_int_operand"]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand" "r")
+ (match_operand:GPR 2 "const_int_operand" "i"))
+(and:GPR (match_operand:GPR 3 "register_operand" "r")
+ (match_operand:GPR 4 "const_int_operand" "i"]
   "loongarch_pre_reload_split ()
&& loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
   "#"
diff --git a/gcc/testsuite/gcc.target/loongarch/pr114861.c 
b/gcc/testsuite/gcc.target/loongarch/pr114861.c
new file mode 100644
index 000..e6507c406b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr114861.c
@@ -0,0 +1,39 @@
+/* PR114861: ICE building the kernel with -Os
+   Reduced from linux/fs/ntfs3/attrib.c at revision c942a0cd3603.  */
+/* { dg-do compile } */
+/* { dg-options "-Os -march=loongarch64 -msoft-float -mabi=lp64s" } */
+
+long evcn, attr_collapse_range_vbo, attr_collapse_range_bytes;
+unsigned short flags;
+int attr_collapse_range_ni_0_0;
+int *attr_collapse_range_mi;
+unsigned attr_collapse_range_svcn, attr_collapse_range_vcn1;
+void ni_insert_nonresident (unsigned, unsigned short, int **);
+int mi_pack_runs (int);
+int
+attr_collapse_range (void)
+{
+  _Bool __trans_tmp_1;
+  int run = attr_collapse_range_ni_0_0;
+  unsigned evcn1, vcn, end;
+  short a_flags = flags;
+  __trans_tmp_1 = flags & (32768 | 1);
+  if (__trans_tmp_1)
+return 2;
+  vcn = attr_collapse_range_vbo;
+  end = attr_collapse_range_bytes;
+  evcn1 = evcn;
+  for (;;)
+if (attr_collapse_range_svcn >= end)
+  {
+unsigned eat, next_svcn = mi_pack_runs (42);
+attr_collapse_range_vcn1 = (vcn ? vcn : attr_collapse_range_svcn);
+eat = (0 < end) - attr_collapse_range_vcn1;
+mi_pack_runs (run - eat);
+if (next_svcn + eat)
+  ni_insert_nonresident (evcn1 - eat - next_svcn, a_flags,
+ _collapse_range_mi);
+  }
+else
+  return 42;
+}
-- 
2.44.0



Re: [PATCH v2 1/2] LoongArch: Define ISA versions

2024-04-22 Thread Xi Ruoyao
On Sat, 2024-04-20 at 18:47 +0800, Yang Yujie wrote:
> +@item la664
> +LoongArch LA664-based processor with LSX, LASX and all LoongArch v1.1
> features.

I still prefer "v1.1 instructions" instead of "v1.1 features" because
LA664 (at least all launched LA664 CPUs) does not support HPTW, which
**is** a v1.1 feature.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 1/2] LoongArch: Define ISA versions

2024-04-20 Thread Xi Ruoyao
On Sat, 2024-04-20 at 11:26 +0800, Lulu Cheng wrote:

> 
> > One LoongArch v1.1 feature "Hardware Page Table Walker" is not
> > implemented by LA664.  Maybe "all LoongArch v1.1 **unprivileged**
> > features"?
> > 
> The description of -march is "+Generate instructions for the machine type 
> @var{arch-type}.",
> 
>  so is there no need to write it like this here?

Then maybe just say "all LoongArch v1.1 instructions" instead of
"features" here as well?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 1/2] LoongArch: Define ISA versions

2024-04-19 Thread Xi Ruoyao
On Fri, 2024-04-19 at 19:04 +0800, Yang Yujie wrote:
>  @table @samp
>  @item native
> -This selects the CPU to generate code for at compilation time by determining
> -the processor type of the compiling machine.  Using @option{-march=native}
> -enables all instruction subsets supported by the local machine (hence
> -the result might not run on different machines).  Using 
> @option{-mtune=native}
> -produces code optimized for the local machine under the constraints
> -of the selected instruction set.
> +Local processor type detected by the native compiler.
>  @item loongarch64
> -A generic CPU with 64-bit extensions.
> +Generic LoongArch 64-bit processor.
>  @item la464
> -LoongArch LA464 CPU with LBT, LSX, LASX, LVZ.
> +LoongArch LA464-based processor with LSX, LASX.
> +@item la664
> +LoongArch LA664-based processor with LSX, LASX and all LoongArch v1.1 
> features.

One LoongArch v1.1 feature "Hardware Page Table Walker" is not
implemented by LA664.  Maybe "all LoongArch v1.1 **unprivileged**
features"?

> +@item la64v1.0
> +LoongArch64 ISA version 1.0.
> +@item la64v1.1
> +LoongArch64 ISA version 1.1.

IMO it's better to use a wording like LA664, i.e. "a CPU implementing
all LoongArch v1.1 unprivileged features" (emphasising "all", as the
v1.1 manual allows to only implement a subset of v1.1 features).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 1/2] LoongArch: Define ISA versions

2024-04-19 Thread Xi Ruoyao
On Fri, 2024-04-19 at 19:04 +0800, Yang Yujie wrote:
> These ISA versions are defined as -march= parameters and
> are recommended for building binaries for distribution.
> 
> Detailed description of these definitions can be found at
> https://github.com/loongson/la-toolchain-conventions, which
> the LoongArch GCC port aims to conform to.

The links seems broken.  Do you mean la-softdev-convention? 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] LoongArch: Enable switchable target

2024-04-08 Thread Xi Ruoyao
On Mon, 2024-04-08 at 16:46 +0800, Yang Yujie wrote:
> v1 -> v2:
> Remove spaces from changelog.

I've rebuilt the base system with a GCC including this patch.  LTO+PGO
bootstrap fine, regtested fine, and no issues observed.

I do usually include the optimization flags into LDFLAGS when I do LTO,
so I don't really rely on this patch though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao
On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
>   * config/loongarch/loongarch-builtins.cc
> (loongarch_init_builtins):
>     Initialize all builtin functions at startup.

git gcc-verify complains that tab should be used instead of space for
this line.

>   (loongarch_expand_builtin): Turn assertion of builtin
> availability
>     into a test.

and this line.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] ICF: Make ICF and SRA agree on padding

2024-04-07 Thread Xi Ruoyao
On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
> +/* Given two types in an assignment, return true either if any one cannot be
> +   totally scalarized or if they have padding (i.e. not copied bits)  */
> +
> +bool
> +sra_total_scalarization_would_copy_same_data_p (tree t1, tree t2)
> +{
> +  sra_padding_collecting p1;
> +  if (!check_ts_and_push_padding_to_vec (t1, ))
> +    return true;
> +
> +  sra_padding_collecting p2;
> +  if (!check_ts_and_push_padding_to_vec (t2, ))
> +    return true;
> +
> +  unsigned l = p1.m_padding.length ();
> +  if (l != p2.m_padding.length ())
> +    return false;
> +  for (unsigned i = 0; i < l; i++)
> +    if (p1.m_padding[i].first != p2.m_padding[i].first
> + || p1.m_padding[i].second != p2.m_padding[i].second)
> +  return false;
> +
> +  return true;
> +}
> +

Better remove this trailing empty line from tree-sra.cc.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] ICF: Make ICF and SRA agree on padding

2024-04-07 Thread Xi Ruoyao
On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
> The patch has been approved by Honza in Bugzilla. (I hope.  He did write
> it looked reasonable.)  Together with the patch for PR 113907, it has
> passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on
> x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux.  It also
> passed normal bootstrap on aarch64-linux but there many testcases failed
> because the compiler timed out.  The machine is old and slow and might
> have been oversubscribed so my plan is to try again on gcc185 from
> cfarm.  If that goes well, I intend to commit the patch and then start
> working on backports.

I've tried these two patches out on my own 24-core AArch64 machine. 
Bootstrapped (but no LTO or PGO) and regtested fine.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao
On Sun, 2024-04-07 at 16:23 +0800, Yang Yujie wrote:
> On Sun, Apr 07, 2024 at 04:23:53PM +0800, Xi Ruoyao wrote:
> > On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> > > This patch fixes the back-end context switching in cases where functions
> > > should be built with their own target contexts instead of the
> > > global one, such as LTO linking and functions with target attributes 
> > > (TBD).
> > > 
> > >   PR target/113233
> > 
> > Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
> > save/restore"?  Should I reopen it?
> > 
> > -- 
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University
> 
> Yes, the issue was not fixed with that patch. This one should do.

So reopened the PR.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao
On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> This patch fixes the back-end context switching in cases where functions
> should be built with their own target contexts instead of the
> global one, such as LTO linking and functions with target attributes (TBD).
> 
>   PR target/113233

Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
save/restore"?  Should I reopen it?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1] LoongArch: Set default alignment for functions jumps and loops [PR112919].

2024-04-06 Thread Xi Ruoyao
On Tue, 2024-04-02 at 15:03 +0800, Lulu Cheng wrote:
> +/* Alignment for functions loops and jumps for best performance.  For new
> +   uarchs the value should be measured via benchmarking.  See the 
> documentation
> +   for -falign-functions -falign-loops and -falign-jumps in invoke.texi for 
> the
   ^ ^

Better have two commas here.

Otherwise it should be OK.

> +   format.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v5] LoongArch: Add support for TLS descriptors

2024-04-01 Thread Xi Ruoyao
Is this patch targeting GCC 14 or 15?  If 14 I guess we'd commit now...

Generally we don't add features in stage 4, but if we keep trad as the
default I think it'd be OK.  And RISC-V guys plan to push their TLS desc
implementation this week too.

On Tue, 2024-03-19 at 09:54 +0800, mengqinggang wrote:
> Add support for TLS descriptors on normal code model and extreme code model.
> 
> Normal code model instruction sequence:
>   -mno-explicit-relocs:
>     la.tls.desc   $r4, s
>     add.d $r12, $r4, $r2
>   -mexplicit-relocs:
>     pcalau12i $r4,%desc_pc_hi20(s)
>     addi.d$r4,$r4,%desc_pc_lo12(s)
>     ld.d  $r1,$r4,%desc_ld(s)
>     jirl  $r1,$r1,%desc_call(s)
>     add.d $r12, $r4, $r2
> 
> Extreme code model instruction sequence:
>   -mno-explicit-relocs:
>     la.tls.desc   $r4, $r12, s
>     add.d $r12, $r4, $r2
>   -mexplicit-relocs:
>     pcalau12i $r4,%desc_pc_hi20(s)
>     addi.d$r12,$r0,%desc_pc_lo12(s)
>     lu32i.d   $r12,%desc64_pc_lo20(s)
>     lu52i.d   $r12,$r12,%desc64_pc_hi12(s)
>     add.d $r4,$r4,$r12
>     ld.d  $r1,$r4,%desc_ld(s)
>     jirl  $r1,$r1,%desc_call(s)
>     add.d $r12, $r4, $r2
> 
> The default is still traditional TLS model, but can be configured with
> --with-tls={trad,desc}. The default can change to TLS descriptors once
> libc and LLVM support this.
> 
> gcc/ChangeLog:
> 
>   * config.gcc: Add --with-tls option to change TLS flavor.
>   * config/loongarch/genopts/loongarch.opt.in: Add -mtls-dialect to
>   configure TLS flavor.
>   * config/loongarch/loongarch-def.h (struct loongarch_target): Add
>   tls_dialect.
>   * config/loongarch/loongarch-driver.cc (la_driver_init): Add tls
>   flavor.
>   * config/loongarch/loongarch-opts.cc (loongarch_init_target): Add
>   tls_dialect.
>   (loongarch_config_target): Ditto.
>   (loongarch_update_gcc_opt_status): Ditto.
>   * config/loongarch/loongarch-opts.h (loongarch_init_target):Ditto.
>   (TARGET_TLS_DESC): New define.
>   * config/loongarch/loongarch.cc (loongarch_symbol_insns): Add TLS DESC
>   instructions sequence length.
>   (loongarch_legitimize_tls_address): New TLS DESC instruction sequence.
>   (loongarch_option_override_internal): Add la_opt_tls_dialect.
>   (loongarch_option_restore): Add la_target.tls_dialect.
>   * config/loongarch/loongarch.md (@got_load_tls_desc): Normal
>   code model for TLS DESC.
>   (got_load_tls_desc_off64): Extreme code model for TLS DESC.
>   * config/loongarch/loongarch.opt: Regenerated.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/cmodel-extreme-1.c: Add -mtls-dialect=trad.
>   * gcc.target/loongarch/cmodel-extreme-2.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c:
>   Ditto.
>   * gcc.target/loongarch/func-call-medium-1.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-2.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-3.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-4.c: Ditto.
>   * gcc.target/loongarch/tls-extreme-macro.c: Ditto.
>   * gcc.target/loongarch/tls-gd-noplt.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-auto-extreme-tls-desc.c: New 
> test.
>   * gcc.target/loongarch/explicit-relocs-auto-tls-desc.c: New test.
>   * gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c: New test.
>   * gcc.target/loongarch/explicit-relocs-tls-desc.c: New test.
> 
> Co-authored-by: Lulu Cheng 
> Co-authored-by: Xi Ruoyao 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Increase division costs

2024-03-31 Thread Xi Ruoyao
On Mon, 2024-04-01 at 10:22 +0800, chenglulu wrote:
> 
> 在 2024/4/1 上午9:29, Xi Ruoyao 写道:
> > On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote:
> > 
> > > I tested spec2006. In the floating-point program, the test items with 
> > > large
> > > 
> > > fluctuations are removed, and the rest is basically unchanged.
> > > 
> > > The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and 
> > > (10,22).
> > So IIUC (10,10) is better than (5,5), (10,22), and the originally
> > proposed (14,22)?  Then should I make a change to make all 4 costs (SF,
> > DF, SI, DI) 10?
> 
> I think this may require the analysis of the spec's test case. I took a 
> look at the test results again,
> 
> where the scores of SPEC INT 462.libquantum fluctuated greatly, but the 
> combination of (10,22)
> 
> showed an overall upward trend compared to the scores of the other two
> combinations.
> 
> I don't know if (10,22) this combination happens to have the kind of 
> test cases in the changelog.
> 
> So can we change it together in GCC15?

Ok.  Abandoning this patch then.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Increase division costs

2024-03-31 Thread Xi Ruoyao
On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote:

> I tested spec2006. In the floating-point program, the test items with large
> 
> fluctuations are removed, and the rest is basically unchanged.
> 
> The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and (10,22).

So IIUC (10,10) is better than (5,5), (10,22), and the originally
proposed (14,22)?  Then should I make a change to make all 4 costs (SF,
DF, SI, DI) 10?

I'd still want DI % 17 to be reduced as reciprocal sequence (but
not SI % 17) since DI % (smaller const) is quite important for
some workloads like competitive programming.  However "adapting with
different modulos" is not possible w/o refactoring generic code so it
must be deferred to at least GCC 15.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Ping: [PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-28 Thread Xi Ruoyao
Ping.

On Wed, 2024-03-20 at 15:10 +0800, Xi Ruoyao wrote:
> We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
> arguments and there is nothing to advance, but that is not the case
> for (...) functions returning by hidden reference which have one such
> artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
> fail.
> 
> Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
> 
> gcc/ChangeLog:
> 
>   PR target/114175
>   * config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
>   mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
>   functions if arg.type is NULL.
> ---
> 
> Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk?
> 
>  gcc/config/mips/mips.cc | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index 68e2ae8d8fa..ce764a5cb35 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
>   argument.  Advance a local copy of CUM past the last "real" named
>   argument, to find out how many registers are left over.  */
>    local_cum = *get_cumulative_args (cum);
> -  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
> +
> +  /* For a C23 variadic function w/o any named argument, and w/o an
> + artifical argument for large return value, skip advancing args.
> + There is such an artifical argument iff. arg.type is non-NULL
> + (PR 114175).  */
> +  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
> +  || arg.type != NULL_TREE)
>  mips_function_arg_advance (pack_cumulative_args (_cum), arg);
>  
>    /* Found out how many registers we need to save.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao
On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> > 
> > 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > > The latency of LA464 and LA664 division instructions depends on the
> > > input.  When I updated the costs in r14-6642, I unintentionally set the
> > > division costs to the best-case latency (when the first operand is 0).
> > > Per a recent discussion [1] we should use "something sensible" instead
> > > of it.
> > > 
> > > Use the average of the minimum and maximum latency observed instead.
> > > This enables multiplication to reciprocal sequence reduction and speeds
> > > up the following test case for about 30%:
> > > 
> > >  int
> > >  main (void)
> > >  {
> > >    unsigned long stat = 0xdeadbeef;
> > >    for (int i = 0; i < 1; i++)
> > >  stat = (stat * stat + stat * 114514 + 1919810) % 17;
> > >    asm(""::"r"(stat));
> > >  }
> > > 
> > > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> > 
> > The test case div-const-reduction.c is modified to assemble the instruction
> > sequence as follows:
> > lu12i.w $r12,97440>>12  # 0x3b9ac000
> > ori $r12,$r12,2567
> > mod.w   $r13,$r13,$r12
> > 
> > This sequence of instructions takes 5 clock cycles.

It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.

> Hmm indeed, it seems a waste to do this reduction for int / 17.
> I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao
On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote:
> On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao  wrote:
> > 
> > The latency of LA464 and LA664 division instructions depends on the
> > input.  When I updated the costs in r14-6642, I unintentionally set the
> > division costs to the best-case latency (when the first operand is 0).
> > Per a recent discussion [1] we should use "something sensible" instead
> > of it.
> > 
> > Use the average of the minimum and maximum latency observed instead.
> > This enables multiplication to reciprocal sequence reduction and speeds
> > up the following test case for about 30%:
> > 
> >     int
> >     main (void)
> >     {
> >   unsigned long stat = 0xdeadbeef;
> >   for (int i = 0; i < 1; i++)
> >     stat = (stat * stat + stat * 114514 + 1919810) % 17;
> >   asm(""::"r"(stat));
> >     }
> 
> I think you should be able to see a constant divisor and thus could do
> better than return the same latency for everything.  For non-constant
> divisors using the best-case latency shouldn't be a problem.

Hmm, it seems not really possible as at now.  expand_divmod does
something like:

  max_cost = (unsignedp
  ? udiv_cost (speed, compute_mode)
  : sdiv_cost (speed, compute_mode));

which is reading the pre-calculated costs from a table.  Thus we don't
really know the denominator and cannot estimate the cost based on it :(.

CSE really invokes the cost hook with the actual (mod (a, (const_int
17)) RTX but it's less important.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao
On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> 
> 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > The latency of LA464 and LA664 division instructions depends on the
> > input.  When I updated the costs in r14-6642, I unintentionally set the
> > division costs to the best-case latency (when the first operand is 0).
> > Per a recent discussion [1] we should use "something sensible" instead
> > of it.
> > 
> > Use the average of the minimum and maximum latency observed instead.
> > This enables multiplication to reciprocal sequence reduction and speeds
> > up the following test case for about 30%:
> > 
> >  int
> >  main (void)
> >  {
> >    unsigned long stat = 0xdeadbeef;
> >    for (int i = 0; i < 1; i++)
> >  stat = (stat * stat + stat * 114514 + 1919810) % 17;
> >    asm(""::"r"(stat));
> >  }
> > 
> > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> 
> The test case div-const-reduction.c is modified to assemble the instruction
> sequence as follows:
>   lu12i.w $r12,97440>>12  # 0x3b9ac000
>   ori $r12,$r12,2567
>   mod.w   $r13,$r13,$r12
> 
> This sequence of instructions takes 5 clock cycles.

Hmm indeed, it seems a waste to do this reduction for int / 17.
I'll try to make a better heuristic as Richard suggests...


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-26 Thread Xi Ruoyao
On Tue, 2024-03-26 at 11:15 +0800, YunQiang Su wrote:

/* snip */

> With -ffinite-math-only -fno-signed-zeros, it does work with
>     x >= y ? x : y
> while without `-ffinite-math-only -fno-signed-zeros`, it cannot.
> @Xi Ruoyao Is it expected by IEEE?

When y is (quiet) NaN and x is not, fmax(x, y) should produce x but x >=
y ? x : y should produce y.  Thus -ffinite-math-only is needed.

When x is +0.0 and y is -0.0, x >= y ? x : y should produce +0.0 but
fmax(x, y) may produce +0.0 or -0.0 (IEEE allows both and I don't see a
more strict requirement in MIPS 6.06 manual either).  Thus -fno-signed-
zeros is needed.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Increase division costs

2024-03-26 Thread Xi Ruoyao
The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

int
main (void)
{
  unsigned long stat = 0xdeadbeef;
  for (int i = 0; i < 1; i++)
stat = (stat * stat + stat * 114514 + 1919810) % 17;
  asm(""::"r"(stat));
}

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html

gcc/ChangeLog:

* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Increase
default division cost to the average of the best case and worst
case senarios observed.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/div-const-reduction.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch-def.cc| 8 
 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c | 9 +
 2 files changed, 13 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index e8c129ce643..93e72a520d5 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -95,12 +95,12 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
   : fp_add (COSTS_N_INSNS (5)),
 fp_mult_sf (COSTS_N_INSNS (5)),
 fp_mult_df (COSTS_N_INSNS (5)),
-fp_div_sf (COSTS_N_INSNS (8)),
-fp_div_df (COSTS_N_INSNS (8)),
+fp_div_sf (COSTS_N_INSNS (12)),
+fp_div_df (COSTS_N_INSNS (15)),
 int_mult_si (COSTS_N_INSNS (4)),
 int_mult_di (COSTS_N_INSNS (4)),
-int_div_si (COSTS_N_INSNS (5)),
-int_div_di (COSTS_N_INSNS (5)),
+int_div_si (COSTS_N_INSNS (14)),
+int_div_di (COSTS_N_INSNS (22)),
 movcf2gr (COSTS_N_INSNS (7)),
 movgr2cf (COSTS_N_INSNS (15)),
 branch_cost (6),
diff --git a/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c 
b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
new file mode 100644
index 000..0ee86410dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=la464" } */
+/* { dg-final { scan-assembler-not "div\.\[dw\]" } } */
+
+int
+test (int a)
+{
+  return a % 17;
+}
-- 
2.44.0



TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Xi Ruoyao
On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:
> > +/* Costs to use when optimizing for xiangshan nanhu.  */
> > +static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
> > +  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
> > +  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
> > +  6,   /* issue_rate */
> > +  3,   /* branch_cost */
> > +  3,   /* memory_cost */
> > +  3,   /* fmv_cost */
> > +  true,/* 
> > slow_unaligned_access */
> > +  false,   /* use_divmod_expansion */
> > +  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
> > +  NULL,/* vector cost */

> Is your integer division really that fast?  The table above essentially 
> says that your cpu can do integer division in 6 cycles.

Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-21 Thread Xi Ruoyao
On Thu, 2024-03-21 at 10:14 +0800, Jie Mei wrote:
> diff --git a/gcc/testsuite/gcc.target/mips/mips-minmax.c 
> b/gcc/testsuite/gcc.target/mips/mips-minmax.c
> new file mode 100644
> index 000..2d234ac4b1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/mips/mips-minmax.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mhard-float -ffinite-math-only -march=mips32r6" } */

You may want to add fmin3 and fmax3 in addition to
smin3 and smax3 so it will work without -ffinite-math-only.

‘fminM3’, ‘fmaxM3’
 IEEE-conformant minimum and maximum operations.  If one operand is
 a quiet ‘NaN’, then the other operand is returned.  If both
 operands are quiet ‘NaN’, then a quiet ‘NaN’ is returned.  In the
 case when gcc supports signaling ‘NaN’ (-fsignaling-nans) an
 invalid floating point exception is raised and a quiet ‘NaN’ is
 returned.

And the MIPS 6.06 manual says:

Numbers are preferred to NaNs: if one input is a NaN, but not both, the
value of the numeric input is returned. If both are NaNs, the NaN in fs
is returned.

for MAX.fmt and MIN.fmt, so they matches fmin3 and fmax3.

> +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH] LoongArch: Fix a typo [PR 114407]

2024-03-20 Thread Xi Ruoyao
gcc/ChangeLog:

PR target/114407
* config/loongarch/loongarch-opts.cc (loongarch_config_target):
Fix typo in diagnostic message, enabing -> enabling.
---

Pushed r14-9582 as obvious.

 gcc/config/loongarch/loongarch-opts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 7eeac43ed2f..627f9148adf 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -362,7 +362,7 @@ config_target_isa:
  gcc_assert (constrained.simd);
 
  inform (UNKNOWN_LOCATION,
- "enabing %qs promotes %<%s%s%> to %<%s%s%>",
+ "enabling %qs promotes %<%s%s%> to %<%s%s%>",
  loongarch_isa_ext_strings[t.isa.simd],
  OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[t.isa.fpu],
  OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[ISA_EXT_FPU64]);
-- 
2.44.0



[PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-20 Thread Xi Ruoyao
We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk?

 gcc/config/mips/mips.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 68e2ae8d8fa..ce764a5cb35 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 mips_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0



Pushed: [PATCH v2] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-19 Thread Xi Ruoyao
On Tue, 2024-03-19 at 11:19 +0800, chenglulu wrote:
> 
> 在 2024/3/18 下午5:34, Xi Ruoyao 写道:
> > We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
> > arguments and there is nothing to advance, but that is not the case
> > for (...) functions returning by hidden reference which have one
> > such
> > artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
> > gcc.dg/c23-stdarg-8.c to fail.
> > 
> > Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/114175
> > * config/loongarch/loongarch.cc
> > (loongarch_setup_incoming_varargs): Only skip
> > loongarch_function_arg_advance for
> > TYPE_NO_NAMED_ARGS_STDARG_P
> > functions if arg.type is NULL.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> > 
> >   gcc/config/loongarch/loongarch.cc | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index 70e31bb831c..57de8ef7d20 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs
> > (cumulative_args_t cum,
> >    argument.  Advance a local copy of CUM past the last "real"
> > named
> >    argument, to find out how many registers are left over.  */
> >     local_cum = *get_cumulative_args (cum);
> I think it's important to add annotation information here:
>  /* where there is no hidden return argument passed, arg.type
> 
>   is always NULL.  */
> 
> Others LTGM.
> 
> Thanks!

Pushed v2 with a comment added as attached.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From c1fd4589c2bf9fd8409d51b94df219cb75107762 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Mon, 18 Mar 2024 17:18:34 +0800
Subject: [PATCH v2] LoongArch: Fix C23 (...) functions returning large
 aggregates [PR114175]

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

	PR target/114175
	* config/loongarch/loongarch.cc
	(loongarch_setup_incoming_varargs): Only skip
	loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
	functions if arg.type is NULL.
---
 gcc/config/loongarch/loongarch.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..5344f2a6987 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,13 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0



[PATCH] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread Xi Ruoyao
We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..57de8ef7d20 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0



[PATCH] LoongArch: Remove unused and incorrect "sge_" define_insn

2024-03-13 Thread Xi Ruoyao
If this insn is really used, we'll have something like

slti $r4,$r0,$r5

in the code.  The assembler will reject it because slti wants 2
register operands and 1 immediate operand.  But we've not got any bug
report for this, indicating this define_insn is unused at all.

Note that do_store_flag (in expr.cc) is already converting x >= 1 to
x > 0 unconditionally, so this define_insn is indeed unused and we can
just remove it.

gcc/ChangeLog:

* config/loongarch/loongarch.md (any_ge): Remove.
(sge_): Remove.
---

Not fully tested but should be obvious.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 525e1e82183..18fd9c1e7d5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -517,7 +517,6 @@ (define_code_iterator equality_op [eq ne])
 ;; These code iterators allow the signed and unsigned scc operations to use
 ;; the same template.
 (define_code_iterator any_gt [gt gtu])
-(define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
 
@@ -3355,15 +3354,6 @@ (define_insn "*sgt_"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
-(define_insn "*sge_"
-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
-(const_int 1)))]
-  ""
-  "slti\t%0,%.,%1"
-  [(set_attr "type" "slt")
-   (set_attr "mode" "")])
-
 (define_insn "*slt_"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(any_lt:GPR (match_operand:X 1 "register_operand" "r")
-- 
2.44.0



Re: [PATCH v1] LoongArch: Remove masking process for operand 3 of xvpermi.q.

2024-03-13 Thread Xi Ruoyao
On Tue, 2024-03-12 at 09:56 +0800, Chenghui Pan wrote:
> The behavior of non-zero unused bits in xvpermi.q instruction's
> third operand is undefined on LoongArch, according to our
> discussion (https://github.com/llvm/llvm-project/pull/83540),
> we think that keeping original insn operand as unmodified
> state is better solution.
> 
> This patch partially reverts 7b158e036a95b1ab40793dd53bed7dbd770ffdaf.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/lasx.md: Remove masking of operand 3.

Add (lasx_xvpermi_q_) before ":".

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c:
>     Reposition operand 3's value into instruction's defined accept range.
^^

Remove these two white spaces.

Should be OK with these ChangeLog style issues fixed.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] testsuite: Fix vfprintf-chk-1.c with -fhardened

2024-03-13 Thread Xi Ruoyao
On Tue, 2024-03-12 at 17:19 +0100, Jakub Jelinek wrote:
> On Thu, Feb 15, 2024 at 10:53:08PM +, Sam James wrote:
> > With _FORTIFY_SOURCE >= 2 (enabled by -fhardened), vfprintf-chk-1.c's
> > __vfprintf_chk ends up calling __vprintf_chk rather than vprintf.

Do we really want to support adding random CFLAGS running the test
suite?  AFAIK adding random CFLAGS will just cause test failures here or
there.  We are adjusting the test suite for -fPIE -pie and -fstack-
protector-strong but it's because they can be implicitly enabled with --
enable-default-* options, and we don't have --enable-default-hardened as
at now.

If we need to bootstrap a hardened GCC and test it, pass -fhardened as
how "info gccinstall" suggests:

make BOOT_CFLAGS="-O2 -g -fhardened"

instead of

env C{,XX}FLAGS="-O2 -g -fhardened" /path/to/gcc/configure ...

which will taint the test suite with -fhardened.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-13 Thread Xi Ruoyao
On Wed, 2024-03-13 at 10:24 +0800, Xi Ruoyao wrote:
>    return TARGET_EXPLICIT_RELOCS
> -    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> -  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> -  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> -  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> -  \tadd.d\t$r4,$r4,%2\n\
> -  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> -  \tjirl\t$r1,$r1,%%desc_call(%1)"
> -    : "la.tls.desc\t%0,%2,%1";
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
> +  "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t"
> +  "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t"
> +  "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t"

Oops, the "%2" in the above line should be "%1".

> +  "add.d\t$r4,$r4,%1\n\t"
> +  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
> +  "jirl\t$r1,$r1,%%desc_call(%0)"
> +    : "la.tls.desc\t$r4,%1,%0";

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-13 Thread Xi Ruoyao
On Wed, 2024-03-13 at 11:06 +0800, mengqinggang wrote:
> 
> 在 2024/3/13 上午6:15, Xi Ruoyao 写道:
> > On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote:
> > > +(define_insn "@got_load_tls_desc"
> > > +  [(set (match_operand:P 0 "register_operand" "=r")
> > > + (unspec:P
> > > +     [(match_operand:P 1 "symbolic_operand" "")]
> > > +     UNSPEC_TLS_DESC))
> > > +    (clobber (reg:SI FCC0_REGNUM))
> > > +    (clobber (reg:SI FCC1_REGNUM))
> > > +    (clobber (reg:SI FCC2_REGNUM))
> > > +    (clobber (reg:SI FCC3_REGNUM))
> > > +    (clobber (reg:SI FCC4_REGNUM))
> > > +    (clobber (reg:SI FCC5_REGNUM))
> > > +    (clobber (reg:SI FCC6_REGNUM))
> > > +    (clobber (reg:SI FCC7_REGNUM))
> > > +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> > > +  "TARGET_TLS_DESC"
> > > +{
> > > +  return TARGET_EXPLICIT_RELOCS
> > > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > > +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> > > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> > Use something like
> > 
> >  ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
> >    "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
> >    "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
> >    "jirl\t$r1,$r1,%%desc_call(%1)"
> >  : "la.tls.desc\t%0,%1";
> > 
> > to prevent additional white spaces in the output asm before tabs.
> > 
> > > +    : "la.tls.desc\t%0,%1";
> > > +}
> > > +  [(set_attr "got" "load")
> > > +   (set_attr "mode" "")
> > > +   (set_attr "length" "16")])
> > > +
> > > +(define_insn "got_load_tls_desc_off64"
> > > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > > + (unspec:DI
> > > +     [(match_operand:DI 1 "symbolic_operand" "")]
> > > +     UNSPEC_TLS_DESC_OFF64))
> > > +    (clobber (reg:SI FCC0_REGNUM))
> > > +    (clobber (reg:SI FCC1_REGNUM))
> > > +    (clobber (reg:SI FCC2_REGNUM))
> > > +    (clobber (reg:SI FCC3_REGNUM))
> > > +    (clobber (reg:SI FCC4_REGNUM))
> > > +    (clobber (reg:SI FCC5_REGNUM))
> > > +    (clobber (reg:SI FCC6_REGNUM))
> > > +    (clobber (reg:SI FCC7_REGNUM))
> > > +    (clobber (reg:SI RETURN_ADDR_REGNUM))
> > > +    (clobber (match_operand:DI 2 "register_operand" "="))]
> > > +  "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
> > > +{
> > > +  return TARGET_EXPLICIT_RELOCS
> > > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > > +  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> > > +  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> > > +  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> > > +  \tadd.d\t$r4,$r4,%2\n\
> > > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> > > +    : "la.tls.desc\t%0,%2,%1";
> > Likewise.
> > 
> > > +}
> > > +  [(set_attr "got" "load")
> > > +   (set_attr "length" "28")])
> > Otherwise OK.
> > 
> > It's better to allow splitting these two instructions but we can do it
> > in another patch.  And IMO it's better to enable TLS desc by default if
> > supported by both the assembler and the libc, but we'll have to defer it
> > until Glibc 2.40 release.
> 
> 
> Do we need to wait until LLVM also supports TLS DESC  before setting it 
> as default?

Hmm, maybe...  I remember when we added R_LARCH_ALIGN lld was being
broken for a while.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao
On Wed, 2024-03-13 at 06:56 +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-13 at 06:15 +0800, Xi Ruoyao wrote:
> > > +(define_insn "@got_load_tls_desc"
> > > +  [(set (match_operand:P 0 "register_operand" "=r")
> 
> Hmm, and it looks like we should use (reg:P 4) instead of match_operand
> here, because the instruction does not work for a different register:
> with TARGET_EXPLICIT_RELOCS we are hard coding r4, and without
> TARGET_EXPLICIT_RELOCS the TLS desc function still only puts the return
> value in r4.

Suggested changes:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 303666bf6d5..8f4d3f36c26 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2954,10 +2954,10 @@ loongarch_legitimize_tls_address (rtx loc)
  tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM);
 
  if (TARGET_CMODEL_EXTREME)
-   emit_insn (gen_got_load_tls_desc_off64 (a0, loc,
+   emit_insn (gen_got_load_tls_desc_off64 (loc,
gen_reg_rtx (DImode)));
  else
-   emit_insn (gen_got_load_tls_desc (Pmode, a0, loc));
+   emit_insn (gen_got_load_tls_desc (Pmode, loc));
 
  emit_insn (gen_add3_insn (dest, a0, tp));
}
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 0a1a6a24f61..8e8f1012344 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2772,9 +2772,9 @@ (define_insn "store_word"
 ;; Thread-Local Storage
 
 (define_insn "@got_load_tls_desc"
-  [(set (match_operand:P 0 "register_operand" "=r")
+  [(set (reg:P 4)
(unspec:P
-   [(match_operand:P 1 "symbolic_operand" "")]
+   [(match_operand:P 0 "symbolic_operand" "")]
UNSPEC_TLS_DESC))
 (clobber (reg:SI FCC0_REGNUM))
 (clobber (reg:SI FCC1_REGNUM))
@@ -2788,20 +2788,20 @@ (define_insn "@got_load_tls_desc"
   "TARGET_TLS_DESC"
 {
   return TARGET_EXPLICIT_RELOCS
-? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
-  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
-  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
-  \tjirl\t$r1,$r1,%%desc_call(%1)"
-: "la.tls.desc\t%0,%1";
+? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
+  "addi.d\t$r4,$r4,%%desc_pc_lo12(%0)\n\t"
+  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
+  "jirl\t$r1,$r1,%%desc_call(%0)"
+: "la.tls.desc\t$r4,%0";
 }
   [(set_attr "got" "load")
(set_attr "mode" "")
(set_attr "length" "16")])
 
 (define_insn "got_load_tls_desc_off64"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (reg:DI 4)
(unspec:DI
-   [(match_operand:DI 1 "symbolic_operand" "")]
+   [(match_operand:DI 0 "symbolic_operand" "")]
UNSPEC_TLS_DESC_OFF64))
 (clobber (reg:SI FCC0_REGNUM))
 (clobber (reg:SI FCC1_REGNUM))
@@ -2812,18 +2812,18 @@ (define_insn "got_load_tls_desc_off64"
 (clobber (reg:SI FCC6_REGNUM))
 (clobber (reg:SI FCC7_REGNUM))
 (clobber (reg:SI RETURN_ADDR_REGNUM))
-(clobber (match_operand:DI 2 "register_operand" "="))]
+(clobber (match_operand:DI 1 "register_operand" "="))]
   "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
 {
   return TARGET_EXPLICIT_RELOCS
-? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
-  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
-  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
-  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
-  \tadd.d\t$r4,$r4,%2\n\
-  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
-  \tjirl\t$r1,$r1,%%desc_call(%1)"
-: "la.tls.desc\t%0,%2,%1";
+? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
+  "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t"
+  "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t"
+  "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t"
+  "add.d\t$r4,$r4,%1\n\t"
+  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
+  "jirl\t$r1,$r1,%%desc_call(%0)"
+: "la.tls.desc\t$r4,%1,%0";
 }
   [(set_attr "got" "load")
(set_attr "length" "28")])

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao
On Wed, 2024-03-13 at 06:15 +0800, Xi Ruoyao wrote:
> > +(define_insn "@got_load_tls_desc"
> > +  [(set (match_operand:P 0 "register_operand" "=r")

Hmm, and it looks like we should use (reg:P 4) instead of match_operand
here, because the instruction does not work for a different register:
with TARGET_EXPLICIT_RELOCS we are hard coding r4, and without
TARGET_EXPLICIT_RELOCS the TLS desc function still only puts the return
value in r4.

> > +   (unspec:P
> > +       [(match_operand:P 1 "symbolic_operand" "")]
> > +       UNSPEC_TLS_DESC))
> > +    (clobber (reg:SI FCC0_REGNUM))
> > +    (clobber (reg:SI FCC1_REGNUM))
> > +    (clobber (reg:SI FCC2_REGNUM))
> > +    (clobber (reg:SI FCC3_REGNUM))
> > +    (clobber (reg:SI FCC4_REGNUM))
> > +    (clobber (reg:SI FCC5_REGNUM))
> > +    (clobber (reg:SI FCC6_REGNUM))
> > +    (clobber (reg:SI FCC7_REGNUM))
> > +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> > +  "TARGET_TLS_DESC"
> > +{
> > +  return TARGET_EXPLICIT_RELOCS
> > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> 
> Use something like
> 
>     ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
>   "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
>   "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
>   "jirl\t$r1,$r1,%%desc_call(%1)"
>     : "la.tls.desc\t%0,%1";
> 
> to prevent additional white spaces in the output asm before tabs.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao
On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote:
> +(define_insn "@got_load_tls_desc"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> + (unspec:P
> +     [(match_operand:P 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> +  "TARGET_TLS_DESC"
> +{
> +  return TARGET_EXPLICIT_RELOCS
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> +  \tjirl\t$r1,$r1,%%desc_call(%1)"

Use something like

? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
  "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
  "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
  "jirl\t$r1,$r1,%%desc_call(%1)"
: "la.tls.desc\t%0,%1";

to prevent additional white spaces in the output asm before tabs.

> +    : "la.tls.desc\t%0,%1";
> +}
> +  [(set_attr "got" "load")
> +   (set_attr "mode" "")
> +   (set_attr "length" "16")])
> +
> +(define_insn "got_load_tls_desc_off64"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (unspec:DI
> +     [(match_operand:DI 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC_OFF64))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))
> +    (clobber (match_operand:DI 2 "register_operand" "="))]
> +  "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
> +{
> +  return TARGET_EXPLICIT_RELOCS
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> +  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> +  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> +  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> +  \tadd.d\t$r4,$r4,%2\n\
> +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> +    : "la.tls.desc\t%0,%2,%1";

Likewise.

> +}
> +  [(set_attr "got" "load")
> +   (set_attr "length" "28")])

Otherwise OK.

It's better to allow splitting these two instructions but we can do it
in another patch.  And IMO it's better to enable TLS desc by default if
supported by both the assembler and the libc, but we'll have to defer it
until Glibc 2.40 release.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread Xi Ruoyao
On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote:
> 
> 在 2024/3/7 下午8:52, Xi Ruoyao 写道:
> > It should be better to extend the expected value before the ll/sc loop
> > (like what LLVM does), instead of repeating the extending in each
> > iteration.  Something like:
> 
> I wanted to do this at first, but it didn't work out.
> 
> But then I thought about it, and there are two benefits to putting it in 
> the middle of ll/sc:
> 
> 1. If there is an operation that uses the $r4 register after this atomic 
> operation, another
> 
> register is required to store $r4.
> 
> 2. ll.w requires long cycles, so putting an addi.w command after ll.w 
> won't make a difference.
> 
> So based on the above, I didn't try again, but directly made a 
> modification like a patch.

Ah, the explanation makes sense to me.  Ok with the original patch then.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread Xi Ruoyao
On Thu, 2024-03-07 at 09:12 +0800, Lulu Cheng wrote:

> +  output_asm_insn ("1:", operands);
> +  output_asm_insn ("ll.\t%0,%1", operands);
> +
> +  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
> + return value of the val_without_const_folding will not be truncated and
> + will be passed directly to the function compare_exchange_strong.
> + However, the instruction 'bne' does not distinguish between 32-bit and
> + 64-bit operations.  so if the upper 32 bits of the register are not
> + extended by the 32nd bit symbol, then the comparison may not be valid
> + here.  This will affect the result of the operation.  */
> +
> +  if (TARGET_64BIT && REG_P (operands[2])
> +  && GET_MODE (operands[2]) == SImode)
> +    {
> +  output_asm_insn ("addi.w\t%5,%2,0", operands);
> +  output_asm_insn ("bne\t%0,%5,2f", operands);

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..c21781947fd 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -234,11 +234,11 @@ (define_insn "atomic_exchange_short"
   "amswap%A3.\t%0,%z2,%1"
   [(set (attr "length") (const_int 4))])
 
-(define_insn "atomic_cas_value_strong"
+(define_insn "atomic_cas_value_strong"
   [(set (match_operand:GPR 0 "register_operand" "=")
(match_operand:GPR 1 "memory_operand" "+ZC"))
(set (match_dup 1)
-   (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
+   (unspec_volatile:GPR [(match_operand:X 2 "reg_or_0_operand" "rJ")
  (match_operand:GPR 3 "reg_or_0_operand" "rJ")
  (match_operand:SI 4 "const_int_operand")]  ;; 
mod_s
 UNSPEC_COMPARE_AND_SWAP))
@@ -246,10 +246,10 @@ (define_insn "atomic_cas_value_strong"
   ""
 {
   return "1:\\n\\t"
-"ll.\\t%0,%1\\n\\t"
+"ll.\\t%0,%1\\n\\t"
 "bne\\t%0,%z2,2f\\n\\t"
 "or%i3\\t%5,$zero,%3\\n\\t"
-"sc.\\t%5,%1\\n\\t"
+"sc.\\t%5,%1\\n\\t"
 "beqz\\t%5,1b\\n\\t"
 "b\\t3f\\n\\t"
 "2:\\n\\t"
@@ -301,9 +301,23 @@ (define_expand "atomic_compare_and_swap"
 operands[3], 
operands[4],
 operands[6]));
   else
-emit_insn (gen_atomic_cas_value_strong (operands[1], operands[2],
- operands[3], operands[4],
- operands[6]));
+{
+  rtx (*cas)(rtx, rtx, rtx, rtx, rtx) =
+   TARGET_64BIT ? gen_atomic_cas_value_strongdi
+: gen_atomic_cas_value_strongsi;
+  rtx expect = operands[3];
+
+  if (mode == SImode
+ && TARGET_64BIT
+ && operands[3] != const0_rtx)
+   {
+ expect = gen_reg_rtx (DImode);
+ emit_insn (gen_extendsidi2 (expect, operands[3]));
+   }
+
+  emit_insn (cas (operands[1], operands[2], expect, operands[4],
+ operands[6]));
+}
 
   rtx compare = operands[1];
   if (operands[3] != const0_rtx)

It produces:

slli.w  $r4,$r4,0
1:
ll.w$r14,$r3,0
bne $r14,$r4,2f
or  $r15,$zero,$r12
sc.w$r15,$r3,0
beqz$r15,1b
b   3f
2:
dbar0b10100
3:

for the test case and the compiled test case runs successfully.  I've
not done a full bootstrap yet though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread Xi Ruoyao
On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote:
> Hi,
> 
> Whether to add an option to control the generation of R_LARCH_RELAX,
> similar to as -mrelax/-mno-relax.

There are already -mrelax and -mno-relax, they can be checked in the
compiler code with TARGET_LINKER_RELAXATION.

/* snip */

> > +    case 'Q':
> > +  if (!TARGET_LINKER_RELAXATION)
> > +break;

So with -mno-relax we'll break early here, then no R_LARCH_RELAX will be
printed.

> > +  if (code == HIGH)
> > +op = XEXP (op, 0);
> > +
> > +  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
> > +fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
> > +
> > +  break;

The tls-ie-norelax.c test case also checks for -mno-relax:

> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
> > +/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } 
> > } */

i.e. -mno-relax is used compiling this test case, and the compiled
assembly code should not contain R_LARCH_RELAX.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers

2024-03-05 Thread Xi Ruoyao
Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names.  A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.
---

Tested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.target/loongarch/vfcmp-d.c  | 202 --
 gcc/testsuite/gcc.target/loongarch/vfcmp-f.c  | 347 ++
 gcc/testsuite/gcc.target/loongarch/xvfcmp-d.c | 202 --
 gcc/testsuite/gcc.target/loongarch/xvfcmp-f.c | 204 --
 4 files changed, 816 insertions(+), 139 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c 
b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
index 8b870ef38a0..87e4ed19e96 100644
--- a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
+++ b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
@@ -1,28 +1,188 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-fno-vect-cost-model" } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #define F double
 #define I long long
 
 #include "vfcmp-f.c"
 
-/* { dg-final { scan-assembler 
"compare_quiet_equal:.*\tvfcmp\\.ceq\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_equal:.*\tvfcmp\\.cune\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_greater:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_less:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_less:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_greater:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_unordered:.*\tvfcmp\\.cun\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_ordered:.*\tvfcmp\\.cor\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_ordered\n"
 } } */
+/*
+** compare_quiet_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.ceq.d (\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_quiet_not_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.cune.d(\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_signaling_greater:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.slt.d (\$vr[0-9]+),\2,\1
+** vst \3,\$r6,0
+** 

[PATCH v2] LoongArch: Allow s9 as a register alias

2024-03-05 Thread Xi Ruoyao
The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

v1 -> v2: Add a test case.

Ok for trunk?

 gcc/config/loongarch/loongarch.h   | 1 +
 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 3 +++
 2 files changed, 4 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
   { "t8",  20 + GP_REG_FIRST },\
   { "x",   21 + GP_REG_FIRST },\
   { "fp",  22 + GP_REG_FIRST },\
+  { "s9",  22 + GP_REG_FIRST },\
   { "s0",  23 + GP_REG_FIRST },\
   { "s1",  24 + GP_REG_FIRST },\
   { "s2",  25 + GP_REG_FIRST },\
diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
new file mode 100644
index 000..d2e3b80f83c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
+register long fp asm("fp"); /* { dg-warning "register of 'fp' used for 
multiple global register variables" } */
-- 
2.44.0



[PATCH v3] testsuite: Add a test case for negating FP vectors containing zeros

2024-03-05 Thread Xi Ruoyao
Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

- v1 -> v2: Remove { dg-do run } which may cause SIGILL.
- v2 -> v3: Add -fno-associative-math to fix an excessive warning on
  arm.

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..21fa00cfa15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,38 @@
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fno-associative-math -fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.44.0



Re: [PATCH v2] LoongArch: Fix inconsistent description in *sge_

2024-03-05 Thread Xi Ruoyao
On Tue, 2024-03-05 at 16:05 +0800, Guo Jie wrote:
> The constraint of op[1] is inconsistent with the output template.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.md
>   (define_insn "*sge_"): Fix inconsistency
>   error.
> 
> ---
> Update in v2:
>     Remove useless support for op[1] is const_imm12_operand.
> 
> ---
>  gcc/config/loongarch/loongarch.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md 
> b/gcc/config/loongarch/loongarch.md
> index f3b5c641fce..e35a001e0ed 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -3360,7 +3360,7 @@ (define_insn "*sge_"
>   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
>    (const_int 1)))]
>    ""
> -  "slti\t%0,%.,%1"
> +  "slt\t%0,%.,%1"
>    [(set_attr "type" "slt")
>     (set_attr "mode" "")])

Hmm, this define_insn seems never really used or it would generate
something like "sltui $r4,$r0,$r4" and trigger an assembler failure. 
The generic path seems already converting "x >= 1" to "x > 0".

So it seems we should just remove this define_insn?


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Fix inconsistent description in *sge_

2024-03-04 Thread Xi Ruoyao
On Mon, 2024-03-04 at 11:03 +0800, Guo Jie wrote:
> The constraint of op[1] is inconsistent with the output template.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.md
>   (define_insn "*sge_"): Fix inconsistency
>   error.
>
> ---
>  gcc/config/loongarch/loongarch.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md
> b/gcc/config/loongarch/loongarch.md
> index f3b5c641fce..2d25374bdc9 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -3357,10 +3357,10 @@ (define_insn "*sgt_"
>  
>  (define_insn "*sge_"
>    [(set (match_operand:GPR 0 "register_operand" "=r")
> - (any_ge:GPR (match_operand:X 1 "register_operand" "r")
> + (any_ge:GPR (match_operand:X 1 "arith_operand" "rI")
>    (const_int 1)))]

No, arith_operand is just register_operand or const_imm12_operand, but
comparing a const_imm12_operand with (const_int 1) should be folded into
a constant (even at -O0, AFAIK).  So allowing const_imm12_operand here
makes no benefit.

>    ""
> -  "slti\t%0,%.,%1"
> +  "slt%i1\t%0,%.,%1"
>    [(set_attr "type" "slt")
>     (set_attr "mode" "")])
>  

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-29 Thread Xi Ruoyao
On Thu, 2024-02-29 at 15:09 +0800, Xi Ruoyao wrote:
> Recently I've fixed two wrong FP vector negate implementation which
> caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
> prevent a similar issue from happening again, add a test case.
> 
> Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
> (with MSA), LoongArch (with LSX and LASX).
> 
> gcc/testsuite:
> 
>   * gcc.dg/vect/vect-neg-zero.c: New test.
> ---
> 
> v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on
> Linaro ARM CI.

Oops, still failing ARM CI.  Not sure why...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Allow s9 as a register alias

2024-02-28 Thread Xi Ruoyao
The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
   { "t8",  20 + GP_REG_FIRST },\
   { "x",   21 + GP_REG_FIRST },\
   { "fp",  22 + GP_REG_FIRST },\
+  { "s9",  22 + GP_REG_FIRST },\
   { "s0",  23 + GP_REG_FIRST },\
   { "s1",  24 + GP_REG_FIRST },\
   { "s2",  25 + GP_REG_FIRST },\
-- 
2.44.0



[PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-02-28 Thread Xi Ruoyao
In Binutils we need to make IE to LE relaxation only allowed when there
is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid
"partial" relaxation won't happen with the extreme code model.  So if we
are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an
R_LARCH_RELAX to allow the relaxation.  The IE to LE relaxation does not
require the pcalau12i and the ld instruction to be adjacent, so we don't
need to limit ourselves to use the macro.

For the distro maintainers backporting changes: this change depends on
r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in
the extreme code model.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Support 'Q' for R_LARCH_RELAX for TLS IE.
(loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS
IE.
* config/loongarch/loongarch.md (ld_from_got): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/tls-ie-relax.c: New test.
* gcc.target/loongarch/tls-ie-norelax.c: New test.
* gcc.target/loongarch/tls-ie-extreme.c: New test.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 15 ++-
 gcc/config/loongarch/loongarch.md |  2 +-
 .../gcc.target/loongarch/tls-ie-extreme.c |  5 +
 .../gcc.target/loongarch/tls-ie-norelax.c |  5 +
 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c | 11 +++
 5 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 0428b6e65d5..70e31bb831c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4981,7 +4981,7 @@ loongarch_output_move (rtx dest, rtx src)
  if (type == SYMBOL_TLS_LE)
return "lu12i.w\t%0,%h1";
  else
-   return "pcalau12i\t%0,%h1";
+   return "%Q1pcalau12i\t%0,%h1";
}
 
   if (src_code == CONST_INT)
@@ -6145,6 +6145,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
'L'  Print the low-part relocation associated with OP.
'm' Print one less than CONST_INT OP in decimal.
'N' Print the inverse of the integer branch condition for comparison OP.
+   'Q'  Print R_LARCH_RELAX for TLS IE.
'r'  Print address 12-31bit relocation associated with OP.
'R'  Print address 32-51bit relocation associated with OP.
'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
@@ -6282,6 +6283,18 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
letter);
   break;
 
+case 'Q':
+  if (!TARGET_LINKER_RELAXATION)
+   break;
+
+  if (code == HIGH)
+   op = XEXP (op, 0);
+
+  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
+   fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
+
+  break;
+
 case 'r':
   loongarch_print_operand_reloc (file, op, false /* hi64_part */,
 true /* lo_reloc */);
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index f3b5c641fce..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2620,7 +2620,7 @@ (define_insn "@ld_from_got"
(match_operand:P 2 "symbolic_operand")))]
UNSPEC_LOAD_FROM_GOT))]
   ""
-  "ld.\t%0,%1,%L2"
+  "%Q2ld.\t%0,%1,%L2"
   [(set_attr "type" "move")]
 )
 
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
new file mode 100644
index 000..00c545a3e8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mcmodel=extreme 
-mexplicit-relocs=auto -mrelax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
new file mode 100644
index 000..dd6bf3634a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c
new file mode 100644
index 000..e9f7569b1da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c
@@ 

[PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-28 Thread Xi Ruoyao
Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on
Linaro ARM CI.

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..6af4a02c517
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,38 @@
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.44.0



[PATCH v2] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-02-28 Thread Xi Ruoyao
The vect_int_mod target selector is evaluated with the options in
DEFAULT_VECTCFLAGS in effect, but these options are not automatically
passed to tests out of the vect directories.  So this test fails on
targets where integer vector modulo operation is supported but requiring
an option to enable, for example LoongArch.

In this test case, the only expected optimization not happened in
original is in corge because it needs forward propogation.  So we can
scan the forwprop2 dump (where the vector operation is not expanded to
scalars yet) instead of optimized, then we don't need to consider
vect_int_mod or not.

gcc/testsuite/ChangeLog:

PR testsuite/113418
* gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
instead of -fdump-tree-optimized.
(dg-final): Scan forwprop2 dump instead of optimized, and remove
the use of vect_int_mod.
* lib/target-supports.exp (check_effective_target_vect_int_mod):
Remove because it's not used anymore.
---

v1->v2: Remove check_effective_target_vect_int_mod as it's now unused.

This fixes the test failure on loongarch64-linux-gnu.  Also tested on
x86_64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.dg/pr104992.c   |  5 ++---
 gcc/testsuite/lib/target-supports.exp | 13 -
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c
index 82f8c75559c..6fd513d34b2 100644
--- a/gcc/testsuite/gcc.dg/pr104992.c
+++ b/gcc/testsuite/gcc.dg/pr104992.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/104992 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
 
 #define vector __attribute__((vector_size(4*sizeof(int
 
@@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x, unsigned 
y, unsigned z) {
 return x / y * z == x;
 }
 
-/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! 
vect_int_mod } } } } */
-/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target vect_int_mod 
} } } */
+/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4138cc9a662..ae33c4f1e3a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9064,19 +9064,6 @@ proc check_effective_target_vect_long_mult { } {
 return $answer
 }
 
-# Return 1 if the target supports vector int modulus, 0 otherwise.
-
-proc check_effective_target_vect_int_mod { } {
-return [check_cached_effective_target_indexed vect_int_mod {
-  expr { ([istarget powerpc*-*-*]
- && [check_effective_target_has_arch_pwr10])
- || [istarget amdgcn-*-*]
- || ([istarget loongarch*-*-*]
-&& [check_effective_target_loongarch_sx])
- || ([istarget riscv*-*-*]
-&& [check_effective_target_riscv_v]) }}]
-}
-
 # Return 1 if the target supports vector even/odd elements extraction, 0 
otherwise.
 
 proc check_effective_target_vect_extract_even_odd { } {
-- 
2.44.0



Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-02-28 Thread Xi Ruoyao
On Thu, 2024-02-29 at 14:08 +0800, Xi Ruoyao wrote:
> > +  "TARGET_TLS_DESC"
> > +  "la.tls.desc\t%0,%1"
> 
> With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead
> of la.tls.desc.  As we don't want to add too many code we can just hard
> code the 4 instructions here instead of splitting this insn, just
> something like
> 
> { return TARGET_EXPLICIT_RELOCS_ALWAS ? ".." : "la.tls.desc\t%0,%1"; }

And if -mcmodel=extreme we should use a 3-operand la.tls.desc.  Or if we
don't want to support this we can just error out if -mcmodel=extreme -
mtls-dialect=desc.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-02-28 Thread Xi Ruoyao
On Thu, 2024-02-29 at 09:42 +0800, mengqinggang wrote:
> Generate la.tls.desc macro instruction for TLS descriptors model.
> 
> la.tls.desc expand to
>   pcalau12i $a0, %desc_pc_hi20(a)
>   ld.d  $a1, $a0, %desc_ld_pc_lo12(a)
>   addi.d    $a0, $a0, %desc_add_pc_lo12(a)
>   jirl  $ra, $a1, %desc_call(a)
> 
> The default is TLS descriptors, but can be configure with
> -mtls-dialect={desc,trad}.

Please keep trad as the default for now.  Glibc-2.40 will be released
after GCC 14.1 but we don't want to end up in a situation where the
default configuration of the latest GCC release creating something not
working with latest Glibc release.

And there's also musl libc we need to take into account.

Or you can write some autoconf test for if the assembler supports
tlsdesc and check TARGET_GLIBC_MAJOR & TARGET_GLIBC_MINOR for Glibc
version to decide if enable desc by default.  If you want this but don't
have time to implement you can leave trad the default and I'll take care
of this.

/* snip */

> +(define_insn "@got_load_tls_desc"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> + (unspec:P
> +     [(match_operand:P 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI A1_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))]

Ok, the clobber list is correct.

> +  "TARGET_TLS_DESC"
> +  "la.tls.desc\t%0,%1"

With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead
of la.tls.desc.  As we don't want to add too many code we can just hard
code the 4 instructions here instead of splitting this insn, just
something like

{ return TARGET_EXPLICIT_RELOCS_ALWAS ? ".." : "la.tls.desc\t%0,%1"; }

> +  [(set_attr "got" "load")
> +   (set_attr "mode" "")])

We need (set_attr "length" "16") in this list as this actually expands
into 16 bytes.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH 2/2] LoongArch: Remove unneeded sign extension after crc/crcc instructions

2024-02-25 Thread Xi Ruoyao
The specification of crc/crcc instructions is clear that the output is
sign-extended to GRLEN.  Add a define_insn to tell the compiler this
fact and allow it to remove the unneeded sign extension on crc/crcc
output.  As crc/crcc instructions are usually used in a tight loop,
this should produce a significant performance gain.

gcc/ChangeLog:

* config/loongarch/loongarch.md
(loongarch__w__w_extended): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/crc-sext.c: New test;
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 11 +++
 gcc/testsuite/gcc.target/loongarch/crc-sext.c | 13 +
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/crc-sext.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 4ded1b3a117..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4264,6 +4264,17 @@ (define_insn "loongarch__w__w"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
+(define_insn "loongarch__w__w_extended"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
+ (match_operand:SI 2 "register_operand" "r")]
+CRC)))]
+  "TARGET_64BIT"
+  ".w..w\t%0,%1,%2"
+  [(set_attr "type" "unknown")
+   (set_attr "mode" "")])
+
 ;; With normal or medium code models, if the only use of a pc-relative
 ;; address is for loading or storing a value, then relying on linker
 ;; relaxation is not better than emitting the machine instruction directly.
diff --git a/gcc/testsuite/gcc.target/loongarch/crc-sext.c 
b/gcc/testsuite/gcc.target/loongarch/crc-sext.c
new file mode 100644
index 000..9ade5a8e4ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/crc-sext.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+**my_crc:
+** crc.w.d.w   \$r4,\$r4,\$r5
+** jr  \$r1
+*/
+int my_crc(long long dword, int crc)
+{
+   return __builtin_loongarch_crc_w_d_w(dword, crc);
+}
-- 
2.44.0



[PATCH 1/2] LoongArch: NFC: Deduplicate crc instruction defines

2024-02-25 Thread Xi Ruoyao
Introduce an iterator for UNSPEC_CRC and UNSPEC_CRCC to make the next
change easier.

gcc/ChangeLog:

* config/loongarch/loongarch.md (CRC): New define_int_iterator.
(crc): New define_int_attr.
(loongarch_crc_w__w, loongarch_crcc_w__w): Unify
into ...
(loongarch__w__w): ... here.
---
 gcc/config/loongarch/loongarch.md | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2ce7a151880..4ded1b3a117 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4251,24 +4251,16 @@ (define_peephole2
 
 
 (define_mode_iterator QHSD [QI HI SI DI])
+(define_int_iterator CRC [UNSPEC_CRC UNSPEC_CRCC])
+(define_int_attr crc [(UNSPEC_CRC "crc") (UNSPEC_CRCC "crcc")])
 
-(define_insn "loongarch_crc_w__w"
+(define_insn "loongarch__w__w"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRC))]
+CRC))]
   ""
-  "crc.w..w\t%0,%1,%2"
-  [(set_attr "type" "unknown")
-   (set_attr "mode" "")])
-
-(define_insn "loongarch_crcc_w__w"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
-  (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRCC))]
-  ""
-  "crcc.w..w\t%0,%1,%2"
+  ".w..w\t%0,%1,%2"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
-- 
2.44.0



Pushed: [GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax

2024-02-23 Thread Xi Ruoyao
On Thu, 2024-02-22 at 19:09 +0800, chenglulu wrote:
> 
> 在 2024/2/22 下午6:20, Xi Ruoyao 写道:
> > To improve Binutils compatibility we've had to backported relaxation
> > support.  But if a user just updates to GCC 13.3 and sticks with
> > Binutils 2.41, there is no reason to use -mno-explicit-relocs as the
> > default because we are turning off relaxation for Binutils 2.41 (it
> > lacks conditional branch relaxation support) anyway.
> > 
> > So like GCC 14, make the default of -m[no-]explicit-relocs depend on
> > -m[no-]relax instead of HAVE_AS_MRELAX_OPTION.  Also update the doc
> > to
> > reflect the behavior change.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/genopts/loongarch.opt.in
> > (TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN.
> > * config/loongarch/loongarch.opt: Regenerate.
> > * config/loongarch/loongarch.cc
> > (loongarch_option_override_internal): Set the default of
> > TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS
> > && !loongarch_mrelax.
> > * doc/invoke.texi (-m[no-]explicit-relocs): Update for
> > LoongArch.
> > ---
> > 
> > Ok for releases/gcc-13?
> 
> LGTM!

Pushed r13-8357.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-23 Thread Xi Ruoyao
On Fri, 2024-02-23 at 11:37 +0800, chenglulu wrote:
> 
> 在 2024/2/23 上午11:27, Xi Ruoyao 写道:
> > On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote:
> > > 在 2024/2/22 下午5:17, Xi Ruoyao 写道:
> > > > The gold linker has never been ported to LoongArch (and it seems
> > > > unlikely to be ported in the future as the new architectures are
> > > > focusing on lld and/or mold for fast linkers).
> > > > 
> > > > ChangeLog:
> > > > 
> > > >     * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
> > > >     list.
> > > >     * configure: Regenerate.
> > > > ---
> > > > 
> > > > Ok for GCC trunk (to get synced into Binutils later)?
> > > I have no problem. But I have a question. Is this modification simply
> > > because we don’t
> > > 
> > > support it or is there an error somewhere?
> > If a user specify --enable-gold building Binutils, with loongarch in
> > this list the building system will attempt to build gold and fail.  If
> > removing loongarch from the list the building system will ignore --
> > enable-gold.
> > 
> Okay, I understand.

Pushed r14-9149 and the Binutils maintainer will pick it up before the
next Binutils release (AFAIK).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-22 Thread Xi Ruoyao
On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote:
> 
> 在 2024/2/22 下午5:17, Xi Ruoyao 写道:
> > The gold linker has never been ported to LoongArch (and it seems
> > unlikely to be ported in the future as the new architectures are
> > focusing on lld and/or mold for fast linkers).
> > 
> > ChangeLog:
> > 
> >     * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
> >     list.
> >     * configure: Regenerate.
> > ---
> > 
> > Ok for GCC trunk (to get synced into Binutils later)?
> 
> I have no problem. But I have a question. Is this modification simply 
> because we don’t
> 
> support it or is there an error somewhere?

If a user specify --enable-gold building Binutils, with loongarch in
this list the building system will attempt to build gold and fail.  If
removing loongarch from the list the building system will ignore --
enable-gold.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax

2024-02-22 Thread Xi Ruoyao
To improve Binutils compatibility we've had to backported relaxation
support.  But if a user just updates to GCC 13.3 and sticks with
Binutils 2.41, there is no reason to use -mno-explicit-relocs as the
default because we are turning off relaxation for Binutils 2.41 (it
lacks conditional branch relaxation support) anyway.

So like GCC 14, make the default of -m[no-]explicit-relocs depend on
-m[no-]relax instead of HAVE_AS_MRELAX_OPTION.  Also update the doc to
reflect the behavior change.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in
(TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the default of
TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS
&& !loongarch_mrelax.
* doc/invoke.texi (-m[no-]explicit-relocs): Update for
LoongArch.
---

Ok for releases/gcc-13?

 gcc/config/loongarch/genopts/loongarch.opt.in |  2 +-
 gcc/config/loongarch/loongarch.cc |  4 
 gcc/config/loongarch/loongarch.opt|  2 +-
 gcc/doc/invoke.texi   | 11 +--
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index da6fedd153e..76acd35d39c 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -155,7 +155,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN)
 Use %reloc() assembly operators.
 
 ; The code model option names for -mcmodel.
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 768e2427285..e78b81cd8fc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6222,6 +6222,10 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
gcc_unreachable ();
 }
 
+  if (TARGET_EXPLICIT_RELOCS == M_OPTION_NOT_SEEN)
+TARGET_EXPLICIT_RELOCS = (HAVE_AS_EXPLICIT_RELOCS
+ && !loongarch_mrelax);
+
   /* Validate the guard size.  */
   int guard_size = param_stack_clash_protection_guard_size;
 
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 59b1e06d3f2..e61fbaed2c1 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -162,7 +162,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN)
 Use %reloc() assembly operators.
 
 ; The code model option names for -mcmodel.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99657fb44d8..792ce283bb9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -25830,12 +25830,11 @@ The default code model is @code{normal}.
 @itemx -mno-explicit-relocs
 Use or do not use assembler relocation operators when dealing with symbolic
 addresses.  The alternative is to use assembler macros instead, which may
-limit optimization.  The default value for the option is determined during
-GCC build-time by detecting corresponding assembler support:
-@code{-mexplicit-relocs} if said support is present,
-@code{-mno-explicit-relocs} otherwise.  This option is mostly useful for
-debugging, or interoperation with assemblers different from the build-time
-one.
+limit instruction scheduling but allow linker relaxation.  The default
+value for the option is determined with the assembler capability detected
+during GCC build-time and the setting of @code{-mrelax}:
+@code{-mexplicit-relocs} if the assembler supports relocation operators
+but @code{-mrelax} is not enabled, @code{-mno-explicit-relocs} otherwise.
 
 @opindex mdirect-extern-access
 @item -mdirect-extern-access
-- 
2.43.2



[PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-22 Thread Xi Ruoyao
The gold linker has never been ported to LoongArch (and it seems
unlikely to be ported in the future as the new architectures are
focusing on lld and/or mold for fast linkers).

ChangeLog:

* configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
list.
* configure: Regenerate.
---

Ok for GCC trunk (to get synced into Binutils later)?

 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 874966fb9f0..02b435c1163 100755
--- a/configure
+++ b/configure
@@ -3092,7 +3092,7 @@ case "${ENABLE_GOLD}" in
   # Check for target supported by gold.
   case "${target}" in
 i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
diff --git a/configure.ac b/configure.ac
index 4f34004a072..1a19c07a27b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -364,7 +364,7 @@ case "${ENABLE_GOLD}" in
   # Check for target supported by gold.
   case "${target}" in
 i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
-- 
2.43.2



Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao
On Tue, 2024-02-20 at 19:50 +0800, chenglulu wrote:
> 
> 在 2024/2/20 下午7:31, Xi Ruoyao 写道:
> > On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:
> > > On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:
> > > 
> > > > So I think that without worrying about performance and ensuring that
> > > > there is no problem
> > > > 
> > > > with binutils, I think we can make the following modifications:
> > > > 
> > > >     -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
> > > >     -   used for padding.  */
> > > >     +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding 
> > > > by
> > > >     +   default.  */
> > > >  #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
> > > >     -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
> > > >     +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> > > > 
> > > > What do you think of it?
> > > Unfortunately it will cause warnings with GAS 2.41 or earlier like
> > > 
> > > t1.s:1: Warning: expected fill pattern missing
> > > t1.s:5: Warning: expected fill pattern missing
> > > 
> > > And AFAIK these things may cause many test failures due to "excessive
> > > errors" if running the GCC test suite with these earlier GAS versions.
> > > Maybe we'll have to add some autoconf-based probing for the linker
> > > anyway?
> > Or just silence the warning passing "--no-warn" to the assembler but I'm
> > highly unsure if this is really a good idea :(.
> > 
> I am not opposed to adding detection code, but I looked at this problem 
> today
> 
> and I think this change is the smallest change. I asked Meng Qinggang and he
> 
> said that the warning of GAS 2.41 can be removed.

Yes, but we cannot change a released binutils-2.41 tarball and Binutils
folks don't make point releases like GCC.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao
On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:
> On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:
> 
> > So I think that without worrying about performance and ensuring that
> > there is no problem
> > 
> > with binutils, I think we can make the following modifications:
> > 
> >    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
> >    -   used for padding.  */
> >    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
> >    +   default.  */
> >     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
> >    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
> >    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> > 
> > What do you think of it?
> 
> Unfortunately it will cause warnings with GAS 2.41 or earlier like
> 
> t1.s:1: Warning: expected fill pattern missing
> t1.s:5: Warning: expected fill pattern missing
> 
> And AFAIK these things may cause many test failures due to "excessive
> errors" if running the GCC test suite with these earlier GAS versions.
> Maybe we'll have to add some autoconf-based probing for the linker
> anyway?

Or just silence the warning passing "--no-warn" to the assembler but I'm
highly unsure if this is really a good idea :(.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao
On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:

> So I think that without worrying about performance and ensuring that 
> there is no problem
> 
> with binutils, I think we can make the following modifications:
> 
>    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
>    -   used for padding.  */
>    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
>    +   default.  */
>     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
>    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
>    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> 
> What do you think of it?

Unfortunately it will cause warnings with GAS 2.41 or earlier like

t1.s:1: Warning: expected fill pattern missing
t1.s:5: Warning: expected fill pattern missing

And AFAIK these things may cause many test failures due to "excessive
errors" if running the GCC test suite with these earlier GAS versions. 
Maybe we'll have to add some autoconf-based probing for the linker
anyway?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-09 Thread Xi Ruoyao
On Fri, 2024-02-09 at 00:02 +0800, chenglulu wrote:
> 
> 在 2024/2/7 上午12:23, Xi Ruoyao 写道:
> > Hi Lulu,
> > 
> > I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
> > ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
> > reasons:
> > 
> > 1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
> > a correctness issue.  For example, a developer may use -falign-
> > functions=16 and then use the low 4 bits of a function pointer to encode
> > some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
> > really aligned to a 16 bytes boundary, causing some breakage.
> > 
> > 2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
> > opcodes.  For example:
> > 
> > .globl _start
> > _start:
> > .balign 32
> > nop
> > nop
> > nop
> > addi.d $a0, $r0, 1
> > .balign 16,54525952,4
> > addi.d $a0, $a0, 1
> > 
> > is assembled and linked to:
> > 
> > 0220 <_start>:
> >   220:  0340    nop
> >   224:  0340    nop
> >   228:  0340    nop
> >   22c:  02c00404    li.d$a0, 1
> >   230:      .word   0x   # <== OOPS!
> >   234:  02c00484    addi.d  $a0, $a0, 1
> > 
> > Arguably this is a bug in GAS (it should at least error out for the
> > unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
> > prefer it to support the 3-operand .align directive even -mrelax for
> > reasons I've given in [1]).  But we can at least work it around by
> > removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
> > 2.42.
> > 
> > 3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
> > ".align 5" which works as expected since Binutils-2.38.
> > 
> > 4. GCC < 14 does not have a default setting of -falign-*, so changing
> > this won't affect anyone who do not specify -falign-* explicitly.
> > 
> > [1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603
> > 
> > Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
> > then?
> > 
> Ok, I agree with you.
> 
> Thanks!

Oops, with Binutils-2.41 GAS will fail to assemble some conditional
branches if we do this :(.

Not sure what to do (maybe backporting both this and a simplified
version of PR112330 fix?)  Let's reconsider after the holiday...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-06 Thread Xi Ruoyao
On Tue, 2024-02-06 at 17:55 +0800, Xi Ruoyao wrote:
> Recently I've fixed two wrong FP vector negate implementation which
> caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
> prevent a similar issue from happening again, add a test case.
> 
> Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
> (with MSA), LoongArch (with LSX and LASX).
> 
> gcc/testsuite:
> 
>   * gcc.dg/vect/vect-neg-zero.c: New test.
> ---
> 
> Ok for trunk?
> 
>  gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 39 +++
>  1 file changed, 39 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> new file mode 100644
> index 000..adb032f5c6a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run } */

This patch fails on Linaro CI for ARM.  I guess I need to remove this {
dg-do run } line and let the test framework to decide run or compile.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-06 Thread Xi Ruoyao
Hi Lulu,

I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
reasons:

1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
a correctness issue.  For example, a developer may use -falign-
functions=16 and then use the low 4 bits of a function pointer to encode
some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
really aligned to a 16 bytes boundary, causing some breakage.

2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
opcodes.  For example:

.globl _start
_start:
.balign 32
nop
nop
nop
addi.d $a0, $r0, 1
.balign 16,54525952,4
addi.d $a0, $a0, 1

is assembled and linked to:

0220 <_start>:
 220:   0340nop
 224:   0340nop
 228:   0340nop
 22c:   02c00404li.d$a0, 1
 230:   .word   0x   # <== OOPS!
 234:   02c00484addi.d  $a0, $a0, 1

Arguably this is a bug in GAS (it should at least error out for the
unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
prefer it to support the 3-operand .align directive even -mrelax for
reasons I've given in [1]).  But we can at least work it around by
removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
2.42.

3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
".align 5" which works as expected since Binutils-2.38.

4. GCC < 14 does not have a default setting of -falign-*, so changing
this won't affect anyone who do not specify -falign-* explicitly.

[1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603

Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
then?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-06 Thread Xi Ruoyao
Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 39 +++
 1 file changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..adb032f5c6a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.43.0



Pushed: [PATCH] MIPS: Fix wrong MSA FP vector negation

2024-02-05 Thread Xi Ruoyao
On Mon, 2024-02-05 at 09:56 +0800, YunQiang Su wrote:
> Xi Ruoyao  于2024年2月5日周一 02:01写道:
> > 
> > We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
> > wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
> > fail when Python is built with MSA enabled.
> > 
> > Use the bnegi.df instructions to simply reverse the sign bit instead.
> > 
> > gcc/ChangeLog:
> > 
> >  * config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
> >  (neg2): Change the mode iterator from MSA to IMSA because
> >  in FP arithmetic we cannot use (0 - x) for -x.
> >  (neg2): New define_insn to implement FP vector negation,
> >  using a bnegi instruction to negate the sign bit.
> > ---
> > 
> > Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk
> > and/or release branches?
> > 
> >   gcc/config/mips/mips-msa.md | 18 +++---
> >   1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> 
> LGTM, while I guess that we also need a test case.

Pushed to trunk and release branches, with a following obvious fix:

diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
index 920161ed1d8..779157f2a0c 100644
--- a/gcc/config/mips/mips-msa.md
+++ b/gcc/config/mips/mips-msa.md
@@ -613,7 +613,7 @@ (define_expand "neg2"
 
 (define_insn "neg2"
   [(set (match_operand:FMSA 0 "register_operand" "=f")
-   (neg (match_operand:FMSA 1 "register_operand" "f")))]
+   (neg:FMSA (match_operand:FMSA 1 "register_operand" "f")))]
   "ISA_HAS_MSA"
   "bnegi.\t%w0,%w1,"
   [(set_attr "type" "simd_bit")

I'll write a test case for gcc.dg/vect later (now I have to do
$SOME_REAL_LIFE_THING...)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] MIPS: Fix wrong MSA FP vector negation

2024-02-04 Thread Xi Ruoyao
We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
fail when Python is built with MSA enabled.

Use the bnegi.df instructions to simply reverse the sign bit instead.

gcc/ChangeLog:

* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.
---

Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk
and/or release branches?

 gcc/config/mips/mips-msa.md | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
index 83d9a08e360..920161ed1d8 100644
--- a/gcc/config/mips/mips-msa.md
+++ b/gcc/config/mips/mips-msa.md
@@ -231,6 +231,10 @@ (define_mode_attr bitimm
(V4SI  "uimm5")
(V2DI  "uimm6")])
 
+;; The index of sign bit in FP vector elements.
+(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63")
+(V4SF "31") (V8SF "31")])
+
 (define_expand "vec_init"
   [(match_operand:MSA 0 "register_operand")
(match_operand:MSA 1 "")]
@@ -597,9 +601,9 @@ (define_expand "abs2"
 })
 
 (define_expand "neg2"
-  [(set (match_operand:MSA 0 "register_operand")
-   (minus:MSA (match_dup 2)
-  (match_operand:MSA 1 "register_operand")))]
+  [(set (match_operand:IMSA 0 "register_operand")
+   (minus:IMSA (match_dup 2)
+  (match_operand:IMSA 1 "register_operand")))]
   "ISA_HAS_MSA"
 {
   rtx reg = gen_reg_rtx (mode);
@@ -607,6 +611,14 @@ (define_expand "neg2"
   operands[2] = reg;
 })
 
+(define_insn "neg2"
+  [(set (match_operand:FMSA 0 "register_operand" "=f")
+   (neg (match_operand:FMSA 1 "register_operand" "f")))]
+  "ISA_HAS_MSA"
+  "bnegi.\t%w0,%w1,"
+  [(set_attr "type" "simd_bit")
+   (set_attr "mode" "")])
+
 (define_expand "msa_ldi"
   [(match_operand:IMSA 0 "register_operand")
(match_operand 1 "const_imm10_operand")]
-- 
2.43.0



Pushed: [PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-04 Thread Xi Ruoyao
On Sun, 2024-02-04 at 11:19 +0800, chenglulu wrote:
> 
> 在 2024/2/2 下午5:55, Xi Ruoyao 写道:
> > We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
> > But in loongarch_symbol_insns:
> > 
> >  if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
> >    return 0;
> > 
> > And LSX_SUPPORTED_MODE_P is defined as:
> > 
> >  #define LSX_SUPPORTED_MODE_P(MODE) \
> >    (ISA_HAS_LSX \
> >     && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...
> > 
> > GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:
> > 
> >  ALWAYS_INLINE poly_uint16
> >  mode_to_bytes (machine_mode mode)
> >  {
> >  #if GCC_VERSION >= 4001
> >    return (__builtin_constant_p (mode)
> >   ? mode_size_inline (mode) : mode_size[mode]);
> >  #else
> >    return mode_size[mode];
> >  #endif
> >  }
> > 
> > There is an assertion in mode_size_inline:
> > 
> >  gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);
> > 
> > Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
> > thus if __builtin_constant_p (mode) is evaluated true (it happens when
> > GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
> > cause an ICE.  OTOH if __builtin_constant_p (mode) is evaluated false,
> > mode_size[mode] is still an out-of-bound array access (the length or the
> > mode_size array is NUM_MACHINE_MODES).
> > 
> > So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
> > MAX_MACHINE_MODE in loongarch_symbol_insns.  This is very similar to a
> > MIPS bug PR98491 fixed by me about 3 years ago.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
> > use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
> > MAX_MACHINE_MODE.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> LGTM!

Pushed r14-8785.

> I have a question. I see that you often add compilation options in 
> BOOT_CFLAGS.
> 
> I also want to test it. Do you have a recommended set of compilation 
> options?

When I build a compiler for my system I use
{BOOT_{C,CXX,LD}FLAGS,{C,CXX,LD}FLAGS_FOR_TARGET}="-O3 -march=la664 -
mtune=la664 -pipe -fgraphite-identity -floop-nest-optimize -fipa-pta -
fdevirtualize-at-ltrans -fno-semantic-interposition -Wl,-O1 -Wl,--as-
needed"

and enable PGO (make profiledbootstrap) and LTO (--with-build-
config=bootstrap-lto).

All of them but GRAPHITE (-fgraphite-identity -floop-nest-optimize)
seems "pretty safe" on the architectures I have a hardware of.  GRAPHITE
is causing bootstrap failure on AArch64 with GCC 13 (PR109929) if
combined with PGO and the real cause is still not found yet.

But when I do a test build I normally only enable the flags which may
help to catch some issues, for example when a change only affects LTO I
add --with-build-config=bootstrap-lto, when changing something related
to LASX I use -O3 -mlasx (or -O3 -march=la664) as BOOT_CFLAGS.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH] LoongArch: Fix wrong LSX FP vector negation

2024-02-04 Thread Xi Ruoyao
On Sun, 2024-02-04 at 11:20 +0800, chenglulu wrote:
> 
> 在 2024/2/3 下午4:58, Xi Ruoyao 写道:
> > We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
> > wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
> > fail when Python is built with LSX enabled.
> > 
> > Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
> > instead.  We are already doing this for LASX and now we can unify them
> > into simd.md.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/lsx.md (neg2): Remove the
> > incorrect expand.
> > * config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
> > (elmsgnbit): Likewise.
> > (neg2): New define_insn.
> > * config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
> > are now instantiated in simd.md.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> LGTM!
> 
> Thanks!

Pushed r14-8785.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH] LoongArch: Fix an ODR violation

2024-02-03 Thread Xi Ruoyao
On Fri, 2024-02-02 at 10:42 +0800, chenglulu wrote:
> LGTM!
> 
> Thanks!

Pushed r14-8773.

> 在 2024/2/2 上午5:54, Xi Ruoyao 写道:
> > When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR
> > violation is detected:
> > 
> >  ../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
> >  'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
> >  57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
> >  ../../gcc/config/loongarch/loongarch-def.cc:186: note:
> >  'abi_minimal_isa' was previously declared here
> >  186 |   abi_minimal_isa = array,
> >  ../../gcc/config/loongarch/loongarch-def.cc:186: note:
> >  code may be misoptimized unless '-fno-strict-aliasing' is used
> > 
> > Fix it by adding a proper declaration of abi_minimal_isa into
> > loongarch-def.h and remove the ODR-violating local declaration in
> > loongarch-opts.cc.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
> > * config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
> > the ODR-violating locale declaration.
> > ---
> > 
> > Bootstrapped on loongarch64-linux-gnu.  Not fully regtested but it
> > should be an obvious fix.  Ok for trunk?
> > 
> >   gcc/config/loongarch/loongarch-def.h   | 3 +++
> >   gcc/config/loongarch/loongarch-opts.cc | 2 --
> >   2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch-def.h 
> > b/gcc/config/loongarch/loongarch-def.h
> > index a1237ecf1fd..2dbf006d013 100644
> > --- a/gcc/config/loongarch/loongarch-def.h
> > +++ b/gcc/config/loongarch/loongarch-def.h
> > @@ -203,5 +203,8 @@ extern loongarch_def_array > N_TUNE_TYPES>
> >     loongarch_cpu_align;
> >   extern loongarch_def_array
> >     loongarch_cpu_rtx_cost_data;
> > +extern loongarch_def_array<
> > +  loongarch_def_array,
> > +  N_ABI_BASE_TYPES> abi_minimal_isa;
> >   
> >   #endif /* LOONGARCH_DEF_H */
> > diff --git a/gcc/config/loongarch/loongarch-opts.cc 
> > b/gcc/config/loongarch/loongarch-opts.cc
> > index b87299513c9..7eeac43ed2f 100644
> > --- a/gcc/config/loongarch/loongarch-opts.cc
> > +++ b/gcc/config/loongarch/loongarch-opts.cc
> > @@ -53,8 +53,6 @@ static const int tm_multilib_list[] = { TM_MULTILIB_LIST 
> > };
> >   static int enabled_abi_types[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES] = { 0 };
> >   
> >   #define isa_required(ABI) (abi_minimal_isa[(ABI).base][(ABI).ext])
> > -extern "C" const struct loongarch_isa
> > -abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
> >   
> >   static inline int
> >   is_multilib_enabled (struct loongarch_abi abi)
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Fix wrong LSX FP vector negation

2024-02-03 Thread Xi Ruoyao
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
fail when Python is built with LSX enabled.

Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead.  We are already doing this for LASX and now we can unify them
into simd.md.

gcc/ChangeLog:

* config/loongarch/lsx.md (neg2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/lasx.md | 16 
 gcc/config/loongarch/lsx.md  | 11 ---
 gcc/config/loongarch/simd.md | 18 ++
 3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e2115ffb884..ac84db7f0ce 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -3028,22 +3028,6 @@ (define_insn "absv8sf2"
   [(set_attr "type" "simd_logic")
(set_attr "mode" "V8SF")])
 
-(define_insn "negv4df2"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-   (neg:V4DF (match_operand:V4DF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.d\t%u0,%u1,63"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "negv8sf2"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-   (neg:V8SF (match_operand:V8SF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.w\t%u0,%u1,31"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V8SF")])
-
 (define_insn "xvfmadd4"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
(fma:FLASX (match_operand:FLASX 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 7002edae4d4..b9b94b9079c 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -728,17 +728,6 @@ (define_expand "neg2"
   DONE;
 })
 
-(define_expand "neg2"
-  [(set (match_operand:FLSX 0 "register_operand")
-   (neg:FLSX (match_operand:FLSX 1 "register_operand")))]
-  "ISA_HAS_LSX"
-{
-  rtx reg = gen_reg_rtx (mode);
-  emit_move_insn (reg, CONST0_RTX (mode));
-  emit_insn (gen_sub3 (operands[0], reg, operands[1]));
-  DONE;
-})
-
 (define_expand "lsx_vrepli"
   [(match_operand:ILSX 0 "register_operand")
(match_operand 1 "const_imm10_operand")]
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index cb0a19447a1..00ff2823a4e 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -85,12 +85,21 @@ (define_mode_attr simdfmt [(V2DF "d") (V4DF "d")
 (define_mode_attr simdifmt_for_f [(V2DF "l") (V4DF "l")
  (V4SF "w") (V8SF "w")])
 
+;; Suffix for integer mode in LSX or LASX instructions to operating FP
+;; vectors using integer vector operations.
+(define_mode_attr simdfmt_as_i [(V2DF "d") (V4DF "d")
+   (V4SF "w") (V8SF "w")])
+
 ;; Size of vector elements in bits.
 (define_mode_attr elmbits [(V2DI "64") (V4DI "64")
   (V4SI "32") (V8SI "32")
   (V8HI "16") (V16HI "16")
   (V16QI "8") (V32QI "8")])
 
+;; The index of sign bit in FP vector elements.
+(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63")
+(V4SF "31") (V8SF "31")])
+
 ;; This attribute is used to form an immediate operand constraint using
 ;; "const__operand".
 (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
@@ -457,6 +466,15 @@ (define_expand "reduc__scal_"
   DONE;
 })
 
+;; FP negation.
+(define_insn "neg2"
+  [(set (match_operand:FVEC 0 "register_operand" "=f")
+   (neg:FVEC (match_operand:FVEC 1 "register_operand" "f")))]
+  ""
+  "vbitrevi.\t%0,%1,"
+  [(set_attr "type" "simd_logic")
+   (set_attr "mode" "")])
+
 ; The LoongArch SX Instructions.
 (include "lsx.md")
 
-- 
2.43.0



[PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-02 Thread Xi Ruoyao
We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
But in loongarch_symbol_insns:

if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
  return 0;

And LSX_SUPPORTED_MODE_P is defined as:

#define LSX_SUPPORTED_MODE_P(MODE) \
  (ISA_HAS_LSX \
   && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...

GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:

ALWAYS_INLINE poly_uint16
mode_to_bytes (machine_mode mode)
{
#if GCC_VERSION >= 4001
  return (__builtin_constant_p (mode)
  ? mode_size_inline (mode) : mode_size[mode]);
#else
  return mode_size[mode];
#endif
}

There is an assertion in mode_size_inline:

gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);

Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
thus if __builtin_constant_p (mode) is evaluated true (it happens when
GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
cause an ICE.  OTOH if __builtin_constant_p (mode) is evaluated false,
mode_size[mode] is still an out-of-bound array access (the length or the
mode_size array is NUM_MACHINE_MODES).

So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
MAX_MACHINE_MODE in loongarch_symbol_insns.  This is very similar to a
MIPS bug PR98491 fixed by me about 3 years ago.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
MAX_MACHINE_MODE.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 963e86d61af..6badef45d62 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2007,7 +2007,8 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, 
machine_mode mode)
 {
   /* LSX LD.* and ST.* cannot support loading symbols via an immediate
  operand.  */
-  if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
+  if (mode != MAX_MACHINE_MODE
+  && (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)))
 return 0;
 
   switch (type)
-- 
2.43.0



[PATCH] LoongArch: Fix an ODR violation

2024-02-01 Thread Xi Ruoyao
When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR
violation is detected:

../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
../../gcc/config/loongarch/loongarch-def.cc:186: note:
'abi_minimal_isa' was previously declared here
186 |   abi_minimal_isa = array,
../../gcc/config/loongarch/loongarch-def.cc:186: note:
code may be misoptimized unless '-fno-strict-aliasing' is used

Fix it by adding a proper declaration of abi_minimal_isa into
loongarch-def.h and remove the ODR-violating local declaration in
loongarch-opts.cc.

gcc/ChangeLog:

* config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
* config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
the ODR-violating locale declaration.
---

Bootstrapped on loongarch64-linux-gnu.  Not fully regtested but it
should be an obvious fix.  Ok for trunk?

 gcc/config/loongarch/loongarch-def.h   | 3 +++
 gcc/config/loongarch/loongarch-opts.cc | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index a1237ecf1fd..2dbf006d013 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -203,5 +203,8 @@ extern loongarch_def_array
   loongarch_cpu_align;
 extern loongarch_def_array
   loongarch_cpu_rtx_cost_data;
+extern loongarch_def_array<
+  loongarch_def_array,
+  N_ABI_BASE_TYPES> abi_minimal_isa;
 
 #endif /* LOONGARCH_DEF_H */
diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index b87299513c9..7eeac43ed2f 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -53,8 +53,6 @@ static const int tm_multilib_list[] = { TM_MULTILIB_LIST };
 static int enabled_abi_types[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES] = { 0 };
 
 #define isa_required(ABI) (abi_minimal_isa[(ABI).base][(ABI).ext])
-extern "C" const struct loongarch_isa
-abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
 
 static inline int
 is_multilib_enabled (struct loongarch_abi abi)
-- 
2.43.0



Re: [PATCH] Change gcc/ira-conflicts.cc build_conflict_bit_table to use size_t/%zu

2024-02-01 Thread Xi Ruoyao
On Thu, 2024-02-01 at 14:55 +0100, Jakub Jelinek wrote:
> On Thu, Feb 01, 2024 at 01:42:03PM +, Jonathan Yong wrote:
> > On 2/1/24 13:06, Xi Ruoyao wrote:
> > > On Thu, 2024-02-01 at 14:01 +0100, Jakub Jelinek wrote:
> > > > On Thu, Feb 01, 2024 at 12:45:31PM +, Jonathan Yong wrote:
> > > > > Attached patch OK? Copied inline for review convenience.
> > > > 
> > > > No, I think e.g. AIX doesn't support the z modifier.
> > > > I don't see %zd or %zu used anywhere except in gcc/jit/ which presumably
> > > > doesn't work on AIX.
> > > > 
> > > 
> > > Should use HOST_WIDE_INT_PRINT_UNSIGNED instead of PRIu64.
> > > 
> > Updated the patch with the suggestions.

I mean if you are casting it to unsigned HOST_WIDE_INT, you should use
HOST_WIDE_INT_PRINT_UNSIGNED,  If you are casting it to size_t you
cannot use it (as Jakub has explained).

When you use printf-like things you have to keep the correspondence
between format specifier and the argument itself, 

> No, that is wrong.  That will break bootstrap on lots of hosts, any time
> size_t is not unsigned long (if unsigned long is 64-bit) or unsigned long
> long (if unsigned long is not 64-bit).
> That includes e.g. all targets where size_t is unsigned int, and some others
> too.
> 
>   Jakub
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Change gcc/ira-conflicts.cc build_conflict_bit_table to use size_t/%zu

2024-02-01 Thread Xi Ruoyao
On Thu, 2024-02-01 at 14:01 +0100, Jakub Jelinek wrote:
> On Thu, Feb 01, 2024 at 12:45:31PM +, Jonathan Yong wrote:
> > Attached patch OK? Copied inline for review convenience.
> 
> No, I think e.g. AIX doesn't support the z modifier.
> I don't see %zd or %zu used anywhere except in gcc/jit/ which presumably
> doesn't work on AIX.
> 
> If you really want to avoid truncation, perhaps do something like
>   if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
>     {
>   if (sizeof (void *) <= sizeof (long))
>   fprintf (ira_dump_file,
>"+++Allocating %lu bytes for conflict table "
>"(uncompressed size %lu)\n",
>(unsigned long) (sizeof (IRA_INT_TYPE) * allocated_words_num),
>(unsigned long) (sizeof (IRA_INT_TYPE) * object_set_words
>     * ira_objects_num));
>   else
>   fprintf (ira_dump_file,
>"+++Allocating %l" PRIu64 "bytes for conflict table "
>"(uncompressed size %" PRIu64 ")\n",

Should use HOST_WIDE_INT_PRINT_UNSIGNED instead of PRIu64.

>(unsigned HOST_WIDE_INT) (sizeof (IRA_INT_TYPE)
>      * allocated_words_num),
>(unsigned HOST_WIDE_INT) (sizeof (IRA_INT_TYPE)
>      * object_set_words
>      * ira_objects_num));
>     }

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Fix soft-float builds of libffi

2024-01-31 Thread Xi Ruoyao
On Sat, 2024-01-27 at 15:09 +0800, Yang Yujie wrote:
> This patch correspond to the upstream PR:
> https://github.com/libffi/libffi/pull/817
> 
> libffi/ChangeLog:
> 
>   * src/loongarch64/ffi.c: Avoid defining floats
>   in struct call_context if the ABI is soft-float.

You need to wait until the PR is accepted by the libffi maintainers. 
Frankly I don't know what libffi maintainers are busy on and I'm
frustrated as well (having a MIPS patch unreviewed there for a month)
but this is the procedure :(.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread Xi Ruoyao
On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote:
> On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote:
> > 
> > 在 2024/1/26 下午6:57, Xi Ruoyao 写道:
> > > On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:
> > > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道:
> > > > > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> > > > > > v3 -> v4:
> > > > > >     1. Add macro support for TLS symbols
> > > > > >     2. Added support for loading __get_tls_addr symbol address 
> > > > > > using call36.
> > > > > >     3. Merge template got_load_tls_{ld/gd/le/ie}.
> > > > > >     4. Enable explicit reloc for extreme TLS GD/LD with 
> > > > > > -mexplicit-relocs=auto.
> > > > > I've rebased and attached the patch to fix the bad split in 
> > > > > -mexplicit-
> > > > > relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
> > > > > tested it seriously though (only tested the added and modified test
> > > > > cases).
> > > > > 
> > > > OK, I'll test the spec for correctness.
> > > I suppose this still won't work yet because Binutils is not fully fixed.
> > > GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
> > > foo", but ld is still not checking if an R_LARCH_RELAX is after
> > > R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
> > > transition can still happen.
> > > 
> > 
> > The following situations are not handled in the patch:
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc 
> > b/gcc/config/loongarch/loongarch.cc
> > 
> > index 3fab4b64453..6336a9f696f 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree 
> > thunk_fndecl ATTRIBUTE_UNUSED,
> >   {
> >     if (TARGET_CMODEL_EXTREME)
> >  {
> > - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
> > + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
> > +   {
> > + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
> > + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
> > +   }
> > + else
> > +   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));

It looks like this part is unreachable: with -mcmodel=extreme
use_sibcall_p will never be true.

So cleaned up this part and fixed an ERROR in the added test:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3a97ba61362..7b8c85a1606 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7481,21 +7481,24 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,
  allowed, otherwise load the address into a register first.  */
   if (use_sibcall_p)
 {
-  if (TARGET_CMODEL_EXTREME)
-   {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
- insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
-   }
-  else
-   insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
+  /* If TARGET_CMODEL_EXTREME, we cannot do a direct jump at all
+and const_call_insn_operand should have returned false.  */
+  gcc_assert (!TARGET_CMODEL_EXTREME);
+
+  insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
   SIBLING_CALL_P (insn) = 1;
 }
   else
 {
-  if (TARGET_CMODEL_EXTREME)
+  if (!TARGET_CMODEL_EXTREME)
+   loongarch_emit_move (temp1, fnaddr);
+  else if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)
emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
   else
-   loongarch_emit_move (temp1, fnaddr);
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
 
   emit_jump_insn (gen_indirect_jump (temp1));
 }
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
index 27baf4886d6..35bd4570a9e 100644
--- 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
+++ 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=extreme -fno-plt" } 
*/
-/* { dg-final { scan-assembler-not "la.tls.[lg]d" { target tls_n

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread Xi Ruoyao
On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote:
> 
> 在 2024/1/26 下午6:57, Xi Ruoyao 写道:
> > On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:
> > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道:
> > > > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> > > > > v3 -> v4:
> > > > >     1. Add macro support for TLS symbols
> > > > >     2. Added support for loading __get_tls_addr symbol address using 
> > > > > call36.
> > > > >     3. Merge template got_load_tls_{ld/gd/le/ie}.
> > > > >     4. Enable explicit reloc for extreme TLS GD/LD with 
> > > > > -mexplicit-relocs=auto.
> > > > I've rebased and attached the patch to fix the bad split in -mexplicit-
> > > > relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
> > > > tested it seriously though (only tested the added and modified test
> > > > cases).
> > > > 
> > > OK, I'll test the spec for correctness.
> > I suppose this still won't work yet because Binutils is not fully fixed.
> > GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
> > foo", but ld is still not checking if an R_LARCH_RELAX is after
> > R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
> > transition can still happen.
> > 
> 
> The following situations are not handled in the patch:
> 
> diff --git a/gcc/config/loongarch/loongarch.cc 
> b/gcc/config/loongarch/loongarch.cc
> 
> index 3fab4b64453..6336a9f696f 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree 
> thunk_fndecl ATTRIBUTE_UNUSED,
>   {
>     if (TARGET_CMODEL_EXTREME)
>  {
> - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
> + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
> +   {
> + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
> + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
> +   }
> + else
> +   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
>    insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
>  }
>     else
> @@ -7482,7 +7488,15 @@ loongarch_output_mi_thunk (FILE *file, tree 
> thunk_fndecl ATTRIBUTE_UNUSED,
>     else
>   {
>     if (TARGET_CMODEL_EXTREME)
> -   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
> +   {
> + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
> +   {
> + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
> + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
> +   }
> + else
> +   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
> +   }
>     else
>  loongarch_emit_move (temp1, fnaddr);

In deed.  Considering the similarity of these two hunks I'll separate
the logic into a static function though.  And I'll also add some test
case for them...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:
> 
> 在 2024/1/26 下午4:49, Xi Ruoyao 写道:
> > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> > > v3 -> v4:
> > >    1. Add macro support for TLS symbols
> > >    2. Added support for loading __get_tls_addr symbol address using 
> > > call36.
> > >    3. Merge template got_load_tls_{ld/gd/le/ie}.
> > >    4. Enable explicit reloc for extreme TLS GD/LD with 
> > > -mexplicit-relocs=auto.
> > I've rebased and attached the patch to fix the bad split in -mexplicit-
> > relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
> > tested it seriously though (only tested the added and modified test
> > cases).
> > 
> OK, I'll test the spec for correctness.

I suppose this still won't work yet because Binutils is not fully fixed.
GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
transition can still happen.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 2/4] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> +;; Use two registers to get the global symbol address from the got table.
> +;; la.global rd, rt, sym
> +
> +(define_insn_and_split "movdi_symbolic_off64"
> + [(set (match_operand:DI 0 "register_operand" "=r,r")
> +   (match_operand:DI 1 "symbolic_off64_or_reg_operand" "Yd,r"))
> +  (unspec:DI [(const_int 0)]
> +    UNSPEC_LOAD_SYMBOL_OFFSET64)
> +  (clobber (match_operand:DI 2 "register_operand" "=,r"))]
> + "TARGET_64BIT && TARGET_CMODEL_EXTREME"
> +{
> +  if (which_alternative == 1)
> +    return "#";
> +
> +  enum loongarch_symbol_type symbol_type;
> +  gcc_assert (loongarch_symbolic_constant_p (operands[1], _type));
> +
> +  switch (symbol_type)
> +    {
> +    case SYMBOL_PCREL64:
> +  return "la.local\t%0,%2,%1";
> +    case SYMBOL_GOT_DISP:
> +  return "la.global\t%0,%2,%1";
> +    case SYMBOL_TLS_IE:
> +  return "la.tls.ie\t%0,%2,%1";
> +    case SYMBOL_TLSGD:
> +  return "la.tls.gd\t%0,%2,%1";
> +    case SYMBOL_TLSLDM:
> +  return "la.tls.ld\t%0,%2,%1";
> +
> +    default:
> +  gcc_unreachable ();
> +  }
> +}
> + "&& REG_P (operands[1]) && find_reg_note (insn, REG_UNUSED, operands[2]) != 
> 0"
> + [(set (match_dup 0) (match_dup 1))]
> + ""
> + [(set_attr "mode" "DI")
> +  (set_attr "length" "5")])

Should be 20, in bytes.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

> +(define_insn "@load_tls"
>    [(set (match_operand:P 0 "register_operand" "=r")
>   (unspec:P
>       [(match_operand:P 1 "symbolic_operand" "")]
> -     UNSPEC_TLS_GD))]
> +     UNSPEC_TLS))]

/* snip */

> +{
> +  enum loongarch_symbol_type symbol_type;
> +  gcc_assert (loongarch_symbolic_constant_p (operands[1],
> _type));

/* snip */

> +  switch (symbol_type)
> +    {
> +    case SYMBOL_TLS_LE:
> +  return "la.tls.le\t%0,%1";
> +    case SYMBOL_TLS_IE:
> +  return "la.tls.ie\t%0,%1";
> +    case SYMBOL_TLSLDM:
> +  return "la.tls.ld\t%0,%1";
> +    case SYMBOL_TLSGD:
> +  return "la.tls.gd\t%0,%1";

/* snip */

> +    default:
> +  gcc_unreachable ();
> +    }
> +}
> +  [(set_attr "mode" "")
> +   (set_attr "length" "2")])

Should be 8, it's in bytes.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> v3 -> v4:
>   1. Add macro support for TLS symbols
>   2. Added support for loading __get_tls_addr symbol address using call36.
>   3. Merge template got_load_tls_{ld/gd/le/ie}.
>   4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From 87c9eafd88ae4a4339e094af08c77e7dfc9ea700 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Fri, 5 Jan 2024 18:40:06 +0800
Subject: [PATCH] LoongArch: Don't split the instructions containing relocs for
 extreme code model

The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent.  So model them as "one large
instruction", i.e. define_insn, with two output registers.  The real
address is the sum of these two registers.

The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (unspec): Add
	UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
	(la_pcrel64_two_parts): New define_insn.
	* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
	typo in the comment.
	(loongarch_call_tls_get_addr): If -mcmodel=extreme
	-mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for
	addressing the TLS symbol and __tls_get_addr.  Emit an REG_EQUAL
	note to allow CSE addressing __tls_get_addr.
	(loongarch_legitimize_tls_address): If -mcmodel=extreme
	-mexplicit-relocs={always,auto}, address TLS IE symbols with
	la_pcrel64_two_parts.
	(loongarch_split_symbol): If -mcmodel=extreme
	-mexplicit-relocs={always,auto}, address symbols with
	la_pcrel64_two_parts.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
	Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
	instruction sequences are not reordered by the compiler.
	(NOIPA): Disallow interprocedural optimizations.
	* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
	duplicated from func-call-extreme-1.c, include it instead.
	(dg-options): Likewise.
	* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
	Likewise.
	* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
	Likewise.
	* gcc.target/loongarch/cmodel-extreme-1.c: New test.
	* gcc.target/loongarch/cmodel-extreme-2.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 112 ++
 gcc/config/loongarch/loongarch.md |  20 
 .../gcc.target/loongarch/cmodel-extreme-1.c   |  18 +++
 .../gcc.target/loongarch/cmodel-extreme-2.c   |   7 ++
 .../loongarch/func-call-extreme-1.c   |  14 ++-
 .../loongarch/func-call-extreme-2.c   |  29 +
 .../loongarch/func-call-extreme-3.c   |   2 +-
 .../loongarch/func-call-extreme-4.c   |   2 +-
 8 files changed, 122 insertions(+), 82 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-2.c

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 481903147b8..e70ce80c7b3 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2737,7 +2737,7 @@ loongarch_add_offset (rtx temp, rtx reg, HOST_WIDE_INT offset)
   return plus_constant (Pmode, reg, offset);
 }
 
-/* The __tls_get_attr symbol.  */
+/* The __tls_get_addr symbol.  */
 static GTY (()) rtx loongarch_tls_symbol;
 
 /* Load an entry for a TLS access.  */
@@ -2777,20 +2777,22 @@ loongarch_call_tls_get_addr (rtx sym, enum loongarch_symbol_type type, rtx v0)
 
   if (loongarch_explicit_relocs_p (type))
 {
-  /* Split tls symbol to high and low.  */
-  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
-  high = loongarch_force_temporary (tmp, high);
-
   if (TARGET_CMODEL_EXTREME)
 	{
-	  rtx tmp1 = gen_reg_rtx (Pmode);
-	  emit_insn (gen_tls_low (Pmode, tmp1, gen_rtx_REG (Pmode, 0), loc));
-	  emit_insn (gen_lui_h_lo20 (tmp1, tmp1, loc));
-	  emit_insn (gen_lui_h_hi12 (tmp1, tmp1, loc));
-	  emit_move_insn (a0, gen_rtx_PLUS (Pmode, high, tmp1));
+	  rtx part1 = gen_reg_rtx (Pmode);
+	  rtx part2 = gen_reg_rtx (Pmode);
+
+	  emit_insn (gen_la_pcrel64_two_parts (part1, part2, loc));
+	  emit_move_insn (a0, gen_rtx_PLUS (Pmode, part1, part2));
 	}
   else
-	emit_insn (gen_tls_low (Pmode, a0, high, loc));
+	{
+	  /* Split tls symbol to high and low.  */
+	  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
+
+	  high = loongarch_force_temporary (tmp, high);
+	  emit_insn (gen_tls_low (Pmode, a0, high, loc));
+	}
 }
   else
 emit_insn (loongarch_load_tls (a0, loc, type));
@@ -2870,20 +2872,30 @@ loongarc

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-24 Thread Xi Ruoyao
On Thu, 2024-01-25 at 08:48 +0800, chenglulu wrote:
> 
> 在 2024/1/24 上午3:36, Xi Ruoyao 写道:
> > On Mon, 2024-01-22 at 15:27 +0800, chenglulu wrote:
> > > > > The failure of this test case was because the compiler believes that 
> > > > > two
> > > > > (UNSPEC_PCREL_64_PART2 [(symbol)]) instances would always produce the
> > > > > same result, but this isn't true because the result depends on PC.  
> > > > > Thus
> > > > > (pc) needed to be included in the RTX, like:
> > > > > 
> > > > >     [(set (match_operand:DI 0 "register_operand" "=r")
> > > > >   (unspec:DI [(match_operand:DI 2 "") (pc)]
> > > > > UNSPEC_LA_PCREL_64_PART1))
> > > > >  (set (match_operand:DI 1 "register_operand" "=r")
> > > > >   (unspec:DI [(match_dup 2) (pc)] UNSPEC_LA_PCREL_64_PART2))]
> > > > > 
> > > > > With this the buggy REG_UNUSED notes were gone.  But it then prevented
> > > > > the CSE when loading the address of __tls_get_addr (i.e. if we address
> > > > > 10 TLE_LD symbols in a function it would emit 10 instances of 
> > > > > "la.global
> > > > > __tls_get_addr") so I added an REG_EQUAL note for it.  For symbols 
> > > > > other
> > > > > than __tls_get_addr such notes are added automatically by optimization
> > > > > passes.
> > > > > 
> > > > > Updated patch attached.
> > > > > 
> > > > I'm eliminating redundant la.global directives in my macro
> > > > implementation.
> > > > 
> > > > I will be testing this patch.
> > > > 
> > > > 
> > > > 
> > > > 
> > > With this patch, spec2006 can pass the test, but spec2017 621 and 654
> > > tests fail.
> > > I haven't debugged the specific cause of the problem yet.
> > Try removing the TARGET_DELEGITIMIZE_ADDRESS hook?  After eating some
> > unhealthy food in the midnight I realized the hook only
> > papers over the same issue caused spec2006 failure.  I tried a bootstrap
> > with BOOT_CFLAGS=-O2 -g -mcmodel=extreme and TARGET_DELEGITIMIZE_ADDRESS
> > commented out, and there is no more spurious "note: non-delegitimized
> > UNSPEC UNSPEC_LA_PCREL_64_PART1 (42) found in variable location" things.
> > I feel that this hook is still written in a buggy way, so maybe removing
> > it will solve the spec2017 issue.
> > 
> I found the problem. Binutils did not consider the four instructions 
> when converting the type from TLS IE to TLS LE, which caused the conversion 
> error.

Oooops.  We better fix this quickly as the Binutils 2.42 release is
imminent.

Maybe we can just disable TLS linker optimization once we see an
R_LARCH_TLS_DESC64* or R_LARCH_TLS_IE64*.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread Xi Ruoyao
On Wed, 2024-01-24 at 19:08 +0800, chenxiaolong wrote:
> At 19:00 +0800 on Wednesday, 2024-01-24, Xi Ruoyao wrote:
> > On Wed, 2024-01-24 at 18:32 +0800, chenxiaolong wrote:
> > > On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoyao wrote:
> > > > The vect_int_mod target selector is evaluated with the options in
> > > > DEFAULT_VECTCFLAGS in effect, but these options are not
> > > > automatically
> > > > passed to tests out of the vect directories.  So this test fails
> > > > on
> > > > targets where integer vector modulo operation is supported but
> > > > requiring
> > > > an option to enable, for example LoongArch.
> > > > 
> > > > In this test case, the only expected optimization not happened in
> > > > original is in corge because it needs forward propogation.  So we
> > > > can
> > > > scan the forwprop2 dump (where the vector operation is not
> > > > expanded
> > > > to
> > > > scalars yet) instead of optimized, then we don't need to consider
> > > > vect_int_mod or not.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > PR testsuite/113418
> > > > * gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
> > > > instead of -fdump-tree-optimized.
> > > > (dg-final): Scan forwprop2 dump instead of optimized, and
> > > > remove
> > > > the use of vect_int_mod.
> > > > ---
> > > > 
> > > > This fixes the test failure on loongarch64-linux-gnu, and I've
> > > > also
> > > > tested it on x86_64-linux-gnu.  Ok for trunk?
> > > > 
> > > >  gcc/testsuite/gcc.dg/pr104992.c | 5 ++---
> > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/gcc/testsuite/gcc.dg/pr104992.c
> > > > b/gcc/testsuite/gcc.dg/pr104992.c
> > > > index 82f8c75559c..6fd513d34b2 100644
> > > > --- a/gcc/testsuite/gcc.dg/pr104992.c
> > > > +++ b/gcc/testsuite/gcc.dg/pr104992.c
> > > > @@ -1,6 +1,6 @@
> > > >  /* PR tree-optimization/104992 */
> > > >  /* { dg-do compile } */
> > > > -/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
> > > > +/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
> > > >  
> > > >  #define vector __attribute__((vector_size(4*sizeof(int
> > > >  
> > > > @@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned
> > > > x,
> > > > unsigned y, unsigned z) {
> > > >  return x / y * z == x;
> > > >  }
> > > >  
> > > > -/* { dg-final { scan-tree-dump-times " % " 9 "optimized" {
> > > > target {
> > > > ! vect_int_mod } } } } */
> > > > -/* { dg-final { scan-tree-dump-times " % " 6 "optimized" {
> > > > target
> > > > vect_int_mod } } } */
> > > > +/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
> > > 
> > > Hello, currently vect_int_mod vectorization operation detection
> > > only
> > > ppc,amd,riscv,LoongArch architecture support. When -fdump-tree-
> > > forwprop2 is used instead of -fdump-tree-optimized, The
> > > check_effective_target_vect_int_mod procedure defined in the
> > > target-
> > > supports.exp file will never be called. It will only be called on
> > > pr104992.c, should we consider supporting other architectures?
> > 
> > Hmm, then we should remove check_effective_target_vect_int_mod.
> > 
> > If we want to keep -fdump-tree-optimized for this test case and also
> > make it correct, we'll at least have to move it into vect/, and write
> > something like
> > 
> > { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { !
> > vect_int_mod } } } }
> > { dg-final { scan-tree-dump-times " % " 6 "optimized" { target {
> > vect_int_mod && vect128 } } } }
> > { dg-final { scan-tree-dump-times " % " 7 "optimized" { target {
> > vect_int_mod && vect64 && !vect128 } } } }
> > 
> > and how about vect256 etc?  This would be very nasty and deviating
> > from
> > the original purpose of this test case (against PR104992, which is a
> > missed-optimization issue unrelated to vectors).
> > 
> Ok, let me think about how to make the pr104992.c test case more
> reasonable.

It *is* reasonable with -fdump-tree-forwprop2.  It's purposed to test a
/ b * b -> a - a % b simplification, not vector operations.

If we need a test to test vector int modulo operations we should write a
new test in vect/, like

/* ... */

for (int i = 0; i < 4; i++)
  x[i] %= y[i];

/* ... */

/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
vect_int_mod } } } } */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread Xi Ruoyao
On Wed, 2024-01-24 at 18:32 +0800, chenxiaolong wrote:
> On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoyao wrote:
> > The vect_int_mod target selector is evaluated with the options in
> > DEFAULT_VECTCFLAGS in effect, but these options are not automatically
> > passed to tests out of the vect directories.  So this test fails on
> > targets where integer vector modulo operation is supported but
> > requiring
> > an option to enable, for example LoongArch.
> > 
> > In this test case, the only expected optimization not happened in
> > original is in corge because it needs forward propogation.  So we can
> > scan the forwprop2 dump (where the vector operation is not expanded
> > to
> > scalars yet) instead of optimized, then we don't need to consider
> > vect_int_mod or not.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR testsuite/113418
> > * gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
> > instead of -fdump-tree-optimized.
> > (dg-final): Scan forwprop2 dump instead of optimized, and
> > remove
> > the use of vect_int_mod.
> > ---
> > 
> > This fixes the test failure on loongarch64-linux-gnu, and I've also
> > tested it on x86_64-linux-gnu.  Ok for trunk?
> > 
> >  gcc/testsuite/gcc.dg/pr104992.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/pr104992.c
> > b/gcc/testsuite/gcc.dg/pr104992.c
> > index 82f8c75559c..6fd513d34b2 100644
> > --- a/gcc/testsuite/gcc.dg/pr104992.c
> > +++ b/gcc/testsuite/gcc.dg/pr104992.c
> > @@ -1,6 +1,6 @@
> >  /* PR tree-optimization/104992 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
> > +/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
> >  
> >  #define vector __attribute__((vector_size(4*sizeof(int
> >  
> > @@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x,
> > unsigned y, unsigned z) {
> >  return x / y * z == x;
> >  }
> >  
> > -/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target {
> > ! vect_int_mod } } } } */
> > -/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target
> > vect_int_mod } } } */
> > +/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
> 
> Hello, currently vect_int_mod vectorization operation detection only
> ppc,amd,riscv,LoongArch architecture support. When -fdump-tree-
> forwprop2 is used instead of -fdump-tree-optimized, The
> check_effective_target_vect_int_mod procedure defined in the target-
> supports.exp file will never be called. It will only be called on
> pr104992.c, should we consider supporting other architectures?

Hmm, then we should remove check_effective_target_vect_int_mod.

If we want to keep -fdump-tree-optimized for this test case and also
make it correct, we'll at least have to move it into vect/, and write
something like

{ dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! vect_int_mod 
} } } }
{ dg-final { scan-tree-dump-times " % " 6 "optimized" { target { vect_int_mod 
&& vect128 } } } }
{ dg-final { scan-tree-dump-times " % " 7 "optimized" { target { vect_int_mod 
&& vect64 && !vect128 } } } }

and how about vect256 etc?  This would be very nasty and deviating from
the original purpose of this test case (against PR104992, which is a
missed-optimization issue unrelated to vectors).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread Xi Ruoyao
On Wed, 2024-01-24 at 17:19 +0800, Jiahao Xu wrote:
> gcc/ChangeLog:
> 
>   * config/loongarch/larchintrin.h
>   (__frecipe_s): Update function return type.
>   (__frecipe_d): Ditto.
>   (__frsqrte_s): Ditto.
>   (__frsqrte_d): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
> 
> diff --git a/gcc/config/loongarch/larchintrin.h 
> b/gcc/config/loongarch/larchintrin.h
> index 7692415e04d..ff2c9f460ac 100644
> --- a/gcc/config/loongarch/larchintrin.h
> +++ b/gcc/config/loongarch/larchintrin.h
> @@ -336,38 +336,38 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2)
>  #ifdef __loongarch_frecipe
>  /* Assembly instruction format: fd, fj.  */
>  /* Data types in instruction templates:  SF, SF.  */
> -extern __inline void
> +extern __inline float
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  __frecipe_s (float _1)
>  {
> -  __builtin_loongarch_frecipe_s ((float) _1);
> +  return (float) __builtin_loongarch_frecipe_s ((float) _1);

I don't think the (float) conversion is needed.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-23 Thread Xi Ruoyao
On Mon, 2024-01-22 at 15:27 +0800, chenglulu wrote:
> > > The failure of this test case was because the compiler believes that two
> > > (UNSPEC_PCREL_64_PART2 [(symbol)]) instances would always produce the
> > > same result, but this isn't true because the result depends on PC.  Thus
> > > (pc) needed to be included in the RTX, like:
> > > 
> > >    [(set (match_operand:DI 0 "register_operand" "=r")
> > >  (unspec:DI [(match_operand:DI 2 "") (pc)] 
> > > UNSPEC_LA_PCREL_64_PART1))
> > >     (set (match_operand:DI 1 "register_operand" "=r")
> > >  (unspec:DI [(match_dup 2) (pc)] UNSPEC_LA_PCREL_64_PART2))]
> > > 
> > > With this the buggy REG_UNUSED notes were gone.  But it then prevented
> > > the CSE when loading the address of __tls_get_addr (i.e. if we address
> > > 10 TLE_LD symbols in a function it would emit 10 instances of "la.global
> > > __tls_get_addr") so I added an REG_EQUAL note for it.  For symbols other
> > > than __tls_get_addr such notes are added automatically by optimization
> > > passes.
> > > 
> > > Updated patch attached.
> > > 
> > I'm eliminating redundant la.global directives in my macro 
> > implementation.
> > 
> > I will be testing this patch.
> > 
> > 
> > 
> > 
> With this patch, spec2006 can pass the test, but spec2017 621 and 654 
> tests fail.
> I haven't debugged the specific cause of the problem yet.

Try removing the TARGET_DELEGITIMIZE_ADDRESS hook?  After eating some
unhealthy food in the midnight I realized the hook only
papers over the same issue caused spec2006 failure.  I tried a bootstrap
with BOOT_CFLAGS=-O2 -g -mcmodel=extreme and TARGET_DELEGITIMIZE_ADDRESS
commented out, and there is no more spurious "note: non-delegitimized
UNSPEC UNSPEC_LA_PCREL_64_PART1 (42) found in variable location" things.
I feel that this hook is still written in a buggy way, so maybe removing
it will solve the spec2017 issue.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-23 Thread Xi Ruoyao
The vect_int_mod target selector is evaluated with the options in
DEFAULT_VECTCFLAGS in effect, but these options are not automatically
passed to tests out of the vect directories.  So this test fails on
targets where integer vector modulo operation is supported but requiring
an option to enable, for example LoongArch.

In this test case, the only expected optimization not happened in
original is in corge because it needs forward propogation.  So we can
scan the forwprop2 dump (where the vector operation is not expanded to
scalars yet) instead of optimized, then we don't need to consider
vect_int_mod or not.

gcc/testsuite/ChangeLog:

PR testsuite/113418
* gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
instead of -fdump-tree-optimized.
(dg-final): Scan forwprop2 dump instead of optimized, and remove
the use of vect_int_mod.
---

This fixes the test failure on loongarch64-linux-gnu, and I've also
tested it on x86_64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.dg/pr104992.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c
index 82f8c75559c..6fd513d34b2 100644
--- a/gcc/testsuite/gcc.dg/pr104992.c
+++ b/gcc/testsuite/gcc.dg/pr104992.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/104992 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
 
 #define vector __attribute__((vector_size(4*sizeof(int
 
@@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x, unsigned 
y, unsigned z) {
 return x / y * z == x;
 }
 
-/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! 
vect_int_mod } } } } */
-/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target vect_int_mod 
} } } */
+/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
-- 
2.43.0



[PATCH] LoongArch: testsuite: Disable stack protector for got-load.C

2024-01-23 Thread Xi Ruoyao
When building GCC with --enable-default-ssp, the stack protector is
enabled for got-load.C, causing additional GOT loads for
__stack_chk_guard.  So mem/u will be matched more than 2 times and the
test will fail.

Disable stack protector to fix this issue.

gcc/testsuite:

* g++.target/loongarch/got-load.C (dg-options): Add
-fno-stack-protector.
---

Ok for trunk?

 gcc/testsuite/g++.target/loongarch/got-load.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/loongarch/got-load.C 
b/gcc/testsuite/g++.target/loongarch/got-load.C
index 20924c73942..17870176ab4 100644
--- a/gcc/testsuite/g++.target/loongarch/got-load.C
+++ b/gcc/testsuite/g++.target/loongarch/got-load.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand" } */
+/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand -fno-stack-protector" } */
 /* { dg-final { scan-rtl-dump-times "mem/u" 2 "expand" } } */
 
 #include 
-- 
2.43.0



  1   2   3   4   5   6   7   8   9   10   >