PING^5: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-19 Thread Xi Ruoyao via Gcc-patches
Ping^5.

> > > On Thu, Aug 10, 2023 at 03:04:03PM +0200, Stefan Schulze Frielinghaus 
> > > wrote:
> > > > In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
> > > > completely missed the fact that the normal form of a generated constant 
> > > > for a
> > > > mode with fewer bits than in HOST_WIDE_INT is a sign extended version 
> > > > of the
> > > > actual constant.  This even holds true for unsigned constants.
> > > > 
> > > > Fixed by masking out the upper bits for the incoming constant and sign
> > > > extending the resulting unsigned constant.
> > > > 
> > > > Bootstrapped and regtested on x64 and s390x.  Ok for mainline?
> > > > 
> > > > While reading existing optimizations in combine I stumbled across two
> > > > optimizations where either my intuition about the representation of
> > > > unsigned integers via a const_int rtx is wrong, which then in turn would
> > > > probably also mean that this patch is wrong, or that the optimizations
> > > > are missed sometimes.  In other words in the following I would assume
> > > > that the upper bits are masked out:
> > > > 
> > > > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > > > index 468b7fde911..80c4ff0fbaf 100644
> > > > --- a/gcc/combine.cc
> > > > +++ b/gcc/combine.cc
> > > > @@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    /* (unsigned) < 0x8000 is equivalent to >= 0.  */
> > > >    else if (is_a  (mode, _mode)
> > > >    && GET_MODE_PRECISION (int_mode) - 1 < 
> > > > HOST_BITS_PER_WIDE_INT
> > > > -  && ((unsigned HOST_WIDE_INT) const_op
> > > > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION 
> > > > (int_mode) - 1)))
> > > >     {
> > > >   const_op = 0;
> > > > @@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    /* (unsigned) >= 0x8000 is equivalent to < 0.  */
> > > >    else if (is_a  (mode, _mode)
> > > >    && GET_MODE_PRECISION (int_mode) - 1 < 
> > > > HOST_BITS_PER_WIDE_INT
> > > > -  && ((unsigned HOST_WIDE_INT) const_op
> > > > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION 
> > > > (int_mode) - 1)))
> > > >     {
> > > >   const_op = 0;
> > > > 
> > > > For example, while bootstrapping on x64 the optimization is missed since
> > > > a LTU comparison in QImode is done and the constant equals
> > > > 0xff80.
> > > > 
> > > > Sorry for inlining another patch, but I would really like to make sure
> > > > that my understanding is correct, now, before I come up with another
> > > > patch.  Thus it would be great if someone could shed some light on this.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * combine.cc (simplify_compare_const): Properly handle unsigned
> > > > constants while narrowing comparison of memory and constants.
> > > > ---
> > > >  gcc/combine.cc | 19 ++-
> > > >  1 file changed, 10 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > > > index e46d202d0a7..468b7fde911 100644
> > > > --- a/gcc/combine.cc
> > > > +++ b/gcc/combine.cc
> > > > @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    && !MEM_VOLATILE_P (op0)
> > > >    /* The optimization makes only sense for constants which are big 
> > > > enough
> > > >  so that we have a chance to chop off something at all.  */
> > > > -  && (unsigned HOST_WIDE_INT) const_op > 0xff
> > > > -  /* Bail out, if the constant does not fit into INT_MODE.  */
> > > > -  && (unsigned HOST_WIDE_INT) const_op
> > > > -    < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 
> > > > 1) - 1)
> > > > +  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode)) > 0xff
> > > >    /* Ensure that we do not overflow during normalization.  */
> > > > -  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
> > > > HOST_WIDE_INT_M1U))
> > > > +  && (code != GTU
> > > > + || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > > +    < HOST_WIDE_INT_M1U)
> > > > +  && trunc_int_for_mode (const_op, int_mode) == const_op)
> > > >  {
> > > > -  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
> > > > +  unsigned HOST_WIDE_INT n
> > > > +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
> > > >    enum rtx_code adjusted_code;
> > > >  
> > > >    /* Normalize code to either LEU or GEU.  */
> > > > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > > 

Re: Question on -fwrapv and -fwrapv-pointer

2023-09-15 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-09-15 at 15:37 +, Qing Zhao wrote:
> 
> 
> > On Sep 15, 2023, at 11:29 AM, Richard Biener
> >  wrote:
> > 
> > 
> > 
> > > Am 15.09.2023 um 17:25 schrieb Qing Zhao :
> > > 
> > > 
> > > 
> > > > On Sep 15, 2023, at 8:41 AM, Arsen Arsenović 
> > > > wrote:
> > > > 
> > > > 
> > > > Qing Zhao  writes:
> > > > 
> > > > > Even though unsigned integer overflow is well defined, it
> > > > > might be
> > > > > unintentional, shall we warn user about this?
> > > > 
> > > > This would be better addressed by providing operators or
> > > > functions that
> > > > do overflow checking in the language, so that they can be
> > > > explicitly
> > > > used where overflow is unexpected.
> > > 
> > > Yes, that will be very helpful to prevent unexpected overflow in
> > > the program in general.
> > > However, this will mainly benefit new codes.
> > > 
> > > For the existing C codes, especially large applications, we still
> > > need to identify all the places 
> > > Where the overflow is unexpected, and fix them. 
> > > 
> > > One good example is linux kernel. 
> > > 
> > > > One could easily imagine a scenario
> > > > where overflow is not expected in some region of code but is in
> > > > the
> > > > larger application.
> > > 
> > > Yes, that’s exactly the same situation Linux kernel faces now, the
> > > unexpected Overflow and 
> > > expected wrap-around are mixed together inside one module. 
> > > It’s hard to detect the unexpected overflow under such situation
> > > based on the current GCC. 
> > 
> > But that’s hardly GCCs fault nor can GCC fix that in any way.  Only
> > the programmer can distinguish both cases.
> 
> Right, compiler cannot fix this. 
> But can provide some tools to help the user to detect this more
> conveniently. 
> 
> Right now, GCC provides two set of options for different types:
> 
>  A. Turn the overflow to expected wrap-around (remove UB);
>  B. Detect overflow;
> 
> A   B
>  remove UB  -fsanitize=…
> signed -fwrapv  signed-integer-overflow
> pointer    -fwrapv-pointer  pointer-overflow (broken in Clang)
> 
> However, Options in A and B excluded with each other. They cannot mix
> together for a single file.
> 
> What’s requested from Kernel is:
> 
> compiler needs to provide a functionality that can mix these two
> together for a file. 
> 
> i.e, apply A (convert UB to defined behavior WRAP-AROUND) only to part
> of the program.  And then add -fsnaitize=*overflow to detect all other
> Unexpected overflows in the program.
> 
> This is currently missing from GCC, I guess?

If overflow is really so rare, we should just enable -fsanitize=signed-
integer-overflow globally and special case the code paths where we want
wrapping.  It's easy in 2023:

/* b + c may wrap here because ... ... */
ckd_add(, b, c);

Or

/* if b + c overflows, we have a severe issue, let's panic even if
   sanitizer disabled */
if (chk_add(, b, c))
  panic("b + c overflows but it shouldn't (b = %d, c = %d)", b, c);

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Question on -fwrapv and -fwrapv-pointer

2023-09-15 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-14 at 21:41 +, Qing Zhao wrote:
> > > CLANG already provided -fsanitize=unsigned-integer-overflow. GCC
> > > might need to do the same.
> > 
> > NO. There is no such thing as unsigned integer overflow. That option
> > is badly designed and the GCC community has rejected a few times now
> > having that sanitizer before. It is bad form to have a sanitizer for
> > well defined code.
> 
> Even though unsigned integer overflow is well defined, it might be
> unintentional, shall we warn user about this?

*Everything* could be unintentional and should be warned then.  GCC is a
compiler, not an advanced AI educating the programmers.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Question on -fwrapv and -fwrapv-pointer

2023-09-14 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-14 at 15:57 +, Qing Zhao via Gcc-patches wrote:
> Currently, GCC behaves as following:
> 
> /* True if overflow wraps around for the given integral or pointer type.  That
>    is, TYPE_MAX + 1 == TYPE_MIN.  */
> #define TYPE_OVERFLOW_WRAPS(TYPE) \
>   (POINTER_TYPE_P (TYPE)    \
>    ? flag_wrapv_pointer \
>    : (ANY_INTEGRAL_TYPE_CHECK(TYPE)->base.u.bits.unsigned_flag  \
>   || flag_wrapv))
> 
> /* True if overflow is undefined for the given integral or pointer type.
>    We may optimize on the assumption that values in the type never overflow.
> 
>    IMPORTANT NOTE: Any optimization based on TYPE_OVERFLOW_UNDEFINED
>    must issue a warning based on warn_strict_overflow.  In some cases
>    it will be appropriate to issue the warning immediately, and in
>    other cases it will be appropriate to simply set a flag and let the
>    caller decide whether a warning is appropriate or not.  */
> #define TYPE_OVERFLOW_UNDEFINED(TYPE)   \
>   (POINTER_TYPE_P (TYPE)    \
>    ? !flag_wrapv_pointer    \
>    : (!ANY_INTEGRAL_TYPE_CHECK(TYPE)->base.u.bits.unsigned_flag \
>   && !flag_wrapv && !flag_trapv))
> 
> The logic above seems treating the pointer default as signed integer, right?

It only says the pointers cannot overflow, not the pointers are signed.

printf("%d\n", (char *)(intptr_t)-1 > (char *)(intptr_t)1);

produces 1 instead of 0.  Technically this is invoking undefined
behavior and a conforming implementation can output anything.  But
consider a 32-bit bare metal target where the linker can locate a "char
x[512]" at [0x7f00, 0x8100).  The standard then requires [512]
> [0], but if we do a signed comparison here we'll end up "[512] <
[0]", this is non-conforming.

IIUC, pointers are not integers, at all.  If we treat them as integers
in the brain we'll end up invoking undefined behavior sooner or later. 
Thus the wrapping/overflowing behavior of pointer is controlled by a
different option than integers.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: gcc: Modify gas uleb128 support test.

2023-09-14 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-14 at 19:54 +0800, chenglulu wrote:
> Sorry, it's my problem. We will modify it as soon as possible.

Try this:

diff --git a/gcc/configure.ac b/gcc/configure.ac
index cb4be11facd..10027a4 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3229,10 +3229,18 @@ AC_MSG_RESULT($gcc_cv_ld_ro_rw_mix)
 
 gcc_AC_INITFINI_ARRAY
 
+# Some assemblers (GNU as for LoongArch) generates relocations for
+# leb128 symbol arithmetic for relaxation, we need to disable relaxation
+# probing leb128 support then.
+gcc_GAS_CHECK_FEATURE([-mno-relax support],
+  gcc_cv_as_mno_relax,[-mno-relax],[.text],,
+  [check_leb128_asflags=-mno-relax])
+
 # Check if we have .[us]leb128, and support symbol arithmetic with it.
 # Older versions of GAS and some non-GNU assemblers, have a bugs handling
 # these directives, even when they appear to accept them.
-gcc_GAS_CHECK_FEATURE([.sleb128 and .uleb128], gcc_cv_as_leb128,,
+gcc_GAS_CHECK_FEATURE([.sleb128 and .uleb128], gcc_cv_as_leb128,
+[$check_leb128_asflags],
 [  .data
.uleb128 L2 - L1
 L1:



-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: gcc: Modify gas uleb128 support test.

2023-09-14 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-14 at 19:21 +0800, Lulu Cheng wrote:
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 09082e8ccae..072fe1d2b48 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -3226,6 +3226,19 @@ L2:
> .uleb128 0x8000
>  ],
>  [[
> +case "$target" in
> +  loongarch*-*-*)
> +    if test "x$gcc_cv_ld" != x; then
> +  ac_try='$gcc_cv_ld conftest.o -o conftest -e 0x0 >&5'
> +  { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> +  (eval $ac_try) 2>&5
> +  ac_status=$?
> +  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> +  test $ac_status = 0
> +  mv conftest conftest.o
> +    fi
> +esac

Phew.  Randomly modifying configure and paste the modification into
configure.ac is not the correct way to modify configure.ac.

ac_* are autoconf internal names so we cannot use them.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3 4/9] LoongArch:Added support for SX vector floating-point instructions.

2023-09-10 Thread Xi Ruoyao via Gcc-patches
The subject should be "Add tests for SX vector floating-point
instructions".  The "support" has already been added.

Likewise for patches 5-9.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


PING^4: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-10 Thread Xi Ruoyao via Gcc-patches
Ping.

> > > On Thu, Aug 10, 2023 at 03:04:03PM +0200, Stefan Schulze Frielinghaus 
> > > wrote:
> > > > In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
> > > > completely missed the fact that the normal form of a generated constant 
> > > > for a
> > > > mode with fewer bits than in HOST_WIDE_INT is a sign extended version 
> > > > of the
> > > > actual constant.  This even holds true for unsigned constants.
> > > > 
> > > > Fixed by masking out the upper bits for the incoming constant and sign
> > > > extending the resulting unsigned constant.
> > > > 
> > > > Bootstrapped and regtested on x64 and s390x.  Ok for mainline?
> > > > 
> > > > While reading existing optimizations in combine I stumbled across two
> > > > optimizations where either my intuition about the representation of
> > > > unsigned integers via a const_int rtx is wrong, which then in turn would
> > > > probably also mean that this patch is wrong, or that the optimizations
> > > > are missed sometimes.  In other words in the following I would assume
> > > > that the upper bits are masked out:
> > > > 
> > > > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > > > index 468b7fde911..80c4ff0fbaf 100644
> > > > --- a/gcc/combine.cc
> > > > +++ b/gcc/combine.cc
> > > > @@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    /* (unsigned) < 0x8000 is equivalent to >= 0.  */
> > > >    else if (is_a  (mode, _mode)
> > > >    && GET_MODE_PRECISION (int_mode) - 1 < 
> > > > HOST_BITS_PER_WIDE_INT
> > > > -  && ((unsigned HOST_WIDE_INT) const_op
> > > > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION 
> > > > (int_mode) - 1)))
> > > >     {
> > > >   const_op = 0;
> > > > @@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    /* (unsigned) >= 0x8000 is equivalent to < 0.  */
> > > >    else if (is_a  (mode, _mode)
> > > >    && GET_MODE_PRECISION (int_mode) - 1 < 
> > > > HOST_BITS_PER_WIDE_INT
> > > > -  && ((unsigned HOST_WIDE_INT) const_op
> > > > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION 
> > > > (int_mode) - 1)))
> > > >     {
> > > >   const_op = 0;
> > > > 
> > > > For example, while bootstrapping on x64 the optimization is missed since
> > > > a LTU comparison in QImode is done and the constant equals
> > > > 0xff80.
> > > > 
> > > > Sorry for inlining another patch, but I would really like to make sure
> > > > that my understanding is correct, now, before I come up with another
> > > > patch.  Thus it would be great if someone could shed some light on this.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * combine.cc (simplify_compare_const): Properly handle unsigned
> > > > constants while narrowing comparison of memory and constants.
> > > > ---
> > > >  gcc/combine.cc | 19 ++-
> > > >  1 file changed, 10 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > > > index e46d202d0a7..468b7fde911 100644
> > > > --- a/gcc/combine.cc
> > > > +++ b/gcc/combine.cc
> > > > @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    && !MEM_VOLATILE_P (op0)
> > > >    /* The optimization makes only sense for constants which are big 
> > > > enough
> > > >  so that we have a chance to chop off something at all.  */
> > > > -  && (unsigned HOST_WIDE_INT) const_op > 0xff
> > > > -  /* Bail out, if the constant does not fit into INT_MODE.  */
> > > > -  && (unsigned HOST_WIDE_INT) const_op
> > > > -    < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 
> > > > 1) - 1)
> > > > +  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode)) > 0xff
> > > >    /* Ensure that we do not overflow during normalization.  */
> > > > -  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
> > > > HOST_WIDE_INT_M1U))
> > > > +  && (code != GTU
> > > > + || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > > +    < HOST_WIDE_INT_M1U)
> > > > +  && trunc_int_for_mode (const_op, int_mode) == const_op)
> > > >  {
> > > > -  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
> > > > +  unsigned HOST_WIDE_INT n
> > > > +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
> > > >    enum rtx_code adjusted_code;
> > > >  
> > > >    /* Normalize code to either LEU or GEU.  */
> > > > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > > 

Re: [PATCH] LoongArch: Fix up memcpy-vec-3.c test case

2023-09-09 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-09-09 at 16:21 +0800, chenglulu wrote:
> LGTM!

Pushed r14-3821.

> 在 2023/9/9 下午4:20, Xi Ruoyao 写道:
> > The generic code will split 16-byte copy into two 8-byte copies, so the
> > vector code wouldn't be used even if -mno-strict-align.  This
> > contradicted with the purpose of this test case.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/loongarch/memcpy-vec-3.c: Increase the amount of
> > copied bytes to 32.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Fix up memcpy-vec-3.c test case

2023-09-09 Thread Xi Ruoyao via Gcc-patches
The generic code will split 16-byte copy into two 8-byte copies, so the
vector code wouldn't be used even if -mno-strict-align.  This
contradicted with the purpose of this test case.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/memcpy-vec-3.c: Increase the amount of
copied bytes to 32.
---
 gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c 
b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c
index 233ed215078..db2ea510b09 100644
--- a/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c
+++ b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c
@@ -3,4 +3,4 @@
 /* { dg-final { scan-assembler-not "vst" } } */
 
 extern char a[], b[];
-void test() { __builtin_memcpy(a, b, 16); }
+void test() { __builtin_memcpy(a, b, 32); }
-- 
2.42.0



Re: [PATCH v1] LoongArch: Fix bug of 'di3_fake'.

2023-09-09 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-09-09 at 15:42 +0800, Lulu Cheng wrote:
> PR 111334
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch.md: Fix bug of di3_fake.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/pr111334.c: New test.

Ok.  Despite I still think we should use unspec inside any_div, this
should be enough to prevent the compiler from matching di3_fake.

> ---
>  gcc/config/loongarch/loongarch.md | 14 +--
>  gcc/testsuite/gcc.target/loongarch/pr111334.c | 39 +++
>  2 files changed, 49 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr111334.c
> 
> diff --git a/gcc/config/loongarch/loongarch.md 
> b/gcc/config/loongarch/loongarch.md
> index 1dc6b524416..3fa32562aa6 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -72,6 +72,9 @@ (define_c_enum "unspec" [
>    UNSPEC_LUI_H_HI12
>    UNSPEC_TLS_LOW
>  
> +  ;; Fake div.w[u] mod.w[u]
> +  UNSPEC_FAKE_ANY_DIV
> +
>    UNSPEC_SIBCALL_VALUE_MULTIPLE_INTERNAL_1
>    UNSPEC_CALL_VALUE_MULTIPLE_INTERNAL_1
>  ])
> @@ -900,7 +903,7 @@ (define_expand "3"
>  (match_operand:GPR 2 "register_operand")))]
>    ""
>  {
> - if (GET_MODE (operands[0]) == SImode)
> + if (GET_MODE (operands[0]) == SImode && TARGET_64BIT)
>    {
>  rtx reg1 = gen_reg_rtx (DImode);
>  rtx reg2 = gen_reg_rtx (DImode);
> @@ -938,9 +941,12 @@ (define_insn "*3"
>  (define_insn "di3_fake"
>    [(set (match_operand:DI 0 "register_operand" "=r,,")
> (sign_extend:DI
> - (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> - (match_operand:DI 2 "register_operand" "r,r,r"]
> -  ""
> + (unspec:SI
> +  [(subreg:SI
> +    (any_div:DI (match_operand:DI 1 "register_operand" "r,r,0")
> +    (match_operand:DI 2 "register_operand" "r,r,r")) 0)]
> + UNSPEC_FAKE_ANY_DIV)))]
> +  "TARGET_64BIT"
>  {
>    return loongarch_output_division (".w\t%0,%1,%2", operands);
>  }
> diff --git a/gcc/testsuite/gcc.target/loongarch/pr111334.c 
> b/gcc/testsuite/gcc.target/loongarch/pr111334.c
> new file mode 100644
> index 000..47366afcb74
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/pr111334.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +unsigned
> +util_next_power_of_two (unsigned x)
> +{
> +  return (1 << __builtin_clz (x - 1));
> +}
> +
> +extern int create_vec_from_array (void);
> +
> +struct ac_shader_args {
> +    struct {
> +   unsigned char offset;
> +   unsigned char size;
> +    } args[384];
> +};
> +
> +struct isel_context {
> +    const struct ac_shader_args* args;
> +    int arg_temps[384];
> +};
> +
> +
> +void
> +add_startpgm (struct isel_context* ctx, unsigned short arg_count)
> +{
> +
> +  for (unsigned i = 0, arg = 0; i < arg_count; i++)
> +    {
> +  unsigned size = ctx->args->args[i].size;
> +  unsigned reg = ctx->args->args[i].offset;
> +
> +  if (reg % ( 4 < util_next_power_of_two (size)
> +    ? 4 : util_next_power_of_two (size)))
> + ctx->arg_temps[i] = create_vec_from_array ();
> +    }
> +}
> +

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Use LSX and LASX for block move

2023-09-09 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-09-09 at 15:14 +0800, chenglulu wrote:
> 
> 在 2023/9/9 下午3:06, Xi Ruoyao 写道:
> > On Sat, 2023-09-09 at 15:04 +0800, chenglulu wrote:
> > > Hi,RuoYao:
> > > 
> > >    I think the test example memcpy-vec-3.c submitted in r14-3818 is
> > > implemented incorrectly.
> > > 
> > > The 16-byte length in this test example will cause can_move_by_pieces to
> > > return true when with '-mstrict-align', so no vector load instructions
> > > will be generated.
> > Yes, in this case we cannot use vst because we don't know if b is
> > aligned.  Thus a { scan-assembler-not "vst" } guarantees that.
> > 
> > Or am I understanding something wrongly here?
> > 
> Well, what I mean is that even if '-mno-strict-align' is used here, 
> vst/vld will not be used,
> 
> so this test example cannot test what we want to test.

Let me revise it...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Use LSX and LASX for block move

2023-09-09 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-09-09 at 15:04 +0800, chenglulu wrote:
> Hi,RuoYao:
> 
>   I think the test example memcpy-vec-3.c submitted in r14-3818 is 
> implemented incorrectly.
> 
> The 16-byte length in this test example will cause can_move_by_pieces to 
> return true when with '-mstrict-align', so no vector load instructions
> will be generated.

Yes, in this case we cannot use vst because we don't know if b is
aligned.  Thus a { scan-assembler-not "vst" } guarantees that.

Or am I understanding something wrongly here?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH] LoongArch: Slightly simplify loongarch_block_move_straight

2023-09-09 Thread Xi Ruoyao via Gcc-patches
Pushed r14-3819.

On Sat, 2023-09-09 at 14:16 +0800, chenglulu wrote:
> 
> 在 2023/9/8 上午12:33, Xi Ruoyao 写道:
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.cc
> > (loongarch_block_move_straight):
> > Check precondition (delta must be a power of 2) and use
> > popcount_hwi instead of a homebrew loop.
> > ---
> > 
> > I've not run a full bootstrap with this, but it should be obvious.
> > Ok for trunk?
> 
> LGTM!
> 
> Thanks!
> 
> > 
> >   gcc/config/loongarch/loongarch.cc | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index 509ef2b97f1..845fad5a8e8 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -5225,9 +5225,8 @@ loongarch_block_move_straight (rtx dest, rtx
> > src, HOST_WIDE_INT length,
> >    emit two ld.d/st.d pairs, one ld.w/st.w pair, and one
> > ld.b/st.b
> >    pair.  For each load/store pair we use a dedicated register
> > to keep
> >    the pipeline as populated as possible.  */
> > -  HOST_WIDE_INT num_reg = length / delta;
> > -  for (delta_cur = delta / 2; delta_cur != 0; delta_cur /= 2)
> > -    num_reg += !!(length & delta_cur);
> > +  gcc_assert (pow2p_hwi (delta));
> > +  HOST_WIDE_INT num_reg = length / delta + popcount_hwi (length %
> > delta);
> >   
> >     /* Allocate a buffer for the temporary registers.  */
> >     regs = XALLOCAVEC (rtx, num_reg);
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH v2] LoongArch: Use LSX and LASX for block move

2023-09-09 Thread Xi Ruoyao via Gcc-patches
Pushed r14-3818 with test cases added.  The pushed patch is attached.

On Sat, 2023-09-09 at 14:10 +0800, chenglulu wrote:
> 
> 在 2023/9/8 上午12:14, Xi Ruoyao 写道:
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN):
> > Define to the maximum amount of bytes able to be loaded or
> > stored with one machine instruction.
> > * config/loongarch/loongarch.cc (loongarch_mode_for_move_size):
> > New static function.
> > (loongarch_block_move_straight): Call
> > loongarch_mode_for_move_size for machine_mode to be moved.
> > (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN
> > instead of UNITS_PER_WORD.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu, with PR110939 patch
> > applied, the "lib_build_self_spec = %<..." line in t-linux commented out
> > (because it's silently making -mlasx in BOOT_CFLAGS ineffective, Yujie
> > is working on a proper fix), and BOOT_CFLAGS="-O3 -mlasx".  Ok for trunk?
> 
> I think test cases need to be added here.
> 
> Otherwise OK, thanks!

/* snip */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From 35adc54b55aa199f17e2c84e382792e424b6171e Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Tue, 5 Sep 2023 21:02:38 +0800
Subject: [PATCH v2] LoongArch: Use LSX and LASX for block move

gcc/ChangeLog:

	* config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN):
	Define to the maximum amount of bytes able to be loaded or
	stored with one machine instruction.
	* config/loongarch/loongarch.cc (loongarch_mode_for_move_size):
	New static function.
	(loongarch_block_move_straight): Call
	loongarch_mode_for_move_size for machine_mode to be moved.
	(loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN
	instead of UNITS_PER_WORD.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/memcpy-vec-1.c: New test.
	* gcc.target/loongarch/memcpy-vec-2.c: New test.
	* gcc.target/loongarch/memcpy-vec-3.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 22 +++
 gcc/config/loongarch/loongarch.h  |  3 +++
 .../gcc.target/loongarch/memcpy-vec-1.c   | 11 ++
 .../gcc.target/loongarch/memcpy-vec-2.c   | 12 ++
 .../gcc.target/loongarch/memcpy-vec-3.c   |  6 +
 5 files changed, 50 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 6698414281e..509ef2b97f1 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5191,6 +5191,20 @@ loongarch_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
   return true;
 }
 
+static machine_mode
+loongarch_mode_for_move_size (HOST_WIDE_INT size)
+{
+  switch (size)
+{
+case 32:
+  return V32QImode;
+case 16:
+  return V16QImode;
+}
+
+  return int_mode_for_size (size * BITS_PER_UNIT, 0).require ();
+}
+
 /* Emit straight-line code to move LENGTH bytes from SRC to DEST.
Assume that the areas do not overlap.  */
 
@@ -5220,7 +5234,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length,
 
   for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2)
 {
-  mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require ();
+  mode = loongarch_mode_for_move_size (delta_cur);
 
   for (; offs + delta_cur <= length; offs += delta_cur, i++)
 	{
@@ -5231,7 +5245,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length,
 
   for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2)
 {
-  mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require ();
+  mode = loongarch_mode_for_move_size (delta_cur);
 
   for (; offs + delta_cur <= length; offs += delta_cur, i++)
 	loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]);
@@ -5326,8 +5340,8 @@ loongarch_expand_block_move (rtx dest, rtx src, rtx r_length, rtx r_align)
 
   HOST_WIDE_INT align = INTVAL (r_align);
 
-  if (!TARGET_STRICT_ALIGN || align > UNITS_PER_WORD)
-align = UNITS_PER_WORD;
+  if (!TARGET_STRICT_ALIGN || align > LARCH_MAX_MOVE_PER_INSN)
+align = LARCH_MAX_MOVE_PER_INSN;
 
   if (length <= align * LARCH_MAX_MOVE_OPS_STRAIGHT)
 {
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 3fc9dc43ab1..7e391205583 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -1181,6 +1181,9 @@ typedef struct {
least twice.  */
 #define LARCH_MAX_MOVE_OPS_STRAIGHT (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER * 2)
 
+#define LARCH_MAX_MOVE_PER_INSN \
+  (ISA_HAS_LASX ? 32 : (ISA_HAS_LSX ? 16 : UNITS_PER_WORD))
+
 /* The base cost of a memcpy call, for MOVE_RATIO and friends.  These

Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-09 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-09-09 at 14:26 +0800, Yang Yujie wrote:
> I remember you were against it because you think non-multilib users
> would be punished because the libdir layout changes (no toplevel).
> However this directory should be (mostly) private to each gcc instance,
> so I don't see real consequences to this unless you have a build script
> that relieas on the path of libgcc.a / startfile, which can still (and
> should) be revised using $(gcc --print-multi-dir).

I guess I can live with it.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-08 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-09-09 at 10:46 +0800, Yang Yujie wrote:
> The next option I can think of would be MULTILIB_EXTRA_OPTS, where 
> -fmultiflags
> fit in nicely.  However, these options won't reach the toplevel builds, and
> tweaking config-ml.in for getting it there would be quite tedious and perhaps
> unreliable:

I don't think the spec tweak should affect toplevel (or default, if you
hate the concept of toplevel) library build.

When I build GCC for a specific machine I usually use:

OPT="-O3 -march=native -pipe ..."
make {STAGE1,BOOT}_CFLAGS="$OPT" {C,CXX}FLAGS_FOR_TARGET="$OPT -g"

If the spec tweak affects the toplevel library build it will eat -march=
etc. in {C,CXX}FLAGS_FOR_TARGET silently, and I don't want this.

Or at least it should not affect --disable-multilib (IMO with --disable-
multilib the spec hack should be disabled completely).  Note that for --
enable-multilib we may use --with-default-multilib=lp64d/march=native,
but (1) this is hard to remember (2) this is not usable with --disable-
multilib.

--disable-multilib *should just work*.  Why should a non-multilib user
be punished by the cost of supporting the complex multilib
configuration, esp. today most LoongArch users don't need multilib at
all?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Enable -fsched-pressure by default at -O1 and higher.

2023-09-08 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-09-08 at 16:29 +0800, Guo Jie wrote:
> Hi,
> 
> What I wanna change is "gcc/common/config/loongarch/loongarch-
> common.cc",
> 
> and the patch is automatically generated by "git gcc-commit-mklog".
> 
> Is it necessary to  to remove "common/" ?

My bad.  I didn't realized the file has been moved to common.

Don't change it :(.

> Thanks for the review.
> 
> 
> 在 2023/9/8 下午4:06, Xi Ruoyao 写道:
> > On Fri, 2023-09-08 at 10:00 +0800, Guo Jie wrote:
> > > gcc/ChangeLog:
> > > 
> > >  * common/config/loongarch/loongarch-common.cc:
> > "common/" should be removed.  You can use "git gcc-verify" to figure
> > out
> > this kind of error before sending a patch in the future.
> > 
> > >  (default_options loongarch_option_optimization_table):
> > >  Default to -fsched-pressure.
> > "Default to -fsched-pressure at -O1 or above."
> > 
> > Otherwise OK.
> > 
> > > ---
> > >   gcc/common/config/loongarch/loongarch-common.cc | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/gcc/common/config/loongarch/loongarch-common.cc
> > > b/gcc/common/config/loongarch/loongarch-common.cc
> > > index c5ed37d27a6..b6901910b70 100644
> > > --- a/gcc/common/config/loongarch/loongarch-common.cc
> > > +++ b/gcc/common/config/loongarch/loongarch-common.cc
> > > @@ -36,6 +36,7 @@ static const struct default_options
> > > loongarch_option_optimization_table[] =
> > >     { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
> > >     { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
> > >     { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> > > +  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> > >     { OPT_LEVELS_NONE, 0, NULL, 0 }
> > >   };
> > >   
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Enable -fsched-pressure by default at -O1 and higher.

2023-09-08 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-09-08 at 10:00 +0800, Guo Jie wrote:
> gcc/ChangeLog:
> 
> * common/config/loongarch/loongarch-common.cc:

"common/" should be removed.  You can use "git gcc-verify" to figure out
this kind of error before sending a patch in the future.

> (default_options loongarch_option_optimization_table):
> Default to -fsched-pressure.

"Default to -fsched-pressure at -O1 or above."

Otherwise OK.

> ---
>  gcc/common/config/loongarch/loongarch-common.cc | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/common/config/loongarch/loongarch-common.cc
> b/gcc/common/config/loongarch/loongarch-common.cc
> index c5ed37d27a6..b6901910b70 100644
> --- a/gcc/common/config/loongarch/loongarch-common.cc
> +++ b/gcc/common/config/loongarch/loongarch-common.cc
> @@ -36,6 +36,7 @@ static const struct default_options
> loongarch_option_optimization_table[] =
>    { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
>    { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
>    { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> +  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>    { OPT_LEVELS_NONE, 0, NULL, 0 }
>  };
>  

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Slightly simplify loongarch_block_move_straight

2023-09-07 Thread Xi Ruoyao via Gcc-patches
gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_block_move_straight):
Check precondition (delta must be a power of 2) and use
popcount_hwi instead of a homebrew loop.
---

I've not run a full bootstrap with this, but it should be obvious.
Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 509ef2b97f1..845fad5a8e8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5225,9 +5225,8 @@ loongarch_block_move_straight (rtx dest, rtx src, 
HOST_WIDE_INT length,
  emit two ld.d/st.d pairs, one ld.w/st.w pair, and one ld.b/st.b
  pair.  For each load/store pair we use a dedicated register to keep
  the pipeline as populated as possible.  */
-  HOST_WIDE_INT num_reg = length / delta;
-  for (delta_cur = delta / 2; delta_cur != 0; delta_cur /= 2)
-num_reg += !!(length & delta_cur);
+  gcc_assert (pow2p_hwi (delta));
+  HOST_WIDE_INT num_reg = length / delta + popcount_hwi (length % delta);
 
   /* Allocate a buffer for the temporary registers.  */
   regs = XALLOCAVEC (rtx, num_reg);
-- 
2.42.0



[PATCH] LoongArch: Use LSX and LASX for block move

2023-09-07 Thread Xi Ruoyao via Gcc-patches
gcc/ChangeLog:

* config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN):
Define to the maximum amount of bytes able to be loaded or
stored with one machine instruction.
* config/loongarch/loongarch.cc (loongarch_mode_for_move_size):
New static function.
(loongarch_block_move_straight): Call
loongarch_mode_for_move_size for machine_mode to be moved.
(loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN
instead of UNITS_PER_WORD.
---

Bootstrapped and regtested on loongarch64-linux-gnu, with PR110939 patch
applied, the "lib_build_self_spec = %<..." line in t-linux commented out
(because it's silently making -mlasx in BOOT_CFLAGS ineffective, Yujie
is working on a proper fix), and BOOT_CFLAGS="-O3 -mlasx".  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 22 ++
 gcc/config/loongarch/loongarch.h  |  3 +++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 6698414281e..509ef2b97f1 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5191,6 +5191,20 @@ loongarch_function_ok_for_sibcall (tree decl 
ATTRIBUTE_UNUSED,
   return true;
 }
 
+static machine_mode
+loongarch_mode_for_move_size (HOST_WIDE_INT size)
+{
+  switch (size)
+{
+case 32:
+  return V32QImode;
+case 16:
+  return V16QImode;
+}
+
+  return int_mode_for_size (size * BITS_PER_UNIT, 0).require ();
+}
+
 /* Emit straight-line code to move LENGTH bytes from SRC to DEST.
Assume that the areas do not overlap.  */
 
@@ -5220,7 +5234,7 @@ loongarch_block_move_straight (rtx dest, rtx src, 
HOST_WIDE_INT length,
 
   for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2)
 {
-  mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require ();
+  mode = loongarch_mode_for_move_size (delta_cur);
 
   for (; offs + delta_cur <= length; offs += delta_cur, i++)
{
@@ -5231,7 +5245,7 @@ loongarch_block_move_straight (rtx dest, rtx src, 
HOST_WIDE_INT length,
 
   for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2)
 {
-  mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require ();
+  mode = loongarch_mode_for_move_size (delta_cur);
 
   for (; offs + delta_cur <= length; offs += delta_cur, i++)
loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]);
@@ -5326,8 +5340,8 @@ loongarch_expand_block_move (rtx dest, rtx src, rtx 
r_length, rtx r_align)
 
   HOST_WIDE_INT align = INTVAL (r_align);
 
-  if (!TARGET_STRICT_ALIGN || align > UNITS_PER_WORD)
-align = UNITS_PER_WORD;
+  if (!TARGET_STRICT_ALIGN || align > LARCH_MAX_MOVE_PER_INSN)
+align = LARCH_MAX_MOVE_PER_INSN;
 
   if (length <= align * LARCH_MAX_MOVE_OPS_STRAIGHT)
 {
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 3fc9dc43ab1..7e391205583 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -1181,6 +1181,9 @@ typedef struct {
least twice.  */
 #define LARCH_MAX_MOVE_OPS_STRAIGHT (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER * 2)
 
+#define LARCH_MAX_MOVE_PER_INSN \
+  (ISA_HAS_LASX ? 32 : (ISA_HAS_LSX ? 16 : UNITS_PER_WORD))
+
 /* The base cost of a memcpy call, for MOVE_RATIO and friends.  These
values were determined experimentally by benchmarking with CSiBE.
 */
-- 
2.42.0



Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-07 at 17:47 +0800, Xi Ruoyao wrote:

/* snip */

> I've made some local experiment too, I think we can add a "-mbuild-
> multilib" option which does nothing but in the hacked spec we can wrap
> the line in %{mbuild-multilib:...}:
> 
> %{mbuild-multilib:% %{mabi=lp64d:-march=la464 -mno-strict-align -msimd=lsx}  
> %{mabi=lp64s:-march=abi-default -mfpu=32}}
> 
> Then we can use -mbuild-multilib -mabi=lp64d for non-default multilibs
    typo, should be removed

> (or all multilibs unless --disable-multilib?).  In the document we can
> just document mbuild-multilib as "internal use only".

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-07 at 17:31 +0800, Yang Yujie wrote:
> > This is bad.  It makes BOOT_CFLAGS=-mlasx or CFLAGS_FOR_TARGET=-mlasx
> > silently ignored so we cannot test a LSX/LASX or vectorizer change with
> > them.
> > 
> > Why do we need to purge all user-specified -m options here?
> 
> Yes, that is an issue that I haven't considered.
> 
> The purge rules (self_specs) exist to clean up the driver-generated
> canonical option tuple.  These options are generated before to the
> injection of library-building options from --with-multilib-{list,default}.
> They are dependent on the default GCC settings and may not be safely
> overriden by any injected individual options, so we choose to start
> over with a purge.
> 
> Working on a patch now, Thanks!

I've made some local experiment too, I think we can add a "-mbuild-
multilib" option which does nothing but in the hacked spec we can wrap
the line in %{mbuild-multilib:...}:

%{mbuild-multilib:%
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-09-06 at 09:04 +0800, Yang Yujie wrote:
> On Tue, Sep 05, 2023 at 09:31:56PM +0800, Xi Ruoyao wrote:
> > On Thu, 2023-08-31 at 20:48 +0800, Yang Yujie wrote:
> > > * Support options for LoongArch SIMD extensions:
> > >   new configure options --with-simd={none,lsx,lasx};
> > >   new compiler option -msimd={none,lsx,lasx};
> > >   new driver options -m[no]-l[a]sx.
> > 
> > Hmm... In my build (a cross compiler configured with
> > ../gcc/configure --
> > target=loongarch64-linux-gnu --with-system-zlib) I have:
> > 
> > $ cat lasx.c
> > int x __attribute__((vector_size(32)));
> > int y __attribute__((vector_size(32)));
> > void test(void) { x += y; }
> > $ gcc/cc1 lasx.c -msimd=lasx -o- -nostdinc -mexplicit-relocs -O2
> > 
> > ... ...
> > 
> > pcalau12i   $r12,%pc_hi20(.LANCHOR0)
> > addi.d  $r12,$r12,%pc_lo12(.LANCHOR0)
> > xvld$xr0,$r12,0
> > xvld$xr1,$r12,32
> > xvadd.w $xr0,$xr0,$xr1
> > xvst$xr0,$r12,0
> > jr  $r1
> > 
> > ... ...
> > 
> > This seems perfectly fine.  But:
> > 
> > $ gcc/xgcc -B gcc lasx.c -mlasx -o- -nostdinc -mexplicit-relocs -O2
> > -S
> > 
> > ... ...
> > 
> > test:
> > .LFB0 = .
> > pcalau12i   $r12,%pc_hi20(.LANCHOR0)
> > addi.d  $r12,$r12,%pc_lo12(.LANCHOR0)
> > addi.d  $r3,$r3,-16
> > .LCFI0 = .
> > st.d$r23,$r3,8
> > .LCFI1 = .
> > ldptr.w $r7,$r12,0
> > ldptr.w $r23,$r12,32
> > ldptr.w $r6,$r12,8
> > 
> > ... ... (no SIMD instructions)
> > 
> > Is this a bug in the driver or I missed something?
> > 
> > -- 
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University
> 
> Maybe you can try deleting gcc/specs first.
> 
> It contains a modified version of self_specs that is used for building
> the libraries, which purges all user-specified "-m" options.
> This file is automatically restored prior to "make check*".

This is bad.  It makes BOOT_CFLAGS=-mlasx or CFLAGS_FOR_TARGET=-mlasx
silently ignored so we cannot test a LSX/LASX or vectorizer change with
them.

Why do we need to purge all user-specified -m options here?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Use bstrins instruction for (a & ~mask) and (a & mask) | (b & ~mask) [PR111252]

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-07 at 10:15 +0800, chenglulu wrote:
> 
> 在 2023/9/6 下午6:58, Xi Ruoyao 写道:
> > Forgot to mention: I've bootstrapped and regtested this patch on
> > loongarch64-linux-gnu (with PR110939 patch applied to unbreak the
> > bootstrapping).  Ok for trunk?
> 
> LGTM!
> 
> Thanks!

Pushed r14-3773.

> > 
> > On Wed, 2023-09-06 at 18:46 +0800, Xi Ruoyao wrote:
> > 
> > > If mask is a constant with value ((1 << N) - 1) << M we can
> > > perform this
> > > optimization.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >  PR target/111252
> > >  * config/loongarch/loongarch-protos.h
> > >  (loongarch_pre_reload_split): Declare new function.
> > >  (loongarch_use_bstrins_for_ior_with_mask): Likewise.
> > >  * config/loongarch/loongarch.cc
> > >  (loongarch_pre_reload_split): Implement.
> > >  (loongarch_use_bstrins_for_ior_with_mask): Likewise.
> > >  * config/loongarch/predicates.md
> > > (ins_zero_bitmask_operand):
> > >  New predicate.
> > >  * config/loongarch/loongarch.md
> > > (bstrins__for_mask):
> > >  New define_insn_and_split.
> > >  (bstrins__for_ior_mask): Likewise.
> > >  (define_peephole2): Further optimize code sequence
> > > produced by
> > >  bstrins__for_ior_mask if possible.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >  * g++.target/loongarch/bstrins-compile.C: New test.
> > >  * g++.target/loongarch/bstrins-run.C: New test.
> > /* snip */
> > 
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2 2/4] LoongArch: Add testsuite framework for Loongson SX/ASX.

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-07 at 15:00 +0800, Xiaolong Chen wrote:

/* snip */

> diff --git 
> a/gcc/testsuite/gcc.target/loongarch/vector/simd_correctness_check.h 
> b/gcc/testsuite/gcc.target/loongarch/vector/simd_correctness_check.h
> new file mode 100644
> index 000..7be199ee3a0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/vector/simd_correctness_check.h

Please reformat with GNU style (maybe using clang-format --style=GNU).

> @@ -0,0 +1,39 @@
> +#include 
> +#include 
> +#include 
> +
> +#define ASSERTEQ_64(line, ref, res)  
>   \
> +do{  
>   \
> +    int fail = 0;
>   \
> +    for(size_t i = 0; i < sizeof(res)/sizeof(res[0]); ++i){  
>   \
> +   long *temp_ref = [i], *temp_res = [i];
>   \
> +   if(abs(*temp_ref - *temp_res) > 0){   
>   \
> +   printf(" error: %s at line %ld , expected "#ref"[%ld]:0x%lx, got: 
> 0x%lx\n", \
> +   __FILE__, line, i, *temp_ref, *temp_res); 
>   \
> +   fail = 1; 
>   \
> +   } 
>   \
> +    }
>   \
> +    if(fail == 1) abort();   
>   \
> +}while(0) 
> +
> +#define ASSERTEQ_32(line, ref, res)  
>   \
> +do{  
>   \
> +    int fail = 0;
>   \
> +    for(size_t i = 0; i < sizeof(res)/sizeof(res[0]); ++i){  
>   \
> +   int *temp_ref = [i], *temp_res = [i]; 
>   \
> +   if(abs(*temp_ref - *temp_res) > 0){   
>   \
> +   printf(" error: %s at line %ld , expected "#ref"[%ld]:0x%x, got: 
> 0x%x\n",   \
> +  __FILE__, line, i, *temp_ref, *temp_res);  
>   \
> +   fail = 1; 
>   \
> +   } 
>   \
> +    }
>   \
> +    if(fail == 1) abort();   
>   \
> +}while(0) 
> +
> +#define ASSERTEQ_int(line, ref, res) 
>   \
> +do{  
>   \
> +    if (ref != res){ 
>   \
> +   printf(" error: %s at line %ld , expected %d, got %d\n",  
>   \
> +  __FILE__, line, ref, res); 
>   \
> +    }
>   \
> +}while(0) 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 1/4] LoongArch: Add tests of -mstrict-align option.

2023-09-06 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-09-06 at 18:43 +0800, Xiaolong Chen wrote:
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/strict-align.c: New test.

A question: is there really a CPU model with LSX/LASX but without
unaligned access support?  If not I think we'd just reject -mstrict-
align -mlsx.

Currently Glibc assumes if LSX is available then unaligned access must
be available too.

> ---
>  gcc/testsuite/gcc.target/loongarch/strict-align.c | 13 +
>  1 file changed, 13 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/strict-align.c
> 
> diff --git a/gcc/testsuite/gcc.target/loongarch/strict-align.c
> b/gcc/testsuite/gcc.target/loongarch/strict-align.c
> new file mode 100644
> index 000..bcad2b84f68
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/strict-align.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mstrict-align -mlasx" } */
> +/* { dg-final { scan-assembler-not "vfadd.s" } } */
> +
> +void
> +foo (float* restrict x, float* restrict y)
> +{
> +  x[0] = x[0] + y[0];
> +  x[1] = x[1] + y[1];
> +  x[2] = x[2] + y[2];
> +  x[3] = x[3] + y[3];
> +}
> +

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Use bstrins instruction for (a & ~mask) and (a & mask) | (b & ~mask) [PR111252]

2023-09-06 Thread Xi Ruoyao via Gcc-patches
Forgot to mention: I've bootstrapped and regtested this patch on
loongarch64-linux-gnu (with PR110939 patch applied to unbreak the
bootstrapping).  Ok for trunk?

On Wed, 2023-09-06 at 18:46 +0800, Xi Ruoyao wrote:

> If mask is a constant with value ((1 << N) - 1) << M we can perform this
> optimization.
> 
> gcc/ChangeLog:
> 
> PR target/111252
> * config/loongarch/loongarch-protos.h
> (loongarch_pre_reload_split): Declare new function.
> (loongarch_use_bstrins_for_ior_with_mask): Likewise.
> * config/loongarch/loongarch.cc
> (loongarch_pre_reload_split): Implement.
> (loongarch_use_bstrins_for_ior_with_mask): Likewise.
> * config/loongarch/predicates.md (ins_zero_bitmask_operand):
> New predicate.
> * config/loongarch/loongarch.md (bstrins__for_mask):
> New define_insn_and_split.
> (bstrins__for_ior_mask): Likewise.
> (define_peephole2): Further optimize code sequence produced by
> bstrins__for_ior_mask if possible.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.target/loongarch/bstrins-compile.C: New test.
> * g++.target/loongarch/bstrins-run.C: New test.

/* snip */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 4/4] LoongArch: Add tests for Loongson SX floating-point conversion instructions.

2023-09-06 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-09-06 at 18:45 +0800, Xiaolong Chen wrote:
> +  *((int*)& __m128_op0[3]) = 0x004200a0;
> +  *((int*)& __m128_op0[2]) = 0x;
> +  *((int*)& __m128_op0[1]) = 0x004200a0;
> +  *((int*)& __m128_op0[0]) = 0x0021;

These are aliasing rule violation and they will suddenly blow up when
GCC optimizer starts to optimize more aggressively based on the aliasing
rule.

Try not to use these (you can write a helper function to memcpy() into a
__m128).  Or use -fno-strict-alising in dg-options.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Use bstrins instruction for (a & ~mask) and (a & mask) | (b & ~mask) [PR111252]

2023-09-06 Thread Xi Ruoyao via Gcc-patches
If mask is a constant with value ((1 << N) - 1) << M we can perform this
optimization.

gcc/ChangeLog:

PR target/111252
* config/loongarch/loongarch-protos.h
(loongarch_pre_reload_split): Declare new function.
(loongarch_use_bstrins_for_ior_with_mask): Likewise.
* config/loongarch/loongarch.cc
(loongarch_pre_reload_split): Implement.
(loongarch_use_bstrins_for_ior_with_mask): Likewise.
* config/loongarch/predicates.md (ins_zero_bitmask_operand):
New predicate.
* config/loongarch/loongarch.md (bstrins__for_mask):
New define_insn_and_split.
(bstrins__for_ior_mask): Likewise.
(define_peephole2): Further optimize code sequence produced by
bstrins__for_ior_mask if possible.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/bstrins-compile.C: New test.
* g++.target/loongarch/bstrins-run.C: New test.
---
 gcc/config/loongarch/loongarch-protos.h   |  4 +-
 gcc/config/loongarch/loongarch.cc | 36 
 gcc/config/loongarch/loongarch.md | 91 +++
 gcc/config/loongarch/predicates.md|  8 ++
 .../g++.target/loongarch/bstrins-compile.C| 22 +
 .../g++.target/loongarch/bstrins-run.C| 65 +
 6 files changed, 225 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.target/loongarch/bstrins-compile.C
 create mode 100644 gcc/testsuite/g++.target/loongarch/bstrins-run.C

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index f4430d0d418..251011c5414 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -56,7 +56,7 @@ enum loongarch_symbol_type {
 };
 #define NUM_SYMBOL_TYPES (SYMBOL_TLSLDM + 1)
 
-/* Routines implemented in loongarch.c.  */
+/* Routines implemented in loongarch.cc.  */
 extern rtx loongarch_emit_move (rtx, rtx);
 extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int);
 extern void loongarch_expand_prologue (void);
@@ -163,6 +163,8 @@ extern const char *current_section_name (void);
 extern unsigned int current_section_flags (void);
 extern bool loongarch_use_ins_ext_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
 extern bool loongarch_check_zero_div_p (void);
+extern bool loongarch_pre_reload_split (void);
+extern int loongarch_use_bstrins_for_ior_with_mask (machine_mode, rtx *);
 
 union loongarch_gen_fn_ptrs
 {
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index aeb37f0f2f7..6698414281e 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5482,6 +5482,42 @@ loongarch_use_ins_ext_p (rtx op, HOST_WIDE_INT width, 
HOST_WIDE_INT bitpos)
   return true;
 }
 
+/* Predicate for pre-reload splitters with associated instructions,
+   which can match any time before the split1 pass (usually combine),
+   then are unconditionally split in that pass and should not be
+   matched again afterwards.  */
+
+bool loongarch_pre_reload_split (void)
+{
+  return (can_create_pseudo_p ()
+ && !(cfun->curr_properties & PROP_rtl_split_insns));
+}
+
+/* Check if we can use bstrins. for
+   op0 = (op1 & op2) | (op3 & op4)
+   where op0, op1, op3 are regs, and op2, op4 are integer constants.  */
+int
+loongarch_use_bstrins_for_ior_with_mask (machine_mode mode, rtx *op)
+{
+  unsigned HOST_WIDE_INT mask1 = UINTVAL (op[2]);
+  unsigned HOST_WIDE_INT mask2 = UINTVAL (op[4]);
+
+  if (mask1 != ~mask2 || !mask1 || !mask2)
+return 0;
+
+  /* Try to avoid a right-shift.  */
+  if (low_bitmask_len (mode, mask1) != -1)
+return -1;
+
+  if (low_bitmask_len (mode, mask2 >> (ffs_hwi (mask2) - 1)) != -1)
+return 1;
+
+  if (low_bitmask_len (mode, mask1 >> (ffs_hwi (mask1) - 1)) != -1)
+return -1;
+
+  return 0;
+}
+
 /* Print the text for PRINT_OPERAND punctation character CH to FILE.
The punctuation characters are:
 
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2308db16902..75f641b38ee 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1322,6 +1322,97 @@ (define_insn "and3_extended"
   [(set_attr "move_type" "pick_ins")
(set_attr "mode" "")])
 
+(define_insn_and_split "*bstrins__for_mask"
+  [(set (match_operand:GPR 0 "register_operand")
+   (and:GPR (match_operand:GPR 1 "register_operand")
+(match_operand:GPR 2 "ins_zero_bitmask_operand")))]
+  ""
+  "#"
+  ""
+  [(set (match_dup 0) (match_dup 1))
+   (set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 3))
+   (const_int 0))]
+  {
+unsigned HOST_WIDE_INT mask = ~UINTVAL (operands[2]);
+int lo = ffs_hwi (mask) - 1;
+int len = low_bitmask_len (mode, mask >> lo);
+
+len = MIN (len, GET_MODE_BITSIZE (mode) - lo);
+operands[2] = GEN_INT (len);
+operands[3] = GEN_INT (lo);
+  })
+
+(define_insn_and_split "*bstrins__for_ior_mask"
+  

Re: [PATCH] LoongArch: Link c++ header directory in the default ABI to the toplevel.

2023-09-06 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-09-06 at 18:06 +0800, Yang Yujie wrote:
> When multilib is enabled, the c++ header directory of the default multilib
> variant needs to be linked to the toplevel since g++ does not search the
> toplevel in this case.
> 
> libstdc++-v3/ChangeLog:
> 
> * configure.host: Register t-loongarch in tmake_file.
> * config/cpu/loongarch/t-loongarch: New file.  Link c++ header
> directory in the default ABI to the toplevel.
> ---
>  libstdc++-v3/config/cpu/loongarch/t-loongarch | 12 
>  libstdc++-v3/configure.host   |  5 -
>  2 files changed, 16 insertions(+), 1 deletion(-)
>  create mode 100644 libstdc++-v3/config/cpu/loongarch/t-loongarch
> 
> diff --git a/libstdc++-v3/config/cpu/loongarch/t-loongarch
> b/libstdc++-v3/config/cpu/loongarch/t-loongarch
> new file mode 100644
> index 000..942eddeb3be
> --- /dev/null
> +++ b/libstdc++-v3/config/cpu/loongarch/t-loongarch
> @@ -0,0 +1,12 @@
> +LA_DEFAULT_MULTIDIR = $(shell $(CXX) --print-multi-directory)
> +TOPLEV_HEADERS = 
> $(DESTDIR)${gxx_include_dir}/${host_alias}/$(LA_DEFAULT_MULTIDIR)
> +
> +.PHONY: install-toplevel-link
> +install: install-toplevel-link
> +install-toplevel-link:
> +   if test x$(MULTIDO) != xtrue && \
> +  test x$(LA_DEFAULT_MULTIDIR) != x.; then \
> +   $(MKDIR_P) "$(dir $(TOPLEV_HEADERS))"; \
> +   rm -rf "$(TOPLEV_HEADERS)"; \
> +   $(LN_S) ../ "$(TOPLEV_HEADERS)"; \

>From autoconf info page:

 -- Macro: AC_PROG_LN_S
 If ‘ln -s’ works on the current file system (the operating system
 and file system support symbolic links), set the output variable
 ‘LN_S’ to ‘ln -s’; otherwise, if ‘ln’ works, set ‘LN_S’ to ‘ln’,
 and otherwise set it to ‘cp -pR’.

 If you make a link in a directory other than the current directory,
 its meaning depends on whether ‘ln’ or ‘ln -s’ is used.  To safely
 create links using ‘$(LN_S)’, either find out which form is used
 and adjust the arguments, or always invoke ‘ln’ in the directory
 where the link is to be created.

 In other words, it does not work to do:
  $(LN_S) foo /x/bar

 Instead, do:

  (cd /x && $(LN_S) foo bar)

But for this special case we cannot "cp -pR ../ $(TOPLEV_HEADERS)"
either:

$ cp ../* -pR something
cp: cannot copy a directory, '../g', into itself, 'h/g'

So I guess we'll need something like

if ln -s ../ "$(TOPLEV_HEADERS)"; then
  # OK!
  true
else
  # system does not support symlink :(
  # install another copy of toplevel headers into default multilib subdir
  TODO: 
fi

And all libstdc++ patches should Cc: libstd...@gcc.gnu.org.

> +   fi
> diff --git a/libstdc++-v3/configure.host b/libstdc++-v3/configure.host
> index 9e7c7f02dfd..9dc42ad3edb 100644
> --- a/libstdc++-v3/configure.host
> +++ b/libstdc++-v3/configure.host
> @@ -315,7 +315,10 @@ esac
>  # Set any OS-dependent and CPU-dependent bits.
>  # THIS TABLE IS SORTED.  KEEP IT THAT WAY.
>  case "${host}" in
> -  *-*-linux* | *-*-uclinux*)
> + loongarch*)
> +    tmake_file="cpu/loongarch/t-loongarch"
> +    ;;
> + *-*-linux* | *-*-uclinux*)
>  case "${host_cpu}" in
>    i[567]86)
>  abi_baseline_pair=i486-linux-gnu

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-05 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-31 at 20:48 +0800, Yang Yujie wrote:
> * Support options for LoongArch SIMD extensions:
>   new configure options --with-simd={none,lsx,lasx};
>   new compiler option -msimd={none,lsx,lasx};
>   new driver options -m[no]-l[a]sx.

Hmm... In my build (a cross compiler configured with ../gcc/configure --
target=loongarch64-linux-gnu --with-system-zlib) I have:

$ cat lasx.c
int x __attribute__((vector_size(32)));
int y __attribute__((vector_size(32)));
void test(void) { x += y; }
$ gcc/cc1 lasx.c -msimd=lasx -o- -nostdinc -mexplicit-relocs -O2

... ...

pcalau12i   $r12,%pc_hi20(.LANCHOR0)
addi.d  $r12,$r12,%pc_lo12(.LANCHOR0)
xvld$xr0,$r12,0
xvld$xr1,$r12,32
xvadd.w $xr0,$xr0,$xr1
xvst$xr0,$r12,0
jr  $r1

... ...

This seems perfectly fine.  But:

$ gcc/xgcc -B gcc lasx.c -mlasx -o- -nostdinc -mexplicit-relocs -O2 -S

... ...

test:
.LFB0 = .
pcalau12i   $r12,%pc_hi20(.LANCHOR0)
addi.d  $r12,$r12,%pc_lo12(.LANCHOR0)
addi.d  $r3,$r3,-16
.LCFI0 = .
st.d$r23,$r3,8
.LCFI1 = .
ldptr.w $r7,$r12,0
ldptr.w $r23,$r12,32
ldptr.w $r6,$r12,8

... ... (no SIMD instructions)

Is this a bug in the driver or I missed something?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-05 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-09-05 at 20:01 +0800, chenglulu wrote:
> 
> 在 2023/9/5 下午7:51, Xi Ruoyao 写道:
> > On Thu, 2023-08-31 at 20:48 +0800, Yang Yujie wrote:
> > >   /* Note: optimize_size may vary across functions,
> > >  while -m[no]-memcpy imposes a global constraint.  */
> > >   #define TARGET_DO_OPTIMIZE_BLOCK_MOVE_P
> > > loongarch_do_optimize_block_move_p()
> > >   
> > > -#ifndef HAVE_AS_EXPLICIT_RELOCS
> > > -#define HAVE_AS_EXPLICIT_RELOCS 0
> > > -#endif
> > > -
> > This causes a build failure with older assembler:
> > 
> > options.cc:3040:3: error: 'HAVE_AS_EXPLICIT_RELOCS' was not declared in 
> > this scope; did you mean 'TARGET_EXPLICIT_RELOCS'?
> >   3040 |   HAVE_AS_EXPLICIT_RELOCS, /* TARGET_EXPLICIT_RELOCS */
> >    |   ^~~
> >    |   TARGET_EXPLICIT_RELOCS
> > 
> > Why this is removed?  If this is an unintentionally change I'll add it
> > back.
> > 
> Sorry, this was deleted accidentally.
> 
> Thanks!

Added the 3 lines back at r14-3706.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-05 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-31 at 20:48 +0800, Yang Yujie wrote:
>  /* Note: optimize_size may vary across functions,
>     while -m[no]-memcpy imposes a global constraint.  */
>  #define TARGET_DO_OPTIMIZE_BLOCK_MOVE_P 
> loongarch_do_optimize_block_move_p()
>  
> -#ifndef HAVE_AS_EXPLICIT_RELOCS
> -#define HAVE_AS_EXPLICIT_RELOCS 0
> -#endif
> -

This causes a build failure with older assembler:

options.cc:3040:3: error: 'HAVE_AS_EXPLICIT_RELOCS' was not declared in this 
scope; did you mean 'TARGET_EXPLICIT_RELOCS'?
 3040 |   HAVE_AS_EXPLICIT_RELOCS, /* TARGET_EXPLICIT_RELOCS */
  |   ^~~
  |   TARGET_EXPLICIT_RELOCS

Why this is removed?  If this is an unintentionally change I'll add it
back.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Support loading floating-point zero into MEM[base + index].

2023-09-01 Thread Xi Ruoyao via Gcc-patches
LGTM.

Nit: it should be "storing" floating-point zero into MEM, not "loading".

On Sat, 2023-09-02 at 12:47 +0800, Guo Jie wrote:
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch.md: Support 'G' -> 'k' in
> movsf_hardfloat and movdf_hardfloat.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/const-double-zero-stx.c: New test.
> 
> ---
>  gcc/config/loongarch/loongarch.md  | 12 ++--
>  .../loongarch/const-double-zero-stx.c  | 18 ++
>  2 files changed, 24 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c
> 
> diff --git a/gcc/config/loongarch/loongarch.md 
> b/gcc/config/loongarch/loongarch.md
> index b37e070660f..6f47c23a79c 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -1915,13 +1915,13 @@ (define_expand "movsf"
>  })
>  
>  (define_insn "*movsf_hardfloat"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" 
> "=f,f,f,m,f,k,m,*f,*r,*r,*r,*m")
> -   (match_operand:SF 1 "move_operand" "f,G,m,f,k,f,G,*r,*f,*G*r,*m,*r"))]
> +  [(set (match_operand:SF 0 "nonimmediate_operand" 
> "=f,f,f,m,f,k,m,k,*f,*r,*r,*r,*m")
> +   (match_operand:SF 1 "move_operand" 
> "f,G,m,f,k,f,G,G,*r,*f,*G*r,*m,*r"))]
>    "TARGET_HARD_FLOAT
>     && (register_operand (operands[0], SFmode)
>     || reg_or_0_operand (operands[1], SFmode))"
>    { return loongarch_output_move (operands[0], operands[1]); }
> -  [(set_attr "move_type" 
> "fmove,mgtf,fpload,fpstore,fpload,fpstore,store,mgtf,mftg,move,load,store")
> +  [(set_attr "move_type" 
> "fmove,mgtf,fpload,fpstore,fpload,fpstore,store,store,mgtf,mftg,move,load,store")
>     (set_attr "mode" "SF")])
>  
>  (define_insn "*movsf_softfloat"
> @@ -1946,13 +1946,13 @@ (define_expand "movdf"
>  })
>  
>  (define_insn "*movdf_hardfloat"
> -  [(set (match_operand:DF 0 "nonimmediate_operand" 
> "=f,f,f,m,f,k,m,*f,*r,*r,*r,*m")
> -   (match_operand:DF 1 "move_operand" "f,G,m,f,k,f,G,*r,*f,*r*G,*m,*r"))]
> +  [(set (match_operand:DF 0 "nonimmediate_operand" 
> "=f,f,f,m,f,k,m,k,*f,*r,*r,*r,*m")
> +   (match_operand:DF 1 "move_operand" 
> "f,G,m,f,k,f,G,G,*r,*f,*r*G,*m,*r"))]
>    "TARGET_DOUBLE_FLOAT
>     && (register_operand (operands[0], DFmode)
>     || reg_or_0_operand (operands[1], DFmode))"
>    { return loongarch_output_move (operands[0], operands[1]); }
> -  [(set_attr "move_type" 
> "fmove,mgtf,fpload,fpstore,fpload,fpstore,store,mgtf,mftg,move,load,store")
> +  [(set_attr "move_type" 
> "fmove,mgtf,fpload,fpstore,fpload,fpstore,store,store,mgtf,mftg,move,load,store")
>     (set_attr "mode" "DF")])
>  
>  (define_insn "*movdf_softfloat"
> diff --git a/gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c 
> b/gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c
> new file mode 100644
> index 000..8fb04be8ff5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-times {stx\..\t\$r0} 2 } } */
> +
> +extern float arr_f[];
> +extern double arr_d[];
> +
> +void
> +test_f (int base, int index)
> +{
> +  arr_f[base + index] = 0.0;
> +}
> +
> +void
> +test_d (int base, int index)
> +{
> +  arr_d[base + index] = 0.0;
> +}

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v6 0/4] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-31 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-31 at 17:08 +0800, Chenghui Pan wrote:
> This is an update of:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628303.html
> 
> Changes since last version of patch set:
> - "dg-skip-if"-related Changes of the g++.dg/torture/vshuf* testcases are 
> reverted.
>   (Replaced by __builtin_shuffle fix)
> - Add fix of __builtin_shuffle() for Loongson SX/ASX (Implemeted by adding
>   vand/xvand insn in front of shuffle operation). There's no significant 
> performance
>   impact in current state.

I think it's the correct fix, thanks!

I'm still unsure about the "partly saved register" issue (I'll need to
resolve similar issues for "ILP32 ABI on loongarch64") but it seems GCC
just don't attempt to preserve any vectors in register across function
call.

After the patches are committed I (and Xuerui, maybe) will perform full
system rebuild with LASX enabled to see if there are subtle issues.  IMO
we still have plenty of time to fix them (if there are any) before GCC
14 release.

> - Rebased on the top of Yang Yujie's latest target configuration interface 
> patch set
>   (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628772.html).
> 
> Brief history of patch set:
> v1 -> v2:
> - Reduce usage of "unspec" in RTL template.
> - Append Support of ADDR_REG_REG in LSX and LASX.
> - Constraint docs are appended in gcc/doc/md.texi and ccomment block.
> - Codes related to vecarg are removed.
> - Testsuite of LSX and LASX is added in v2. (Because of the size limitation of
>   mail list, these patches are not shown)
> - Adjust the loongarch_expand_vector_init() function to reduce instruction 
> output amount.
> - Some minor implementation changes of RTL templates.
> 
> v2 -> v3:
> - Revert vabsd/xvabsd RTL templates to unspec impl.
> - Resolve warning in gcc/config/loongarch/loongarch.cc when bootstrapping 
>   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
> - Remove redundant definitions in lasxintrin.h.
> - Refine commit info.
> 
> v3 -> v4:
> - Code simplification.
> - Testsuite patches are splited from this patch set again and will be
>   submitted independently in the future.
> 
> v4 -> v5:
> - Regression test fix (pr54346.c)
> - Combine vilvh/xvilvh insn's RTL template impl.
> - Add dg-skip-if for loongarch*-*-* in vshuf test inside g++.dg/torture
>   (reverted in this version)
> 
> Lulu Cheng (4):
>   LoongArch: Add Loongson SX base instruction support.
>   LoongArch: Add Loongson SX directive builtin function support.
>   LoongArch: Add Loongson ASX base instruction support.
>   LoongArch: Add Loongson ASX directive builtin function support.
> 
>  gcc/config.gcc    |    2 +-
>  gcc/config/loongarch/constraints.md   |  131 +-
>  gcc/config/loongarch/genopts/loongarch.opt.in |    4 +
>  gcc/config/loongarch/lasx.md  | 5104 
>  gcc/config/loongarch/lasxintrin.h | 5338 +
>  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
>  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
>  gcc/config/loongarch/loongarch-modes.def  |   39 +
>  gcc/config/loongarch/loongarch-protos.h   |   35 +
>  gcc/config/loongarch/loongarch.cc | 4751 ++-
>  gcc/config/loongarch/loongarch.h  |  117 +-
>  gcc/config/loongarch/loongarch.md |   56 +-
>  gcc/config/loongarch/loongarch.opt    |    4 +
>  gcc/config/loongarch/lsx.md   | 4467 ++
>  gcc/config/loongarch/lsxintrin.h  | 5181 
>  gcc/config/loongarch/predicates.md    |  333 +-
>  gcc/doc/md.texi   |   11 +
>  17 files changed, 28645 insertions(+), 280 deletions(-)
>  create mode 100644 gcc/config/loongarch/lasx.md
>  create mode 100644 gcc/config/loongarch/lasxintrin.h
>  create mode 100644 gcc/config/loongarch/lsx.md
>  create mode 100644 gcc/config/loongarch/lsxintrin.h
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v5] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-31 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-31 at 15:02 +0800, chenxiaolong wrote:
> +;; Implement __builtin_copysignf128 function.
> +
> +(define_insn_and_split "copysigntf3"
> +  [(set (match_operand:TF 0 "register_operand" "=")
> +   (unspec:TF [(match_operand:TF 1 "register_operand" "r")
> +   (match_operand:TF 2 "register_operand" "r")]
> +   UNSPEC_COPYSIGNF128))]
> +  "TARGET_64BIT"
> +  "#"
> +  "reload_completed"
> + [(const_int 0)]
> +{
> +  rtx op0_lo = gen_rtx_REG (DImode,REGNO (operands[0]) + 0);
> +  rtx op0_hi = gen_rtx_REG (DImode,REGNO (operands[0]) + 1);
> +  rtx op1_lo = gen_rtx_REG (DImode,REGNO (operands[1]) + 0);
> +  rtx op1_hi = gen_rtx_REG (DImode,REGNO (operands[1]) + 1);
> +  rtx op2_hi = gen_rtx_REG (DImode,REGNO (operands[2]) + 1);
> +
> +  if (REGNO (operands[1]) == REGNO (operands[2]))
> +    {
> +  loongarch_emit_move (operands[0], operands[1]);
> +  DONE;
> +    }
> +  else
> +    {
> +  loongarch_emit_move (op0_hi, op2_hi);
> +  loongarch_emit_move (op0_lo, op1_lo);
> +  emit_insn (gen_insvdi (op0_hi, GEN_INT (63), GEN_INT (0), op1_hi));
> +  DONE;
> +    }
> +})

Please remove this part too, for now.  I'm trying to figure out a more
generic fix, and if I fail we can add this part later.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-30 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-31 at 10:46 +0800, chenxiaolong wrote:
> +;; Implement __builtin_fabs128 function.
> +
> +(define_expand "abstf2"
> +  [(match_operand:TF 0 "register_operand")
> +   (match_operand:TF 1 "register_operand")]
> +  "TARGET_64BIT"
> +{
> +  loongarch_emit_move (operands[0], operands[1]);
> +  emit_insn (gen_abstf_local (operands[0]));
> +  DONE;
> +})
> +
> +(define_insn "abstf_local"
> +  [(set (match_operand:TF 0 "register_operand" "+r")
> +   (abs:TF (match_dup 0)))]
> +  "TARGET_64BIT"
> +{
> +  operands[0] = gen_rtx_REG (DImode, REGNO (operands[0]) + 1);
> +  return "bstrins.d\t%0,$r0,0x3f,0x3f";
> +})

This should be removed because the "generic" expand works fine:

$ cat t.c
_Float128 fabsf128 (_Float128 in)
{
  return __builtin_fabsf128 (in);
}
$ cc t.c -S -O2 -o-
fabsf128:
.LFB0 = .
.cfi_startproc
bstrpick.d  $r5,$r5,62,0
jr  $r1
.cfi_endproc

It does not work with -O0, but -O0 means "not optimized" anyway.

> +;; Implement __builtin_copysignf128 function.
> +
> +(define_insn_and_split "copysigntf3"
> +  [(set (match_operand:TF 0 "register_operand" "=")
> +   (unspec:TF [(match_operand:TF 1 "register_operand" "r")
> +   (match_operand:TF 2 "register_operand" "r")]
> +   UNSPEC_COPYSIGNF128))]
> +  "TARGET_64BIT"
> +  "#"
> +  "reload_completed"
> + [(const_int 0)]
> +{
> +  rtx op0_lo = gen_rtx_REG (DImode,REGNO (operands[0]) + 0);
> +  rtx op0_hi = gen_rtx_REG (DImode,REGNO (operands[0]) + 1);
> +  rtx op1_lo = gen_rtx_REG (DImode,REGNO (operands[1]) + 0);
> +  rtx op1_hi = gen_rtx_REG (DImode,REGNO (operands[1]) + 1);
> +  rtx op2_hi = gen_rtx_REG (DImode,REGNO (operands[2]) + 1);
> +
> +  if (REGNO (operands[1]) == REGNO (operands[2]))
> +    {
> +  loongarch_emit_move (operands[0], operands[1]);
> +  DONE;
> +    }
> +  else
> +    {
> +  loongarch_emit_move (op0_hi, op2_hi);
> +  loongarch_emit_move (op0_lo, op1_lo);
> +  emit_insn (gen_insvdi (op0_hi, GEN_INT (63), GEN_INT (0), op1_hi));
> +  DONE;
> +    }
> +})

Hmm... The generic implementation does not work:

copysignf128:
.LFB0 = .
.cfi_startproc
or  $r12,$r0,$r0
lu52i.d $r12,$r12,0x8000>>52
and $r7,$r7,$r12
bstrpick.d  $r5,$r5,62,0
or  $r5,$r5,$r7
jr  $r1
.cfi_endproc

It's sub-optimal.  But there seems a general issue about cases like

int test(int a, int b)
{
  return (a & ~0x10) | (b & 0x10);
}

It's compiled to:

test:
.LFB0 = .
.cfi_startproc
addi.w  $r12,$r0,-17# 0xffef
and $r12,$r12,$r4
andi$r5,$r5,16
or  $r12,$r12,$r5
slli.w  $r4,$r12,0
jr  $r1
.cfi_endproc

But the optimal implementation should be:

bstrpick.w $r4, $r4, 4, 4
bstrins.w  $r5, $r4, 4, 4
or $r5, $r4, $r0

So to me we should fix the general case instead.  Please hold this part
(you can commit the remains of the patch w/o the loongarch.md change for
now), and I'll try to fix the general case.

Created https://gcc.gnu.org/PR111252 for tracking the issue.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: RFC: Introduce -fhardened to enable security-related flags

2023-08-30 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-08-29 at 15:42 -0400, Marek Polacek via Gcc-patches wrote:
> + if (UNLIKELY (flag_hardened)
> + && (opt->code == OPT_D || opt->code == OPT_U))
> +   {
> + if (!fortify_seen_p)
> +   fortify_seen_p = !strncmp (opt->arg, "_FORTIFY_SOURCE", 15);
> + if (!cxx_assert_seen_p)
> +   cxx_assert_seen_p = !strcmp (opt->arg, "_GLIBCXX_ASSERTIONS");

It looks like there is some minor logic issue here: the first strncmp
will mistakenly match "-D_FORTIFY_SOURCE_FAKE", and the second strcmp
will not match "-D_GLIBCXX_ASSERTIONS=1".

> +   }

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2 3/4] LoongArch: add new configure option --with-strict-align-lib

2023-08-30 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-08-30 at 14:51 +0800, Yujie Yang wrote:
> > > LoongArch processors may not support memory accesses without natural
> > > alignments.  Building libraries with -mstrict-align may help with
> > > toolchain binary compatiblity and performance on these implementations
> > > (e.g. Loongson 2K1000LA).
> > > 
> > > No significant performance degredation is observed on current mainstream
> > > LoongArch processors when the option is enabled.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * config.gcc: use -mstrict-align for building libraries
> > > if --with-strict-align-lib is given.
> > 
> > Isn't this equivalent to --with-default-multilib=mno-strict-align now?
> > 
> > And I still believe the easiest way for 2K1000LA is adding -march=la264
> > support so the user can simply configure with --with-arch=la264.
> 
> Not exactly -- Options given in --with-multilib-default= will not be applied
> to multilib variants that have build options specified in 
> --with-multilib-list,
> but --with-strict-align-lib is always effective.
> 
> e.g. for the following configuration:
> 
>   --with-multilib-default=mstrict-align
>   --with-multilib-list=lp64d/la464,lp64s
> 
> The library build options would be:
> 
>   base/lp64d variant: -mabi=lp64d -march=la464 (no -mstrict-align appended)
>   base/lp64s variant: -mabi=lp64s -march=abi-default -mstrict-align
> 
> Sure, you can do it with --with-arch=la264. It's just a convenient
> switch that we can use for building generic toolchains.

If you want a generic toolchain, it should default to -mstrict-align as
well.  Or it will still do unexpected thing for cases like:

struct foo { char x; int y; } __attribute__ ((packed));

int get (struct foo *foo) { return foo->y; }

So it should be --with-strict-align (it should make the *compiler*
default to -mstrict-align).  But them it seems --with-arch=la264 is just
easier...

Or maybe we should add -march=la64-baseline (or another name?) as the
"bottom line" of a LA64 CPU.  Currently the definition of -
march=loongarch64 includes unaligned access and 64-bit FP support, so
IMO we should have a baseline definition if we need to support something
"below" loongarch64.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2 3/4] LoongArch: add new configure option --with-strict-align-lib

2023-08-29 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-08-30 at 09:58 +0800, Yang Yujie wrote:
> LoongArch processors may not support memory accesses without natural
> alignments.  Building libraries with -mstrict-align may help with
> toolchain binary compatiblity and performance on these implementations
> (e.g. Loongson 2K1000LA).
> 
> No significant performance degredation is observed on current mainstream
> LoongArch processors when the option is enabled.
> 
> gcc/ChangeLog:
> 
> * config.gcc: use -mstrict-align for building libraries
> if --with-strict-align-lib is given.

Isn't this equivalent to --with-default-multilib=mno-strict-align now?

And I still believe the easiest way for 2K1000LA is adding -march=la264
support so the user can simply configure with --with-arch=la264.

> ---
>  gcc/config.gcc | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 4fae672a3b7..ed70fa63268 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -4892,7 +4892,7 @@ case "${target}" in
> ;;
>  
> loongarch*-*)
> -   supported_defaults="abi arch tune fpu simd multilib-
> default"
> +   supported_defaults="abi arch tune fpu simd multilib-
> default strict-align-lib"
>  
> # Local variables
> unset \
> @@ -5089,6 +5089,17 @@ case "${target}" in
> ;;
> esac
>  
> +   # Build libraries with -mstrict-align if --with-
> strict-align-lib is given.
> +   case ${with_strict_align_lib} in
> +   yes) strict_align_opt="/mstrict-align" ;;
> +   ""|no)  ;;
> +   *)
> +   echo "Unknown option: --with-strict-align-
> lib=${with_strict_align_lib}" 1>&2
> +   exit 1
> +   ;;
> +   esac
> +
> +
> # Handle --with-multilib-default
> if echo "${with_multilib_default}" \
> | grep -E -e '[[:space:]]' -e '//' -e '/$' -e '^/' >
> /dev/null 2>&1; then
> @@ -5250,6 +5261,9 @@ case "${target}" in
> ;;
> esac
>  
> +   # Use mstrict-align for building libraries if
> --with-strict-align-lib is given.
> +   loongarch_multilib_list_make="${loongarch_mult
> ilib_list_make}${strict_align_opt}"
> +
> # Check for repeated configuration of the same
> multilib variant.
> if echo "${elem_abi_base}/${elem_abi_ext}" \
>  | grep -E "^(${all_abis%|})$" >/dev/null
> 2>&1; then

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: RFC: Top level configure: Require a minimum version 6.8 texinfo

2023-08-29 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-08-29 at 19:01 +0300, Eli Zaretskii via Gcc-patches wrote:
> > Date: Tue, 29 Aug 2023 17:45:20 +0200
> > Cc: gcc-patches@gcc.gnu.org, gdb-patc...@sourceware.org,
> >  binut...@sourceware.org
> > From: Jakub Jelinek via Gdb-patches 
> > 
> > On Tue, Aug 29, 2023 at 04:21:44PM +0100, Nick Clifton via Gcc-patches 
> > wrote:
> > >   Currently the top level configure.ac file sets the minimum required
> > >   version of texinfo to be 4.7.  I would like to propose changing this
> > >   to 6.8.
> > >   
> > >   The reason for the change is that the bfd documentation now needs at
> > >   least version 6.8 in order to build[1][2].  Given that 4.7 is now
> > >   almost 20 years old (it was released in April 2004), updating the
> > >   requirement to a newer version does seem reasonable.  On the other
> > >   hand 6.8 is quite new (it was released in March 2021), so a lot of
> > >   systems out there may not have it.
> > > 
> > >   Thoughts ?
> > 
> > I think that is too new.
> 
> It _is_ new.  But I also don't understand why Nick thinks he needs
> Texinfo 6.8.  AFAIR, makeinfo supported @node lines without explicit
> pointers since at least version 4.8.  I have on my disk the manual
> produced for Emacs 22.1, where the Texinfo sources have no pointers,
> e.g.:
> 
>   @node Abbrev Concepts
> 
> and the corresponding Info file says:
> 
>   This is ../info/emacs, produced by makeinfo version 4.8 from emacs.texi.
> 
> So I'm not sure what exactly is the feature that requires Texinfo 6.8.
> What am I missing?

FWIW I tried building Binutils-2.41 with Texinfo 6.7 and it built
successfully.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-08-29 Thread Xi Ruoyao via Gcc-patches
Hi Jeff,

Can you take a look at the patch?  It fixes a bootstrap failure on
LoongArch.  And in this month 3 related bugzilla tickets have been
created (110939, 24, 71).

On Thu, 2023-08-10 at 15:04 +0200, Stefan Schulze Frielinghaus via Gcc-
patches wrote:
> In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
> completely missed the fact that the normal form of a generated constant for a
> mode with fewer bits than in HOST_WIDE_INT is a sign extended version of the
> actual constant.  This even holds true for unsigned constants.
> 
> Fixed by masking out the upper bits for the incoming constant and sign
> extending the resulting unsigned constant.
> 
> Bootstrapped and regtested on x64 and s390x.  Ok for mainline?
> 
> While reading existing optimizations in combine I stumbled across two
> optimizations where either my intuition about the representation of
> unsigned integers via a const_int rtx is wrong, which then in turn would
> probably also mean that this patch is wrong, or that the optimizations
> are missed sometimes.  In other words in the following I would assume
> that the upper bits are masked out:

/* removed the inlined patch to avoid confusion */

> For example, while bootstrapping on x64 the optimization is missed since
> a LTU comparison in QImode is done and the constant equals
> 0xff80.
> 
> Sorry for inlining another patch, but I would really like to make sure
> that my understanding is correct, now, before I come up with another
> patch.  Thus it would be great if someone could shed some light on this.
> 
> gcc/ChangeLog:
> 
> * combine.cc (simplify_compare_const): Properly handle unsigned
> constants while narrowing comparison of memory and constants.
> ---
>  gcc/combine.cc | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index e46d202d0a7..468b7fde911 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>    && !MEM_VOLATILE_P (op0)
>    /* The optimization makes only sense for constants which are big enough
>  so that we have a chance to chop off something at all.  */
> -  && (unsigned HOST_WIDE_INT) const_op > 0xff
> -  /* Bail out, if the constant does not fit into INT_MODE.  */
> -  && (unsigned HOST_WIDE_INT) const_op
> -    < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 1) - 
> 1)
> +  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode)) > 
> 0xff
>    /* Ensure that we do not overflow during normalization.  */
> -  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
> HOST_WIDE_INT_M1U))
> +  && (code != GTU
> + || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode))
> +    < HOST_WIDE_INT_M1U)
> +  && trunc_int_for_mode (const_op, int_mode) == const_op)
>  {
> -  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
> +  unsigned HOST_WIDE_INT n
> +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
>    enum rtx_code adjusted_code;
>  
>    /* Normalize code to either LEU or GEU.  */
> @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
> HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
> HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode),
> GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
> -   (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME 
> (adjusted_code),
> -   n);
> +   (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode),
> +   GET_RTX_NAME (adjusted_code), n);
>     }
>   poly_int64 offset = (BYTES_BIG_ENDIAN
>    ? 0
>    : (GET_MODE_SIZE (int_mode)
>   - GET_MODE_SIZE (narrow_mode_iter)));
>   *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset);
> - *pop1 = GEN_INT (n);
> + *pop1 = gen_int_mode (n, narrow_mode_iter);
>   return adjusted_code;
> }
>  }

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] LoongArch: Enable '-free' starting at -O2.

2023-08-28 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-28 at 11:46 +0800, Lulu Cheng wrote:
> v1 -> v2:
> 1. Modify Changelog information format.
> 
> gcc/ChangeLog:
> 
> * common/config/loongarch/loongarch-common.cc:
> Enable '-free' on O2 and above.
> * doc/invoke.texi: Modify the description information
> of the '-free' compilation option and add the LoongArch
> description.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/sign-extend.c: New test.

LGTM.

> ---
>  .../config/loongarch/loongarch-common.cc  |  1 +
>  gcc/doc/invoke.texi   |  4 +--
>  .../gcc.target/loongarch/sign-extend.c    | 25 +++
>  3 files changed, 28 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/sign-extend.c
> 
> diff --git a/gcc/common/config/loongarch/loongarch-common.cc 
> b/gcc/common/config/loongarch/loongarch-common.cc
> index fce32fa3f8d..c5ed37d27a6 100644
> --- a/gcc/common/config/loongarch/loongarch-common.cc
> +++ b/gcc/common/config/loongarch/loongarch-common.cc
> @@ -35,6 +35,7 @@ static const struct default_options 
> loongarch_option_optimization_table[] =
>  {
>    { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
>    { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
> +  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
>    { OPT_LEVELS_NONE, 0, NULL, 0 }
>  };
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a32dabf0405..16aa92b5e86 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12639,8 +12639,8 @@ Attempt to remove redundant extension instructions.  
> This is especially
>  helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
>  registers after writing to their lower 32-bit half.
>  
> -Enabled for Alpha, AArch64, PowerPC, RISC-V, SPARC, h83000 and x86 at levels
> -@option{-O2}, @option{-O3}, @option{-Os}.
> +Enabled for Alpha, AArch64, LoongArch, PowerPC, RISC-V, SPARC, h83000 and 
> x86 at
> +levels @option{-O2}, @option{-O3}, @option{-Os}.
>  
>  @opindex fno-lifetime-dse
>  @opindex flifetime-dse
> diff --git a/gcc/testsuite/gcc.target/loongarch/sign-extend.c 
> b/gcc/testsuite/gcc.target/loongarch/sign-extend.c
> new file mode 100644
> index 000..3f339d06bbd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/sign-extend.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mabi=lp64d -O2" } */
> +/* { dg-final { scan-assembler-times "slli.w" 1 } } */
> +
> +extern int PL_savestack_ix;
> +extern int PL_regsize;
> +extern int PL_savestack_max;
> +void Perl_savestack_grow_cnt (int need);
> +extern void Perl_croak (char *);
> +
> +int
> +S_regcppush(int parenfloor)
> +{
> +  int retval = PL_savestack_ix;
> +  int paren_elems_to_push = (PL_regsize - parenfloor) * 4;
> +  int p;
> +
> +  if (paren_elems_to_push < 0)
> +    Perl_croak ("panic: paren_elems_to_push < 0");
> +
> +  if (PL_savestack_ix + (paren_elems_to_push + 6) > PL_savestack_max)
> +    Perl_savestack_grow_cnt (paren_elems_to_push + 6);
> +
> +  return retval;
> +}

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v5 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-23 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-24 at 11:40 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Thu, 2023-08-24 at 11:13 +0800, Chenghui Pan wrote:
> > - Add dg-skip-if for loongarch*-*-* in vshuf test in g++.dg/torture, because
> >   vshuf/xvshuf insn's result is undefined when 6 or 7 bit of vector's 
> > element is set,
> >   and insns with this condition are generated in these testcases.
> 
> I'm almost sure this is wrong.  You need to fix the code generation so
> __builtin_shuffle will always generate something defined on LoongArch,
> instead of covering up the issue.

https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html says clearly:

   The elements of the input vectors are numbered in memory ordering of
   vec0 beginning at 0 and vec1 beginning at N. The elements of mask are
   considered modulo N in the single-operand case and modulo 2*N in the
   two-operand case.
   
So there is no undefined thing allowed here.  You must implement it as it's
documented.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v5 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-23 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-24 at 11:13 +0800, Chenghui Pan wrote:
> - Add dg-skip-if for loongarch*-*-* in vshuf test in g++.dg/torture, because
>   vshuf/xvshuf insn's result is undefined when 6 or 7 bit of vector's element 
> is set,
>   and insns with this condition are generated in these testcases.

I'm almost sure this is wrong.  You need to fix the code generation so
__builtin_shuffle will always generate something defined on LoongArch,
instead of covering up the issue.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


PING^2: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-08-23 Thread Xi Ruoyao via Gcc-patches
Ping again.

On Fri, 2023-08-18 at 13:04 +0200, Stefan Schulze Frielinghaus via Gcc-patches 
wrote:
> Ping.  Since this fixes bootstrap problem PR110939 for Loongarch I'm
> pingen this one earlier.
> 
> On Thu, Aug 10, 2023 at 03:04:03PM +0200, Stefan Schulze Frielinghaus wrote:
> > In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
> > completely missed the fact that the normal form of a generated constant for 
> > a
> > mode with fewer bits than in HOST_WIDE_INT is a sign extended version of the
> > actual constant.  This even holds true for unsigned constants.
> > 
> > Fixed by masking out the upper bits for the incoming constant and sign
> > extending the resulting unsigned constant.
> > 
> > Bootstrapped and regtested on x64 and s390x.  Ok for mainline?
> > 
> > While reading existing optimizations in combine I stumbled across two
> > optimizations where either my intuition about the representation of
> > unsigned integers via a const_int rtx is wrong, which then in turn would
> > probably also mean that this patch is wrong, or that the optimizations
> > are missed sometimes.  In other words in the following I would assume
> > that the upper bits are masked out:
> > 
> > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > index 468b7fde911..80c4ff0fbaf 100644
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> >    /* (unsigned) < 0x8000 is equivalent to >= 0.  */
> >    else if (is_a  (mode, _mode)
> >    && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
> > -  && ((unsigned HOST_WIDE_INT) const_op
> > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > (int_mode))
> >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 
> > 1)))
> >     {
> >   const_op = 0;
> > @@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> >    /* (unsigned) >= 0x8000 is equivalent to < 0.  */
> >    else if (is_a  (mode, _mode)
> >    && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
> > -  && ((unsigned HOST_WIDE_INT) const_op
> > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > (int_mode))
> >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 
> > 1)))
> >     {
> >   const_op = 0;
> > 
> > For example, while bootstrapping on x64 the optimization is missed since
> > a LTU comparison in QImode is done and the constant equals
> > 0xff80.
> > 
> > Sorry for inlining another patch, but I would really like to make sure
> > that my understanding is correct, now, before I come up with another
> > patch.  Thus it would be great if someone could shed some light on this.
> > 
> > gcc/ChangeLog:
> > 
> > * combine.cc (simplify_compare_const): Properly handle unsigned
> > constants while narrowing comparison of memory and constants.
> > ---
> >  gcc/combine.cc | 19 ++-
> >  1 file changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > index e46d202d0a7..468b7fde911 100644
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> >    && !MEM_VOLATILE_P (op0)
> >    /* The optimization makes only sense for constants which are big 
> > enough
> >  so that we have a chance to chop off something at all.  */
> > -  && (unsigned HOST_WIDE_INT) const_op > 0xff
> > -  /* Bail out, if the constant does not fit into INT_MODE.  */
> > -  && (unsigned HOST_WIDE_INT) const_op
> > -    < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 1) 
> > - 1)
> > +  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode)) > 
> > 0xff
> >    /* Ensure that we do not overflow during normalization.  */
> > -  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
> > HOST_WIDE_INT_M1U))
> > +  && (code != GTU
> > + || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode))
> > +    < HOST_WIDE_INT_M1U)
> > +  && trunc_int_for_mode (const_op, int_mode) == const_op)
> >  {
> > -  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
> > +  unsigned HOST_WIDE_INT n
> > +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
> >    enum rtx_code adjusted_code;
> >  
> >    /* Normalize code to either LEU or GEU.  */
> > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> > HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
> > HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode),
> > GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
> > -   (unsigned HOST_WIDE_INT)const_op, 

Re: [PATCH v1] libffi: Backport of LoongArch support for libffi.

2023-08-22 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-08-22 at 20:42 +0800, Lulu Cheng wrote:
> This is a backport of ,
> and contains modifications to commit 5a4774cd4d, as well as the LoongArch
> schema portion of commit ee22ecbd11. This is needed for libgo.
> 
> 
> libffi/ChangeLog:

Mention PR libffi/108682 in the ChangeLog here (if it's not pushed yet).

> * configure.host: Add LoongArch support.
> * Makefile.am: Likewise.
> * Makefile.in: Regenerate.
> * src/loongarch64/ffi.c: New file.
> * src/loongarch64/ffitarget.h: New file.
> * src/loongarch64/sysv.S: New file.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-20 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote:
> Seems ARMv8-A only guarantees to preserve low 64-bit value of
> NEON/floating-point register value. I'm not sure that I modify the
> testcase in the right way and maybe we need more investigations. Any
> ideas or suggestion?

Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my
test case is not valid:

"As with global register variables, it is recommended that you choose a
register that is normally saved and restored by function calls on your
machine, so that calls to library routines will not clobber it."

So when I use asm(name), the compiler has no obligation to guarantee
that it will ever work like a normal variable after a function call.

But I still need to verify that the compiler correctly understands only
the low 64 bits of the vector register is saved.  I'll try to make
another test case...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-18 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-08-18 at 15:05 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2023-08-18 at 14:58 +0800, Xi Ruoyao via Gcc-patches wrote:
> > On Fri, 2023-08-18 at 14:39 +0800, chenxiaolong wrote:
> > > 在 2023-08-17四的 15:08 +,Joseph Myers写道:
> > > > On Thu, 17 Aug 2023, Xi Ruoyao via Gcc-patches wrote:
> > > > 
> > > > > So I guess we just need
> > > > > 
> > > > > builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > > > builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > > > 
> > > > > etc. to map the "q" builtins to "f128" builtins if we really need
> > > > > the
> > > > > "q" builtins.
> > > > > 
> > > > > Joseph: the problem here is many customers of LoongArch CPUs wish
> > > > > to
> > > > > compile their old code with minimal change.  Is it acceptable to
> > > > > add
> > > > > these builtin_define's like rs6000-c.cc?  Note "a new architecture"
> > > > > does
> > > > > not mean we'll only compile post-C2x-era programs onto it.
> > > > 
> > > > The powerpc support for __float128 started in GCC 6, predating the
> > > > support 
> > > > for _FloatN type names, built-in functions etc. in GCC 7 - that's
> > > > why 
> > > > there's such backwards compatibility support there.  That name only
> > > > exists 
> > > > on a few architectures.
> > > > 
> > > > If people really want to compile code using the old __float128 names
> > > > for 
> > > > LoongArch I suppose you could have such #defines, but it would be
> > > > better 
> > > > for people to make their code use the standard names (as supported
> > > > from 
> > > > GCC 7 onwards, though only from GCC 13 in C++) and then put
> > > > backwards 
> > > > compatibility in their code for using the __float128 names if they
> > > > want to 
> > > > support the type with older GCC (GCC 6 or before for C; GCC 12 or
> > > > before 
> > > > for C++) on x86_64 / i386 / powerpc / ia64.  Such backwards
> > > > compatibility 
> > > > in user code is more likely to be relevant for C++ than for C, given
> > > > how 
> > > > the C++ support was added to GCC much more recently.  (Note: I
> > > > haven't 
> > > > checked when other compilers added support for the _Float128 name or
> > > > associated built-in functions, whether for C or for C++, which might
> > > > also 
> > > > affect when user code wants such compatibility.)
> > > > 
> > > Thank you for your valuable comments. On the LoongArch architecture,
> > > the "__float128" type is associated with float128_type_node and the "q"
> > > suffix function is mapped to the "f128" function. This allows
> > > compatibility with both "__float128" and "_Float128" types in the GCC
> > > compiler. The new code is modified as follows:
> > >   Add the following to the loongarch-builtins.c file:
> > > +lang_hooks.types.register_builtin_type (float128_type_node,
> > > "__float128");
> > >   Add the following to the loongarch-c.c file:
> > > +builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > +builtin_define ("__builtin_copysignq=__builtin_copysignf128");
> > > +builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > +builtin_define ("__builtin_nansq=__builtin_nansf128");
> > > +builtin_define ("__builtin_infq=__builtin_inff128");
> > > +builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
> > > 
> > >  The regression tests of the six functions were added without problems.
> > > However, the implementation of the __builtin_nansq() function does not
> > > get the result we want. The questions are as follows:
> > >  x86_64:
> > >     _Float128 ret=__builtin_nansf128("NAN");
> > > 
> > >     compiled to (with gcc test.c -O2 ):
> > > .cfi_offset 1, -8
> > > bl  %plt(__builtin_nansf128)
> > >     ..
> > >  LoongArch:
> > >     _Float128 ret=__builtin_nansf128("NAN");
> > >   compiled to (with gcc test.c -O2 ):
> > > .cfi_offset 1

Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-18 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-08-18 at 14:58 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2023-08-18 at 14:39 +0800, chenxiaolong wrote:
> > 在 2023-08-17四的 15:08 +,Joseph Myers写道:
> > > On Thu, 17 Aug 2023, Xi Ruoyao via Gcc-patches wrote:
> > > 
> > > > So I guess we just need
> > > > 
> > > > builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > > builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > > 
> > > > etc. to map the "q" builtins to "f128" builtins if we really need
> > > > the
> > > > "q" builtins.
> > > > 
> > > > Joseph: the problem here is many customers of LoongArch CPUs wish
> > > > to
> > > > compile their old code with minimal change.  Is it acceptable to
> > > > add
> > > > these builtin_define's like rs6000-c.cc?  Note "a new architecture"
> > > > does
> > > > not mean we'll only compile post-C2x-era programs onto it.
> > > 
> > > The powerpc support for __float128 started in GCC 6, predating the
> > > support 
> > > for _FloatN type names, built-in functions etc. in GCC 7 - that's
> > > why 
> > > there's such backwards compatibility support there.  That name only
> > > exists 
> > > on a few architectures.
> > > 
> > > If people really want to compile code using the old __float128 names
> > > for 
> > > LoongArch I suppose you could have such #defines, but it would be
> > > better 
> > > for people to make their code use the standard names (as supported
> > > from 
> > > GCC 7 onwards, though only from GCC 13 in C++) and then put
> > > backwards 
> > > compatibility in their code for using the __float128 names if they
> > > want to 
> > > support the type with older GCC (GCC 6 or before for C; GCC 12 or
> > > before 
> > > for C++) on x86_64 / i386 / powerpc / ia64.  Such backwards
> > > compatibility 
> > > in user code is more likely to be relevant for C++ than for C, given
> > > how 
> > > the C++ support was added to GCC much more recently.  (Note: I
> > > haven't 
> > > checked when other compilers added support for the _Float128 name or
> > > associated built-in functions, whether for C or for C++, which might
> > > also 
> > > affect when user code wants such compatibility.)
> > > 
> > Thank you for your valuable comments. On the LoongArch architecture,
> > the "__float128" type is associated with float128_type_node and the "q"
> > suffix function is mapped to the "f128" function. This allows
> > compatibility with both "__float128" and "_Float128" types in the GCC
> > compiler. The new code is modified as follows:
> >   Add the following to the loongarch-builtins.c file:
> > +lang_hooks.types.register_builtin_type (float128_type_node,
> > "__float128");
> >   Add the following to the loongarch-c.c file:
> > +builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > +builtin_define ("__builtin_copysignq=__builtin_copysignf128");
> > +builtin_define ("__builtin_nanq=__builtin_nanf128");
> > +builtin_define ("__builtin_nansq=__builtin_nansf128");
> > +builtin_define ("__builtin_infq=__builtin_inff128");
> > +builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
> > 
> >  The regression tests of the six functions were added without problems.
> > However, the implementation of the __builtin_nansq() function does not
> > get the result we want. The questions are as follows:
> >  x86_64:
> >     _Float128 ret=__builtin_nansf128("NAN");
> > 
> >     compiled to (with gcc test.c -O2 ):
> > .cfi_offset 1, -8
> > bl  %plt(__builtin_nansf128)
> >     ..
> >  LoongArch:
> >     _Float128 ret=__builtin_nansf128("NAN");
> >   compiled to (with gcc test.c -O2 ):
> > .cfi_offset 1, -8
> > bl  %plt(__builtin_nansf128)
> 
> It seems wrong.  It should be "bl %plt(nansf128)" instead, without the
> __builtin_ prefix so the implementation in libm (from Glibc) will be
> used instead.  AFAIK __builtin_nan and __builtin_nans are rarely called
> with a non-empty tagp so it's not worthy to inline the implementation
> for non-empty tagp here.
> 
> The same issue happens on x86_64:
> 
> call    __builtin_nansf128@PLT
> 
> __builtin_nanf128 compiles correct:
> 
> call    nanf128@PLT
> 
> I'll see if there is a ticket in https://gcc.gnu.org/bugzilla.  If not
> I'll create one.

Alright, Glibc does not have a "nansf128" function yet.  Actually there
is even no "nans" function for the plain double type.  So even a plain
__builtin_nans("114") won't work too.

If we'll fix this, we need to do it in a generic, target-independent way
(i. e. fix it all at once for all targets).

So for now, and for LoongArch specific code, the proper thing to do is
aliasing float128_type_node as __float128 and the six
__builtin_define's.

Please commit them to trunk if regression test passes.  You need to also
add LoongArch as a target supporting __float128 in extend.texi.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-18 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-08-18 at 14:39 +0800, chenxiaolong wrote:
> 在 2023-08-17四的 15:08 +,Joseph Myers写道:
> > On Thu, 17 Aug 2023, Xi Ruoyao via Gcc-patches wrote:
> > 
> > > So I guess we just need
> > > 
> > > builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > 
> > > etc. to map the "q" builtins to "f128" builtins if we really need
> > > the
> > > "q" builtins.
> > > 
> > > Joseph: the problem here is many customers of LoongArch CPUs wish
> > > to
> > > compile their old code with minimal change.  Is it acceptable to
> > > add
> > > these builtin_define's like rs6000-c.cc?  Note "a new architecture"
> > > does
> > > not mean we'll only compile post-C2x-era programs onto it.
> > 
> > The powerpc support for __float128 started in GCC 6, predating the
> > support 
> > for _FloatN type names, built-in functions etc. in GCC 7 - that's
> > why 
> > there's such backwards compatibility support there.  That name only
> > exists 
> > on a few architectures.
> > 
> > If people really want to compile code using the old __float128 names
> > for 
> > LoongArch I suppose you could have such #defines, but it would be
> > better 
> > for people to make their code use the standard names (as supported
> > from 
> > GCC 7 onwards, though only from GCC 13 in C++) and then put
> > backwards 
> > compatibility in their code for using the __float128 names if they
> > want to 
> > support the type with older GCC (GCC 6 or before for C; GCC 12 or
> > before 
> > for C++) on x86_64 / i386 / powerpc / ia64.  Such backwards
> > compatibility 
> > in user code is more likely to be relevant for C++ than for C, given
> > how 
> > the C++ support was added to GCC much more recently.  (Note: I
> > haven't 
> > checked when other compilers added support for the _Float128 name or
> > associated built-in functions, whether for C or for C++, which might
> > also 
> > affect when user code wants such compatibility.)
> > 
> Thank you for your valuable comments. On the LoongArch architecture,
> the "__float128" type is associated with float128_type_node and the "q"
> suffix function is mapped to the "f128" function. This allows
> compatibility with both "__float128" and "_Float128" types in the GCC
> compiler. The new code is modified as follows:
>   Add the following to the loongarch-builtins.c file:
> +lang_hooks.types.register_builtin_type (float128_type_node,
> "__float128");
>   Add the following to the loongarch-c.c file:
> +builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> +builtin_define ("__builtin_copysignq=__builtin_copysignf128");
> +builtin_define ("__builtin_nanq=__builtin_nanf128");
> +builtin_define ("__builtin_nansq=__builtin_nansf128");
> +builtin_define ("__builtin_infq=__builtin_inff128");
> +builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
> 
>  The regression tests of the six functions were added without problems.
> However, the implementation of the __builtin_nansq() function does not
> get the result we want. The questions are as follows:
>  x86_64:
>     _Float128 ret=__builtin_nansf128("NAN");
> 
>     compiled to (with gcc test.c -O2 ):
> .cfi_offset 1, -8
> bl  %plt(__builtin_nansf128)
>     ..
>  LoongArch:
>     _Float128 ret=__builtin_nansf128("NAN");
>   compiled to (with gcc test.c -O2 ):
> .cfi_offset 1, -8
> bl  %plt(__builtin_nansf128)

It seems wrong.  It should be "bl %plt(nansf128)" instead, without the
__builtin_ prefix so the implementation in libm (from Glibc) will be
used instead.  AFAIK __builtin_nan and __builtin_nans are rarely called
with a non-empty tagp so it's not worthy to inline the implementation
for non-empty tagp here.

The same issue happens on x86_64:

call__builtin_nansf128@PLT

__builtin_nanf128 compiles correct:

callnanf128@PLT

I'll see if there is a ticket in https://gcc.gnu.org/bugzilla.  If not
I'll create one.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-16 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-08-15 at 20:03 +, Joseph Myers wrote:
> On Tue, 15 Aug 2023, chenxiaolong wrote:
> 
> > In the implementation process, the "q" suffix function is
> >     Re-register and associate the "__float128" type with the
> >     "long double" type so that the compiler can handle the
> >     corresponding function correctly. The functions implemented
> >     include __builtin_{huge_valq infq, fabsq, copysignq, nanq,nansq}.
> >     On the LoongArch architecture, __builtin_{fabsq,copysignq} can
> >     be implemented with the instruction "bstrins.d", so that its
> >     optimization effect reaches the optimal value.
> 
> Why?  If long double has binary128 format, you shouldn't need any of these 
> functions at all; if it doesn't, just the C23 _Float128 type name and f128 
> constant suffix, and associated built-in functions defined in 
> builtins.def, should suffice (and since we now have _FloatN support for 
> C++, C++ no longer provides a reason for adding __float128 either).  
> __float128 is a legacy type name and feature and shouldn't be needed on 
> any new architectures, which can just use the standard type name from the 
> start.

For _Float128 GCC already does the correct thing:

_Float128 g(_Float128 x) { return __builtin_fabsf128(x); }

compiled to (with -O2):

g:
.LFB3 = .
.cfi_startproc
bstrpick.d  $r5,$r5,62,0
jr  $r1
.cfi_endproc

So I guess we just need

builtin_define ("__builtin_fabsq=__builtin_fabsf128");
builtin_define ("__builtin_nanq=__builtin_nanf128");

etc. to map the "q" builtins to "f128" builtins if we really need the
"q" builtins.

Joseph: the problem here is many customers of LoongArch CPUs wish to
compile their old code with minimal change.  Is it acceptable to add
these builtin_define's like rs6000-c.cc?  Note "a new architecture" does
not mean we'll only compile post-C2x-era programs onto it.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-15 Thread Xi Ruoyao via Gcc-patches
The implementation fails to handle this test case properly:

typedef double __attribute__((vector_size(32))) v4df;

void use1(double);

__attribute__((noipa)) double use(double)
{
register double x asm("f24") = 114.514;
__asm__("" : "+f" (x));
return x;
}

void test(void)
{
register v4df x asm("f24") = {1, 2, 3, 4};
__asm__("" : "+f" (x));
use(x[1]);
use1(x[3]);
}

Here use() attempts to save and restore f24, but it uses fst.d/fld.d,
clobbering the high 192 bits of xr24.  Now test() passes a wrong value
of x[3] to use1().

Note that saving and restoring f24 with xvst/xvld in use() won't really
fix the issue because in real life use() can be in another translation
unit (or even a shared library) compiled with -mno-lsx.  So it seems we
need to tell the compiler "a function call may clobber the high bits of
a vector register even if the corresponding floating-point register is
saved".  I'm not sure how to accomplish this...

On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote:
> This is an update of:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html
> 
> This version of patch set only introduces some small simplications of
> implementation. Because I missed the size limitation of mail size, the
> huge testsuite patches of v2 and v3 are not shown in the mail list.
> So,
> testsuite patches are splited from this patch set again and will be
> submitted 
> independently in the future.
> 
> Binutils-gdb introduced LSX/LASX support since 2.41 release:
> https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html
> 
> Brief history of patch set version:
> v1 -> v2:
> - Reduce usage of "unspec" in RTL template.
> - Append Support of ADDR_REG_REG in LSX and LASX.
> - Constraint docs are appended in gcc/doc/md.texi and ccomment block.
> - Codes related to vecarg are removed.
> - Testsuite of LSX and LASX is added in v2. (Because of the size
> limitation of
>   mail list, these patches are not shown)
> - Adjust the loongarch_expand_vector_init() function to reduce
> instruction 
>   output amount.
> - Some minor implementation changes of RTL templates.
> 
> v2 -> v3:
> - Revert vabsd/xvabsd RTL templates to unspec impl.
> - Resolve warning in gcc/config/loongarch/loongarch.cc when
> bootstrapping 
>   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
> - Remove redundant definitions in lasxintrin.h.
> - Refine commit info.
> 
> Lulu Cheng (6):
>   LoongArch: Add Loongson SX vector directive compilation framework.
>   LoongArch: Add Loongson SX base instruction support.
>   LoongArch: Add Loongson SX directive builtin function support.
>   LoongArch: Add Loongson ASX vector directive compilation framework.
>   LoongArch: Add Loongson ASX base instruction support.
>   LoongArch: Add Loongson ASX directive builtin function support.
> 
>  gcc/config.gcc    |    2 +-
>  gcc/config/loongarch/constraints.md   |  131 +-
>  .../loongarch/genopts/loongarch-strings   |    4 +
>  gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
>  gcc/config/loongarch/lasx.md  | 5122 
>  gcc/config/loongarch/lasxintrin.h | 5338
> +
>  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
>  gcc/config/loongarch/loongarch-c.cc   |   18 +
>  gcc/config/loongarch/loongarch-def.c  |    6 +
>  gcc/config/loongarch/loongarch-def.h  |    9 +-
>  gcc/config/loongarch/loongarch-driver.cc  |   10 +
>  gcc/config/loongarch/loongarch-driver.h   |    2 +
>  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
>  gcc/config/loongarch/loongarch-modes.def  |   39 +
>  gcc/config/loongarch/loongarch-opts.cc    |   89 +-
>  gcc/config/loongarch/loongarch-opts.h |    3 +
>  gcc/config/loongarch/loongarch-protos.h   |   35 +
>  gcc/config/loongarch/loongarch-str.h  |    3 +
>  gcc/config/loongarch/loongarch.cc | 4586 +-
>  gcc/config/loongarch/loongarch.h  |  117 +-
>  gcc/config/loongarch/loongarch.md |   56 +-
>  gcc/config/loongarch/loongarch.opt    |   12 +-
>  gcc/config/loongarch/lsx.md   | 4481 ++
>  gcc/config/loongarch/lsxintrin.h  | 5181 
>  gcc/config/loongarch/predicates.md    |  333 +-
>  gcc/doc/md.texi   |   11 +
>  26 files changed, 28668 insertions(+), 284 deletions(-)
>  create mode 100644 gcc/config/loongarch/lasx.md
>  create mode 100644 gcc/config/loongarch/lasxintrin.h
>  create mode 100644 gcc/config/loongarch/lsx.md
>  create mode 100644 gcc/config/loongarch/lsxintrin.h
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-15 Thread Xi Ruoyao via Gcc-patches
Please fix code style (this is the third time I say it and I'm really
frustrated now).  GCC is a project, it's not a student homework so style
matters.  And it's not so difficult to fix the style: for a new file you
can use "clang-format --style GNU -i filename.c" to do the work
automatically.

On Tue, 2023-08-15 at 18:39 +0800, chenxiaolong wrote:
> In the implementation process, the "q" suffix function is
>     Re-register and associate the "__float128" type with the
>     "long double" type so that the compiler can handle the
>     corresponding function correctly. The functions implemented
>     include __builtin_{huge_valq infq, fabsq, copysignq, nanq,nansq}.
>     On the LoongArch architecture, __builtin_{fabsq,copysignq} can
>     be implemented with the instruction "bstrins.d", so that its
>     optimization effect reaches the optimal value.
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch-builtins.cc (DEF_LARCH_FTYPE):
> (enum loongarch_builtin_type):Increases the type of the function.
> (FLOAT_BUILTIN_HIQ):__builtin_{huge_valq,infq}.
> (FLOAT_BUILTIN_FCQ):__builtin_{fabsq,copysignq}.
> (FLOAT_BUILTIN_NNQ):__builtin_{nanq,nansq}.
> (loongarch_init_builtins):
> (loongarch_fold_builtin):
> (loongarch_expand_builtin):
> * config/loongarch/loongarch-protos.h (loongarch_fold_builtin):
> (loongarch_c_mode_for_suffix):Add the declaration of the function.
> * config/loongarch/loongarch.cc (loongarch_c_mode_for_suffix):Add
>     the definition of the function.
> (TARGET_FOLD_BUILTIN):
> (TARGET_C_MODE_FOR_SUFFIX):
> * config/loongarch/loongarch.md (infq):Add an instruction template
>     to the machine description file to generate information such as
>     the icode used by the function and the constructor.
> ():
> (fabsq):
> (copysignq):
> 
> libgcc/ChangeLog:
> 
> * config/loongarch/t-softfp-tf:
> * config/loongarch/tf-signs.c: New file.
> ---
>  gcc/config/loongarch/loongarch-builtins.cc | 168 -
>  gcc/config/loongarch/loongarch-protos.h    |   2 +
>  gcc/config/loongarch/loongarch.cc  |  14 ++
>  gcc/config/loongarch/loongarch.md  |  69 +
>  libgcc/config/loongarch/t-softfp-tf    |   3 +
>  libgcc/config/loongarch/tf-signs.c |  59 
>  6 files changed, 313 insertions(+), 2 deletions(-)
>  create mode 100644 libgcc/config/loongarch/tf-signs.c
> 
> diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
> b/gcc/config/loongarch/loongarch-builtins.cc
> index b929f224dfa..2fb0fde0e3f 100644
> --- a/gcc/config/loongarch/loongarch-builtins.cc
> +++ b/gcc/config/loongarch/loongarch-builtins.cc
> @@ -36,6 +36,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "fold-const.h"
>  #include "expr.h"
>  #include "langhooks.h"
> +#include "calls.h"
> +#include "explow.h"
>  
>  /* Macros to create an enumeration identifier for a function prototype.  */
>  #define LARCH_FTYPE_NAME1(A, B) LARCH_##A##_FTYPE_##B
> @@ -48,9 +50,18 @@ enum loongarch_function_type
>  #define DEF_LARCH_FTYPE(NARGS, LIST) LARCH_FTYPE_NAME##NARGS LIST,
>  #include "config/loongarch/loongarch-ftypes.def"
>  #undef DEF_LARCH_FTYPE
> +  LARCH_BUILTIN_HUGE_VALQ,
> +  LARCH_BUILTIN_INFQ,
> +  LARCH_BUILTIN_FABSQ,
> +  LARCH_BUILTIN_COPYSIGNQ,
> +  LARCH_BUILTIN_NANQ,
> +  LARCH_BUILTIN_NANSQ,
>    LARCH_MAX_FTYPE_MAX
>  };
>  
> +/* Count the number of functions with "q" as the suffix.  */
> +const int MATHQ_NUMS = (int)LARCH_MAX_FTYPE_MAX - 
> (int)LARCH_BUILTIN_HUGE_VALQ;
> +
>  /* Specifies how a built-in function should be converted into rtl.  */
>  enum loongarch_builtin_type
>  {
> @@ -63,6 +74,15 @@ enum loongarch_builtin_type
>   value and the arguments are mapped to operands 0 and above.  */
>    LARCH_BUILTIN_DIRECT_NO_TARGET,
>  
> + /* The function corresponds to  __builtin_{huge_valq,infq}.  */
> +  LARCH_BUILTIN_HIQ_DIRECT,
> +
> + /* The function corresponds to  __builtin_{fabsq,copysignq}.  */
> +  LARCH_BUILTIN_FCQ_DIRECT,
> +
> +  /* Define the type of the __builtin_{nanq,nansq} function.  */
> +  LARCH_BUILTIN_NNQ_DIRECT
> +
>  };
>  
>  /* Declare an availability predicate for built-in functions that require
> @@ -136,6 +156,24 @@ AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI)
>    LARCH_BUILTIN (INSN, #INSN, LARCH_BUILTIN_DIRECT_NO_TARGET, \
>  FUNCTION_TYPE, AVAIL)
>  
> +/* Define an float to do funciton {huge_valq,infq}.  */
> +#define FLOAT_BUILTIN_HIQ (INSN, FUNCTION_TYPE)  \
> +    { CODE_FOR_ ## INSN, \
> +    "__builtin_" #INSN,  LARCH_BUILTIN_HIQ_DIRECT,    \
> +    FUNCTION_TYPE, loongarch_builtin_avail_default }
> +
> +/* Define an float to do funciton {fabsq,copysignq}.  */
> +#define FLOAT_BUILTIN_FCQ (INSN, FUNCTION_TYPE)  \
> +    { CODE_FOR_ ## INSN,

Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout

2023-08-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 19:16 +0800, Xi Ruoyao wrote:
> On Mon, 2023-08-14 at 18:18 +0800, Yujie Yang wrote:
> > On Mon, Aug 14, 2023 at 03:48:53PM +0800, Xi Ruoyao wrote:
> > > On Mon, 2023-08-14 at 15:37 +0800, Yujie Yang wrote:
> > > > On Mon, Aug 14, 2023 at 01:38:40PM +0800, Xi Ruoyao wrote:
> > > > > On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> > > > > 
> > > > > > However, for LoongArch, we do not want such a "toplevel" library
> > > > > > installation since the default ABI may change.  We expect all
> > > > > > multilib variants of libraries to be installed to their designated
> > > > > > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that
> > > > > > the default ABI can be configured arbitrarily (with --with-abi)
> > > > > > while the gcc libdir layout stays consistent.  This could be
> > > > > > helpful for the distribution packaging of GCC libraries.
> > > > > 
> > > > > Have you tested a --disable-multilib configuration?  To me with --
> > > > > disable-configuration everything should be still in the toplevel
> > > > > directory, not any sub-directory.
> > > > 
> > > > That's a good point, sorry I missed --disable-multilib here.
> > > > 
> > > > However, you don't really need --disable-multilib since
> > > > the libraries are only built once in the default ABI configuration
> > > > as long as --with-multilib-list does not request anything more than
> > > > that.
> > > > 
> > > > Maybe we should force-enabling multilib in all cases.
> > > 
> > > I really don't like this.  Why must I always remind my self "hey, this
> > > is LoongArch, there is a different directory layout" when I don't need
> > > multilib at all?
> > > 
> > 
> > AFAIK, the two main uses of the multisubdir layout are in the C++
> > header directory and the GCC libdir (where libgcc.a resides), respectively.
> > The GCC libdir is fine since they are private to a user's GCC build.
> > However, the C++ header directory is shared across the system unless
> > an alternative sysroot is chosen, so the consisentency of the multilib
> > layout matters.
> 
> The C++ header directory should also be considered private to the GCC
> build.  AFAIK no distro supports "overwriting a part of the system", so
> you cannot just install a custom GCC build and overwrite the system C++
> header directory.  For a cross compiler, the C++ header directory is
> $prefix/$target_triple/include/c++/$gcc_version/$multi_dir, the C++
> header in $sysroot/usr/include/c++ (if it ever exists) will not be used
> at all.
> 
> > So theoretically, the toplevel libraries should have the same ABI under
> > the the target triplet.  However, for many architectures, the
> > "--with-abi + MULTILIB_DEFAULT" scheme may cause the toplevel to be
> > configured to have different meanings.
> 
> https://gcc.gnu.org/PR104085 is an example of the issue caused by the
> different meaning.
> 
> > So I think it's also a reasonable approach that we just simply eliminate
> > the ambiguous toplevel libraries and use a symmetric layout instead.
> 
> I don't like the inconsistency among different GCC ports.  If all ports
> use the same approach I'll not object.

I came up with another idea. What if we:

1. Keep the "default" ABI libs in the toplevel directory. There is
*always* a default ABI so treating it specially is not really nonsense.
2. Create a symlink for consistency. For example, if --with-abi=lp64d, -
-with-multilib-list=lp64d,lp64s:

 * /usr/lib/gcc/loongarch64-linux-gnu/14.0.0 contains the lp64d
   libraries.
 * /usr/lib/gcc/loongarch64-linux-gnu/14.0.0/lp64s contains the lp64s
   libraries.
 * /usr/lib/gcc/loongarch64-linux-gnu/14.0.0/lp64d is a symlink to "."

Then we can refer to the lp64d libgcc.a with both
/usr/lib/gcc/loongarch64-linux-gnu/14.0.0/lp64d/libgcc.a, and
/usr/lib/gcc/loongarch64-linux-gnu/14.0.0/libgcc.a.

For referring to the default multilib, the non-suffixed
/usr/lib/gcc/loongarch64-linux-gnu/14.0.0 path should be used; for
referring lp64d (no matter what the default is),
/usr/lib/gcc/loongarch64-linux-gnu/14.0.0/lp64d should be used.

The symlink can be created by the GCC building system or manually by the
distro maintainer (or gcc packager).

Thoughts?

-- 
Xi Ruoyao  School of Aerospace Science and
Technology, Xidian University


Re: [PATCH v4 1/6] LoongArch: Add Loongson SX vector directive compilation framework.

2023-08-14 Thread Xi Ruoyao via Gcc-patches
I guess there is a merge conflict with Yujie's "-msimd=" patch and you
may need to collaborate to resolve it.  Maybe just add -msimd in this
series.

On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote:
> From: Lulu Cheng 
> 
> gcc/ChangeLog:
> 
> * config/loongarch/genopts/loongarch-strings: Add compilation 
> framework.
> * config/loongarch/genopts/loongarch.opt.in: Ditto.
> * config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
> * config/loongarch/loongarch-def.c: Ditto.
> * config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
> (ISA_EXT_SIMD_LSX): Ditto.
> (N_SWITCH_TYPES): Ditto.
> (SW_LSX): Ditto.
> (struct loongarch_isa): Ditto.
> * config/loongarch/loongarch-driver.cc (APPEND_SWITCH): Ditto.
> (driver_get_normalized_m_opts): Ditto.
> * config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
> Ditto.
> * config/loongarch/loongarch-opts.cc (loongarch_config_target): Ditto.
> (isa_str): Ditto.
> * config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
> * config/loongarch/loongarch-str.h (OPTSTR_LSX): Ditto.
> * config/loongarch/loongarch.opt: Ditto.

/* snip */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout

2023-08-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 18:18 +0800, Yujie Yang wrote:
> On Mon, Aug 14, 2023 at 03:48:53PM +0800, Xi Ruoyao wrote:
> > On Mon, 2023-08-14 at 15:37 +0800, Yujie Yang wrote:
> > > On Mon, Aug 14, 2023 at 01:38:40PM +0800, Xi Ruoyao wrote:
> > > > On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> > > > 
> > > > > However, for LoongArch, we do not want such a "toplevel" library
> > > > > installation since the default ABI may change.  We expect all
> > > > > multilib variants of libraries to be installed to their designated
> > > > > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that
> > > > > the default ABI can be configured arbitrarily (with --with-abi)
> > > > > while the gcc libdir layout stays consistent.  This could be
> > > > > helpful for the distribution packaging of GCC libraries.
> > > > 
> > > > Have you tested a --disable-multilib configuration?  To me with --
> > > > disable-configuration everything should be still in the toplevel
> > > > directory, not any sub-directory.
> > > 
> > > That's a good point, sorry I missed --disable-multilib here.
> > > 
> > > However, you don't really need --disable-multilib since
> > > the libraries are only built once in the default ABI configuration
> > > as long as --with-multilib-list does not request anything more than
> > > that.
> > > 
> > > Maybe we should force-enabling multilib in all cases.
> > 
> > I really don't like this.  Why must I always remind my self "hey, this
> > is LoongArch, there is a different directory layout" when I don't need
> > multilib at all?
> > 
> 
> AFAIK, the two main uses of the multisubdir layout are in the C++
> header directory and the GCC libdir (where libgcc.a resides), respectively.
> The GCC libdir is fine since they are private to a user's GCC build.
> However, the C++ header directory is shared across the system unless
> an alternative sysroot is chosen, so the consisentency of the multilib
> layout matters.

The C++ header directory should also be considered private to the GCC
build.  AFAIK no distro supports "overwriting a part of the system", so
you cannot just install a custom GCC build and overwrite the system C++
header directory.  For a cross compiler, the C++ header directory is
$prefix/$target_triple/include/c++/$gcc_version/$multi_dir, the C++
header in $sysroot/usr/include/c++ (if it ever exists) will not be used
at all.

> So theoretically, the toplevel libraries should have the same ABI under
> the the target triplet.  However, for many architectures, the
> "--with-abi + MULTILIB_DEFAULT" scheme may cause the toplevel to be
> configured to have different meanings.

https://gcc.gnu.org/PR104085 is an example of the issue caused by the
different meaning.

> So I think it's also a reasonable approach that we just simply eliminate
> the ambiguous toplevel libraries and use a symmetric layout instead.

I don't like the inconsistency among different GCC ports.  If all ports
use the same approach I'll not object.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 2/6] LoongArch: improved target configuration interface

2023-08-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 16:57 +0800, Yujie Yang wrote:
> On Mon, Aug 14, 2023 at 04:49:11PM +0800, Xi Ruoyao wrote:
> > On Mon, 2023-08-14 at 16:44 +0800, Yujie Yang wrote:
> > > I assume we all want:
> > > 
> > >  (1) -mlasx -mlsx -> enable LSX and LASX
> > >  (2) -mlasx -mno-lsx -> disable LSX and LASX
> > >  (3) -mno-lsx -mlasx -> enable LSX and LASX
> > 
> > Yes.
> > 
> > > Unless we declare -mlsx / -mlasx as driver deferred, AFAIK there is no 
> > > other way for
> > > us to know the actual order of appearnce of all -m[no-]l[a]sx options on 
> > > the command
> > > line.  All we know from GCC's option system would be a final on/off state 
> > > of "lsx"
> > > and a final on/off state of "lasx".
> > 
> > But x86 does this correct;
> > 
> > $ echo __AVX__ + __AVX2__ | LANG= cpp -E -mno-avx -mavx2
> > # 0 ""
> > # 0 ""
> > # 0 ""
> > # 1 "/usr/include/stdc-predef.h" 1 3 4
> > # 0 "" 2
> > # 1 ""
> > 1 + 1
> > 
> > so there must be a way to handle this...
> > 
> > -- 
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University
> 
> Emm... What happens if you reverse the order?
> 
> $ echo __AVX__ + __AVX2__ | LANG= cpp -E -mavx2 -mno-avx
> 
> Anyways, I believe there may be other ways to implement this, but it would
> require equally much effort (or even much more) that the current approach.
> Especially considering the possiblity of future updates -- we now have a
> framework for this sort of things.
> 
> Meanwhile you confortably can stay away from -msimd= and use only
> -mlsx / -mlasx. So...a matter of style maybe?

I'm OK with that, but we need to document it clearly in invoke.texi.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 2/6] LoongArch: improved target configuration interface

2023-08-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 16:44 +0800, Yujie Yang wrote:
> I assume we all want:
> 
>  (1) -mlasx -mlsx -> enable LSX and LASX
>  (2) -mlasx -mno-lsx -> disable LSX and LASX
>  (3) -mno-lsx -mlasx -> enable LSX and LASX

Yes.

> Unless we declare -mlsx / -mlasx as driver deferred, AFAIK there is no other 
> way for
> us to know the actual order of appearnce of all -m[no-]l[a]sx options on the 
> command
> line.  All we know from GCC's option system would be a final on/off state of 
> "lsx"
> and a final on/off state of "lasx".

But x86 does this correct;

$ echo __AVX__ + __AVX2__ | LANG= cpp -E -mno-avx -mavx2
# 0 ""
# 0 ""
# 0 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "" 2
# 1 ""
1 + 1

so there must be a way to handle this...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout

2023-08-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 15:37 +0800, Yujie Yang wrote:
> On Mon, Aug 14, 2023 at 01:38:40PM +0800, Xi Ruoyao wrote:
> > On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> > 
> > > However, for LoongArch, we do not want such a "toplevel" library
> > > installation since the default ABI may change.  We expect all
> > > multilib variants of libraries to be installed to their designated
> > > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that
> > > the default ABI can be configured arbitrarily (with --with-abi)
> > > while the gcc libdir layout stays consistent.  This could be
> > > helpful for the distribution packaging of GCC libraries.
> > 
> > Have you tested a --disable-multilib configuration?  To me with --
> > disable-configuration everything should be still in the toplevel
> > directory, not any sub-directory.
> 
> That's a good point, sorry I missed --disable-multilib here.
> 
> However, you don't really need --disable-multilib since
> the libraries are only built once in the default ABI configuration
> as long as --with-multilib-list does not request anything more than
> that.
> 
> Maybe we should force-enabling multilib in all cases.

I really don't like this.  Why must I always remind my self "hey, this
is LoongArch, there is a different directory layout" when I don't need
multilib at all?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 2/6] LoongArch: improved target configuration interface

2023-08-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 13:58 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> > * Support options for LoongArch SIMD extensions:
> >   new configure options --with-simd={none,lsx,lasx};
> >   new driver options -m[no]-l[a]sx / -msimd={none,lsx,lasx}.
> 
> I suggest to rename --with-simd= to --with-ext= and accept a comma-
> separated ISA extension list, because we have non-SIMD ISA extensions.
> For example, "--with-ext=lasx,lbt" will make -mlasx, -mlsx (implied),
> and -mlbt the default.  I prefer "-mlasx" over "-msimd=lasx" because "-
> mlasx" is shorter anyway (if there is no real reason to make -mlasx and
> -msimd=lasx two different things).

Perhaps just "--with-feature" or "--with-loongarch-feature", then we can
even fold -mstrict-align here, like "--with-feature=lbt,strict-align".


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 2/6] LoongArch: improved target configuration interface

2023-08-13 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> * Support options for LoongArch SIMD extensions:
>   new configure options --with-simd={none,lsx,lasx};
>   new driver options -m[no]-l[a]sx / -msimd={none,lsx,lasx}.

I suggest to rename --with-simd= to --with-ext= and accept a comma-
separated ISA extension list, because we have non-SIMD ISA extensions. 
For example, "--with-ext=lasx,lbt" will make -mlasx, -mlsx (implied),
and -mlbt the default.  I prefer "-mlasx" over "-msimd=lasx" because "-
mlasx" is shorter anyway (if there is no real reason to make -mlasx and
-msimd=lasx two different things).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout

2023-08-13 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 13:38 +0800, Xi Ruoyao wrote:
> 
> > However, for LoongArch, we do not want such a "toplevel" library
> > installation since the default ABI may change.  We expect all
> > multilib variants of libraries to be installed to their designated
> > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that
> > the default ABI can be configured arbitrarily (with --with-abi)
> > while the gcc libdir layout stays consistent.  This could be
> > helpful for the distribution packaging of GCC libraries.
> 
> Have you tested a --disable-multilib configuration?  To me with --
> disable-configuration everything should be still in the toplevel

I mean --disable-multilib configuration, not "--disable-configuration".

> directory, not any sub-directory.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout

2023-08-13 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:

> However, for LoongArch, we do not want such a "toplevel" library
> installation since the default ABI may change.  We expect all
> multilib variants of libraries to be installed to their designated
> ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that
> the default ABI can be configured arbitrarily (with --with-abi)
> while the gcc libdir layout stays consistent.  This could be
> helpful for the distribution packaging of GCC libraries.

Have you tested a --disable-multilib configuration?  To me with --
disable-configuration everything should be still in the toplevel
directory, not any sub-directory.

/* snip */

> ChangeLog:
> 
>     * config-ml.in: add loongarch support.  Allow overriding

Use a tab, not 8 white spaces.  Likewise for all patches in the series.

>     toplevel multisubdir.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 2/6] LoongArch: improved target configuration interface

2023-08-13 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> loongarch64)
> -   tune_pattern="loongarch64|la464"
> -   tune_default="la464"
> +   tune_pattern="native|abi-default|loongarch64|la464"

I think we can remove tune_pattern completely.  There is no reason to
limit --with-tune setting based on --with-arch setting.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 3/6] LoongArch: define preprocessing macros "__loongarch_{arch,tune}"

2023-08-13 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> These are exported according to the LoongArch Toolchain Conventions[1]
> as a replacement of the obsolete "_LOONGARCH_{ARCH,TUNE}" macros,
> which are expanded to strings representing the actual architecture
> and microarchitecture of the target.
> 
> [1] currently relased at https://github.com/loongson/LoongArch-Documentation
>     /blob/main/docs/LoongArch-toolchain-conventions-EN.adoc
> 
> gcc/ChangeLog:
> 
>     * gcc/config/loongarch/loongarch-c.cc: Export macros
>     "__loongarch_{arch,tune}" in the preprocessor.

Ok.  I think this can be applied anyway (regardless of other patches).

> ---
>  gcc/config/loongarch/loongarch-c.cc | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/config/loongarch/loongarch-c.cc 
> b/gcc/config/loongarch/loongarch-c.cc
> index 660c68f0e06..7bee037cc4a 100644
> --- a/gcc/config/loongarch/loongarch-c.cc
> +++ b/gcc/config/loongarch/loongarch-c.cc
> @@ -64,6 +64,9 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
>    LARCH_CPP_SET_PROCESSOR ("_LOONGARCH_ARCH", la_target.cpu_arch);
>    LARCH_CPP_SET_PROCESSOR ("_LOONGARCH_TUNE", la_target.cpu_tune);
>  
> +  LARCH_CPP_SET_PROCESSOR ("__loongarch_arch", la_target.cpu_arch);
> +  LARCH_CPP_SET_PROCESSOR ("__loongarch_tune", la_target.cpu_tune);
> +
>    /* Base architecture / ABI.  */
>    if (TARGET_64BIT)
>  {

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 2/6] LoongArch: improved target configuration interface

2023-08-13 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> The configure script and the GCC driver are updated so that
> it is easier to customize and control GCC builds for targeting
> different LoongArch implementations.
> 
> * Support options for LoongArch SIMD extensions:
>   new configure options --with-simd={none,lsx,lasx};
>   new driver options -m[no]-l[a]sx / -msimd={none,lsx,lasx}.

What's the relationship between -mlasx and -msimd=lasx?  What will
happen if the user specifies -mlasx -msimd=none or -mlasx -msimd=lsx?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 4/6] LoongArch: use -mstrict-align by default when building libraries

2023-08-13 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote:
> LoongArch processors may not support memory accesses without natural
> alignments.  Building libraries with -mstrict-align may help with
> toolchain binary compatiblity and performance on these implementations
> (e.g. Loongson 2K1000LA).

I don't think it's a good idea.  You should provide a configuration-time
option (maybe named --with-strict-align) to make -mstrict-align the
default instead, thus both the libraries and the compiled user code will
be suitable for 2K1000.

> With this patch, no significant performance degredation is observed on
> current mainstream LoongArch processors.
> 
> gcc/ChangeLog:
> 
>     * gcc/config/t-linux: add -mstrict-align via self_specs
>     when building GCC libraries.
> ---
>  gcc/config/loongarch/t-linux | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/config/loongarch/t-linux b/gcc/config/loongarch/t-linux
> index 75bb430c555..2a170d600a9 100644
> --- a/gcc/config/loongarch/t-linux
> +++ b/gcc/config/loongarch/t-linux
> @@ -35,6 +35,9 @@ gen_mlib_spec = $(if $(word 2,$1),\
>  # clean up the result of DRIVER_SELF_SPEC to avoid conflict
>  lib_build_self_spec  = %  
> +# build libraries with -mstrict-align by default
> +lib_build_self_spec += -mstrict-align
> +
>  # append user-specified build options from --with-multilib-list
>  lib_build_self_spec += $(foreach mlib,$(subst $(comma), 
> ,$(TM_MULTILIB_CONFIG)),\
> $(call gen_mlib_spec,$(subst /, ,$(mlib

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-08-11 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-10 at 15:04 +0200, Stefan Schulze Frielinghaus via Gcc-
patches wrote:
> In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
> completely missed the fact that the normal form of a generated constant for a
> mode with fewer bits than in HOST_WIDE_INT is a sign extended version of the
> actual constant.  This even holds true for unsigned constants.
> 
> Fixed by masking out the upper bits for the incoming constant and sign
> extending the resulting unsigned constant.
> 
> Bootstrapped and regtested on x64 and s390x.  Ok for mainline?

The patch fails to apply:

patching file gcc/combine.cc
Hunk #1 FAILED at 11923.
Hunk #2 FAILED at 11962.

It looks like some indents are tabs in the source file, but white spaces
in the patch.

> While reading existing optimizations in combine I stumbled across two
> optimizations where either my intuition about the representation of
> unsigned integers via a const_int rtx is wrong, which then in turn would
> probably also mean that this patch is wrong, or that the optimizations
> are missed sometimes.  In other words in the following I would assume
> that the upper bits are masked out:
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 468b7fde911..80c4ff0fbaf 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>    /* (unsigned) < 0x8000 is equivalent to >= 0.  */
>    else if (is_a  (mode, _mode)
>    && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
> -  && ((unsigned HOST_WIDE_INT) const_op
> +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> (int_mode))
>    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 
> 1)))
>     {
>   const_op = 0;
> @@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>    /* (unsigned) >= 0x8000 is equivalent to < 0.  */
>    else if (is_a  (mode, _mode)
>    && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
> -  && ((unsigned HOST_WIDE_INT) const_op
> +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> (int_mode))
>    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 
> 1)))
>     {
>   const_op = 0;
> 
> For example, while bootstrapping on x64 the optimization is missed since
> a LTU comparison in QImode is done and the constant equals
> 0xff80.
> 
> Sorry for inlining another patch, but I would really like to make sure
> that my understanding is correct, now, before I come up with another
> patch.  Thus it would be great if someone could shed some light on this.
> 
> gcc/ChangeLog:
> 
> * combine.cc (simplify_compare_const): Properly handle unsigned
> constants while narrowing comparison of memory and constants.
> ---
>  gcc/combine.cc | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index e46d202d0a7..468b7fde911 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>    && !MEM_VOLATILE_P (op0)
>    /* The optimization makes only sense for constants which are big enough
>  so that we have a chance to chop off something at all.  */
> -  && (unsigned HOST_WIDE_INT) const_op > 0xff
> -  /* Bail out, if the constant does not fit into INT_MODE.  */
> -  && (unsigned HOST_WIDE_INT) const_op
> -    < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 1) - 
> 1)
> +  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode)) > 
> 0xff
>    /* Ensure that we do not overflow during normalization.  */
> -  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
> HOST_WIDE_INT_M1U))
> +  && (code != GTU
> + || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode))
> +    < HOST_WIDE_INT_M1U)
> +  && trunc_int_for_mode (const_op, int_mode) == const_op)
>  {
> -  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
> +  unsigned HOST_WIDE_INT n
> +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
>    enum rtx_code adjusted_code;
>  
>    /* Normalize code to either LEU or GEU.  */
> @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
> HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
> HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode),
> GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
> -   (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME 
> (adjusted_code),
> -   n);
> +   (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode),
> +   GET_RTX_NAME (adjusted_code), n);
>     }
>   

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-09 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote:
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
> 
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
> 
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?

And should we set -mno-gather as the default for GDS affected
processors?  We'll likely apply the ucode update for them, and then the
gathering instructions will be much slower.

> gcc/ChangeLog:
> 
> * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> GATHER_SCATTER_DRIVER_SELF_SPECS.
> (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> * config/i386/i386.opt (mgather): New option.
> (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++-
>  gcc/config/i386/i386.opt |  8 
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>  
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  "%{mno-gather:-mtune-
> ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-
> ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   %{mno-scatter:-mtune-
> ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> +   %{mscatter:-mtune-
> ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>  
>  /* -march=native handling only makes sense with compiler running on
>     an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>  
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.
> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger
> Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2 11/14] LoongArch: Mark am* instructions as LA64-only

2023-08-09 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-08-09 at 19:46 +0800, Jiajie Chen wrote:
> LoongArch32 only provides basic ll/sc instructions for atomic
> operations. Mark am* atomic instructions as 64-bit only.

I'd prefer using a different symbol, say TARGET_LOONGARCH_AM here.  Then
it would be easier to adjust the code if we have a LA32 core with am*
support in the future.  For now we can just
#define TARGET_LOONGARCH_AM TARGET_64BIT.

> gcc/ChangeLog:
> 
> * config/loongarch.sync.md: Guard am* atomic insns by
> TARGET_64BIT.
> ---
>  gcc/config/loongarch/sync.md | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
> index 9924d522bcd..151b553bcc6 100644
> --- a/gcc/config/loongarch/sync.md
> +++ b/gcc/config/loongarch/sync.md
> @@ -77,7 +77,7 @@
>    [(match_operand:GPR 1 "reg_or_0_operand" "rJ")
>     (match_operand:SI 2 "const_int_operand")]  ;; model
>    UNSPEC_ATOMIC_STORE))]
> -  ""
> +  "TARGET_64BIT"
>    "amswap%A2.\t$zero,%z1,%0"
>    [(set (attr "length") (const_int 8))])
>  
> @@ -88,7 +88,7 @@
>    (match_operand:GPR 1 "reg_or_0_operand" "rJ"))
>    (match_operand:SI 2 "const_int_operand")] ;; model
>  UNSPEC_SYNC_OLD_OP))]
> -  ""
> +  "TARGET_64BIT"
>    "am%A2.\t$zero,%z1,%0"
>    [(set (attr "length") (const_int 8))])
>  
> @@ -101,7 +101,7 @@
>  (match_operand:GPR 2 "reg_or_0_operand" "rJ"))
>    (match_operand:SI 3 "const_int_operand")] ;; model
>  UNSPEC_SYNC_OLD_OP))]
> -  ""
> +  "TARGET_64BIT"
>    "am%A3.\t%0,%z2,%1"
>    [(set (attr "length") (const_int 8))])
>  
> @@ -113,7 +113,7 @@
>   UNSPEC_SYNC_EXCHANGE))
>     (set (match_dup 1)
> (match_operand:GPR 2 "register_operand" "r"))]
> -  ""
> +  "TARGET_64BIT"
>    "amswap%A3.\t%0,%z2,%1"
>    [(set (attr "length") (const_int 8))])
>  
> @@ -182,7 +182,7 @@
>    [(match_operand:QI 0 "register_operand" "") ;; bool output
>     (match_operand:QI 1 "memory_operand" "+ZB")    ;; memory
>     (match_operand:SI 2 "const_int_operand" "")]   ;; model
> -  ""
> +  "TARGET_64BIT"
>  {
>    /* We have no QImode atomics, so use the address LSBs to form a mask,
>   then use an aligned SImode atomic.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2 01/14] LoongArch: Introduce loongarch32 target

2023-08-09 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-08-09 at 19:46 +0800, Jiajie Chen wrote:
> +  builtin_define ("_ABILP32=3");
> +  builtin_define ("_LOONGARCH_SIM=_ABILP32");

Let's remove them.  These MIPS-style definitions are deprecated:
https://github.com/loongson/LoongArch-Documentation/pull/28.

Unfortunately for LP64 ABI _ABILP64 is already a part of public API. 
I've tried to raise a deprecation warning for them, but it seems doing
so needs a major change in libcpp...  However ILP32 ABI is "fresh new"
so we should take the advantage to remove the historic burden.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 2/9] LoongArch: Fix default ISA setting

2023-08-08 Thread Xi Ruoyao via Gcc-patches
On Sun, 2023-08-06 at 20:49 +0800, Jiajie Chen via Gcc-patches wrote:
> When loongarch_arch_target is called, la_target has not been
> initialized, thus the macro LARCH_ACTUAL_ARCH always equals to zero.
> 
> This commit fixes by expanding the macro and reading the latest value.
> It permits -march=loongarch64 when the default target is loongarch32 and
> vice versa.
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch-opts.cc (loongarch_config_target):
>   Fix -march detection.

Nit: the first letter 'F' of the second line should align with '*' of
the first line, not 'c'.

/* snip */

> diff --git a/gcc/testsuite/gcc.target/loongarch/arch-3.c 
> b/gcc/testsuite/gcc.target/loongarch/arch-3.c
> new file mode 100644
> index 000..543b93883bd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/arch-3.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=loongarch64 -mabi=ilp32d" } */
> +int foo()
> +{
> +}
> +/* { dg-error "unable to implement ABI 'ilp32d' with instruction set 
> 'la64/fpu64'" "" { target *-*-* } 0 } */

This is just wrong.  It's absolutely possible to implement ilp32d with
la64/fpu64.  LoongArch *.w instructions are always 32-bit operations, no
matter on LA32 or LA64.  They are different from RISC-V where many
instructions operate on 32-bit integers on RV32 but 64-bit integers on
RV64.

If you don't want to spend your time to implement it you should use
`sorry ("%<-mabi=ilp32d%> is not implemented for la64");` instead.

Yes, I know there is some (mis)uses of TARGET_64BIT in the
config/loongarch code where TARGET_ABI_LP64 should be actually used
instead.  They are bugs preventing us from implementing -mabi=ilp32d -
march=loongarch64 and they should be fixed.  They are not our excuse to
blindly "simulate" what RISC-V has.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-07 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-08-08 at 10:24 +0800, Xi Ruoyao wrote:

And I think this way to implement these functions (using libgcc calls)
is not the best.

On 64-bit LoongArch a __float128 is stored in a pair of GPR, so
operations like copysignq and absq can be implemented much more
efficiently by expanding them using bstrins and bstrpick instructions in
the compiler.  For example:

__float128 
absq (__float128 val)
{
  return __builtin_absq (val);
}

should be compiled to:

bstrins.d $a1, $zero, 63, 63
jr $ra

Instead of

b __fabstf2

(perhaps, unless -Os).

> > +__float128 nanq(const char * str)

Using "nanq" as the symbol name is unacceptable as well.  Use "__nanq"
or something.  "nanq" is not reserved for implementation, so it may
cause a conflict in the future if "nanq" finally become a standard
function or the users defines their own "nanq" function.
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-07 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-08-08 at 10:09 +0800, chenxiaolong wrote:
> +/* Count the number of functions with "q" as the suffix.  */
> +const int MATHQ_NUMS=(int)LARCH_MAX_FTYPE_MAX-(int)LARCH_BUILTIN_HUGE_VALQ;

Format issue still not fixed.

> +__float128 nanq  (const char * str)
> +{
> +  union _FP_UNION_Q nan;
> +  nan.bits.frac0 = 0;
> +  nan.bits.frac1 = 0;
> +  nan.bits.exp = 0x7FFF;
> +  nan.bits.sign = 1;
> +  if (str  !=  NULL && strlen (str) > 0)
> +  return nan.flt;
> +  return 0;
> +}

I don't think the logic is correct.  __builtin_nanq("") should return a
NaN, not 0.

> +  if (str  !=  NULL && strlen (str) > 0)
> +  return nan.flt;

Indent is 2, not 4.

And we don't need to check "str != NULL" here.  Calling nan()-family
functions with a null tagp is deemed undefined behavior.

> +__float128 nansq (const char *str)
> +{
> +  union _FP_UNION_Q nan;
> +  nan.bits.frac0 = 0;
> +  nan.bits.frac1 = 0;
> +  nan.bits.exp = 0x7FFF;
> +  nan.bits.sign = 1;
> +  if (str != NULL && strlen (str) > 0)
> +  return nan.flt;
> +  return 0;
> +}

Same logic error.  And this seems exactly same as nanq, the analogous is
definitely wrong because __builtin_nanq should return a quiet NaN, but
__builtin_nansq should return a signaling NaN.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-06 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-07 at 12:01 +0800, chenxiaolong wrote:
> +/* Count the number of functions with "q" as the suffix */
> +static int MATHQ_NUMS=(int)LARCH_MAX_FTYPE_MAX-(int)LARCH_BUILTIN_HUGE_VALQ;

This is obviously not the GCC coding standard...  It should have some
white spaces:

static int MATHQ_NUMS = (int)LARCH_MAX_FTYPE_MAX - (int)LARCH_BUILTIN_HUGE_VALQ;

And I guess this variable should be declared const.

> +/* Define an float to do funciton huge_valq*/
> +#define FLOAT_BUILTIN_HUGE(INSN, FUNCTION_TYPE)   \
> +{ CODE_FOR_ ## INSN,   \
> +"__builtin_" #INSN,  LARCH_BUILTIN_HUGE_DIRECT,\
> +FUNCTION_TYPE, loongarch_builtin_avail_default }

/* snip */

> +/* Define an float to do funciton nansq*/
> +#define FLOAT_BUILTIN_NANSQ(INSN, FUNCTION_TYPE)  \
> +{ CODE_FOR_ ## INSN,   \
> +"__builtin_" #INSN,  LARCH_BUILTIN_NANSQ_DIRECT,   \
> +FUNCTION_TYPE, loongarch_builtin_avail_default }

What's the point to define these macros each is only used once?

> +  tree type,ftype;
> +  tree const_string_type
> + 
> =build_pointer_type(build_qualified_type(char_type_node,TYPE_QUAL_CONST));

Really bad format.  In GNU coding standard you should have a white space
after '=', and before '(', etc.  Please fix the formatting everywhere.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 16:58 +0300, Alexander Monakov wrote:
> 
> On Fri, 21 Jul 2023, Xi Ruoyao via Gcc-patches wrote:
> 
> > Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
> > are building GCC 14 snapshot).  The default is "fast" (if no -std=
> > option is used), which allows some contractions disallowed by the
> > standard.
> 
> Not fully, see below.
> 
> > But GCC is in C++ and I'm not sure if the C++ standard has the same
> > definition for allowed contractions as C.
> 
> It doesn't, but in GCC we should aim to provide the same semantics in C++
> as in C.
> 
> > > (Or is the severity of lack of support sufficiently different in the two 
> > > cases that this is fine -- i.e. not compile vs may trigger floating 
> > > point rounding inaccuracies?)
> > 
> > It's possible that the test itself is flaky.  Can you provide some
> > detail about how it fails?
> 
> See also PR 99903 for an earlier known issue which appears due to x87
> excess precision and so tweaking -ffp-contract wouldn't help:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903

Does it affect AArch64 too?

> Now that multiple platforms are hitting this, can we _please_ get rid
> of the questionable attempt to compute time in a floating-point variable
> and just use an uint64_t storing nanoseconds?

To me this is the correct thing to do.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:
> My understanding is that this is not a hardware bug and that it's 
> specified that rounding does not happen on the multiply "sub-part" in 
> `FNMSUB`, but rounding happens on the `FMUL` that generates some input
> to it.

AFAIK the C standard does only say "A floating *expression* may be
contracted".  I.e:

double r = a * b + c;

may be compiled to use FMA because "a * b + c" is a floating point
expression.  But

double t = a * b;
double r = t + c;

is not, because "a * b" and "t + c" are two separate floating point
expressions.

So a contraction across two functions is not allowed.  We now have -ffp-
contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
contractions.

Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
are building GCC 14 snapshot).  The default is "fast" (if no -std=
option is used), which allows some contractions disallowed by the
standard.

But GCC is in C++ and I'm not sure if the C++ standard has the same
definition for allowed contractions as C.

> I can look into `-ffp-contract=off` as you both have recommended.
> One question -- if we have concerns that the host compiler may not be 
> able to handle `attribute((noinline))` would we also be concerned that
> this flag may not be supported?

Only use it in BOOT_CFLAGS, i. e. 'make BOOT_CFLAGS="-O2 -g -ffp-
contract=on"' (or "off" instead of "on").  In 3-stage bootstrapping it's
only applied in stage 2 and 3, during which GCC is compiled by itself.

> (Or is the severity of lack of support sufficiently different in the two 
> cases that this is fine -- i.e. not compile vs may trigger floating 
> point rounding inaccuracies?)

It's possible that the test itself is flaky.  Can you provide some
detail about how it fails?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 13:11 +0100, Matthew Malcomson via Gcc-patches
wrote:
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.
> 
> gcc/ChangeLog:
> 
>   * timevar.cc (get_time): Make this noinline to avoid fusing
>   behaviour and associated test flakyness.

I don't think it's correct.  It will break bootstrapping GCC from other
ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
And IMO it's just hiding the real problem.

We need more info of the "particular machine".  Is this a hardware bug
(i.e. the machine violates the AArch64 spec) or a GCC code generation
issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Allow using --with-arch=native if host CPU is LoongArch

2023-07-20 Thread Xi Ruoyao via Gcc-patches
If the host triple and the target triple are different but the host is
LoongArch, in some cases --with-arch=native can be useful.  For example,
if we are bootstrapping a loongarch64-linux-musl toolchain on a
Glibc-based system and we don't intend to use the toolchain on other
machines, we can use

../gcc/configure --{build,host}=loongarch64-linux-gnu \
 --target=loongarch64-linux-musl --with-arch=native

Relax the check in config.gcc to allow such configurations.

gcc/ChangeLog:

* config.gcc [target=loongarch*-*-*, with_arch=native]: Allow
building cross compiler if the host CPU is LoongArch.
---

Tested on x86_64-linux-gnu (building a cross compiler targeting
LoongArch --with-arch=native still rejected) and loongarch64-linux-gnu
(building a cross compiler targeting loongarch64-linux-musl allowed).
Ok for trunk?

 gcc/config.gcc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1446eb2b3ca..146bca22a38 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4939,10 +4939,13 @@ case "${target}" in
case ${with_arch} in
"" | loongarch64 | la464) ;; # OK, append here.
native)
-   if test x${host} != x${target}; then
+   case ${host} in
+   loongarch*) ;; # OK
+   *)
echo "--with-arch=native is illegal for 
cross-compiler." 1>&2
exit 1
-   fi
+   ;;
+   esac
;;
"")
echo "Please set a default value for \${with_arch}" \
-- 
2.41.0



Re: [PATCH v2 0/8] Add Loongson SX/ASX instruction support to LoongArch target.

2023-07-18 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-07-18 at 19:06 +0800, Chenghui Pan wrote:
> Lulu Cheng (8):
>   LoongArch: Added Loongson SX vector directive compilation framework.
>   LoongArch: Added Loongson SX base instruction support.
>   LoongArch: Added Loongson SX directive builtin function support.
>   LoongArch: Added Loongson ASX vector directive compilation framework.
>   LoongArch: Added Loongson ASX base instruction support.
>   LoongArch: Added Loongson ASX directive builtin function support.

Let's always use "Add".

>   LoongArch: Add Loongson SX directive test cases.
>   LoongArch: Add Loongson ASX directive test cases.

Have you tested this series by bootstrapping and regtesting GCC with
BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx" and
BOOT_CFLAGS="-O3 -mlasx"?  This may catch some mistakes early.

And I'll rebuild the entire system with these GCC patches and -mlasx in
Aug (after Glibc-2.38 release) as a field test too.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2 2/2] libstdc++: use new built-in trait __is_scalar for std::is_scalar

2023-07-12 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-07-12 at 11:32 -0700, Ken Matsui via Gcc-patches wrote:
> > conditional on the front-end change being committed first of course
> 
> Does this mean we want to commit this [2/2] patch before committing
> the [1/2] patch in this case?

No, this mean you should get 1/2 reviewed and committed first.

> Also, can I tweak the commit message without being approved again,
> such as attaching the benchmark result?

Yes, as long as the ChangeLog is still correct (the Git hook will reject
a push with wrong ChangeLog format anyway).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Devirtualization of objects in array

2023-07-12 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-07-12 at 16:58 +0800, Ng YongXiang via Gcc-patches wrote:
> I'm writing to seek for a review for an issue I filed some time ago.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . A proposed patch is
> attached in the bug tracker as well.

You should send the patch to gcc-patches@gcc.gnu.org for a review, see
https://gcc.gnu.org/contribute.html for the details.  Generally we
consider patches attached in bugzilla as drafts.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH pushed] testsuite: Unbreak pr110557.cc where long is 32-bit (was Re: Pushed: [PATCH v2] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557])

2023-07-11 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-07-11 at 13:04 +0530, Prathamesh Kulkarni wrote:

/* snip */

> Hi Xi,
> Your commit:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=63ae6bc60c0f67fb2791991bf4b6e7e0a907d420,
> 
> seems to cause following regressions on arm-linux-gnueabihf:
> FAIL: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
> FAIL: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
> FAIL: g++.dg/vect/pr110557.cc  -std=c++17 (test for excess errors)
> FAIL: g++.dg/vect/pr110557.cc  -std=c++20 (test for excess errors)
> 
> Excess error:
> gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
> 'Item::y' exceeds its type

Ah sorry, I didn't consider ports with 32-bit long.

The attached patch should fix the issue.  It has been tested and pushed
r14-2427 and r13-7555.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From 312839653b8295599c63cae90278a87af528edad Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Tue, 11 Jul 2023 15:55:54 +0800
Subject: [PATCH] testsuite: Unbreak pr110557.cc where long is 32-bit

On ports with 32-bit long, the test produced excess errors:

gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
'Item::y' exceeds its type

Reported-by: Prathamesh Kulkarni 

gcc/testsuite/ChangeLog:

	* g++.dg/vect/pr110557.cc: Use long long instead of long for
	64-bit type.
	(test): Remove an unnecessary cast.
---
 gcc/testsuite/g++.dg/vect/pr110557.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/vect/pr110557.cc b/gcc/testsuite/g++.dg/vect/pr110557.cc
index e1fbe1caac4..effb67e2df3 100644
--- a/gcc/testsuite/g++.dg/vect/pr110557.cc
+++ b/gcc/testsuite/g++.dg/vect/pr110557.cc
@@ -1,7 +1,9 @@
 // { dg-additional-options "-mavx" { target { avx_runtime } } }
 
-static inline long
-min (long a, long b)
+typedef long long i64;
+
+static inline i64
+min (i64 a, i64 b)
 {
   return a < b ? a : b;
 }
@@ -9,16 +11,16 @@ min (long a, long b)
 struct Item
 {
   int x : 8;
-  long y : 55;
+  i64 y : 55;
   bool z : 1;
 };
 
-__attribute__ ((noipa)) long
+__attribute__ ((noipa)) i64
 test (Item *a, int cnt)
 {
-  long size = 0;
+  i64 size = 0;
   for (int i = 0; i < cnt; i++)
-size = min ((long)a[i].y, size);
+size = min (a[i].y, size);
   return size;
 }
 
-- 
2.41.0



Pushed: [PATCH v2] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

2023-07-10 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-07-10 at 10:33 +, Richard Biener wrote:
> On Fri, 7 Jul 2023, Xi Ruoyao wrote:
> 
> > If a bit-field is signed and it's wider than the output type, we
> > must
> > ensure the extracted result sign-extended.  But this was not handled
> > correctly.
> > 
> > For example:
> > 
> >     int x : 8;
> >     long y : 55;
> >     bool z : 1;
> > 
> > The vectorized extraction of y was:
> > 
> >     vect__ifc__49.29_110 =
> >   MEM  [(struct Item
> > *)vectp_a.27_108];
> >     vect_patt_38.30_112 =
> >   vect__ifc__49.29_110 & { 9223372036854775552,
> > 9223372036854775552 };
> >     vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
> >     vect_patt_40.32_114 =
> >   VIEW_CONVERT_EXPR(vect_patt_39.31_113);
> > 
> > This is obviously incorrect.  This pach has implemented it as:
> > 
> >     vect__ifc__25.16_62 =
> >   MEM  [(struct Item
> > *)vectp_a.14_60];
> >     vect_patt_31.17_63 =
> >   VIEW_CONVERT_EXPR(vect__ifc__25.16_62);
> >     vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
> >     vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;
> 
> OK.

Pushed r14-2407 and r13-7553.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH v2] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

2023-07-07 Thread Xi Ruoyao via Gcc-patches
If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended.  But this was not handled
correctly.

For example:

int x : 8;
long y : 55;
bool z : 1;

The vectorized extraction of y was:

vect__ifc__49.29_110 =
  MEM  [(struct Item *)vectp_a.27_108];
vect_patt_38.30_112 =
  vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
vect_patt_40.32_114 =
  VIEW_CONVERT_EXPR(vect_patt_39.31_113);

This is obviously incorrect.  This pach has implemented it as:

vect__ifc__25.16_62 =
  MEM  [(struct Item *)vectp_a.14_60];
vect_patt_31.17_63 =
  VIEW_CONVERT_EXPR(vect__ifc__25.16_62);
vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;

gcc/ChangeLog:

PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.

gcc/testsuite/ChangeLog:

PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.
---

Change v1 -> v2:

- Rename two variables for readability.
- Remove a redundant useless_type_conversion_p check.
- Edit the comment for early conversion to show the rationale of
  "|| ref_sext".

Bootstrapped (with BOOT_CFLAGS="-O3 -mavx2") and regtested on
x86_64-linux-gnu.  Ok for trunk and gcc-13?

 gcc/testsuite/g++.dg/vect/pr110557.cc | 37 
 gcc/tree-vect-patterns.cc | 62 ---
 2 files changed, 83 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr110557.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr110557.cc 
b/gcc/testsuite/g++.dg/vect/pr110557.cc
new file mode 100644
index 000..e1fbe1caac4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr110557.cc
@@ -0,0 +1,37 @@
+// { dg-additional-options "-mavx" { target { avx_runtime } } }
+
+static inline long
+min (long a, long b)
+{
+  return a < b ? a : b;
+}
+
+struct Item
+{
+  int x : 8;
+  long y : 55;
+  bool z : 1;
+};
+
+__attribute__ ((noipa)) long
+test (Item *a, int cnt)
+{
+  long size = 0;
+  for (int i = 0; i < cnt; i++)
+size = min ((long)a[i].y, size);
+  return size;
+}
+
+int
+main ()
+{
+  struct Item items[] = {
+{ 1, -1 },
+{ 2, -2 },
+{ 3, -3 },
+{ 4, -4 },
+  };
+
+  if (test (items, 4) != -4)
+__builtin_trap ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 1bc36b043a0..c0832e8679f 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2566,7 +2566,7 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
Widening with mask first, shift later:
container = (type_out) container;
masked = container & (((1 << bitsize) - 1) << bitpos);
-   result = patt2 >> masked;
+   result = masked >> bitpos;
 
Widening with shift first, mask last:
container = (type_out) container;
@@ -2578,6 +2578,15 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
result = masked >> bitpos;
result = (type_out) result;
 
+   If the bitfield is signed and it's wider than type_out, we need to
+   keep the result sign-extended:
+   container = (type) container;
+   masked = container << (prec - bitsize - bitpos);
+   result = (type_out) (masked >> (prec - bitsize));
+
+   Here type is the signed variant of the wider of type_out and the type
+   of container.
+
The shifting is always optional depending on whether bitpos != 0.
 
 */
@@ -2636,14 +2645,22 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (BYTES_BIG_ENDIAN)
 shift_n = prec - shift_n - mask_width;
 
+  bool ref_sext = (!TYPE_UNSIGNED (TREE_TYPE (bf_ref)) &&
+  TYPE_PRECISION (ret_type) > mask_width);
+  bool load_widen = (TYPE_PRECISION (TREE_TYPE (container)) <
+TYPE_PRECISION (ret_type));
+
   /* We move the conversion earlier if the loaded type is smaller than the
- return type to enable the use of widening loads.  */
-  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
-  && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
-{
-  pattern_stmt
-   = gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
-  NOP_EXPR, container);
+ return type to enable the use of widening loads.  And if we need a
+ sign extension, we need to convert the loaded value early to a signed
+ type as well.  */
+  if (ref_sext || load_widen)
+{
+  tree type = load_widen ? ret_type : container_type;
+  if (ref_sext)
+   type = gimple_signed_type (type);
+  pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (type),
+ NOP_EXPR, container);
   container = gimple_get_lhs (pattern_stmt);
   container_type = TREE_TYPE (container);
   prec = tree_to_uhwi (TYPE_SIZE (container_type));
@@ -2671,7 

Re: [PATCH] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

2023-07-07 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-07 at 08:15 +0200, Richard Biener wrote:

/* snip */

> > +  bool sign_ext = (!TYPE_UNSIGNED (TREE_TYPE (bf_ref)) &&
> > +  TYPE_PRECISION (ret_type) > mask_width);
> > +  bool widening = ((TYPE_PRECISION (TREE_TYPE (container)) <
> > +   TYPE_PRECISION (ret_type))
> > +  && !useless_type_conversion_p (TREE_TYPE (container),
> > + ret_type));
> 
> the !useless_type_conversion_p check isn't necessary, when TYPE_PRECISION
> isn't equal the conversion is never useless.

I'll drop it.

> I'll also note that ret_type == TREE_TYPE (bf_ref).

No, ret_type == TREE_TYPE (ret), not TREE_TYPE (bf_ref).  For something
like

struct Item
  {
int x : 30;
int y : 30;
  };

Item *p = get();
unsigned long t = p->y;

Then TREE_TYPE (ret) is unsigned long, and TREE_TYPE (bf_ref) is int. 
In this case we still need to perform the sign extension: if p->y is -1
we should have -1ul in t.  So we need to check the signedness of
TREE_TYPE (bf_ref).

> Can you rename 'widening' to 'load_widen' and 'sign_ext' to 'ref_sext'?  As 
> they
> are named it suggest they apply to the same so I originally thought sign_ext
> should be widening && !TYPE_UNSIGNED.

I'll rename them.

I'll send a v2 after testing it.
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

2023-07-06 Thread Xi Ruoyao via Gcc-patches
If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended.  But this was not handled
correctly.

For example:

int x : 8;
long y : 55;
bool z : 1;

The vectorized extraction of y was:

vect__ifc__49.29_110 =
  MEM  [(struct Item *)vectp_a.27_108];
vect_patt_38.30_112 =
  vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
vect_patt_40.32_114 =
  VIEW_CONVERT_EXPR(vect_patt_39.31_113);

This is obviously incorrect.  This pach has implemented it as:

vect__ifc__25.16_62 =
  MEM  [(struct Item *)vectp_a.14_60];
vect_patt_31.17_63 =
  VIEW_CONVERT_EXPR(vect__ifc__25.16_62);
vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;

gcc/ChangeLog:

PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.

gcc/testsuite/ChangeLog:

PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.
---

Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk and gcc-13
branch?

 gcc/testsuite/g++.dg/vect/pr110557.cc | 37 +
 gcc/tree-vect-patterns.cc | 58 ---
 2 files changed, 81 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr110557.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr110557.cc 
b/gcc/testsuite/g++.dg/vect/pr110557.cc
new file mode 100644
index 000..e1fbe1caac4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr110557.cc
@@ -0,0 +1,37 @@
+// { dg-additional-options "-mavx" { target { avx_runtime } } }
+
+static inline long
+min (long a, long b)
+{
+  return a < b ? a : b;
+}
+
+struct Item
+{
+  int x : 8;
+  long y : 55;
+  bool z : 1;
+};
+
+__attribute__ ((noipa)) long
+test (Item *a, int cnt)
+{
+  long size = 0;
+  for (int i = 0; i < cnt; i++)
+size = min ((long)a[i].y, size);
+  return size;
+}
+
+int
+main ()
+{
+  struct Item items[] = {
+{ 1, -1 },
+{ 2, -2 },
+{ 3, -3 },
+{ 4, -4 },
+  };
+
+  if (test (items, 4) != -4)
+__builtin_trap ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 1bc36b043a0..20412c27ead 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2566,7 +2566,7 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
Widening with mask first, shift later:
container = (type_out) container;
masked = container & (((1 << bitsize) - 1) << bitpos);
-   result = patt2 >> masked;
+   result = masked >> bitpos;
 
Widening with shift first, mask last:
container = (type_out) container;
@@ -2578,6 +2578,15 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
result = masked >> bitpos;
result = (type_out) result;
 
+   If the bitfield is signed and it's wider than type_out, we need to
+   keep the result sign-extended:
+   container = (type) container;
+   masked = container << (prec - bitsize - bitpos);
+   result = (type_out) (masked >> (prec - bitsize));
+
+   Here type is the signed variant of the wider of type_out and the type
+   of container.
+
The shifting is always optional depending on whether bitpos != 0.
 
 */
@@ -2636,14 +2645,22 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (BYTES_BIG_ENDIAN)
 shift_n = prec - shift_n - mask_width;
 
+  bool sign_ext = (!TYPE_UNSIGNED (TREE_TYPE (bf_ref)) &&
+  TYPE_PRECISION (ret_type) > mask_width);
+  bool widening = ((TYPE_PRECISION (TREE_TYPE (container)) <
+   TYPE_PRECISION (ret_type))
+  && !useless_type_conversion_p (TREE_TYPE (container),
+ ret_type));
+
   /* We move the conversion earlier if the loaded type is smaller than the
  return type to enable the use of widening loads.  */
-  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
-  && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+  if (sign_ext || widening)
 {
-  pattern_stmt
-   = gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
-  NOP_EXPR, container);
+  tree type = widening ? ret_type : container_type;
+  if (sign_ext)
+   type = gimple_signed_type (type);
+  pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (type),
+ NOP_EXPR, container);
   container = gimple_get_lhs (pattern_stmt);
   container_type = TREE_TYPE (container);
   prec = tree_to_uhwi (TYPE_SIZE (container_type));
@@ -2671,7 +2688,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
 shift_first = true;
 
   tree result;
-  if (shift_first)
+  if (shift_first && !sign_ext)
 {
   tree shifted = container;
   if (shift_n)
@@ -2694,14 +2711,27 @@ 

Re: [PATCH v1 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-07-05 Thread Xi Ruoyao via Gcc-patches
A question: is vld/vst guaranteed to be atomic if the accessed address
is aligned?  If true we can use them to implement lock-free 128-bit
atomic load and store.  See https://gcc.gnu.org/bugzilla/PR104688 for
the background, and some people really hate using a lock for atomics.

On Fri, 2023-06-30 at 10:16 +0800, Chenghui Pan wrote:
> These patches add the Loongson SX/ASX instruction support to the
> LoongArch
> target, and can be utilized by using the new "-mlsx" and
> "-mlasx" option.
> 
> Patches are bootstrapped and tested on loongarch64-linux-gnu target.
> 
> Lulu Cheng (6):
>   LoongArch: Added Loongson SX vector directive compilation framework.
>   LoongArch: Added Loongson SX base instruction support.
>   LoongArch: Added Loongson SX directive builtin function support.
>   LoongArch: Added Loongson ASX vector directive compilation
> framework.
>   LoongArch: Added Loongson ASX base instruction support.
>   LoongArch: Added Loongson ASX directive builtin function support.
> 
>  gcc/config.gcc    |    2 +-
>  gcc/config/loongarch/constraints.md   |  128 +-
>  .../loongarch/genopts/loongarch-strings   |    4 +
>  gcc/config/loongarch/genopts/loongarch.opt.in |   16 +-
>  gcc/config/loongarch/lasx.md  | 5147 
>  gcc/config/loongarch/lasxintrin.h | 5342
> +
>  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
>  gcc/config/loongarch/loongarch-c.cc   |   18 +
>  gcc/config/loongarch/loongarch-def.c  |    6 +
>  gcc/config/loongarch/loongarch-def.h  |    9 +-
>  gcc/config/loongarch/loongarch-driver.cc  |   10 +
>  gcc/config/loongarch/loongarch-driver.h   |    2 +
>  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
>  gcc/config/loongarch/loongarch-modes.def  |   39 +
>  gcc/config/loongarch/loongarch-opts.cc    |   89 +-
>  gcc/config/loongarch/loongarch-opts.h |    3 +
>  gcc/config/loongarch/loongarch-protos.h   |   35 +
>  gcc/config/loongarch/loongarch-str.h  |    3 +
>  gcc/config/loongarch/loongarch.cc | 4615 +-
>  gcc/config/loongarch/loongarch.h  |  117 +-
>  gcc/config/loongarch/loongarch.md |   56 +-
>  gcc/config/loongarch/loongarch.opt    |   16 +-
>  gcc/config/loongarch/lsx.md   | 4490 ++
>  gcc/config/loongarch/lsxintrin.h  | 5181 
>  gcc/config/loongarch/predicates.md    |  333 +-
>  25 files changed, 28723 insertions(+), 290 deletions(-)
>  create mode 100644 gcc/config/loongarch/lasx.md
>  create mode 100644 gcc/config/loongarch/lasxintrin.h
>  create mode 100644 gcc/config/loongarch/lsx.md
>  create mode 100644 gcc/config/loongarch/lsxintrin.h
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 2/6] LoongArch: Added Loongson SX base instruction support.

2023-06-30 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-06-30 at 10:16 +0800, Chenghui Pan wrote:
> +(define_c_enum "unspec" [
> +  UNSPEC_LSX_ASUB_S
> +  UNSPEC_LSX_VABSD_U
> +  UNSPEC_LSX_VAVG_S

/* ... */

To me many of them can be modeled using RTL templates, instead of an
unspec.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 2/6] LoongArch: Added Loongson SX base instruction support.

2023-06-30 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-06-30 at 10:16 +0800, Chenghui Pan wrote:
>  
> +  int use_vecarg_p = TARGET_VECARG
> +    && LSX_SUPPORTED_MODE_P (mode);
> +
>    memset (info, 0, sizeof (*info));
>    info->gpr_offset = cum->num_gprs;
>    info->fpr_offset = cum->num_fprs;
> @@ -535,7 +546,7 @@ loongarch_get_arg_info (struct loongarch_arg_info *info,
>  
>    /* Pass one- or two-element floating-point aggregates in FPRs.  */
>    if ((info->num_fprs
> -  = loongarch_pass_aggregate_num_fpr (type, fields))
> +  = loongarch_pass_aggregate_num_fpr (type, fields, use_vecarg_p))
>   && info->fpr_offset + info->num_fprs <= MAX_ARGS_IN_REGISTERS)
> switch (info->num_fprs)
>   {

No, this is breaking ABI.  use_vecarg_p can be only set if we invent a
new ABI (it won't be LP64D anymore), or we add some special switch for
it (like x86's -msseregparm and sseregparm attribute).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-06-30 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-06-30 at 10:16 +0800, Chenghui Pan wrote:
> These patches add the Loongson SX/ASX instruction support to the
> LoongArch
> target, and can be utilized by using the new "-mlsx" and
> "-mlasx" option.
> 
> Patches are bootstrapped and tested on loongarch64-linux-gnu target.
> 
> Lulu Cheng (6):
>   LoongArch: Added Loongson SX vector directive compilation framework.
>   LoongArch: Added Loongson SX base instruction support.
>   LoongArch: Added Loongson SX directive builtin function support.
>   LoongArch: Added Loongson ASX vector directive compilation framework.
>   LoongArch: Added Loongson ASX base instruction support.
>   LoongArch: Added Loongson ASX directive builtin function support.

These seems too long for review.

Could we separate them into multiple pieces, for example:

- The first patch just adds "-mlsx" and "-mlasx" options.
- The second patch adds memory load and store instructions, and block
move & store operations using these instructions.
- The third patch adds integer vector add/subtraction/multiplication
instructions.
- The fourth patch adds integer vector division instructions (division
is "complex" so IMO it worthy a separate patch)
- ...
- The (n-1)-th patch adds remaining instructions (impossible or
difficult to be modeled with RTL templates) as UNSPECs.
- The n-th patch adds the built-ins.

> 
>  gcc/config.gcc    |    2 +-
>  gcc/config/loongarch/constraints.md   |  128 +-
>  .../loongarch/genopts/loongarch-strings   |    4 +
>  gcc/config/loongarch/genopts/loongarch.opt.in |   16 +-
>  gcc/config/loongarch/lasx.md  | 5147 
>  gcc/config/loongarch/lasxintrin.h | 5342
> +
>  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
>  gcc/config/loongarch/loongarch-c.cc   |   18 +
>  gcc/config/loongarch/loongarch-def.c  |    6 +
>  gcc/config/loongarch/loongarch-def.h  |    9 +-
>  gcc/config/loongarch/loongarch-driver.cc  |   10 +
>  gcc/config/loongarch/loongarch-driver.h   |    2 +
>  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
>  gcc/config/loongarch/loongarch-modes.def  |   39 +
>  gcc/config/loongarch/loongarch-opts.cc    |   89 +-
>  gcc/config/loongarch/loongarch-opts.h |    3 +
>  gcc/config/loongarch/loongarch-protos.h   |   35 +
>  gcc/config/loongarch/loongarch-str.h  |    3 +
>  gcc/config/loongarch/loongarch.cc | 4615 +-
>  gcc/config/loongarch/loongarch.h  |  117 +-
>  gcc/config/loongarch/loongarch.md |   56 +-
>  gcc/config/loongarch/loongarch.opt    |   16 +-
>  gcc/config/loongarch/lsx.md   | 4490 ++
>  gcc/config/loongarch/lsxintrin.h  | 5181 
>  gcc/config/loongarch/predicates.md    |  333 +-
>  25 files changed, 28723 insertions(+), 290 deletions(-)
>  create mode 100644 gcc/config/loongarch/lasx.md
>  create mode 100644 gcc/config/loongarch/lasxintrin.h
>  create mode 100644 gcc/config/loongarch/lsx.md
>  create mode 100644 gcc/config/loongarch/lsxintrin.h
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-06-30 at 04:08 +0800, Xi Ruoyao wrote:
> On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> > These tests fail when the testsuite is executed with -fstack-
> > protector-strong.
> > To avoid this, this patch adds -fno-stack-protector to dg-options.
> > 
> > Tested on x86_64-pc-linux-gnu, ok for trunk?
> 
> LGTM, we've noticed these two failures in Linux From Scratch [1].  But
> this is not an approval because I'm not a maintainer.

And can we backport them to gcc-13 branch too?  These two tests were
added in the cycle of GCC 13, so we could consider the failures
"regression".

> 
> [1]:https://www.linuxfromscratch.org/lfs/view/development/chapter08/gcc.html
> 
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/pr104610.c: Use -fno-stack-protector.
> > * gcc.target/i386/pr69482-1.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr104610.c  | 2 +-
> >  gcc/testsuite/gcc.target/i386/pr69482-1.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c
> > b/gcc/testsuite/gcc.target/i386/pr104610.c
> > index fe39cbe5b8a..5173fc8898c 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr104610.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256" } */
> > +/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256 -fno-stack-
> > protector" } */
> >  /* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
> >  /* { dg-final { scan-assembler-times {sete} 1 } } */
> >  /* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > index f192261b104..99bb6ad5a37 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O3" } */
> > +/* { dg-options "-O3 -fno-stack-protector" } */
> >  
> >  static inline void memset_s(void* s, int n) {
> >    volatile unsigned char * p = s;
> > 
> > base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> These tests fail when the testsuite is executed with -fstack-
> protector-strong.
> To avoid this, this patch adds -fno-stack-protector to dg-options.
> 
> Tested on x86_64-pc-linux-gnu, ok for trunk?

LGTM, we've noticed these two failures in Linux From Scratch [1].  But
this is not an approval because I'm not a maintainer.

[1]:https://www.linuxfromscratch.org/lfs/view/development/chapter08/gcc.html

> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/pr104610.c: Use -fno-stack-protector.
> * gcc.target/i386/pr69482-1.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr104610.c  | 2 +-
>  gcc/testsuite/gcc.target/i386/pr69482-1.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c
> b/gcc/testsuite/gcc.target/i386/pr104610.c
> index fe39cbe5b8a..5173fc8898c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr104610.c
> +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256" } */
> +/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256 -fno-stack-
> protector" } */
>  /* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
>  /* { dg-final { scan-assembler-times {sete} 1 } } */
>  /* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> index f192261b104..99bb6ad5a37 100644
> --- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3" } */
> +/* { dg-options "-O3 -fno-stack-protector" } */
>  
>  static inline void memset_s(void* s, int n) {
>    volatile unsigned char * p = s;
> 
> base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 1/2] c++: implement __is_const built-in trait

2023-06-24 Thread Xi Ruoyao via Gcc-patches
Please use [PATCH v3 1/2] next time, now it's not easy to find the
latest version of the series (I'm not sure if the number "3" is
correct).

On Sat, 2023-06-24 at 03:38 -0700, Ken Matsui via Gcc-patches wrote:
> This patch implements built-in trait for std::is_const.
> 
> gcc/cp/ChangeLog:
> 
> * cp-trait.def: Define __is_const.
> * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_CONST.
> * semantics.cc (trait_expr_value): Likewise.
> (finish_trait_expr): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.dg/ext/has-builtin-1.C: Test existence of __is_const.
> * g++.dg/ext/is_const.C: New test.
> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/constraint.cc |  3 +++
>  gcc/cp/cp-trait.def  |  1 +
>  gcc/cp/semantics.cc  |  4 
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
>  gcc/testsuite/g++.dg/ext/is_const.C  | 19 +++
>  5 files changed, 30 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_const.C
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index 8cf0f2d0974..ff4ae831def 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_UNION:
>    inform (loc, "  %qT is not a union", t1);
>    break;
> +    case CPTK_IS_CONST:
> +  inform (loc, "  %qT is not a const type", t1);
> +  break;
>  case CPTK_IS_AGGREGATE:
>    inform (loc, "  %qT is not an aggregate", t1);
>    break;
> diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> index 8b7fece0cc8..b40b475b86d 100644
> --- a/gcc/cp/cp-trait.def
> +++ b/gcc/cp/cp-trait.def
> @@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE,
> "__is_trivially_assignable", 2)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE,
> "__is_trivially_constructible", -1)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
>  DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
> +DEFTRAIT_EXPR (IS_CONST, "__is_const", 1)
>  DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY,
> "__reference_constructs_from_temporary", 2)
>  DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY,
> "__reference_converts_from_temporary", 2)
>  /* FIXME Added space to avoid direct usage in GCC 13.  */
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 8fb47fd179e..011ba8e46e1 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -12079,6 +12079,9 @@ trait_expr_value (cp_trait_kind kind, tree
> type1, tree type2)
>  case CPTK_IS_ENUM:
>    return type_code1 == ENUMERAL_TYPE;
>  
> +    case CPTK_IS_CONST:
> +  return CP_TYPE_CONST_P (type1);
> +
>  case CPTK_IS_FINAL:
>    return CLASS_TYPE_P (type1) && CLASSTYPE_FINAL (type1);
>  
> @@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc,
> cp_trait_kind kind, tree type1, tree type2)
>  case CPTK_IS_ENUM:
>  case CPTK_IS_UNION:
>  case CPTK_IS_SAME:
> +    case CPTK_IS_CONST:
>    break;
>  
>  case CPTK_IS_LAYOUT_COMPATIBLE:
> diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> index f343e153e56..965309a333a 100644
> --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> @@ -146,3 +146,6 @@
>  #if !__has_builtin (__remove_cvref)
>  # error "__has_builtin (__remove_cvref) failed"
>  #endif
> +#if !__has_builtin (__is_const)
> +# error "__has_builtin (__is_const) failed"
> +#endif
> diff --git a/gcc/testsuite/g++.dg/ext/is_const.C
> b/gcc/testsuite/g++.dg/ext/is_const.C
> new file mode 100644
> index 000..8f2d7c2fce9
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/is_const.C
> @@ -0,0 +1,19 @@
> +// { dg-do compile { target c++11 } }
> +
> +#include 
> +
> +using namespace __gnu_test;
> +
> +#define SA(X) static_assert((X),#X)
> +
> +// Positive tests.
> +SA(__is_const(const int));
> +SA(__is_const(const volatile int));
> +SA(__is_const(cClassType));
> +SA(__is_const(cvClassType));
> +
> +// Negative tests.
> +SA(!__is_const(int));
> +SA(!__is_const(volatile int));
> +SA(!__is_const(ClassType));
> +SA(!__is_const(vClassType));

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [pushed] wwwdocs: Add GCC Code of Conduct

2023-06-20 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-06-20 at 12:22 -0400, Jason Merrill via Gcc-patches wrote:
> diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
> index aaef8915..6dbe5d45 100644
> --- a/htdocs/bugs/index.html
> +++ b/htdocs/bugs/index.html
> @@ -122,6 +122,9 @@ three of which can be obtained from the output of 
> gcc -v:
>    Questions about the correctness or the expected behavior of
>    certain constructs that are not GCC extensions.  Ask them in forums
>    dedicated to the discussion of the programming language.
> +
> +  Violations of the Code of Conduct.

The link should be "../conduct.html" :).

> +
>  

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v5] MIPS: Add speculation_barrier support

2023-06-16 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-06-16 at 15:53 +0800, YunQiang Su wrote:
> Ohh, sorry. I forget it. I commented there.
> I have no permission to close this bug report. Can you help to close
> it?

Modify the email address of your Bugzilla account to your @gcc.gnu.org
address, then you should be able to close it.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [pushed][PATCH v3] LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]

2023-06-15 Thread Xi Ruoyao via Gcc-patches
Xuerui: I guess this makes it sensible to show "ret" instead of "jirl
$zero, $ra, 0" in objdump -d output, but I don't know how to implement
it.  Do you have some idea?

On Thu, 2023-06-15 at 16:27 +0800, Lulu Cheng wrote:
> Pushed to trunk and gcc-12 gcc-13.
> r14-1866
> r13-7448
> r12-9698
> 
> 在 2023/6/15 上午9:30, Lulu Cheng 写道:
> > Micro-architecture unconditionally treats a "jr $ra" as "return from
> > subroutine",
> > hence doing "jr $ra" would interfere with both subroutine return
> > prediction and
> > the more general indirect branch prediction.
> > 
> > Therefore, a problem like PR110136 can cause a significant increase
> > in branch error
> > prediction rate and affect performance. The same problem exists with
> > "indirect_jump".
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.md: Modify the register
> > constraints for template
> > "jumptable" and "indirect_jump" from "r" to "e".
> > 
> > Co-authored-by: Andrew Pinski 
> > ---
> > v1 -> v2:
> >    1. Modify the description.
> >    2. Modify the register constraints of the template
> > "indirect_jump".
> > v2 -> v3:
> >    1. Modify the description.
> > ---
> >   gcc/config/loongarch/loongarch.md | 8 ++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.md
> > b/gcc/config/loongarch/loongarch.md
> > index 816a943d155..b37e070660f 100644
> > --- a/gcc/config/loongarch/loongarch.md
> > +++ b/gcc/config/loongarch/loongarch.md
> > @@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
> >   }
> >     [(set_attr "type" "branch")])
> >   
> > +;; Micro-architecture unconditionally treats a "jr $ra" as "return
> > from subroutine",
> > +;; non-returning indirect jumps through $ra would interfere with
> > both subroutine
> > +;; return prediction and the more general indirect branch
> > prediction.
> > +
> >   (define_expand "indirect_jump"
> >     [(set (pc) (match_operand 0 "register_operand"))]
> >     ""
> > @@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
> >   })
> >   
> >   (define_insn "@indirect_jump"
> > -  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
> > +  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
> >     ""
> >     "jr\t%0"
> >     [(set_attr "type" "jump")
> > @@ -2928,7 +2932,7 @@ (define_expand "tablejump"
> >   
> >   (define_insn "@tablejump"
> >     [(set (pc)
> > -   (match_operand:P 0 "register_operand" "r"))
> > +   (match_operand:P 0 "register_operand" "e"))
> >  (use (label_ref (match_operand 1 "" "")))]
> >     ""
> >     "jr\t%0"
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH] LoongArch: Set default alignment for functions and labels with -mtune

2023-06-15 Thread Xi Ruoyao via Gcc-patches
Pushed r14-1839.

On Thu, 2023-06-15 at 09:12 +0800, Lulu Cheng wrote:
> LGTM! Thanks!
> 
> 在 2023/6/14 上午8:43, Xi Ruoyao 写道:
> > The LA464 micro-architecture is sensitive to alignment of code.  The
> > Loongson team has benchmarked various combinations of function, the
> > results [1] show that 16-byte label alignment together with 32-byte
> > function alignment gives best results in terms of SPEC score.
> > 
> > Add a mtune-based table-driven mechanism to set the default of
> > -falign-{functions,labels}.  As LA464 is the first (and the only for
> > now) uarch supported by GCC, the same setting is also used for
> > the "generic" -mtune=loongarch64.  In the future we may set
> > different
> > settings for LA{2,3,6}64 once we add the support for them.
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch-tune.h (loongarch_align): New
> > struct.
> > * config/loongarch/loongarch-def.h (loongarch_cpu_align):
> > New
> > array.
> > * config/loongarch/loongarch-def.c (loongarch_cpu_align):
> > Define
> > the array.
> > * config/loongarch/loongarch.cc
> > (loongarch_option_override_internal): Set the value of
> > -falign-functions= if -falign-functions is enabled but no
> > value
> > is given.  Likewise for -falign-labels=.
> > ---
> >   gcc/config/loongarch/loongarch-def.c  | 12 
> >   gcc/config/loongarch/loongarch-def.h  |  1 +
> >   gcc/config/loongarch/loongarch-tune.h |  8 
> >   gcc/config/loongarch/loongarch.cc |  6 ++
> >   4 files changed, 27 insertions(+)
> > 
> > diff --git a/gcc/config/loongarch/loongarch-def.c
> > b/gcc/config/loongarch/loongarch-def.c
> > index fc4ebbefede..6729c857f7c 100644
> > --- a/gcc/config/loongarch/loongarch-def.c
> > +++ b/gcc/config/loongarch/loongarch-def.c
> > @@ -72,6 +72,18 @@ loongarch_cpu_cache[N_TUNE_TYPES] = {
> >     },
> >   };
> >   
> > +struct loongarch_align
> > +loongarch_cpu_align[N_TUNE_TYPES] = {
> > +  [CPU_LOONGARCH64] = {
> > +    .function = "32",
> > +    .label = "16",
> > +  },
> > +  [CPU_LA464] = {
> > +    .function = "32",
> > +    .label = "16",
> > +  },
> > +};
> > +
> >   /* The following properties cannot be looked up directly using
> > "cpucfg".
> >    So it is necessary to provide a default value for "unknown
> > native"
> >    tune targets (i.e. -mtune=native while PRID does not correspond
> > to
> > diff --git a/gcc/config/loongarch/loongarch-def.h
> > b/gcc/config/loongarch/loongarch-def.h
> > index 778b1409956..fb8bb88eb52 100644
> > --- a/gcc/config/loongarch/loongarch-def.h
> > +++ b/gcc/config/loongarch/loongarch-def.h
> > @@ -144,6 +144,7 @@ extern int loongarch_cpu_issue_rate[];
> >   extern int loongarch_cpu_multipass_dfa_lookahead[];
> >   
> >   extern struct loongarch_cache loongarch_cpu_cache[];
> > +extern struct loongarch_align loongarch_cpu_align[];
> >   extern struct loongarch_rtx_cost_data
> > loongarch_cpu_rtx_cost_data[];
> >   
> >   #ifdef __cplusplus
> > diff --git a/gcc/config/loongarch/loongarch-tune.h
> > b/gcc/config/loongarch/loongarch-tune.h
> > index ba31c4f08c3..5c03262daff 100644
> > --- a/gcc/config/loongarch/loongarch-tune.h
> > +++ b/gcc/config/loongarch/loongarch-tune.h
> > @@ -48,4 +48,12 @@ struct loongarch_cache {
> >   int simultaneous_prefetches; /* number of parallel prefetch */
> >   };
> >   
> > +/* Alignment for functions and labels for best performance.  For
> > new uarchs
> > +   the value should be measured via benchmarking.  See the
> > documentation for
> > +   -falign-functions and -falign-labels in invoke.texi for the
> > format.  */
> > +struct loongarch_align {
> > +  const char *function;/* default value for -falign-
> > functions */
> > +  const char *label;   /* default value for -falign-labels */
> > +};
> > +
> >   #endif /* LOONGARCH_TUNE_H */
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index eb73d11b869..5b8b93eb24b 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -6249,6 +6249,12 @@ loongarch_option_override_internal (struct
> > gcc_options *opts)
> >     && !opts->x_optimize_size)
> >   opts->x_flag_prefetch_loop_arrays = 1;
> >   
> > +  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
> > +    opts->x_str_align_functions =
> > loongarch_cpu_align[LARCH_ACTUAL_TUNE].function;
> > +
> > +  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
> > +    opts->x_str_align_labels =
> > loongarch_cpu_align[LARCH_ACTUAL_TUNE].label;
> > +
> >     if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib)
> >   error ("%qs cannot be used for compiling a shared library",
> >    "-mdirect-extern-access");
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


  1   2   3   4   5   6   >