Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-10-19 Thread chenglulu



在 2023/8/20 下午4:25, Xi Ruoyao 写道:

On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote:

Seems ARMv8-A only guarantees to preserve low 64-bit value of
NEON/floating-point register value. I'm not sure that I modify the
testcase in the right way and maybe we need more investigations. Any
ideas or suggestion?


Hi, Ruoyao:

The implementation of hook loongarch_hard_regno_call_part_clobbered 
results in all vector registers being caller saved registers.


So no data will be lost during the function call.


Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my
test case is not valid:

"As with global register variables, it is recommended that you choose a
register that is normally saved and restored by function calls on your
machine, so that calls to library routines will not clobber it."

So when I use asm(name), the compiler has no obligation to guarantee
that it will ever work like a normal variable after a function call.

But I still need to verify that the compiler correctly understands only
the low 64 bits of the vector register is saved.  I'll try to make
another test case...





Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-20 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote:
> Seems ARMv8-A only guarantees to preserve low 64-bit value of
> NEON/floating-point register value. I'm not sure that I modify the
> testcase in the right way and maybe we need more investigations. Any
> ideas or suggestion?

Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my
test case is not valid:

"As with global register variables, it is recommended that you choose a
register that is normally saved and restored by function calls on your
machine, so that calls to library routines will not clobber it."

So when I use asm(name), the compiler has no obligation to guarantee
that it will ever work like a normal variable after a function call.

But I still need to verify that the compiler correctly understands only
the low 64 bits of the vector register is saved.  I'll try to make
another test case...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-17 Thread Chenghui Pan
Hi! I try to investigate on this problem, and modify the testcase to
compile and run on aarch64 for reference, but I get some strange result
(comment shows the info that I see by stepping through by using gdb):

typedef double __attribute__((vector_size(16))) v2df;

void use1(double d) {}

__attribute__((noipa)) v2df use(double d)
{
  //reg v8's value: {1, 2}
  register v2df x asm("v8") = {5, 9};
  //reg v8's value: {5, 9}
  __asm__("" : "+w" (x));
  return x;
}

void test(void)
{
  register v2df x asm("v8") = {1, 2};
  __asm__("" : "+w" (x));
  //reg v8's value: {1, 2}
  use(x[0]);
  //reg v8's value: {1, 0}
  use1(x[1]);
}

int main(int argc, char **argv)
{
  test();
  return 0;
}

The compile command is: gcc -march=armv8-a -Og -g 1.c (gcc
8.3.0+binutils 2.31)

Disassembly of test() and use():
00400558 :   
  400558:   fc1f0fe8str d8, [sp, #-16]!   
  40055c:   9000adrpx0, 40 <_init-0x3e0>  
  400560:   3dc19c08ldr q8, [x0, #1648]   
  400564:   4ea81d00mov v0.16b, v8.16b
  400568:   fc4107e8ldr d8, [sp], #16 
  40056c:   d65f03c0ret

00400570 :  
  400570:   a9be7bfdstp x29, x30, [sp, #-32]! 
  400574:   910003fdmov x29, sp   
  400578:   fd000be8str d8, [sp, #16] 
  40057c:   9000adrpx0, 40 <_init-0x3e0>  
  400580:   3dc1a008ldr q8, [x0, #1664]   
  400584:   5e080500mov d0, v8.d[0]   
  400588:   97f4bl  400558   
  40058c:   fd400be8ldr d8, [sp, #16] 
  400590:   a8c27bfdldp x29, x30, [sp], #32   
  400594:   d65f03c0ret 

As the register value in the comments, The compiling output on aarch64
also clobbers the high parts of vector register. I googled for some
documents and I find this:
https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64
-bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard
/Parameters-in-NEON-and-floating-point-registers 

Seems ARMv8-A only guarantees to preserve low 64-bit value of
NEON/floating-point register value. I'm not sure that I modify the
testcase in the right way and maybe we need more investigations. Any
ideas or suggestion?

On Wed, 2023-08-16 at 11:27 +0800, Xi Ruoyao wrote:
> The implementation fails to handle this test case properly:
> 
> typedef double __attribute__((vector_size(32))) v4df;
> 
> void use1(double);
> 
> __attribute__((noipa)) double use(double)
> {
>   register double x asm("f24") = 114.514;
>   __asm__("" : "+f" (x));
>   return x;
> }
> 
> void test(void)
> {
>   register v4df x asm("f24") = {1, 2, 3, 4};
>   __asm__("" : "+f" (x));
>   use(x[1]);
>   use1(x[3]);
> }
> 
> Here use() attempts to save and restore f24, but it uses fst.d/fld.d,
> clobbering the high 192 bits of xr24.  Now test() passes a wrong
> value
> of x[3] to use1().
> 
> Note that saving and restoring f24 with xvst/xvld in use() won't
> really
> fix the issue because in real life use() can be in another
> translation
> unit (or even a shared library) compiled with -mno-lsx.  So it seems
> we
> need to tell the compiler "a function call may clobber the high bits
> of
> a vector register even if the corresponding floating-point register
> is
> saved".  I'm not sure how to accomplish this...
> 
> On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote:
> > This is an update of:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html
> > 
> > This version of patch set only introduces some small simplications
> > of
> > implementation. Because I missed the size limitation of mail size,
> > the
> > huge testsuite patches of v2 and v3 are not shown in the mail list.
> > So,
> > testsuite patches are splited from this patch set again and will be
> > submitted 
> > independently in the future.
> > 
> > Binutils-gdb introduced LSX/LASX support since 2.41 release:
> > https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html
> > 
> > Brief history of patch set version:
> > v1 -> v2:
> > - Reduce usage of "unspec" in RTL template.
> > - Append Support of ADDR_REG_REG in LSX and LASX.
> > - Constraint docs are appended in gcc/doc/md.texi and ccomment
> > block.
> > - Codes related to vecarg are removed.
> > - Testsuite of LSX and LASX is added in v2. (Because of the size
> > limitation of
> >   mail list, these patches are not shown)
> > - Adjust the loongarch_expand_vector_init() function to reduce
> > instruction 
> >   output amount.
> > - Some minor implementation changes of RTL templates.
> > 
> > v2 -> v3:
> > - Revert vabsd/xvabsd RTL templates to unspec 

Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-15 Thread Xi Ruoyao via Gcc-patches
The implementation fails to handle this test case properly:

typedef double __attribute__((vector_size(32))) v4df;

void use1(double);

__attribute__((noipa)) double use(double)
{
register double x asm("f24") = 114.514;
__asm__("" : "+f" (x));
return x;
}

void test(void)
{
register v4df x asm("f24") = {1, 2, 3, 4};
__asm__("" : "+f" (x));
use(x[1]);
use1(x[3]);
}

Here use() attempts to save and restore f24, but it uses fst.d/fld.d,
clobbering the high 192 bits of xr24.  Now test() passes a wrong value
of x[3] to use1().

Note that saving and restoring f24 with xvst/xvld in use() won't really
fix the issue because in real life use() can be in another translation
unit (or even a shared library) compiled with -mno-lsx.  So it seems we
need to tell the compiler "a function call may clobber the high bits of
a vector register even if the corresponding floating-point register is
saved".  I'm not sure how to accomplish this...

On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote:
> This is an update of:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html
> 
> This version of patch set only introduces some small simplications of
> implementation. Because I missed the size limitation of mail size, the
> huge testsuite patches of v2 and v3 are not shown in the mail list.
> So,
> testsuite patches are splited from this patch set again and will be
> submitted 
> independently in the future.
> 
> Binutils-gdb introduced LSX/LASX support since 2.41 release:
> https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html
> 
> Brief history of patch set version:
> v1 -> v2:
> - Reduce usage of "unspec" in RTL template.
> - Append Support of ADDR_REG_REG in LSX and LASX.
> - Constraint docs are appended in gcc/doc/md.texi and ccomment block.
> - Codes related to vecarg are removed.
> - Testsuite of LSX and LASX is added in v2. (Because of the size
> limitation of
>   mail list, these patches are not shown)
> - Adjust the loongarch_expand_vector_init() function to reduce
> instruction 
>   output amount.
> - Some minor implementation changes of RTL templates.
> 
> v2 -> v3:
> - Revert vabsd/xvabsd RTL templates to unspec impl.
> - Resolve warning in gcc/config/loongarch/loongarch.cc when
> bootstrapping 
>   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
> - Remove redundant definitions in lasxintrin.h.
> - Refine commit info.
> 
> Lulu Cheng (6):
>   LoongArch: Add Loongson SX vector directive compilation framework.
>   LoongArch: Add Loongson SX base instruction support.
>   LoongArch: Add Loongson SX directive builtin function support.
>   LoongArch: Add Loongson ASX vector directive compilation framework.
>   LoongArch: Add Loongson ASX base instruction support.
>   LoongArch: Add Loongson ASX directive builtin function support.
> 
>  gcc/config.gcc    |    2 +-
>  gcc/config/loongarch/constraints.md   |  131 +-
>  .../loongarch/genopts/loongarch-strings   |    4 +
>  gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
>  gcc/config/loongarch/lasx.md  | 5122 
>  gcc/config/loongarch/lasxintrin.h | 5338
> +
>  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
>  gcc/config/loongarch/loongarch-c.cc   |   18 +
>  gcc/config/loongarch/loongarch-def.c  |    6 +
>  gcc/config/loongarch/loongarch-def.h  |    9 +-
>  gcc/config/loongarch/loongarch-driver.cc  |   10 +
>  gcc/config/loongarch/loongarch-driver.h   |    2 +
>  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
>  gcc/config/loongarch/loongarch-modes.def  |   39 +
>  gcc/config/loongarch/loongarch-opts.cc    |   89 +-
>  gcc/config/loongarch/loongarch-opts.h |    3 +
>  gcc/config/loongarch/loongarch-protos.h   |   35 +
>  gcc/config/loongarch/loongarch-str.h  |    3 +
>  gcc/config/loongarch/loongarch.cc | 4586 +-
>  gcc/config/loongarch/loongarch.h  |  117 +-
>  gcc/config/loongarch/loongarch.md |   56 +-
>  gcc/config/loongarch/loongarch.opt    |   12 +-
>  gcc/config/loongarch/lsx.md   | 4481 ++
>  gcc/config/loongarch/lsxintrin.h  | 5181 
>  gcc/config/loongarch/predicates.md    |  333 +-
>  gcc/doc/md.texi   |   11 +
>  26 files changed, 28668 insertions(+), 284 deletions(-)
>  create mode 100644 gcc/config/loongarch/lasx.md
>  create mode 100644 gcc/config/loongarch/lasxintrin.h
>  create mode 100644 gcc/config/loongarch/lsx.md
>  create mode 100644 gcc/config/loongarch/lsxintrin.h
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-14 Thread Chenghui Pan
This is an update of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html

This version of patch set only introduces some small simplications of
implementation. Because I missed the size limitation of mail size, the
huge testsuite patches of v2 and v3 are not shown in the mail list. So,
testsuite patches are splited from this patch set again and will be submitted 
independently in the future.

Binutils-gdb introduced LSX/LASX support since 2.41 release:
https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html

Brief history of patch set version:
v1 -> v2:
- Reduce usage of "unspec" in RTL template.
- Append Support of ADDR_REG_REG in LSX and LASX.
- Constraint docs are appended in gcc/doc/md.texi and ccomment block.
- Codes related to vecarg are removed.
- Testsuite of LSX and LASX is added in v2. (Because of the size limitation of
  mail list, these patches are not shown)
- Adjust the loongarch_expand_vector_init() function to reduce instruction 
  output amount.
- Some minor implementation changes of RTL templates.

v2 -> v3:
- Revert vabsd/xvabsd RTL templates to unspec impl.
- Resolve warning in gcc/config/loongarch/loongarch.cc when bootstrapping 
  with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
- Remove redundant definitions in lasxintrin.h.
- Refine commit info.

Lulu Cheng (6):
  LoongArch: Add Loongson SX vector directive compilation framework.
  LoongArch: Add Loongson SX base instruction support.
  LoongArch: Add Loongson SX directive builtin function support.
  LoongArch: Add Loongson ASX vector directive compilation framework.
  LoongArch: Add Loongson ASX base instruction support.
  LoongArch: Add Loongson ASX directive builtin function support.

 gcc/config.gcc|2 +-
 gcc/config/loongarch/constraints.md   |  131 +-
 .../loongarch/genopts/loongarch-strings   |4 +
 gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
 gcc/config/loongarch/lasx.md  | 5122 
 gcc/config/loongarch/lasxintrin.h | 5338 +
 gcc/config/loongarch/loongarch-builtins.cc| 2686 -
 gcc/config/loongarch/loongarch-c.cc   |   18 +
 gcc/config/loongarch/loongarch-def.c  |6 +
 gcc/config/loongarch/loongarch-def.h  |9 +-
 gcc/config/loongarch/loongarch-driver.cc  |   10 +
 gcc/config/loongarch/loongarch-driver.h   |2 +
 gcc/config/loongarch/loongarch-ftypes.def |  666 +-
 gcc/config/loongarch/loongarch-modes.def  |   39 +
 gcc/config/loongarch/loongarch-opts.cc|   89 +-
 gcc/config/loongarch/loongarch-opts.h |3 +
 gcc/config/loongarch/loongarch-protos.h   |   35 +
 gcc/config/loongarch/loongarch-str.h  |3 +
 gcc/config/loongarch/loongarch.cc | 4586 +-
 gcc/config/loongarch/loongarch.h  |  117 +-
 gcc/config/loongarch/loongarch.md |   56 +-
 gcc/config/loongarch/loongarch.opt|   12 +-
 gcc/config/loongarch/lsx.md   | 4481 ++
 gcc/config/loongarch/lsxintrin.h  | 5181 
 gcc/config/loongarch/predicates.md|  333 +-
 gcc/doc/md.texi   |   11 +
 26 files changed, 28668 insertions(+), 284 deletions(-)
 create mode 100644 gcc/config/loongarch/lasx.md
 create mode 100644 gcc/config/loongarch/lasxintrin.h
 create mode 100644 gcc/config/loongarch/lsx.md
 create mode 100644 gcc/config/loongarch/lsxintrin.h

-- 
2.36.0