Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.
在 2023/8/20 下午4:25, Xi Ruoyao 写道: On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote: Seems ARMv8-A only guarantees to preserve low 64-bit value of NEON/floating-point register value. I'm not sure that I modify the testcase in the right way and maybe we need more investigations. Any ideas or suggestion? Hi, Ruoyao: The implementation of hook loongarch_hard_regno_call_part_clobbered results in all vector registers being caller saved registers. So no data will be lost during the function call. Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my test case is not valid: "As with global register variables, it is recommended that you choose a register that is normally saved and restored by function calls on your machine, so that calls to library routines will not clobber it." So when I use asm(name), the compiler has no obligation to guarantee that it will ever work like a normal variable after a function call. But I still need to verify that the compiler correctly understands only the low 64 bits of the vector register is saved. I'll try to make another test case...
Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.
On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote: > Seems ARMv8-A only guarantees to preserve low 64-bit value of > NEON/floating-point register value. I'm not sure that I modify the > testcase in the right way and maybe we need more investigations. Any > ideas or suggestion? Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my test case is not valid: "As with global register variables, it is recommended that you choose a register that is normally saved and restored by function calls on your machine, so that calls to library routines will not clobber it." So when I use asm(name), the compiler has no obligation to guarantee that it will ever work like a normal variable after a function call. But I still need to verify that the compiler correctly understands only the low 64 bits of the vector register is saved. I'll try to make another test case... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.
Hi! I try to investigate on this problem, and modify the testcase to compile and run on aarch64 for reference, but I get some strange result (comment shows the info that I see by stepping through by using gdb): typedef double __attribute__((vector_size(16))) v2df; void use1(double d) {} __attribute__((noipa)) v2df use(double d) { //reg v8's value: {1, 2} register v2df x asm("v8") = {5, 9}; //reg v8's value: {5, 9} __asm__("" : "+w" (x)); return x; } void test(void) { register v2df x asm("v8") = {1, 2}; __asm__("" : "+w" (x)); //reg v8's value: {1, 2} use(x[0]); //reg v8's value: {1, 0} use1(x[1]); } int main(int argc, char **argv) { test(); return 0; } The compile command is: gcc -march=armv8-a -Og -g 1.c (gcc 8.3.0+binutils 2.31) Disassembly of test() and use(): 00400558 : 400558: fc1f0fe8str d8, [sp, #-16]! 40055c: 9000adrpx0, 40 <_init-0x3e0> 400560: 3dc19c08ldr q8, [x0, #1648] 400564: 4ea81d00mov v0.16b, v8.16b 400568: fc4107e8ldr d8, [sp], #16 40056c: d65f03c0ret 00400570 : 400570: a9be7bfdstp x29, x30, [sp, #-32]! 400574: 910003fdmov x29, sp 400578: fd000be8str d8, [sp, #16] 40057c: 9000adrpx0, 40 <_init-0x3e0> 400580: 3dc1a008ldr q8, [x0, #1664] 400584: 5e080500mov d0, v8.d[0] 400588: 97f4bl 400558 40058c: fd400be8ldr d8, [sp, #16] 400590: a8c27bfdldp x29, x30, [sp], #32 400594: d65f03c0ret As the register value in the comments, The compiling output on aarch64 also clobbers the high parts of vector register. I googled for some documents and I find this: https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64 -bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard /Parameters-in-NEON-and-floating-point-registers Seems ARMv8-A only guarantees to preserve low 64-bit value of NEON/floating-point register value. I'm not sure that I modify the testcase in the right way and maybe we need more investigations. Any ideas or suggestion? On Wed, 2023-08-16 at 11:27 +0800, Xi Ruoyao wrote: > The implementation fails to handle this test case properly: > > typedef double __attribute__((vector_size(32))) v4df; > > void use1(double); > > __attribute__((noipa)) double use(double) > { > register double x asm("f24") = 114.514; > __asm__("" : "+f" (x)); > return x; > } > > void test(void) > { > register v4df x asm("f24") = {1, 2, 3, 4}; > __asm__("" : "+f" (x)); > use(x[1]); > use1(x[3]); > } > > Here use() attempts to save and restore f24, but it uses fst.d/fld.d, > clobbering the high 192 bits of xr24. Now test() passes a wrong > value > of x[3] to use1(). > > Note that saving and restoring f24 with xvst/xvld in use() won't > really > fix the issue because in real life use() can be in another > translation > unit (or even a shared library) compiled with -mno-lsx. So it seems > we > need to tell the compiler "a function call may clobber the high bits > of > a vector register even if the corresponding floating-point register > is > saved". I'm not sure how to accomplish this... > > On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote: > > This is an update of: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html > > > > This version of patch set only introduces some small simplications > > of > > implementation. Because I missed the size limitation of mail size, > > the > > huge testsuite patches of v2 and v3 are not shown in the mail list. > > So, > > testsuite patches are splited from this patch set again and will be > > submitted > > independently in the future. > > > > Binutils-gdb introduced LSX/LASX support since 2.41 release: > > https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html > > > > Brief history of patch set version: > > v1 -> v2: > > - Reduce usage of "unspec" in RTL template. > > - Append Support of ADDR_REG_REG in LSX and LASX. > > - Constraint docs are appended in gcc/doc/md.texi and ccomment > > block. > > - Codes related to vecarg are removed. > > - Testsuite of LSX and LASX is added in v2. (Because of the size > > limitation of > > mail list, these patches are not shown) > > - Adjust the loongarch_expand_vector_init() function to reduce > > instruction > > output amount. > > - Some minor implementation changes of RTL templates. > > > > v2 -> v3: > > - Revert vabsd/xvabsd RTL templates to unspec
Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.
The implementation fails to handle this test case properly: typedef double __attribute__((vector_size(32))) v4df; void use1(double); __attribute__((noipa)) double use(double) { register double x asm("f24") = 114.514; __asm__("" : "+f" (x)); return x; } void test(void) { register v4df x asm("f24") = {1, 2, 3, 4}; __asm__("" : "+f" (x)); use(x[1]); use1(x[3]); } Here use() attempts to save and restore f24, but it uses fst.d/fld.d, clobbering the high 192 bits of xr24. Now test() passes a wrong value of x[3] to use1(). Note that saving and restoring f24 with xvst/xvld in use() won't really fix the issue because in real life use() can be in another translation unit (or even a shared library) compiled with -mno-lsx. So it seems we need to tell the compiler "a function call may clobber the high bits of a vector register even if the corresponding floating-point register is saved". I'm not sure how to accomplish this... On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote: > This is an update of: > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html > > This version of patch set only introduces some small simplications of > implementation. Because I missed the size limitation of mail size, the > huge testsuite patches of v2 and v3 are not shown in the mail list. > So, > testsuite patches are splited from this patch set again and will be > submitted > independently in the future. > > Binutils-gdb introduced LSX/LASX support since 2.41 release: > https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html > > Brief history of patch set version: > v1 -> v2: > - Reduce usage of "unspec" in RTL template. > - Append Support of ADDR_REG_REG in LSX and LASX. > - Constraint docs are appended in gcc/doc/md.texi and ccomment block. > - Codes related to vecarg are removed. > - Testsuite of LSX and LASX is added in v2. (Because of the size > limitation of > mail list, these patches are not shown) > - Adjust the loongarch_expand_vector_init() function to reduce > instruction > output amount. > - Some minor implementation changes of RTL templates. > > v2 -> v3: > - Revert vabsd/xvabsd RTL templates to unspec impl. > - Resolve warning in gcc/config/loongarch/loongarch.cc when > bootstrapping > with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx". > - Remove redundant definitions in lasxintrin.h. > - Refine commit info. > > Lulu Cheng (6): > LoongArch: Add Loongson SX vector directive compilation framework. > LoongArch: Add Loongson SX base instruction support. > LoongArch: Add Loongson SX directive builtin function support. > LoongArch: Add Loongson ASX vector directive compilation framework. > LoongArch: Add Loongson ASX base instruction support. > LoongArch: Add Loongson ASX directive builtin function support. > > gcc/config.gcc | 2 +- > gcc/config/loongarch/constraints.md | 131 +- > .../loongarch/genopts/loongarch-strings | 4 + > gcc/config/loongarch/genopts/loongarch.opt.in | 12 +- > gcc/config/loongarch/lasx.md | 5122 > gcc/config/loongarch/lasxintrin.h | 5338 > + > gcc/config/loongarch/loongarch-builtins.cc | 2686 - > gcc/config/loongarch/loongarch-c.cc | 18 + > gcc/config/loongarch/loongarch-def.c | 6 + > gcc/config/loongarch/loongarch-def.h | 9 +- > gcc/config/loongarch/loongarch-driver.cc | 10 + > gcc/config/loongarch/loongarch-driver.h | 2 + > gcc/config/loongarch/loongarch-ftypes.def | 666 +- > gcc/config/loongarch/loongarch-modes.def | 39 + > gcc/config/loongarch/loongarch-opts.cc | 89 +- > gcc/config/loongarch/loongarch-opts.h | 3 + > gcc/config/loongarch/loongarch-protos.h | 35 + > gcc/config/loongarch/loongarch-str.h | 3 + > gcc/config/loongarch/loongarch.cc | 4586 +- > gcc/config/loongarch/loongarch.h | 117 +- > gcc/config/loongarch/loongarch.md | 56 +- > gcc/config/loongarch/loongarch.opt | 12 +- > gcc/config/loongarch/lsx.md | 4481 ++ > gcc/config/loongarch/lsxintrin.h | 5181 > gcc/config/loongarch/predicates.md | 333 +- > gcc/doc/md.texi | 11 + > 26 files changed, 28668 insertions(+), 284 deletions(-) > create mode 100644 gcc/config/loongarch/lasx.md > create mode 100644 gcc/config/loongarch/lasxintrin.h > create mode 100644 gcc/config/loongarch/lsx.md > create mode 100644 gcc/config/loongarch/lsxintrin.h > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.
This is an update of: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html This version of patch set only introduces some small simplications of implementation. Because I missed the size limitation of mail size, the huge testsuite patches of v2 and v3 are not shown in the mail list. So, testsuite patches are splited from this patch set again and will be submitted independently in the future. Binutils-gdb introduced LSX/LASX support since 2.41 release: https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html Brief history of patch set version: v1 -> v2: - Reduce usage of "unspec" in RTL template. - Append Support of ADDR_REG_REG in LSX and LASX. - Constraint docs are appended in gcc/doc/md.texi and ccomment block. - Codes related to vecarg are removed. - Testsuite of LSX and LASX is added in v2. (Because of the size limitation of mail list, these patches are not shown) - Adjust the loongarch_expand_vector_init() function to reduce instruction output amount. - Some minor implementation changes of RTL templates. v2 -> v3: - Revert vabsd/xvabsd RTL templates to unspec impl. - Resolve warning in gcc/config/loongarch/loongarch.cc when bootstrapping with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx". - Remove redundant definitions in lasxintrin.h. - Refine commit info. Lulu Cheng (6): LoongArch: Add Loongson SX vector directive compilation framework. LoongArch: Add Loongson SX base instruction support. LoongArch: Add Loongson SX directive builtin function support. LoongArch: Add Loongson ASX vector directive compilation framework. LoongArch: Add Loongson ASX base instruction support. LoongArch: Add Loongson ASX directive builtin function support. gcc/config.gcc|2 +- gcc/config/loongarch/constraints.md | 131 +- .../loongarch/genopts/loongarch-strings |4 + gcc/config/loongarch/genopts/loongarch.opt.in | 12 +- gcc/config/loongarch/lasx.md | 5122 gcc/config/loongarch/lasxintrin.h | 5338 + gcc/config/loongarch/loongarch-builtins.cc| 2686 - gcc/config/loongarch/loongarch-c.cc | 18 + gcc/config/loongarch/loongarch-def.c |6 + gcc/config/loongarch/loongarch-def.h |9 +- gcc/config/loongarch/loongarch-driver.cc | 10 + gcc/config/loongarch/loongarch-driver.h |2 + gcc/config/loongarch/loongarch-ftypes.def | 666 +- gcc/config/loongarch/loongarch-modes.def | 39 + gcc/config/loongarch/loongarch-opts.cc| 89 +- gcc/config/loongarch/loongarch-opts.h |3 + gcc/config/loongarch/loongarch-protos.h | 35 + gcc/config/loongarch/loongarch-str.h |3 + gcc/config/loongarch/loongarch.cc | 4586 +- gcc/config/loongarch/loongarch.h | 117 +- gcc/config/loongarch/loongarch.md | 56 +- gcc/config/loongarch/loongarch.opt| 12 +- gcc/config/loongarch/lsx.md | 4481 ++ gcc/config/loongarch/lsxintrin.h | 5181 gcc/config/loongarch/predicates.md| 333 +- gcc/doc/md.texi | 11 + 26 files changed, 28668 insertions(+), 284 deletions(-) create mode 100644 gcc/config/loongarch/lasx.md create mode 100644 gcc/config/loongarch/lasxintrin.h create mode 100644 gcc/config/loongarch/lsx.md create mode 100644 gcc/config/loongarch/lsxintrin.h -- 2.36.0