Re: [PATCH 1/2] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-04 Thread chenglulu
在 2024/1/4 下午5:05, chenglulu 写道: 在 2024/1/4 下午12:05, Xi Ruoyao 写道: On Thu, 2024-01-04 at 11:58 +0800, chenglulu wrote: 在 2024/1/4 上午11:51, Xi Ruoyao 写道: On Wed, 2023-12-27 at 16:46 +0800, Lulu Cheng wrote: +(define_insn "movdi_pcrel64" + [(set (match_operand:DI 0 "r

Re:[pushed] [PATCH v2 0/4] Adjust option handling code

2024-01-09 Thread chenglulu
Pushed to r14-7085...r14-7088 在 2024/1/8 上午9:14, Yang Yujie 写道: This patchset performs some code cleanup, and is bootstrapped and regtested on loongarch64-linux-gnu. Changes from v1 -> v2: * Replaced all TARGET_ macros from .opt. * Fixed definition of ISA_HAS_LAMCAS. Yang Yujie (4):

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-12 Thread chenglulu
在 2024/1/12 下午7:42, Xi Ruoyao 写道: 在 2024-01-12星期五的 09:46 +0800,chenglulu写道: I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS: we need a target hook to tell the generic code UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll see millions lines

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-13 Thread chenglulu
在 2024/1/13 下午9:05, Xi Ruoyao 写道: 在 2024-01-13星期六的 15:01 +0800,chenglulu写道: 在 2024/1/12 下午7:42, Xi Ruoyao 写道: 在 2024-01-12星期五的 09:46 +0800,chenglulu写道: I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS: we need a target hook to tell the generic code

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-21 Thread chenglulu
Sorry, I've been busy with something else these two days. I don't think there's anything wrong with the code, but I need to test the spec.:-) 在 2023/12/21 下午7:56, Xi Ruoyao 写道: Ping :). On Tue, 2023-12-12 at 14:47 +0800, Xi Ruoyao wrote: The problem with peephole2 is it uses a naive

Re: [PATCH] LoongArch: Added TLS Le Relax support.

2023-12-19 Thread chenglulu
在 2023/12/19 下午8:37, Xi Ruoyao 写道: On Tue, 2023-12-19 at 19:04 +0800, Lulu Cheng wrote: +(define_insn "@add_tls_le_relax" +  [(set (match_operand:P 0 "register_operand" "=r") +   (unspec:P [(match_operand:P 1 "register_operand" "r") +  (match_operand:P 2 "register_operand"

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-22 Thread chenglulu
在 2023/11/23 下午3:11, Xi Ruoyao 写道: On Thu, 2023-11-23 at 14:35 +0800, chenglulu wrote: Hi,   I don’t quite understand this part. Is it because define_insn would be duplicated with the above implementation, so define_insn_and_split is used? Yes, but if you think duplicating the above

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread chenglulu
在 2023/11/23 下午3:31, chenglulu 写道: 在 2023/11/23 下午3:11, Xi Ruoyao 写道: On Thu, 2023-11-23 at 14:35 +0800, chenglulu wrote: Hi,    I don’t quite understand this part. Is it because define_insn would be duplicated with the above implementation, so define_insn_and_split is used? Yes

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread chenglulu
在 2023/11/23 下午5:02, Xi Ruoyao 写道: On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote: The fix_truncv4sfv4si2 template is indeed called when debugging with gdb. So I think we can use define_expand here. The problem is cases where we want to combine an rint call with float- to-int conversion

Re: [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift

2023-11-23 Thread chenglulu
LGTM. Thanks. 在 2023/11/20 上午8:47, Xi Ruoyao 写道: Remove unnecessary UNSPECs and make the [x]vrotr[i] instructions useful with GNU vectors and auto vectorization. gcc/ChangeLog: * config/loongarch/lsx.md (bitimm): Move to ... (UNSPEC_LSX_VROTR): Remove. (lsx_vrotr_):

Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread chenglulu
I tested it and it was fine. I never knew this could be used like this. Thank you! 在 2023/11/20 上午8:47, Xi Ruoyao 写道: No functional change, just a cleanup. gcc/ChangeLog: * config/loongarch/loongarch.md (lrint_allow_inexact): Remove. (2): Check if == UNSPEC_FTINT

Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread chenglulu
在 2023/11/23 下午4:58, Xi Ruoyao 写道: On Thu, 2023-11-23 at 16:23 +0800, chenglulu wrote: I tested it and it was fine. I never knew this could be used like this. I remember when I wrote r13-3920 I tried this but failed. Maybe something has been improved in machine description parser

Re: [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions

2023-11-23 Thread chenglulu
LGTM. Thanks! 在 2023/11/20 上午8:47, Xi Ruoyao 写道: Removes unnecessary UNSPECs and make the muh instructions useful with GNU vectors or auto vectorization. gcc/ChangeLog: * config/loongarch/simd.md (muh): New code attribute mapping any_extend to smul_highpart or umul_highpart.

Re: Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations

2023-11-28 Thread chenglulu
在 2023/11/29 下午3:12, Xi Ruoyao 写道: On Mon, 2023-11-20 at 08:47 +0800, Xi Ruoyao wrote: The [1/5] patch is the PR112578 fix at https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html. It has been changed to remove the nearbyint pattern (because nearbyint should not raise FE_INEXACT

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-22 Thread chenglulu
在 2023/11/20 上午8:47, Xi Ruoyao 写道: The usage LSX and LASX frint/ftint instructions had some problems: 1. These instructions raises FE_INEXACT, which is not allowed with -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions (the only exceptions are rint, lrint, and llrint).

Re:[pushed] [PATCH v1] LoongArch: Remove duplicate definition of CLZ_DEFINED_VALUE_AT_ZERO.

2023-12-02 Thread chenglulu
Pushed to r14-6070. 在 2023/11/29 上午9:53, Xi Ruoyao 写道: On Tue, 2023-11-28 at 15:56 +0800, Li Wei wrote: In the r14-5547 commit, C[LT]Z_DEFINED_VALUE_AT_ZERO were defined at the same time, but in fact, CLZ_DEFINED_VALUE_AT_ZERO has already been defined, so remove the duplicate definition.

Re: [pushed][PATCH v1 1/2] LoongArch: Accelerate optimization of scalar signed/unsigned popcount.

2023-12-02 Thread chenglulu
Pushed to r14-6072. 在 2023/11/28 下午3:38, Li Wei 写道: In LoongArch, the vector popcount has corresponding instructions, while the scalar does not. Currently, the scalar popcount is calculated through a loop, and the value of a non-power of two needs to be iterated several times, so the vector

Re: [PATCH v1 1/2] LoongArch: Switch loongarch-def from C to C++ to make it possible.

2023-12-02 Thread chenglulu
在 2023/12/2 下午6:15, Xi Ruoyao 写道: On Sat, 2023-12-02 at 16:14 +0800, Lulu Cheng wrote: /* snip */ diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index b5836f198c0..6861642a98d 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++

Re: [pushed][PATCH v1 2/2] LoongArch: Optimize vector constant extract-{even/odd} permutation.

2023-12-02 Thread chenglulu
在 2023/11/29 下午5:44, Xi Ruoyao 写道: On Tue, 2023-11-28 at 15:39 +0800, Li Wei wrote: For vector constant extract-{even/odd} permutation replace the default [x]vshuf instruction combination with [x]vilv{l/h} instruction, which can reduce instructions and improves performance. gcc/ChangeLog:

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-24 Thread chenglulu
在 2023/11/24 上午10:39, Xi Ruoyao 写道: On Thu, 2023-11-23 at 18:03 +, Joseph Myers wrote: The rint functions indeed don't set errno (don't have domain or range errors, at least if you ignore the option for signaling NaNs arguments to be domain errors - which is in TS 18661-1, but not what

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-24 Thread chenglulu
在 2023/11/24 下午4:42, Xi Ruoyao 写道: On Fri, 2023-11-24 at 16:36 +0800, chenglulu wrote: 在 2023/11/24 下午4:26, Xi Ruoyao 写道: On Fri, 2023-11-24 at 16:01 +0800, chenglulu wrote: I only saw lrint llrint in n2310 with this description: F7.12.9.5 "The lrint and llrint functions

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-24 Thread chenglulu
在 2023/11/24 下午4:26, Xi Ruoyao 写道: On Fri, 2023-11-24 at 16:01 +0800, chenglulu wrote: I only saw lrint llrint in n2310 with this description: F7.12.9.5 "The lrint and llrint functions round their argument to the nearest integer value, rounding according to the current rounding dire

Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread chenglulu
在 2023/11/23 下午8:24, Xi Ruoyao 写道: On Thu, 2023-11-23 at 17:14 +0800, chenglulu wrote: When I look at this code and compare it to our scalar implementation, it seems that our scalar implementation still lacks an "lround". Should be "lroundeven". We don't have an in

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-24 Thread chenglulu
在 2023/11/24 下午6:30, Xi Ruoyao 写道: On Fri, 2023-11-24 at 17:46 +0800, chenglulu wrote: It's just that I'm confused that the description of rint in n2310, including Joseph's email, all say that rint will not set errno, but linux-man says "which might set errno to ERANGE" . Annex F h

Re: [PATCH v1 1/2] LoongArch: Switch loongarch-def from C to C++ to make it possible.

2023-12-04 Thread chenglulu
在 2023/12/2 下午9:41, Xi Ruoyao 写道: On Sat, 2023-12-02 at 20:44 +0800, chenglulu wrote: @@ -657,12 +658,18 @@ abi_str (struct loongarch_abi abi)     strlen (loongarch_abi_base_strings[abi.base]));      else   { +  /* This situation has not yet occurred, so in order

Re: [pushed][PATCH v2 0/2] Delete ISA_BASE_LA64V110 related definitions.

2023-12-07 Thread chenglulu
Pushed to r14-6303 and r14-6304. 在 2023/12/5 上午10:30, Lulu Cheng 写道: 1. Rebase Xi Ruoyao's patch a to the latest commit. https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636798.html 2. remove the #if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS) guards in

Re:[pushed] [PATCH] LoongArch: Add support for xorsign.

2023-12-08 Thread chenglulu
Pushed to r14-6308. 在 2023/11/17 下午5:00, Jiahao Xu 写道: This patch adds support for xorsign pattern to scalar fp and vector. With the new expands, uniformly using vector bitwise logical operations to handle xorsign. On LoongArch64, floating-point registers and vector registers share the same

Re:[pushed] [PATCH] LoongArch: Fix lsx-vshuf.c and lasx-xvshuf_b.c tests fail on LA664 [PR112611]

2023-12-08 Thread chenglulu
Pushed to r14-6316. 在 2023/11/29 上午11:16, Jiahao Xu 写道: For [x]vshuf instructions, if the index value in the selector exceeds 63, it triggers undefined behavior on LA464, but not on LA664. To ensure compatibility of these two tests on both LA464 and LA664, we have modified both tests to

Re: [pushed][PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.

2023-12-08 Thread chenglulu
Pushed to r14-6311...r14-6315. 在 2023/12/6 下午3:04, Jiahao Xu 写道: LoongArch V1.1 adds support for approximate instructions, which are utilized along with additional Newton-Raphson steps implement single precision floating-point division, square root and reciprocal square root operations for

Re:[pushed] [PATCH] LoongArch: Fix ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG directly.

2023-12-08 Thread chenglulu
Pushed to r14-6317. 在 2023/11/29 上午11:18, Jiahao Xu 写道: loongarch_expand_vec_cond_mask_expr generates 'subreg's of 'subreg's, which are not supported in gcc, it causes an ICE: ice.c:55:1: error: unrecognizable insn: 55 | } | ^ (insn 63 62 64 8 (set (reg:V4DI 278)

Re: [PATCH] LoongArch: Allow -mcmodel=extreme and model attribute with -mexplicit-relocs=auto

2023-12-07 Thread chenglulu
在 2023/12/7 下午8:20, Xi Ruoyao 写道: There seems no real reason to require -mexplicit-relocs=always for -mcmodel=extreme or model attribute. As the linker does not know how to relax a 3-operand la.local or la.global pseudo instruction, just emit explicit relocs for SYMBOL_PCREL64, and under

Re: [pushed][PATCH] LoongArch: Fix runtime error in a gcc build with --with-build-config=bootstrap-ubsan

2023-11-26 Thread chenglulu
Pushed to r14-5864. 在 2023/11/23 上午11:05, Guo Jie 写道: gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_split_plus_constant): avoid left shift of negative value -0x8000. --- gcc/config/loongarch/loongarch.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Re:[pushed] [PATCH] LoongArch: Optimize the loading of immediate numbers with the same high and low 32-bit values

2023-11-26 Thread chenglulu
Pushed to r14-5863. 在 2023/11/18 下午2:59, Guo Jie 写道: For the following immediate load operation in gcc/testsuite/gcc.target/loongarch/imm-load1.c: long long r = 0x0101010101010101; Before this patch: lu12i.w $r15,16842752>>12 ori $r15,$r15,257

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu
在 2024/1/26 下午6:57, Xi Ruoyao 写道: On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: 在 2024/1/26 下午4:49, Xi Ruoyao 写道: On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: v3 -> v4:    1. Add macro support for TLS symbols    2. Added support for loading __get_tls_addr symbol addr

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread chenglulu
在 2024/1/27 下午7:11, Xi Ruoyao 写道: On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote: On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote: 在 2024/1/26 下午6:57, Xi Ruoyao 写道: On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: 在 2024/1/26 下午4:49, Xi Ruoyao 写道: On Fri, 2024-01-26 at 15:37

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-28 Thread chenglulu
在 2024/1/27 下午10:03, chenglulu 写道: 在 2024/1/27 下午7:11, Xi Ruoyao 写道: On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote: On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote: 在 2024/1/26 下午6:57, Xi Ruoyao 写道: On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: 在 2024/1/26 下午4:49, Xi Ruoyao

Re:[pushed] [PATCH v2] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-02-01 Thread chenglulu
Pushed to r14-8722. 在 2024/1/26 下午4:41, Li Wei 写道: We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r failed to vectorize effectively. For this reason, we adjust the cost of 128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit vectorization. The

Re: [pushed][PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-02-01 Thread chenglulu
Pushed to r14-8723. 在 2024/1/24 下午5:19, Jiahao Xu 写道: gcc/ChangeLog: * config/loongarch/larchintrin.h (__frecipe_s): Update function return type. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. gcc/testsuite/ChangeLog: *

Re: [pushed][PATCH v5 0/5] When cmodel=extreme, add macro implementation and fix problems with explicit relos implementation.

2024-02-01 Thread chenglulu
Pushed to r14-8717...r14-8721. 在 2024/1/29 下午4:21, Lulu Cheng 写道: When cmodel=extreme, since the symbol address is obtained through four instructions, errors may occur in some cases during linking. Xi Ruoyao fixes this problem.

Re: [pushed][PATCH v2] LoongArch: Modify the address calculation logic for obtaining array element values through fp.

2024-02-01 Thread chenglulu
Pushed to r14-8716. 在 2024/1/30 下午3:55, Lulu Cheng 写道: Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C). Thereby modifying the register dependencies and optimizing the code. The value of C is 2 4 or 8. The following is the assembly code before and

Re: [PATCH] LoongArch: Fix an ODR violation

2024-02-01 Thread chenglulu
LGTM! Thanks! 在 2024/2/2 上午5:54, Xi Ruoyao 写道: When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR violation is detected: ../../gcc/config/loongarch/loongarch-opts.cc:57: warning: 'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr] 57 |

Re: [PATCH] LoongArch: libsanitizer: Enable build lsan and tsan for loongarch64.

2024-02-01 Thread chenglulu
Ping? 在 2024/1/30 上午10:09, Lulu Cheng 写道: From: chenguoqi libsanitizer/ChangeLog: * configure.tgt: Enable tsan and lsan for loongarch64. * tsan/Makefile.am: Add tsan_rtl_loongarch64.S to EXTRA_libtsan_la_SOURCES. * tsan/Makefile.in: Regenerate. ---

Re: [PATCH] LoongArch: Fix wrong LSX FP vector negation

2024-02-03 Thread chenglulu
在 2024/2/3 下午4:58, Xi Ruoyao 写道: We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is wrong because -0.0 is not 0 - 0.0. This causes some Python tests to fail when Python is built with LSX enabled. Use the vbitrevi.{d/w} instructions to simply reverse the sign bit instead. We

Re: [PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-03 Thread chenglulu
在 2024/2/2 下午5:55, Xi Ruoyao 写道: We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes. But in loongarch_symbol_insns: if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)) return 0; And LSX_SUPPORTED_MODE_P is defined as: #define

Re: [PATCH] LoongArch: libsanitizer: Enable build lsan and tsan for loongarch64.

2024-02-03 Thread chenglulu
在 2024/2/2 下午6:01, Jakub Jelinek 写道: On Tue, Jan 30, 2024 at 10:09:51AM +0800, Lulu Cheng wrote: From: chenguoqi libsanitizer/ChangeLog: * configure.tgt: Enable tsan and lsan for loongarch64. * tsan/Makefile.am: Add tsan_rtl_loongarch64.S to EXTRA_libtsan_la_SOURCES. This

Re:[pushed] [PATCH v1] LoongArch: testsuite: Fix gcc.dg/vect/vect-reduc-mul_{1,2}.c FAIL.

2024-02-04 Thread chenglulu
Pushed to r14-8784. 在 2024/2/2 上午9:42, Li Wei 写道: This FAIL was introduced from r14-6908. The reason is that when merging constant vector permutation implementations, the 128-bit matching situation was not fully considered. In fact, the expansion of 128-bit vectors after merging only supports

Re: Pushed: [PATCH v2] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-23 Thread chenglulu
在 2024/1/23 下午4:04, Xi Ruoyao 写道: On Tue, 2024-01-23 at 10:37 +0800, chenglulu wrote: LGTM! Thanks! Pushed v2 as attached. The only change is in the comment: Qinggang told me TLE LE relaxation actually *requires* explicit relocs. I think one of the reasons is also because we cannot

Re: [pushed][PATCH v1] LoongArch: doc:Combined with the content of target-supports.exp, add the attribute description related to LoongArch.

2024-01-22 Thread chenglulu
Pushed to r14-8344. 在 2024/1/17 上午9:24, chenxiaolong 写道: gcc/ChangeLog: * doc/sourcebuild.texi: Add attributes for keywords. --- gcc/doc/sourcebuild.texi | 20 1 file changed, 20 insertions(+) diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-21 Thread chenglulu
在 2024/1/19 下午4:51, chenglulu 写道: 在 2024/1/19 下午1:46, Xi Ruoyao 写道: On Wed, 2024-01-17 at 17:57 +0800, chenglulu wrote: Virtual register 1479 will be used in insn 2744, but register 1479 was assigned the REG_UNUSED attribute in the previous instruction. The attached file is the wrong file

Re: [PATCH] LoongArch: testsuite: Disable stack protector for got-load.C

2024-01-23 Thread chenglulu
LGTM! Thanks! 在 2024/1/23 下午7:35, Xi Ruoyao 写道: When building GCC with --enable-default-ssp, the stack protector is enabled for got-load.C, causing additional GOT loads for __stack_chk_guard. So mem/u will be matched more than 2 times and the test will fail. Disable stack protector to fix

Re: [PATCH v1] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-01-26 Thread chenglulu
在 2024/1/24 下午5:36, Li Wei 写道: We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r failed to vectorize effectively. For this reason, we adjust the cost of 128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit vectorization. The experimental

Re: [pushed][PATCH] LoongArch: Split vec_selects of bottom elements into simple move

2024-01-26 Thread chenglulu
Pushed to r14-8447. 在 2024/1/16 上午10:23, Jiahao Xu 写道: For below pattern, can be treated as a simple move because floating point and vector share a common register on loongarch64. (set (reg/v:SF 32 $f0 [orig:93 res ] [93]) (vec_select:SF (reg:V8SF 32 $f0 [115]) (parallel [

Re:[pushed] [PATCH v3] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT

2024-01-26 Thread chenglulu
Pushed to r14-8446. 在 2024/1/16 上午10:32, Jiahao Xu 写道: Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the short-circuit operation instead of the non-short-circuit operation. SPEC2017 performance evaluation shows 1% performance improvement for fprate GEOMEAN and no

Re: [pushed][PATCH v3] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-26 Thread chenglulu
在 2024/1/26 下午3:32, Richard Biener 写道: On Fri, Jan 26, 2024 at 7:23 AM chenxiaolong wrote: gcc/testsuite/ChangeLog: OK Pushed to r14-8445. Thank you everyone for your review! * gcc.dg/signbit-2.c: Added additional "-mlsx" compilation options. *

Re: [pushed][PATCH v1] LoongArch: Optimize implementation of single-precision floating-point approximate division.

2024-01-26 Thread chenglulu
Pushed to r14-8444. 在 2024/1/24 下午5:44, Li Wei 写道: We found that in the spec17 521.wrf program, some loop invariant code generated from single-precision floating-point approximate division calculation failed to propose a loop. This is because the pseudo-register that stores the intermediate

Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread chenglulu
在 2024/1/26 下午4:52, Xi Ruoyao 写道: On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: +(define_insn "@load_tls"    [(set (match_operand:P 0 "register_operand" "=r")   (unspec:P       [(match_operand:P 1 "symbolic_operand" "")] -     UNSPEC_TLS_GD))] +    

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu
在 2024/1/26 下午4:49, Xi Ruoyao 写道: On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: v3 -> v4:   1. Add macro support for TLS symbols   2. Added support for loading __get_tls_addr symbol address using call36.   3. Merge template got_load_tls_{ld/gd/le/ie}.   4. Enable explicit reloc for

Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread chenglulu
在 2024/1/26 下午4:59, chenglulu 写道: 在 2024/1/26 下午4:52, Xi Ruoyao 写道: On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: +(define_insn "@load_tls"     [(set (match_operand:P 0 "register_operand" "=r")   (unspec:P       [(match

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu
在 2024/1/26 下午6:57, Xi Ruoyao 写道: On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: 在 2024/1/26 下午4:49, Xi Ruoyao 写道: On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: v3 -> v4:    1. Add macro support for TLS symbols    2. Added support for loading __get_tls_addr symbol addr

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-24 Thread chenglulu
在 2024/1/24 上午3:36, Xi Ruoyao 写道: On Mon, 2024-01-22 at 15:27 +0800, chenglulu wrote: The failure of this test case was because the compiler believes that two (UNSPEC_PCREL_64_PART2 [(symbol)]) instances would always produce the same result, but this isn't true because the result depends

Re: [PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread chenglulu
在 2024/1/24 下午5:58, Jiahao Xu 写道: 在 2024/1/24 下午5:48, Xi Ruoyao 写道: On Wed, 2024-01-24 at 17:19 +0800, Jiahao Xu wrote: gcc/ChangeLog: * config/loongarch/larchintrin.h (__frecipe_s): Update function return type. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d):

Re: [PATCH] Loongarch: Remove vec_concatz pattern

2024-01-24 Thread chenglulu
Jiahao:  Note that the LoongArch 'a' in the title needs to be capitalized.  I modified this patch and incorporated it first. 在 2024/1/24 下午5:19, Jiahao Xu 写道: It is incorrect to use vld/vori to implement the vec_concatz because when the LSX instruction is used to update the value of the

Re:[pushed] [PATCH] LoongArch: Disable TLS type symbols from generating non-zero offsets.

2024-01-24 Thread chenglulu
Pushed to r14-8412. 在 2024/1/23 上午11:54, Lulu Cheng 写道: TLS gd ld and ie type symbols will generate corresponding GOT entries, so non-zero offsets cannot be generated. The address of TLS le type symbol+addend is not implemented in binutils, so non-zero offset is not generated here for the time

Re: [pushed][PATCH] Loongarch: Remove vec_concatz pattern

2024-01-24 Thread chenglulu
Pushed to r14-8414. 在 2024/1/24 下午5:19, Jiahao Xu 写道: It is incorrect to use vld/vori to implement the vec_concatz because when the LSX instruction is used to update the value of the vector register, the upper 128 bits of the vector register will not be zeroed. gcc/ChangeLog: *

Re: [PATCH] LoongArch: libsanitizer: Enable build lsan and tsan for loongarch64.

2024-02-05 Thread chenglulu
在 2024/2/2 下午6:01, Jakub Jelinek 写道: On Tue, Jan 30, 2024 at 10:09:51AM +0800, Lulu Cheng wrote: From: chenguoqi libsanitizer/ChangeLog: * configure.tgt: Enable tsan and lsan for loongarch64. * tsan/Makefile.am: Add tsan_rtl_loongarch64.S to EXTRA_libtsan_la_SOURCES. This

Re: Pushed: [PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-05 Thread chenglulu
在 2024/2/5 上午1:01, Xi Ruoyao 写道: I have a question. I see that you often add compilation options in BOOT_CFLAGS. I also want to test it. Do you have a recommended set of compilation options? When I build a compiler for my system I use {BOOT_{C,CXX,LD}FLAGS,{C,CXX,LD}FLAGS_FOR_TARGET}="-O3

Re: [PATCH v2] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-15 Thread chenglulu
在 2023/11/15 上午5:52, Xi Ruoyao 写道: This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: + - SC: + But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the

Re: [PATCH v2] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-15 Thread chenglulu
在 2023/11/15 下午7:38, Xi Ruoyao 写道: Pushed r14-5486. /* snip */ * gcc.target/loongarch/cas-acquire.c: New test. This test fails with GCC 12/13 on LA664, and it indicates a correctness issue. May I backport this patch to 12/13 as well? I think we can backport. Thanks!

Re: [pushed ][PATCH v1 0/3] Add LoongarchV1.1 instructions support.

2023-11-18 Thread chenglulu
Pushed to r14-5568. 在 2023/11/17 下午7:09, Xi Ruoyao 写道: On Fri, 2023-11-17 at 16:33 +0800, Lulu Cheng wrote: Lulu Cheng (3):   LoongArch: Add LA664 support.   LoongArch: Implement atomic operations using LoongArch1.1     instructions.   LoongArch: atomic_load and atomic_store are

Re: [PATCH v2 0/6] Add LoongArch v1.1 div32 and ld-seq-sa support

2023-11-18 Thread chenglulu
I have no problem, thanks for fixing the bug 在 2023/11/18 上午4:43, Xi Ruoyao 写道: Superseds https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636795.html. Requires https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636946.html. Changes: - Rebase on top of "Add LoongarchV1.1

Re: [pushed][PATCH v2] LoongArch: Add code generation support for call36 function calls.

2023-11-18 Thread chenglulu
Pushed to r14-5567. 在 2023/11/16 下午3:27, Lulu Cheng 写道: When compiling with '-mcmodel=medium', the function call is made through 'pcaddu18i+jirl' if binutils supports call36, otherwise the native implementation 'pcalau12i+jirl' is used. gcc/ChangeLog: * config.in: Regenerate.

Re:[pushed and backport] [PATCH] LoongArch: Modify MUSL_DYNAMIC_LINKER.

2023-11-19 Thread chenglulu
pushed to r14-5601 backport to r13-8085 and r12-9995. r12 and r13 simultaneously synchronized the patch that changed '/lib64' to '/lib'. 在 2023/11/18 上午11:15, Lulu Cheng 写道: Use no suffix at all in the musl dynamic linker name for hard float ABI. Use -sf and -sp suffixes in musl dynamic

Re: [pushed][PATCH v2] LoongArch: Add code generation support for call36 function calls.

2023-11-19 Thread chenglulu
在 2023/11/20 上午9:51, Xi Ruoyao 写道: On Mon, 2023-11-20 at 09:09 +0800, chenglulu wrote: 在 2023/11/19 上午1:24, Xi Ruoyao 写道: On Sat, 2023-11-18 at 16:16 +0800, chenglulu wrote: Pushed to r14-5567. 在 2023/11/16 下午3:27, Lulu Cheng 写道: When compiling with '-mcmodel=medium', the function call

Re: [pushed][PATCH v2] LoongArch: Add code generation support for call36 function calls.

2023-11-19 Thread chenglulu
在 2023/11/19 上午1:24, Xi Ruoyao 写道: On Sat, 2023-11-18 at 16:16 +0800, chenglulu wrote: Pushed to r14-5567. 在 2023/11/16 下午3:27, Lulu Cheng 写道: When compiling with '-mcmodel=medium', the function call is made through 'pcaddu18i+jirl' if binutils supports call36, otherwise the native

Re: [pushed][PATCH v2] LoongArch: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2023-11-17 Thread chenglulu
Pushed to r14-5547. 在 2023/11/17 上午10:38, Li Wei 写道: The LoongArch has defined ctz and clz on the backend, but if we want GCC do CTZ transformation optimization in forwprop2 pass, GCC need to know the value of c[lt]z at zero, which may be beneficial for some test cases (like spec2017

Re: [PATCH] LoongArch: Handle vectorized copysign (x, -1) expansion efficiently

2023-11-17 Thread chenglulu
LGTM. Thanks. 在 2023/11/14 上午4:07, Xi Ruoyao 写道: With LSX or LASX, copysign (x[i], -1) (or any negative constant) can be vectorized using [x]vbitseti.{w/d} instructions to directly set the signbits. Inspired by Tamar Christina's "AArch64: Handle copysign (x, -1) expansion efficiently"

Re: [PATCH v1 1/3] LoongArch: Add LA664 support.

2023-11-17 Thread chenglulu
在 2023/11/17 下午8:31, Xi Ruoyao 写道: On Fri, 2023-11-17 at 16:33 +0800, Lulu Cheng wrote: Define ISA_BASE_LA64V110, which represents the base instruction set defined in LoongArch1.1. Support the configure setting --with-arch =la664, and support -march=la664,-mtune=la664. gcc/ChangeLog:

Re: [PATCH 0/5] LoongArch: Initial LA664 support

2023-11-16 Thread chenglulu
在 2023/11/17 下午12:55, Xi Ruoyao 写道: On Fri, 2023-11-17 at 10:41 +0800, chenglulu wrote: Hi, Thank you very much for the modification, but I think we need to support la664 with the configuration items of configure. I'll add it. I also defined ISA_BASE_LA64V110 to represent the LoongArch1.1

Re: [PATCH 0/5] LoongArch: Initial LA664 support

2023-11-16 Thread chenglulu
Hi, Thank you very much for the modification, but I think we need to support la664 with the configuration items of configure. I also defined ISA_BASE_LA64V110 to represent the LoongArch1.1 instruction set, what do you think? 在 2023/11/16 下午9:18, Xi Ruoyao 写道: Loongson 3A6000 processor

Re: [pushed][PATCH] LoongArch: Fix scan-assembler-times of lasx/lsx test case.

2023-11-16 Thread chenglulu
Pushed to r14-5544 在 2023/11/16 下午8:31, Jiahao Xu 写道: These tests fail when they are first added,this patch adjusts the scan-assembler-times to fix them. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler times. *

Re: [pushed][PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-16 Thread chenglulu
Pushed to r14-5545. 在 2023/11/16 下午4:44, Jiahao Xu 写道: Based on SPEC2017 performance evaluation results, it's better to make them equal to the cost of unaligned store/load so as to avoid odd alignment peeling. gcc/ChangeLog: * config/loongarch/loongarch.cc

Re: [PATCH v1] LoongArch: Added code generation support for call36 function calls.

2023-11-14 Thread chenglulu
在 2023/11/14 下午5:55, Xi Ruoyao 写道: On Tue, 2023-11-14 at 17:45 +0800, Lulu Cheng wrote: +  /* When function calls are made through call36, t0 register will be +implicitly modified, so '-fno-ipa-ra' needs to be set here.  */    case CMODEL_MEDIUM: + if

Re: Pushed: [PATCH v2] LoongArch: Use finer-grained DBAR hints

2023-11-14 Thread chenglulu
在 2023/11/14 下午4:34, Xi Ruoyao 写道: On Tue, 2023-11-14 at 10:26 +0800, chenglulu wrote: Hi,  * Before calling this template, the function get_memmodel is called to process memmodel, which has a piece of code:    /* Workaround for Bugzilla 59448. GCC doesn't track consume properly

Re: [PATCH] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]

2023-11-14 Thread chenglulu
在 2023/11/14 下午4:50, Xi Ruoyao 写道: Ping. I've tested this with Binutils 2.41 and 2.41.50.202311xx several times so it should be OK. On Mon, 2023-11-06 at 15:50 +0800, Xi Ruoyao wrote: /* snip */ Bootstrapped and regtested on loongarch64-linux-gnu twice: once with Binutils 2.41, another

Re: [PATCH] LoongArch: Use finer-grained DBAR hints

2023-11-13 Thread chenglulu
在 2023/11/14 上午7:18, Xi Ruoyao 写道: /* snip */ (define_insn "mem_thread_fence_1" [(set (match_operand:BLK 0 "" "") (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) (match_operand:SI 1 "const_int_operand" "")] ;; model "" - "dbar\t0") + { +enum memmodel model =

Re: [PATCH v2] LoongArch: Optimize single-used address with -mexplicit-relocs=auto for fld/fst

2023-11-12 Thread chenglulu
在 2023/11/11 下午6:58, Xi Ruoyao 写道: fld and fst have same address mode as ld.w and st.w, so the same optimization as r14-4851 should be applied for them too. gcc/ChangeLog: * config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode iterator. (ST_ANY): New mode

Re: [PATCH] LoongArch: Use simplify_gen_subreg instead of gen_rtx_SUBREG in loongarch_expand_vec_cond_mask_expr [PR112476]

2023-11-12 Thread chenglulu
在 2023/11/12 上午9:00, Xi Ruoyao 写道: GCC internal says: 'subreg's of 'subreg's are not supported. Using 'simplify_gen_subreg' is the recommended way to avoid this problem. Unfortunately loongarch_expand_vec_cond_mask_expr might create nested subreg under certain circumstances,

Re: [PATCH] LoongArch: Optimize LSX vector shuffle on floating-point vector

2023-11-21 Thread chenglulu
在 2023/11/19 下午3:01, Xi Ruoyao 写道: The vec_perm expander was wrongly defined. GCC internal says: Operand 3 is the “selector”. It is an integral mode vector of the same width and number of elements as mode M. With this mistake, the generic code manages to work around and it ends up creating

Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread chenglulu
在 2024/3/7 下午12:05, mengqinggang 写道: Hi, Thanks, this patch is LGTM. I don't have a problem either. Thanks. 在 2024/3/7 上午10:56, Xi Ruoyao 写道: On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote: Hi, Whether to add an option to control the generation of R_LARCH_RELAX, similar to as

Re: [PATCH] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread chenglulu
在 2024/3/18 下午5:34, Xi Ruoyao 写道: We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there is nothing to advance, but that is not the case for (...) functions returning by hidden reference which have one such artificial argument. This is causing

Re: [PATCH] LoongArch: Remove unused and incorrect "sge_" define_insn

2024-03-14 Thread chenglulu
在 2024/3/13 下午9:03, Xi Ruoyao 写道: If this insn is really used, we'll have something like slti $r4,$r0,$r5 in the code. The assembler will reject it because slti wants 2 register operands and 1 immediate operand. But we've not got any bug report for this, indicating this define_insn is

Re:[pushed] [PATCH v2] LoongArch: Remove masking process for operand 3 of xvpermi.q.

2024-03-14 Thread chenglulu
Pushed to r14-9486. 在 2024/3/14 上午9:26, Chenghui Pan 写道: The behavior of non-zero unused bits in xvpermi.q instruction's third operand is undefined on LoongArch, according to our discussion (https://github.com/llvm/llvm-project/pull/83540), we think that keeping original insn operand as

Re: [pushed][PATCH v2 0/3] LoongArch: Cleanup unused/redundant codes.

2024-03-19 Thread chenglulu
Pushed to r14-9562...r14-9564. 在 2024/3/15 上午9:30, Chenghui Pan 写道: Changes from v1: Some correction about ChangeLog format. There's some unused/redundant definitions inside LoongArch target support codes, these patches make a simple cleanup. Regression test passed. Chenghui Pan (3):

Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread chenglulu
在 2024/3/7 下午8:52, Xi Ruoyao 写道: It should be better to extend the expected value before the ll/sc loop (like what LLVM does), instead of repeating the extending in each iteration. Something like: I wanted to do this at first, but it didn't work out. But then I thought about it, and there

Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-03-07 Thread chenglulu
在 2024/3/1 下午5:39, mengqinggang 写道: Thanks, I try to send a new version patch next week. 在 2024/2/29 下午2:08, Xi Ruoyao 写道: On Thu, 2024-02-29 at 09:42 +0800, mengqinggang wrote: Generate la.tls.desc macro instruction for TLS descriptors model. la.tls.desc expand to    pcalau12i $a0,

Re:[pushed] [PATCH v1] LoongArch: testsuite:Fix problems with incorrect results in vector test cases.

2024-03-07 Thread chenglulu
Pushed to r14-9352. 在 2024/3/6 下午4:54, chenxiaolong 写道: In simd_correctness_check.h, the role of the macro ASSERTEQ_64 is to check the result of the passed vector values for the 64-bit data of each array element. It turns out that it uses the abs() function to check only the lower 32 bits of

Re:[pushed] [PATCH] LoongArch: Use /lib instead of /lib64 as the library search path for MUSL.

2024-03-07 Thread chenglulu
Pushed to r14-9351. 在 2024/3/6 上午9:19, Yang Yujie 写道: gcc/ChangeLog: * config.gcc: Add a case for loongarch*-*-linux-musl*. * config/loongarch/linux.h: Disable the multilib-compatible treatment for *musl* targets. * config/loongarch/musl.h: New file. ---

Re: [PATCH] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers

2024-03-06 Thread chenglulu
This test case is so cleverly designed! I have no problem. Thank you! 在 2024/3/5 下午9:00, Xi Ruoyao 写道: Loops on named vector register are not vectorized (see comment 11 of PR113622), so the these test cases have been failing for a while. Rewrite them using check-function-bodies to remove hard

Re: [pushed][PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-08 Thread chenglulu
Pushed to r14-9407. 在 2024/3/7 上午9:12, Lulu Cheng 写道: If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be implemented through "ll.w+sc.w". In the implementation of the instruction sequence, it is necessary to determine whether the two registers are equal. Since

Re: [pushed][PATCH] LoongArch: testsuite: Add compilation options to the regname-fp-s9.c.

2024-03-08 Thread chenglulu
Pushed to r14-9408. 在 2024/3/7 上午9:50, Lulu Cheng 写道: When the value of the macro DEFAULT_CFLAGS is set to '-ansi -pedantic-errors', regname-s9-fp.c will test to fail. To solve this problem, add the compilation option '-Wno-pedantic -std=gnu90' to this test case. gcc/testsuite/ChangeLog:

<    1   2   3   4   >