Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v3)

2024-06-07 Thread Hongtao Liu
d[ \\t] 694 > > > 2024-06-06 Roger Sayle > Hongtao Liu > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_args_builtin): Call > fixup_modeless_constant before testing predicates. Only call > copy_to_mode_reg on memory operands

Re: [Linaro-TCWG-CI] gcc-15-1022-gb05288d1f1e: FAIL: 1 regressions on master-thumb_m0_eabi

2024-06-06 Thread Hongtao Liu via Gcc-regression
Do r15-1050-gfcfce55c85f842ed843cbc4aabe744c6a004dead fix the failure? On Thu, Jun 6, 2024 at 10:06 PM ci_notify--- via Gcc-regression wrote: > > Dear contributor, our automatic CI has detected problems related to your > patch(es). Please find some details below. If you have any questions, >

[gcc r15-1088] Add additional option --param max-completely-peeled-insns=200 for power64*-*-*

2024-06-06 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:b24f2954dbc13d85e9fb62e05a88e9df21e4d4f4 commit r15-1088-gb24f2954dbc13d85e9fb62e05a88e9df21e4d4f4 Author: liuhongt Date: Fri Jun 7 09:29:24 2024 +0800 Add additional option --param max-completely-peeled-insns=200 for power64*-*-* gcc/testsuite/ChangeLog:

Re: [PATCH] [APX] Adjust target-support check [PR 115341]

2024-06-06 Thread Hongtao Liu
On Thu, Jun 6, 2024 at 2:39 PM Hongyu Wang wrote: > > Current target apxf check does not specify sub-features that assembler > supports, so the check with older binutils will fail at assemble stage > for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check > for latest apx

[gcc r15-1050] Refine testcase for power10.

2024-06-05 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:fcfce55c85f842ed843cbc4aabe744c6a004dead commit r15-1050-gfcfce55c85f842ed843cbc4aabe744c6a004dead Author: liuhongt Date: Thu Jun 6 11:27:53 2024 +0800 Refine testcase for power10. For power10, there're extra 3 REG_EQUIV notes with (fix:SI. to avoid the

[gcc r15-1048] Adjust rtx_cost for MEM to enable more simplication

2024-06-05 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:961dd0d635217c703a38c48903981e0d60962546 commit r15-1048-g961dd0d635217c703a38c48903981e0d60962546 Author: liuhongt Date: Fri Apr 19 10:39:53 2024 +0800 Adjust rtx_cost for MEM to enable more simplication For CONST_VECTOR_DUPLICATE_P in constant_pool, it is

[gcc r15-1047] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-06-05 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:7876cde25cbd2f026a0ae488e5263e72f8e9bfa0 commit r15-1047-g7876cde25cbd2f026a0ae488e5263e72f8e9bfa0 Author: liuhongt Date: Fri Apr 19 10:29:34 2024 +0800 Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode. When mask is (1 << (prec - imm)

Re: [V2 PATCH] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-06-05 Thread Hongtao Liu
On Wed, Jun 5, 2024 at 10:44 PM Jeff Law wrote: > > > > On 6/4/24 10:22 PM, liuhongt wrote: > >> Can you add a testcase for this? I don't mind if it's x86 specific and > >> does a bit of asm scanning. > >> > >> Also note that the context for this patch has changed, so it won't > >> automatically

[gcc r15-1022] Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

2024-06-04 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:b05288d1f1e4b632eddf8830b4369d4659f6c2ff commit r15-1022-gb05288d1f1e4b632eddf8830b4369d4659f6c2ff Author: liuhongt Date: Tue May 21 16:57:17 2024 +0800 Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX. According to IEEE standard, for

[gcc r15-1003] Adjust testcase for -march=cascadelake

2024-06-03 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:4d207044195b97ecb27c72a7dc987eb8b86644a0 commit r15-1003-g4d207044195b97ecb27c72a7dc987eb8b86644a0 Author: liuhongt Date: Tue Jun 4 10:13:09 2024 +0800 Adjust testcase for -march=cascadelake gcc/testsuite/ChangeLog: PR target/115299

[gcc r15-984] Add some preference for floating point rtl ifcvt when sse4.1 is not available

2024-06-03 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:ac306de7d5100d3682eae2270995a9abbe19db38 commit r15-984-gac306de7d5100d3682eae2270995a9abbe19db38 Author: liuhongt Date: Fri May 31 14:38:07 2024 +0800 Add some preference for floating point rtl ifcvt when sse4.1 is not available W/o TARGET_SSE4_1, it takes

Re: [PATCH] Add AVX10.1 target_clones support

2024-06-03 Thread Hongtao Liu
On Wed, May 29, 2024 at 11:05 AM Haochen Jiang wrote: > > Hi all, > > Since AVX10 is the first major ISA introduced after AVX-512, we propose > to add target_clones support for it. > > Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since > it is only for priority but not for

Re: [PATCH v3 1/8] [APX NF]: Support APX NF add

2024-06-02 Thread Hongtao Liu
On Wed, May 29, 2024 at 1:11 PM Kong, Lingling wrote: > > Hi, compared with v2, these patches restored the original lea patten position > and addressed hongtao's comment. > > APX NF(no flags) feature implements suppresses the update of status flags > for arithmetic operations. Ok for the patch

Re: [PATCH 3/3] [APX CCMP] Support ccmp for float compare

2024-05-30 Thread Hongtao Liu
On Wed, May 15, 2024 at 4:21 PM Hongyu Wang wrote: > > The ccmp insn itself doesn't support fp compare, but x86 has fp comi > insn that changes EFLAG which can be the scc input to ccmp. Allow > scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD > compare which can not be identified

Re: [PATCH 1/3] [APX CCMP] Support APX CCMP

2024-05-30 Thread Hongtao Liu
On Wed, May 15, 2024 at 4:24 PM Hongyu Wang wrote: > > APX CCMP feature implements conditional compare which executes compare > when EFLAGS matches certain condition. > > CCMP introduces default flags value (dfv), when conditional compare does > not execute, it will directly set the flags

Re: [PATCH] i386: Optimize EQ/NE comparison between avx512 kmask and -1.

2024-05-30 Thread Hongtao Liu
On Tue, May 28, 2024 at 4:00 PM Hu, Lin1 wrote: > > Hi all, > > This patch aims to acheive EQ/NE comparison between avx512 kmask and -1 > by using kxortest with checking CF. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: > >

Re: Question about generating vpmovzxbd instruction without using the interfaces in immintrin.h

2024-05-30 Thread Hongtao Liu via Gcc
On Fri, May 31, 2024 at 10:58 AM Hanke Zhang via Gcc wrote: > > Hi, > I've recently been trying to hand-write code to trigger automatic > vectorization optimizations in GCC on Intel x86 machines (without > using the interfaces in immintrin.h), but I'm running into a problem > where I can't seem

[gcc r15-932] Rename double_u with __double_u to avoid pulluting the namespace.

2024-05-30 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:3a873c0a7bc8183de95a6103b507101a25eed413 commit r15-932-g3a873c0a7bc8183de95a6103b507101a25eed413 Author: liuhongt Date: Thu May 30 14:15:48 2024 +0800 Rename double_u with __double_u to avoid pulluting the namespace. gcc/ChangeLog: *

Re: [PATCH 3/3 v2] vect: support direct conversion under x86-64-v3.

2024-05-30 Thread Hongtao Liu
On Wed, May 29, 2024 at 5:00 PM Hu, Lin1 wrote: > > According to hongtao's suggestion, I support some trunc in mmx.md under > x86-64-v3, and optimize ix86_expand_trunc_with_avx2_noavx512f. Ok. > > BRs, > Lin > > gcc/ChangeLog: > > PR 107432 > * config/i386/i386-expand.cc

[gcc r15-920] Support vcond_mask_qiqi and friends.

2024-05-30 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:b6c6d5abf0d31c936f50f8f9073c5e335b9e24b7 commit r15-920-gb6c6d5abf0d31c936f50f8f9073c5e335b9e24b7 Author: liuhongt Date: Wed Feb 28 11:17:10 2024 +0800 Support vcond_mask_qiqi and friends. gcc/ChangeLog: * config/i386/sse.md (vcond_mask_):

[gcc r15-919] Don't reduce estimated unrolled size for innermost loop.

2024-05-29 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:ef27b91b62c3aa8841c02665dffa8914c742fd37 commit r15-919-gef27b91b62c3aa8841c02665dffa8914c742fd37 Author: liuhongt Date: Tue Feb 27 15:34:57 2024 +0800 Don't reduce estimated unrolled size for innermost loop. For the innermost loop, after completely loop

Re: [PATCH 2/3 v2] vect: Support v4hi -> v4qi.

2024-05-29 Thread Hongtao Liu
On Wed, May 29, 2024 at 4:56 PM Hu, Lin1 wrote: > > Exclude add TARGET_MMX_WITH_SSE, I merge two patterns. Ok. > > BRs, > Lin > > gcc/ChangeLog: > > PR target/107432 > * config/i386/mmx.md > (VI2_32_64): New mode iterator. > (mmxhalfmode): New mode atter. > (mmxhalfmodelower):

Re: [PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-29 Thread Hongtao Liu
On Thu, May 16, 2024 at 5:15 PM Hongyu Wang wrote: > > Richard Biener 于2024年5月16日周四 15:05写道: > > > > > On Thu, May 16, 2024 at 8:25 AM Hongyu Wang wrote: > > > > > > Hi, > > > > > > In ix86_override_options_after_change, calls to ix86_default_align > > > and ix86_recompute_optlev_based_flags

[gcc r15-882] Reduce cost of MEM (A + imm).

2024-05-28 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:1d6199e5f8c1c08083eeb0279f71333234fe14ad commit r15-882-g1d6199e5f8c1c08083eeb0279f71333234fe14ad Author: liuhongt Date: Mon Feb 19 13:57:24 2024 +0800 Reduce cost of MEM (A + imm). For MEM, rtx_cost iterates each subrtx, and adds up the costs, so for

[gcc r15-857] Fix predicate mismatch between vfcmaddcph's define_insn and define_expand.

2024-05-27 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:c65002347e595cda8b15e59e734d209283faf2b6 commit r15-857-gc65002347e595cda8b15e59e734d209283faf2b6 Author: liuhongt Date: Tue May 28 10:32:12 2024 +0800 Fix predicate mismatch between vfcmaddcph's define_insn and define_expand. When I applied Roger's patch

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-27 Thread Hongtao Liu
On Mon, May 27, 2024 at 2:48 PM Hongtao Liu wrote: > > On Sat, May 18, 2024 at 4:10 AM Roger Sayle > wrote: > > > > > > Hi Hongtao, > > Many thanks for the review, bug fixes and suggestions for improvements. > > This revised version of the pa

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-27 Thread Hongtao Liu
On Sat, May 18, 2024 at 4:10 AM Roger Sayle wrote: > > > Hi Hongtao, > Many thanks for the review, bug fixes and suggestions for improvements. > This revised version of the patch, implements all of your corrections. In > theory > the "ternlog idx" should guarantee that some operands are

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-27 Thread Hongtao Liu
On Tue, May 21, 2024 at 5:46 AM Alexander Monakov wrote: > > > Hello! > > I looked at ternlog a bit last year, so I'd like to offer some drive-by > comments. If you want to tackle them in a follow-up patch, or leave for > someone else to handle, please let me know. > > On Fri, 17 May 2024, Roger

Re: [PATCH 2/3] vect: Support v4hi -> v4qi.

2024-05-26 Thread Hongtao Liu
On Thu, May 23, 2024 at 2:38 PM Hu, Lin1 wrote: > > gcc/ChangeLog: > > PR target/107432 > * config/i386/mmx.md (truncv4hiv4qi2): New define_insn. > > gcc/testsuite/ChangeLog: > > PR target/107432 > * gcc.target/i386/pr107432-6.c: Add test. > --- > gcc/config/i386/mmx.md

Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-26 Thread Hongtao Liu
On Mon, May 20, 2024 at 11:15 AM Hongtao Liu wrote: > > On Wed, May 15, 2024 at 11:30 AM Jiang, Haochen > wrote: > > > > Also cc Honza and Richard since we touched generic tune. > > > > Thx, > > Haochen > > > > > -Original Message-

[PATCH] x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw_por [PR115146]

2024-05-26 Thread Hongtao Liu
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652231.html Ok for this. -- BR, Hongtao

[gcc r15-814] Fix typo in the testcase.

2024-05-24 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:51f4b47c4f4f61fe31a7bd1fa80e08c2438d76a8 commit r15-814-g51f4b47c4f4f61fe31a7bd1fa80e08c2438d76a8 Author: liuhongt Date: Fri May 24 09:49:08 2024 +0800 Fix typo in the testcase. gcc/testsuite/ChangeLog: PR target/114148 *

Re: [PATCH 1/2] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-05-23 Thread Hongtao Liu
CC for review. On Tue, May 21, 2024 at 1:12 PM liuhongt wrote: > > When mask is (1 << (prec - imm) - 1) which is used to clear upper bits > of A, then it can be simplified to LSHIFTRT. > > i.e Simplify > (and:v8hi > (ashifrt:v8hi A 8) > (const_vector 0xff x8)) > to > (lshifrt:v8hi A 8) > >

Re: [PATCH 3/3] vect: support direct conversion under x86-64-v3.

2024-05-23 Thread Hongtao Liu
On Thu, May 23, 2024 at 3:17 PM Hu, Lin1 wrote: > > > -Original Message- > > From: Hongtao Liu > > Sent: Thursday, May 23, 2024 2:42 PM > > To: Hu, Lin1 > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; > > ubiz...@gmail.com; rguent...@suse.de >

Re: [PATCH 3/3] vect: support direct conversion under x86-64-v3.

2024-05-23 Thread Hongtao Liu
On Thu, May 23, 2024 at 2:38 PM Hu, Lin1 wrote: > > gcc/ChangeLog: > > PR 107432 > * config/i386/i386-expand.cc (ix86_expand_trunc_with_avx2_noavx512f): > New function for generate a series of suitable insn. > * config/i386/i386-protos.h

Re: [PATCH] Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

2024-05-22 Thread Hongtao Liu
On Wed, May 22, 2024 at 3:59 PM Jakub Jelinek wrote: > > On Wed, May 22, 2024 at 09:46:41AM +0200, Richard Biener wrote: > > On Wed, May 22, 2024 at 3:58 AM liuhongt wrote: > > > > > > According to IEEE standard, for conversions from floating point to > > > integer. When a NaN or infinite

Re: [V2 PATCH] Don't reduce estimated unrolled size for innermost loop at cunrolli.

2024-05-22 Thread Hongtao Liu
On Wed, May 22, 2024 at 1:07 PM liuhongt wrote: > > >> Hard to find a default value satisfying all testcases. > >> some require loop unroll with 7 insns increment, some don't want loop > >> unroll w/ 5 insn increment. > >> The original 2/3 reduction happened to meet all those testcases(or the >

Re: [PATCH v2] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Hongtao Liu
On Tue, May 21, 2024 at 3:14 PM Haochen Jiang wrote: > > Hi all, > > This is the v2 patch to fix PR115069. The new testcase has passed. > > Changes in v2: > - Added a testcase. > - Change the comment for the early exit. > > Thx, > Haochen > > Since vpermq is really slow, we should avoid using

Re: [PATCH] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Hongtao Liu
On Tue, May 21, 2024 at 2:16 PM Haochen Jiang wrote: > > Hi all, > > Since vpermq is really slow, we should avoid using it when it is > the only instruction could be used for ix86_expand_vecop_qihi2. > > Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? Please add a testcase for

[gcc r15-717] Use pblendw instead of pand to clear upper 16 bits.

2024-05-20 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:0ebaffccb294d90184ad78367de66b6307de3ac0 commit r15-717-g0ebaffccb294d90184ad78367de66b6307de3ac0 Author: liuhongt Date: Fri Mar 22 14:40:00 2024 +0800 Use pblendw instead of pand to clear upper 16 bits. For vec_pack_truncv8si/v4si w/o AVX512,

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-20 Thread Hongtao Liu
On Wed, May 15, 2024 at 5:24 PM Richard Biener wrote: > > On Wed, May 15, 2024 at 4:15 AM Hongtao Liu wrote: > > > > On Mon, May 13, 2024 at 3:40 PM Richard Biener > > wrote: > > > > > > On Mon, May 13, 2024 at 4:29 AM liuhongt wrote: > > &g

Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-19 Thread Hongtao Liu
On Wed, May 15, 2024 at 11:30 AM Jiang, Haochen wrote: > > Also cc Honza and Richard since we touched generic tune. > > Thx, > Haochen > > > -Original Message- > > From: Haochen Jiang > > Sent: Wednesday, May 15, 2024 11:04 AM > > To: gcc-patch

Re: [PATCH] i386: Rename sat_plusminus expanders to standard names [PR11260]

2024-05-19 Thread Hongtao Liu
On Fri, May 17, 2024 at 3:55 PM Uros Bizjak wrote: > > Rename _3 expander to a standard ssadd, > usadd, sssub and ussub name to enable corresponding optab expansion. > > Also add named expander for MMX modes. LGTM. > > PR middle-end/112600 > > gcc/ChangeLog: > > * config/i386/mmx.md (3):

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Hongtao Liu
> > > Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and > 2-way dot_prod operations can be generated > This is the link https://godbolt.org/z/hcWr64vx3, x86 define udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod instructions are generated -- BR, Hongtao

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Hongtao Liu
On Thu, May 16, 2024 at 10:40 PM Victor Do Nascimento wrote: > > From: Victor Do Nascimento > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > optabs for dealing with vectorizable dot product code sequences. The > consequence of using a direct optab for this is that

[gcc r15-530] Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial.

2024-05-15 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:090714e6cf8029f4ff8883dce687200024adbaeb commit r15-530-g090714e6cf8029f4ff8883dce687200024adbaeb Author: liuhongt Date: Wed May 15 10:56:24 2024 +0800 Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial. pshufb is available

[gcc r15-529] Optimize ashift >> 7 to vpcmpgtb for vector int8.

2024-05-15 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:0cc0956b3bb8bcbc9196075b9073a227d799e042 commit r15-529-g0cc0956b3bb8bcbc9196075b9073a227d799e042 Author: liuhongt Date: Tue May 14 18:39:54 2024 +0800 Optimize ashift >> 7 to vpcmpgtb for vector int8. Since there is no corresponding instruction, the shift

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-15 Thread Hongtao Liu
C -std=gnu++14 LP64 note (test for > > > > g++warnings, line 56) > > > > g++: g++.dg/warn/Warray-bounds-20.C -std=gnu++14 note (test for > > > > g++warnings, line 66) > > > > g++: g++.dg/warn/Warray-bounds-20.C -std=gnu++17 LP64 note (test for > > > > g++warnings, line 56) > > > > g++:

[gcc r15-499] x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-14 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:a71f90c5a7ae2942083921033cb23dcd63e70525 commit r15-499-ga71f90c5a7ae2942083921033cb23dcd63e70525 Author: Levy Hsu Date: Thu May 9 16:50:56 2024 +0800 x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563] Hi All

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-14 Thread Hongtao Liu
On Mon, May 13, 2024 at 3:40 PM Richard Biener wrote: > > On Mon, May 13, 2024 at 4:29 AM liuhongt wrote: > > > > As testcase in the PR, O3 cunrolli may prevent vectorization for the > > innermost loop and increase register pressure. > > The patch removes the 1/3 reduction of unr_insn for

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md

2024-05-14 Thread Hongtao Liu
On Mon, May 13, 2024 at 5:57 AM Roger Sayle wrote: > > > This patch improves the way that the x86 backend recognizes and > expands AVX512's bitwise ternary logic (vpternlog) instructions. I like the patch. 1 file changed, 25 insertions(+), 1 deletion(-) gcc/config/i386/i386-expand.cc | 26

Re: [PATCH v2 3/3] pretty-print: Don't translate escape sequences to windows console API

2024-05-12 Thread LIU Hao
s, translate + * them as needed. + */ nitpicking: This should probably be + * them as needed. */ CC'ing Jonathan Yong. This series of patches look good to me. -- Best regards, LIU Hao OpenPGP_signature.asc Description: OpenPGP digital signature

Re: [x86 PATCH] Improve V[48]QI shifts on AVX512

2024-05-10 Thread Hongtao Liu
ld also fix this mem operand > issue. I hope to submit it for review this weekend. I opened a PR for that. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 > > Thanks again, > Roger > > > From: Hongtao Liu > > On Fri, May 10, 2024 at 6:26 AM Roger Sayle > > w

Re: [x86 PATCH] Improve V[48]QI shifts on AVX512

2024-05-09 Thread Hongtao Liu
On Fri, May 10, 2024 at 6:26 AM Roger Sayle wrote: > > > The following one line patch improves the code generated for V8QI and V4QI > shifts when AV512BW and AVX512VL functionality is available. + /* With AVX512 its cheaper to do vpmovsxbw/op/vpmovwb. */ + && !(TARGET_AVX512BW &&

Re: [PATCH 1/3] diagnostics: Enable escape sequence processing on windows consoles

2024-05-09 Thread LIU Hao
ode()` be called only when `handle` is valid? I think you may initialize `isconsole` to `false`; then only if the handle is valid, should it be set accordingly; and this function just returns `isconsole`. The other two patches look good to me. -- Best regards, LIU Hao OpenPGP_signature

Re: [PATCH] i386: Fix some intrinsics without alignment requirements.

2024-05-08 Thread Hongtao Liu
On Wed, May 8, 2024 at 10:13 AM Hu, Lin1 wrote: > > Hi all, > > This patch aims to fix some intrinsics without alignment requirement, but > raised runtime error's problem. > > Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: > > PR

Re: [PATCH] i386: fix ix86_hardreg_mov_ok with lra_in_progress

2024-05-07 Thread Hongtao Liu
On Mon, May 6, 2024 at 3:40 PM Kong, Lingling wrote: > > Hi, > Originally eliminate_regs_in_insn will transform > (parallel [ > (set (reg:QI 130) > (plus:QI (subreg:QI (reg:DI 19 frame) 0) > (const_int 96))) > (clobber (reg:CC 17 flag))]) {*addqi_1} > to > (set (reg:QI 130) >

[gcc r15-234] Optimize 64-bit vector permutation with punpcklqdq + 128-bit vector pshuf.

2024-05-07 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:a9f642783853b60bb0a59562b8ab3ed10ec01641 commit r15-234-ga9f642783853b60bb0a59562b8ab3ed10ec01641 Author: liuhongt Date: Wed Dec 20 11:54:43 2023 +0800 Optimize 64-bit vector permutation with punpcklqdq + 128-bit vector pshuf. gcc/ChangeLog:

[gcc r15-236] Extend usdot_prodv*qi with vpmaddwd when AVXVNNI/AVX512VNNI is not available.

2024-05-07 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:8b974f54393ab2d2d16a0051a68c155455a92aad commit r15-236-g8b974f54393ab2d2d16a0051a68c155455a92aad Author: liuhongt Date: Mon Jan 8 15:13:41 2024 +0800 Extend usdot_prodv*qi with vpmaddwd when AVXVNNI/AVX512VNNI is not available. gcc/ChangeLog:

[gcc r15-235] Support dot_prod optabs for 64-bit vector.

2024-05-07 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:fa911365490a7ca308878517a4af6189ffba7ed6 commit r15-235-gfa911365490a7ca308878517a4af6189ffba7ed6 Author: liuhongt Date: Wed Dec 20 11:43:25 2023 +0800 Support dot_prod optabs for 64-bit vector. gcc/ChangeLog: PR target/113079 *

Re: [PATCH] x86: Fix cmov cost model issue [PR109549]

2024-05-05 Thread Hongtao Liu
CC uros. On Mon, May 6, 2024 at 11:03 AM Kong, Lingling wrote: > > Hi, > (if_then_else:SI (eq (reg:CCZ 17 flags) > (const_int 0 [0])) > (reg/v:SI 101 [ e ]) > (reg:SI 102)) > The cost is 8 for the rtx, the cost for > (eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4, but this is

[gcc r15-167] Update libbid according to the latest Intel Decimal Floating-Point Math Library.

2024-05-05 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:affd77d3fe7bfb525b3fb23316d164e847ed02d1 commit r15-167-gaffd77d3fe7bfb525b3fb23316d164e847ed02d1 Author: liuhongt Date: Wed Mar 27 08:20:13 2024 +0800 Update libbid according to the latest Intel Decimal Floating-Point Math Library. The Intel Decimal

Re: [PATCH] Don't assert for IFN_COND_{MIN, MAX} in vect_transform_reduction

2024-04-30 Thread Hongtao Liu
On Tue, Apr 30, 2024 at 3:38 PM Jakub Jelinek wrote: > > On Tue, Apr 30, 2024 at 09:30:00AM +0200, Richard Biener wrote: > > On Mon, Apr 29, 2024 at 5:30 PM H.J. Lu wrote: > > > > > > On Mon, Apr 29, 2024 at 6:47 AM liuhongt wrote: > > > > > > > > The Fortran standard does not specify what the

[gcc r15-22] Adjust alternative *k to ?k for avx512 mask in zero_extend patterns

2024-04-28 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:c19a674d03847b900919b97d0957c8ae5164f8f1 commit r15-22-gc19a674d03847b900919b97d0957c8ae5164f8f1 Author: liuhongt Date: Tue Apr 16 08:37:22 2024 +0800 Adjust alternative *k to ?k for avx512 mask in zero_extend patterns So when both source operand and dest

Re: [PATCH] Fix rust on *-w64-mingw32

2024-04-27 Thread LIU Hao via Gcc
Attached is an alternative patch to functionalize `load_macros_array`. It allows GCC to build on x86_64-w64-mingw32. Not tested though, as I know no Rust. As before, please edit the patch at your disposal. -- Best regards, LIU Hao diff --git a/gcc/rust/checks/errors/borrowck/rust-borrow

Re: [PATCH] Fix rust on *-w64-mingw32

2024-04-27 Thread LIU Hao
Attached is an alternative patch to functionalize `load_macros_array`. It allows GCC to build on x86_64-w64-mingw32. Not tested though, as I know no Rust. As before, please edit the patch at your disposal. -- Best regards, LIU Hao diff --git a/gcc/rust/checks/errors/borrowck/rust-borrow

Re: [PATCH] Fix rust on *-w64-mingw32

2024-04-26 Thread LIU Hao via Gcc
Rust. -- Best regards, LIU Hao OpenPGP_signature.asc Description: OpenPGP digital signature

[PATCH] Fix rust on *-w64-mingw32

2024-04-25 Thread LIU Hao via Gcc
Hello, Attached is a patch for fixing build issues on *-w64-mingw32. Please check and update at your leisure. 'gcc/system.h' contains a macro called `mkdir()` and there is no need to invoke `_mkdir()` within a conditional block. -- Best regards, LIU Hao diff --git a/gcc/rust/checks/errors

Re: [PATCH] i386: Fix behavior for both using AVX10.1-256 in options and function attribute

2024-04-24 Thread Hongtao Liu
On Wed, Apr 24, 2024 at 1:46 PM Haochen Jiang wrote: > > Hi all, > > When we are using -mavx10.1-256 in command line and avx10.1-256 in > target attribute together, zmm should never be generated. But current > GCC will generate zmm since it wrongly enables EVEX512 for non-explicitly > set AVX512.

Re: [PATCH] x86: Allow TImode offsettable memory only with 8-bit constant

2024-04-14 Thread Hongtao Liu
On Sat, Apr 13, 2024 at 6:42 AM H.J. Lu wrote: > > The x86 instruction size limit is 15 bytes. If a NDD instruction has > a segment prefix byte, a 4-byte opcode prefix, a MODRM byte, a SIB byte, > a 4-byte displacement and a 4-byte immediate, adding an address size > prefix will exceed the size

RE: [PATCH] asan, v3: Fix up handling of > 32 byte aligned variables with -fsanitize=address -fstack-protector* [PR110027]

2024-04-11 Thread Liu, Hongtao
> -Original Message- > From: Jakub Jelinek > Sent: Thursday, April 11, 2024 4:39 PM > To: Richard Biener ; Jeff Law ; > Liu, Hongtao > Cc: gcc-patches@gcc.gnu.org > Subject: [PATCH] asan, v3: Fix up handling of > 32 byte aligned variables > with -

Re: [PATCH] Prohibit SHA/KEYLOCKER usage of EGPR when APX enabled

2024-04-09 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 3:05 PM Hongyu Wang wrote: > > The latest APX spec announced removal of SHA/KEYLOCKER evex promotion [1], > which means the SHA/KEYLOCKER insn does not support EGPR when APX > enabled. Update the corresponding constraints to their EGPR-disabled > counterparts. > >

Re: [PATCH] i386, v2: Fix aes/vaes patterns [PR114576]

2024-04-09 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 5:18 PM Jakub Jelinek wrote: > > On Tue, Apr 09, 2024 at 11:23:40AM +0800, Hongtao Liu wrote: > > I think we can merge alternative 2 with 3 to > > * return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" : > > \&q

Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]

2024-04-08 Thread Hongtao Liu
On Thu, Apr 4, 2024 at 4:42 PM Jakub Jelinek wrote: > > On Wed, Apr 19, 2023 at 02:40:59AM +, Jiang, Haochen via Gcc-patches > wrote: > > > > (define_insn "aesenc" > > > > - [(set (match_operand:V2DI 0 "register_operand" "=x,x") > > > > - (unspec:V2DI [(match_operand:V2DI 1

Re: [PATCH v2] x86: Define __APX_INLINE_ASM_USE_GPR32__

2024-04-08 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 9:58 AM H.J. Lu wrote: > > Define __APX_INLINE_ASM_USE_GPR32__ for -mapx-inline-asm-use-gpr32. > When __APX_INLINE_ASM_USE_GPR32__ is defined, inline asm statements > should contain only instructions compatible with r16-r31. Ok. > > gcc/ > > PR target/114587 >

Re: [PATCH] x86: Define macros for APX options

2024-04-08 Thread Hongtao Liu
On Mon, Apr 8, 2024 at 11:44 PM H.J. Lu wrote: > > Define following macros for APX options: > > 1. __APX_EGPR__: -mapx-features=egpr. > 2. __APX_PUSH2POP2__: -mapx-features=push2pop2. > 3. __APX_NDD__: -mapx-features=ndd. > 4. __APX_PPX__: -mapx-features=ppx. For -mapx-features=, we haven't

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Hongtao Liu
On Tue, Mar 26, 2024 at 11:26 AM Hongtao Liu wrote: > > On Mon, Mar 25, 2024 at 8:51 PM Jakub Jelinek wrote: > > > > On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote: > > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > > > alig

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Hongtao Liu
On Mon, Mar 25, 2024 at 8:51 PM Jakub Jelinek wrote: > > On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote: > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > > alignb. (base_align_bias - base_offset) may not aligned to alignb, and > > caused segement fault. > > > >

[gcc r13-8488] Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute.

2024-03-21 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:e6a3d1f5bcfd954b614155d96c97bde8ac230e2e commit r13-8488-ge6a3d1f5bcfd954b614155d96c97bde8ac230e2e Author: liuhongt Date: Fri Mar 22 10:09:43 2024 +0800 Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute. Also fixed a typo in the testcase.

[gcc r14-9603] Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute.

2024-03-21 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:9a6c7aa1b011b77fcd9b19f7b8d7ff0fc823cdb2 commit r14-9603-g9a6c7aa1b011b77fcd9b19f7b8d7ff0fc823cdb2 Author: liuhongt Date: Fri Mar 22 10:09:43 2024 +0800 Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute. Also fixed a typo in the testcase.

[gcc r13-8475] Fix runtime error for nonlinear iv vectorization(step_mult).

2024-03-21 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:199b021a38f30b681e0dbecd2d0296beabd50b13 commit r13-8475-g199b021a38f30b681e0dbecd2d0296beabd50b13 Author: liuhongt Date: Thu Mar 21 13:15:23 2024 +0800 Fix runtime error for nonlinear iv vectorization(step_mult). wi::from_mpz doesn't take a sign argument,

[gcc r14-9591] Fix runtime error for nonlinear iv vectorization(step_mult).

2024-03-21 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:ac2f8c2a367151fc0410f904339c475a953cffc8 commit r14-9591-gac2f8c2a367151fc0410f904339c475a953cffc8 Author: liuhongt Date: Thu Mar 21 13:15:23 2024 +0800 Fix runtime error for nonlinear iv vectorization(step_mult). wi::from_mpz doesn't take a sign argument,

Re: Builtin for consulting value analysis (better ffs() code gen)

2024-03-21 Thread LIU Hao via Gcc
nch below clearly eliminates the dependency.     }     else     {         // The architects say this is safe even for 0.         res = -1;     asm("bsf %1, %0" : "+r"(res) : "rm"(x));     }     return res + 1; } -- Best regards, LIU Hao OpenPGP_signature.asc Description: OpenPGP digital signature

[gcc r14-9588] Document -fexcess-precision=16.

2024-03-20 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:415091f09096a0ebba1fdcd4af8c2fda24cfd411 commit r14-9588-g415091f09096a0ebba1fdcd4af8c2fda24cfd411 Author: liuhongt Date: Mon Mar 18 18:53:59 2024 +0800 Document -fexcess-precision=16. gcc/ChangeLog: PR middle-end/114347 *

Re: [PATCH] testsuite: add the case to cover vectorization of A[(i+x)*stride] [PR114322]

2024-03-20 Thread Hao Liu OS
> So - OK with using { target vect_int } instead. Sure, it's much better to be target independent. Refactored and committed in r14-9569-g4c276896 Thanks, - Hao From: Richard Biener Sent: Wednesday, March 20, 2024 16:21 To: Hao Liu OS Cc: GCC-patc

[gcc r14-9569] testsuite: add the case to cover the vectorization of A[(i+x)*stride] [PR114322]

2024-03-20 Thread Hao Liu via Gcc-cvs
https://gcc.gnu.org/g:4c276896d646c2dbc8047fd81d6e65f8c5ecf01d commit r14-9569-g4c276896d646c2dbc8047fd81d6e65f8c5ecf01d Author: Hao Liu Date: Wed Mar 20 17:37:01 2024 +0800 testsuite: add the case to cover the vectorization of A[(i+x)*stride] [PR114322] This issues has been

[PATCH] testsuite: add the case to cover vectorization of A[(i+x)*stride] [PR114322]

2024-03-20 Thread Hao Liu OS
Hi Richard, As mentioned in the comments of PR114322 (which has been fixed by PR114151 r14-9540-ge0e9499a), this patch is to cover the case. Bootstrapped and regression tested on aarch64-linux-gnu, OK for trunk? gcc/testsuite/ChangeLog: PR tree-optimization/114322 *

Re: [PATCH] Document -fexcess-precision=16.

2024-03-18 Thread Hongtao Liu
On Tue, Mar 19, 2024 at 12:16 AM Joseph Myers wrote: > > On Mon, 18 Mar 2024, liuhongt wrote: > > > +If @option{-fexcess-precision=16} is specified, casts and assignments of > > +@code{_Float16} and @code{bfloat16_t} cause value to be rounded to their > > +semantic types if they're supported by

Re: [PATCH] i386 [stv]: Handle REG_EH_REGION note [pr111822].

2024-03-18 Thread Hongtao Liu
On Mon, Mar 18, 2024 at 6:59 PM Uros Bizjak wrote: > > On Mon, Mar 18, 2024 at 11:52 AM liuhongt wrote: > > > > Commit r14-9459-g618e34d56cc38e only handles > > general_scalar_chain::convert_op. The patch also handles > > timode_scalar_chain::convert_op to avoid potential similar bug. > > > >

[gcc r14-9512] Add missing hf/bf patterns.

2024-03-17 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:942d470a5a4fb1baeff943127a81b441dffaa543 commit r14-9512-g942d470a5a4fb1baeff943127a81b441dffaa543 Author: liuhongt Date: Fri Mar 15 10:59:10 2024 +0800 Add missing hf/bf patterns. It will be used by copysignm3/xorsignm3/lroundmn2 expanders.

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: > > Don't enable excess lanes when inverting vector bit-masks smaller than the > integer mode. This is yet another case of wrong-code due to mishandling > of oversized bitmasks. > > This issue shows up in vect/tsvc/vect-tsvc-s278.c and >

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 10:46 PM Uros Bizjak wrote: > > On Thu, Mar 14, 2024 at 8:42 AM Uros Bizjak wrote: > > > > On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu wrote: > > > > > > On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak wrote: > > > > > &g

[gcc r12-10214] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:a861f940efffae2782c559cd04df2d2740cd28bd commit r12-10214-ga861f940efffae2782c559cd04df2d2740cd28bd Author: liuhongt Date: Wed Mar 13 10:40:01 2024 +0800 i386[stv]: Handle REG_EH_REGION note When we split (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])

[gcc r13-8438] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:bdbcfbfcf591381f0faf95c881e3772b56d0a404 commit r13-8438-gbdbcfbfcf591381f0faf95c881e3772b56d0a404 Author: liuhongt Date: Wed Mar 13 10:40:01 2024 +0800 i386[stv]: Handle REG_EH_REGION note When we split (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])

[gcc r14-9459] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:618e34d56cc38e9c3ae95a413228068e53ed76bb commit r14-9459-g618e34d56cc38e9c3ae95a413228068e53ed76bb Author: liuhongt Date: Wed Mar 13 10:40:01 2024 +0800 i386[stv]: Handle REG_EH_REGION note When we split (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak wrote: > > On Thu, Mar 14, 2024 at 2:33 AM liuhongt wrote: > > > > When we split > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ]) > > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct > > SQRefCounted

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-12 Thread Hongtao Liu
On Tue, Mar 12, 2024 at 8:00 PM liuhongt wrote: > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > alignb. (base_align_bias - base_offset) may not aligned to alignb, and > caused segement fault. > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk and

Re: [PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-06 Thread LIU Hao
May I suggest you keep the mcf thread model for aarch-w64-mingw32? I requested Martin Storsjö to test it on a physical Windows 11 on ARM machine with Clang and all tests passed. I think it should work once the GCC support is complete. -- Best regards, LIU Hao OpenPGP_signature.asc

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-03-04 Thread Hongtao Liu
On Thu, Feb 29, 2024 at 2:20 PM Hongtao Liu wrote: > > On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek wrote: > > > > Hi! > > > > Adding Hongtao and Honza into the loop as the ones who acked the original > > patch. > > > > The no_callee_saved_regist

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-28 Thread Hongtao Liu
On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek wrote: > > Hi! > > Adding Hongtao and Honza into the loop as the ones who acked the original > patch. > > The no_callee_saved_registers by default for noreturn functions change can > break in-process backtrace(3) or backtraces from debugger or other

  1   2   3   4   5   6   7   8   9   10   >