Re: [PATCH] Don't assert for IFN_COND_{MIN, MAX} in vect_transform_reduction

2024-04-30 Thread Hongtao Liu
On Tue, Apr 30, 2024 at 3:38 PM Jakub Jelinek wrote: > > On Tue, Apr 30, 2024 at 09:30:00AM +0200, Richard Biener wrote: > > On Mon, Apr 29, 2024 at 5:30 PM H.J. Lu wrote: > > > > > > On Mon, Apr 29, 2024 at 6:47 AM liuhongt wrote: > > > > > > > > The Fortran standard does not specify what the

Re: [PATCH] i386: Fix behavior for both using AVX10.1-256 in options and function attribute

2024-04-24 Thread Hongtao Liu
On Wed, Apr 24, 2024 at 1:46 PM Haochen Jiang wrote: > > Hi all, > > When we are using -mavx10.1-256 in command line and avx10.1-256 in > target attribute together, zmm should never be generated. But current > GCC will generate zmm since it wrongly enables EVEX512 for non-explicitly > set AVX512.

Re: [PATCH] x86: Allow TImode offsettable memory only with 8-bit constant

2024-04-14 Thread Hongtao Liu
On Sat, Apr 13, 2024 at 6:42 AM H.J. Lu wrote: > > The x86 instruction size limit is 15 bytes. If a NDD instruction has > a segment prefix byte, a 4-byte opcode prefix, a MODRM byte, a SIB byte, > a 4-byte displacement and a 4-byte immediate, adding an address size > prefix will exceed the size

Re: [PATCH] Prohibit SHA/KEYLOCKER usage of EGPR when APX enabled

2024-04-09 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 3:05 PM Hongyu Wang wrote: > > The latest APX spec announced removal of SHA/KEYLOCKER evex promotion [1], > which means the SHA/KEYLOCKER insn does not support EGPR when APX > enabled. Update the corresponding constraints to their EGPR-disabled > counterparts. > >

Re: [PATCH] i386, v2: Fix aes/vaes patterns [PR114576]

2024-04-09 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 5:18 PM Jakub Jelinek wrote: > > On Tue, Apr 09, 2024 at 11:23:40AM +0800, Hongtao Liu wrote: > > I think we can merge alternative 2 with 3 to > > * return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" : > > \&q

Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]

2024-04-08 Thread Hongtao Liu
On Thu, Apr 4, 2024 at 4:42 PM Jakub Jelinek wrote: > > On Wed, Apr 19, 2023 at 02:40:59AM +, Jiang, Haochen via Gcc-patches > wrote: > > > > (define_insn "aesenc" > > > > - [(set (match_operand:V2DI 0 "register_operand" "=x,x") > > > > - (unspec:V2DI [(match_operand:V2DI 1

Re: [PATCH v2] x86: Define __APX_INLINE_ASM_USE_GPR32__

2024-04-08 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 9:58 AM H.J. Lu wrote: > > Define __APX_INLINE_ASM_USE_GPR32__ for -mapx-inline-asm-use-gpr32. > When __APX_INLINE_ASM_USE_GPR32__ is defined, inline asm statements > should contain only instructions compatible with r16-r31. Ok. > > gcc/ > > PR target/114587 >

Re: [PATCH] x86: Define macros for APX options

2024-04-08 Thread Hongtao Liu
On Mon, Apr 8, 2024 at 11:44 PM H.J. Lu wrote: > > Define following macros for APX options: > > 1. __APX_EGPR__: -mapx-features=egpr. > 2. __APX_PUSH2POP2__: -mapx-features=push2pop2. > 3. __APX_NDD__: -mapx-features=ndd. > 4. __APX_PPX__: -mapx-features=ppx. For -mapx-features=, we haven't

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Hongtao Liu
On Tue, Mar 26, 2024 at 11:26 AM Hongtao Liu wrote: > > On Mon, Mar 25, 2024 at 8:51 PM Jakub Jelinek wrote: > > > > On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote: > > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > > > alig

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Hongtao Liu
On Mon, Mar 25, 2024 at 8:51 PM Jakub Jelinek wrote: > > On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote: > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > > alignb. (base_align_bias - base_offset) may not aligned to alignb, and > > caused segement fault. > > > >

Re: [PATCH] Document -fexcess-precision=16.

2024-03-18 Thread Hongtao Liu
On Tue, Mar 19, 2024 at 12:16 AM Joseph Myers wrote: > > On Mon, 18 Mar 2024, liuhongt wrote: > > > +If @option{-fexcess-precision=16} is specified, casts and assignments of > > +@code{_Float16} and @code{bfloat16_t} cause value to be rounded to their > > +semantic types if they're supported by

Re: [PATCH] i386 [stv]: Handle REG_EH_REGION note [pr111822].

2024-03-18 Thread Hongtao Liu
On Mon, Mar 18, 2024 at 6:59 PM Uros Bizjak wrote: > > On Mon, Mar 18, 2024 at 11:52 AM liuhongt wrote: > > > > Commit r14-9459-g618e34d56cc38e only handles > > general_scalar_chain::convert_op. The patch also handles > > timode_scalar_chain::convert_op to avoid potential similar bug. > > > >

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: > > Don't enable excess lanes when inverting vector bit-masks smaller than the > integer mode. This is yet another case of wrong-code due to mishandling > of oversized bitmasks. > > This issue shows up in vect/tsvc/vect-tsvc-s278.c and >

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 10:46 PM Uros Bizjak wrote: > > On Thu, Mar 14, 2024 at 8:42 AM Uros Bizjak wrote: > > > > On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu wrote: > > > > > > On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak wrote: > > > > > &g

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak wrote: > > On Thu, Mar 14, 2024 at 2:33 AM liuhongt wrote: > > > > When we split > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ]) > > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct > > SQRefCounted

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-12 Thread Hongtao Liu
On Tue, Mar 12, 2024 at 8:00 PM liuhongt wrote: > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > alignb. (base_align_bias - base_offset) may not aligned to alignb, and > caused segement fault. > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk and

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-03-04 Thread Hongtao Liu
On Thu, Feb 29, 2024 at 2:20 PM Hongtao Liu wrote: > > On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek wrote: > > > > Hi! > > > > Adding Hongtao and Honza into the loop as the ones who acked the original > > patch. > > > > The no_callee_saved_regist

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-28 Thread Hongtao Liu
On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek wrote: > > Hi! > > Adding Hongtao and Honza into the loop as the ones who acked the original > patch. > > The no_callee_saved_registers by default for noreturn functions change can > break in-process backtrace(3) or backtraces from debugger or other

Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-26 Thread Hongtao Liu
On Tue, Feb 27, 2024 at 3:44 PM Richard Biener wrote: > > On Tue, 27 Feb 2024, haochen.jiang wrote: > > > On Linux/x86_64, > > > > af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit > > commit af66ad89e8169f44db723813662917cf4cbb78fc > > Author: Richard Biener > > Date: Fri Feb

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-26 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 6:30 PM H.J. Lu wrote: > > On Sun, Feb 25, 2024 at 8:25 PM H.J. Lu wrote: > > > > On Sun, Feb 25, 2024 at 7:03 PM Hongtao Liu wrote: > > > > > > On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu wrote: > > > > > >

Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Hongtao Liu
MODE_NATURAL_SIZE (imode); > > Pan > > -Original Message- > From: Hongtao Liu > Sent: Monday, February 26, 2024 11:41 AM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > richard.guent...@gmail.com; Wang, Yanzhang ; > rda

Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 11:26 AM wrote: > > From: Pan Li > > We allowed vector type for get_stored_val when read is less than or > equal to store in previous. Unfortunately, we missed to adjust the > validate_subreg part accordingly. For vector type, we don't need to > restrict the mode size

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu wrote: > > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu wrote: > > > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > > > > > ldtilecfg and sttilecfg take a 512-byte memory block. With > > > _tile_loadconf

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > ldtilecfg and sttilecfg take a 512-byte memory block. With > _tile_loadconfig implemented as > > extern __inline void > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _tile_loadconfig (const void *__config) > { > __asm__

Re: PING: [PATCH] x86-64: Check R_X86_64_CODE_6_GOTTPOFF support

2024-02-22 Thread Hongtao Liu
On Thu, Feb 22, 2024 at 10:33 PM H.J. Lu wrote: > > On Sun, Feb 18, 2024 at 8:02 AM H.J. Lu wrote: > > > > If assembler and linker supports > > > > add %reg1, name@gottpoff(%rip), %reg2 > > > > with R_X86_64_CODE_6_GOTTPOFF, we can generate it instead of > > > > mov name@gottpoff(%rip), %reg2 >

Re: [PATCH] x86-64: Generate push2/pop2 only if the incoming stack is 16-byte aligned

2024-02-17 Thread Hongtao Liu
On Wed, Feb 14, 2024 at 5:33 AM H.J. Lu wrote: > > Since push2/pop2 requires 16-byte stack alignment, don't generate them > if the incoming stack isn't 16-byte aligned. Ok. > > gcc/ > > PR target/113912 > * config/i386/i386.cc (ix86_can_use_push2pop2): New. >

Re: [PATCH] x86: Update constraints for APX NDD instructions

2024-02-07 Thread Hongtao Liu
On Tue, Feb 6, 2024 at 11:49 AM H.J. Lu wrote: > > 1. The only supported TLS code sequence with ADD is > > addq foo@gottpoff(%rip),%reg > > Change je constraint to a memory operand in APX NDD ADD pattern with > register source operand. > > 2. The instruction length of APX NDD instructions

Re: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization.

2024-01-25 Thread Hongtao Liu
s been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline (in stage 1)? Ok, thanks for handling this. > > > 2024-01-25 Roger Sayle > Hongtao Liu > >

Re: [PATCH v3 0/2] x86: Don't save callee-saved registers if not needed

2024-01-24 Thread Hongtao Liu
On Tue, Jan 23, 2024 at 11:00 PM H.J. Lu wrote: > > Changes in v3: > > 1. Rebase against commit 02e68389494 > 2. Don't add call_no_callee_saved_registers to machine_function since > all callee-saved registers are properly clobbered by callee with > no_callee_saved_registers attribute. > The patch

Re: [PATCH] i386: Modify testcases failed under -DDEBUG

2024-01-24 Thread Hongtao Liu
On Mon, Jan 22, 2024 at 10:31 AM Haochen Jiang wrote: > > Hi all, > > Recently, I happened to run i386.exp under -DDEBUG and found some fail. > > This patch aims to fix that. Ok for trunk? OK. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/adx-check.h: Include

Re: [PATCH 1/2] x86: Add no_callee_saved_registers function attribute

2024-01-21 Thread Hongtao Liu
On Sat, Jan 20, 2024 at 10:30 PM H.J. Lu wrote: > > When an interrupt handler is implemented by an assembly stub which does: > > 1. Save all registers. > 2. Call a C function. > 3. Restore all registers. > 4. Return from interrupt. > > it is completely unnecessary to save and restore any

Re: [PATCH] hwasan: Check if Intel LAM_U57 is enabled

2024-01-17 Thread Hongtao Liu
On Wed, Jan 10, 2024 at 12:47 AM H.J. Lu wrote: > > When -fsanitize=hwaddress is used, libhwasan will try to enable LAM_U57 > in the startup code. Update the target check to enable hwaddress tests > if LAM_U57 is enabled. Also compile hwaddress tests with -mlam=u57 on > x86-64 since hwasan

Re: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization.

2024-01-16 Thread Hongtao Liu
On Wed, Jan 17, 2024 at 5:59 AM Roger Sayle wrote: > > > I thought I'd just missed the bug fixing season of stage3, but there > appears to a little latitude in early stage4 (for vector patches), so > I'll post this now. > > This patch resolves PR target/106060 by providing efficient methods for >

Re: [PATCH] Update documents for fcf-protection=

2024-01-11 Thread Hongtao Liu
On Thu, Jan 11, 2024 at 12:06 AM H.J. Lu wrote: > > On Tue, Jan 9, 2024 at 6:02 PM liuhongt wrote: > > > > After r14-2692-g1c6231c05bdcca, the option is defined as EnumSet and > > -fcf-protection=branch won't unset any others bits since they're in > > different groups. So to override

Re: [PATCH] i386: Add AVX10.1 related macros

2024-01-11 Thread Hongtao Liu
On Fri, Jan 12, 2024 at 10:55 AM Jiang, Haochen wrote: > > > -Original Message- > > From: Richard Biener > > Sent: Thursday, January 11, 2024 4:19 PM > > To: Liu, Hongtao > > Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org; > > ubiz...@gmail.com; bur...@net-b.de; san...@codesourcery.com >

Re: [PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-10 Thread Hongtao Liu
On Thu, Jan 11, 2024 at 7:06 AM Andi Kleen wrote: > > Hongtao Liu writes: > >> > >> +@opindex mapx-inline-asm-use-gpr32 > >> +@item -mapx-inline-asm-use-gpr32 > >> +When APX_F enabled, EGPR usage was by default disabled to prevent > >> +u

Re: [PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-10 Thread Hongtao Liu
On Tue, Jan 9, 2024 at 3:09 PM Hongyu Wang wrote: > > Hi, > > For APX, the inline asm behavior was not mentioned in any document > before. Add description for it. > > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.opt: Adjust document. > * doc/invoke.texi: Add description

Re: [PATCH] i386: [APX] Add missing document for APX

2024-01-07 Thread Hongtao Liu
On Mon, Jan 8, 2024 at 11:09 AM Hongyu Wang wrote: > > Hi, > > The supported sub-features for APX was missing in option document and > target attribute section. Add those missing ones. > > Ok for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i386.opt: Add supported sub-features. >

Re: Disable FMADD in chains for Zen4 and generic

2024-01-07 Thread Hongtao Liu
On Thu, Dec 14, 2023 at 12:03 AM Jan Hubicka wrote: > > > > The diffrerence is that Cores understand the fact that fmadd does not need > > > all three parameters to start computation, while Zen cores doesn't. > > > > > > Since this seems noticeable win on zen and not loss on Core it seems like >

Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2024-01-07 Thread Hongtao Liu
.c: Likewise. > * gcc.target/i386/pr100865-5b.c: Likewise. > * gcc.target/i386/pr100865-9a.c: Likewise. > * gcc.target/i386/pr100865-9b.c: Likewise. > * gcc.target/i386/pr102021.c: Likewise. > * gcc.target/i386/pr90773-17.c: Likewise. > > Thanks in a

Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2024-01-01 Thread Hongtao Liu
On Fri, Dec 22, 2023 at 6:25 PM Roger Sayle wrote: > > > This patch resolves the second part of PR target/112992, building upon > Hongtao Liu's solution to the first part. > > The issue addressed by this patch is that when initializing vectors by > broadcasting integer constants, the compiler has

Re: [PATCH] i386: Allow 64 bit mask register for -mno-evex512

2023-12-19 Thread Hongtao Liu
On Fri, Dec 15, 2023 at 10:34 AM Haochen Jiang wrote: > > Hi all, > > There is a recent change in AVX10 documentation which allows 64 bit mask > register instructions in AVX10-256, the documentation comes following: > > Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification

Re: [PATCH] i386: Sync move_max/store_max with prefer-vector-width [PR112824]

2023-12-14 Thread Hongtao Liu
On Thu, Dec 14, 2023 at 3:54 PM Hongyu Wang wrote: > > Hi, > > Currently move_max follows the tuning feature first, but ideally it > should sync with prefer-vector-width when it is explicitly set to keep > vector move and operation with same vector size. > > Bootstrapped/regtested on

Re: [PATCH] i386: Remove RAO-INT from Grand Ridge

2023-12-14 Thread Hongtao Liu
On Thu, Dec 14, 2023 at 10:55 AM Haochen Jiang wrote: > > Hi all, > > According to ISE050 published at the end of September, RAO-INT will not > be in Grand Ridge anymore. This patch aims to remove it. > > The documentation comes following: > > https://cdrdv2.intel.com/v1/dl/getContent/671368 > >

Re: [PATCH] [ICE] Support vpcmov for V4HF/V4BF/V2HF/V2BF under TARGET_XOP.

2023-12-13 Thread Hongtao Liu
On Wed, Dec 13, 2023 at 7:59 PM Jakub Jelinek wrote: > > On Fri, Dec 08, 2023 at 03:12:00PM +0800, liuhongt wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > > > > gcc/ChangeLog: > > > > PR target/112904 > > * config/i386/mmx.md

Re: [PATCH] i386: Fix ICE on __builtin_ia32_pabsd128 without lhs [PR112962]

2023-12-13 Thread Hongtao Liu
On Wed, Dec 13, 2023 at 4:44 PM Jakub Jelinek wrote: > > Hi! > > The following patch fixes ICE on the testcase in similar way to how > other folded builtins are handled in ix86_gimple_fold_builtin when > they don't have a lhs; these builtins are const or pure, so normally > DCE would remove them

Re: Disable FMADD in chains for Zen4 and generic

2023-12-12 Thread Hongtao Liu
On Tue, Dec 12, 2023 at 10:38 PM Jan Hubicka wrote: > > Hi, > this patch disables use of FMA in matrix multiplication loop for generic (for > x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U. > > For Intel this is neutral both on the matrix multiplication microbenchmark >

Re: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Hongtao Liu
On Tue, Dec 12, 2023 at 1:47 PM Jiang, Haochen via Gcc-regression wrote: > > > -Original Message- > > From: Jiang, Haochen > > Sent: Tuesday, December 12, 2023 9:11 AM > > To: Andrew Pinski (QUIC) ; haochen.jiang > > ; gcc-regress...@gcc.gnu.org; gcc- > > patc...@gcc.gnu.org > > Subject:

Re: [PATCH] Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.

2023-12-11 Thread Hongtao Liu
On Fri, Dec 8, 2023 at 10:17 AM liuhongt wrote: > > If the function desn't clobber any sse registers or only clobber > 128-bit part, then vzeroupper isn't issued before the function exit. > the status not CLEAN but ANY after the function. > > Also for sibling_call, it's safe to issue an

Re: [PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-11 Thread Hongtao Liu
On Mon, Dec 11, 2023 at 8:39 PM Hongyu Wang wrote: > > > > +__int128 u128_2 = (9223372036854775808 << 4) * foo0_u8_0; /* { > > > dg-warning "integer constant is so large that it is unsigned" "so large" > > > } */ > > > > Just you can use (9223372036854775807LL + (__int128) 1) instead of >

Re: [v3 PATCH] Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

2023-12-11 Thread Hongtao Liu
On Mon, Dec 11, 2023 at 4:14 PM Richard Biener wrote: > > On Mon, Dec 11, 2023 at 7:51 AM liuhongt wrote: > > > > > since you are looking at TYPE_PRECISION below you want > > > VECTOR_INTIEGER_TYPE_P here as well? The alternative > > > would be to compare TYPE_SIZE. > > > > > > Some of the

Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-12-07 Thread Hongtao Liu
On Wed, Dec 6, 2023 at 3:52 PM Richard Biener wrote: > > On Wed, Dec 6, 2023 at 3:33 AM Jiang, Haochen wrote: > > > > > -Original Message- > > > From: Jiang, Haochen > > > Sent: Friday, December 1, 2023 4:51 PM > > > To: Richard Biener > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; >

Re: [V2 PATCH] Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

2023-12-07 Thread Hongtao Liu
ping. On Thu, Nov 16, 2023 at 6:49 PM liuhongt wrote: > > Update in V2: > 1) Add some comments before the pattern. > 2) Remove ? from view_convert. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > When I'm working on PR112443, I notice there's some

Re: [PATCH v3 00/16] Support Intel APX NDD

2023-12-06 Thread Hongtao Liu
On Wed, Dec 6, 2023 at 8:11 PM Uros Bizjak wrote: > > On Wed, Dec 6, 2023 at 9:08 AM Hongyu Wang wrote: > > > > Hi, > > > > Following up the discussion of V2 patches in > > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639368.html, > > this patch series add early clobber for all TImode

Re: [PATCH] Don't vectorize when vector stmts are only vec_contruct and stores

2023-12-05 Thread Hongtao Liu
On Mon, Dec 4, 2023 at 10:10 PM Richard Biener wrote: > > On Mon, Dec 4, 2023 at 6:32 AM liuhongt wrote: > > > > .i.e. for below cases. > >a[0] = b1; > >a[1] = b2; > >.. > >a[n] = bn; > > > > There're extra dependences when contructing the vector, but not for > > scalar store.

Re: [PATCH] i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]

2023-12-05 Thread Hongtao Liu
On Wed, Dec 6, 2023 at 6:23 AM Jakub Jelinek wrote: > > Hi! > > Regardless of the outcome of the REG_UNUSED discussions, I think > it is a good idea to move the vzeroupper pass one pass later. > As can be seen in the multiple PRs and as postreload.cc documents, > reload/LRA is known to create

Re: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

2023-12-04 Thread Hongtao Liu
On Mon, Dec 4, 2023 at 3:51 PM Uros Bizjak wrote: > > On Mon, Dec 4, 2023 at 8:11 AM Hongtao Liu wrote: > > > > On Fri, Dec 1, 2023 at 10:26 PM Richard Biener > > wrote: > > > > > > On Fri, Dec 1, 2023 at 3:39 AM liuhongt wrote: > > >

Re: [PATCH v2 00/17] Support Intel APX NDD

2023-12-04 Thread Hongtao Liu
On Tue, Dec 5, 2023 at 10:32 AM Hongyu Wang wrote: > > Hi, > > APX NDD patches have been posted at > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636604.html > > Thanks to Hongtao's review, the V2 patch adds support of zext sematic with > memory input as NDD by default clear upper bits

Re: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

2023-12-03 Thread Hongtao Liu
On Fri, Dec 1, 2023 at 10:26 PM Richard Biener wrote: > > On Fri, Dec 1, 2023 at 3:39 AM liuhongt wrote: > > > > > Hmm, I would suggest you put reg_needed into the class and accumulate > > > over all vec_construct, with your patch you pessimize a single v32qi > > > over two separate v16qi for

Re: [PATCH] Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms

2023-11-30 Thread Hongtao Liu
Any comments? On Wed, Nov 22, 2023 at 12:17 PM liuhongt wrote: > > From: "Zhang, Annita" > > Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and > m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the > performance of -march=x86-64-v3/v4 with -mtune=generic set by >

Re: [PATCH] Take register pressure into account for vec_construct when the components are not loaded from memory.

2023-11-29 Thread Hongtao Liu
On Wed, Nov 29, 2023 at 3:47 PM Richard Biener wrote: > > On Tue, Nov 28, 2023 at 8:54 AM liuhongt wrote: > > > > For vec_contruct, the components must be live at the same time if > > they're not loaded from memory, when the number of those components > > exceeds available registers, spill

Re: [PATCH] i386: Fix CPUID of USER_MSR.

2023-11-28 Thread Hongtao Liu
On Wed, Nov 29, 2023 at 9:23 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to fix the wrong CPUID of USER_MSR, its correct CPUID is > (0x7, 0x1).EDX[15], But I set it as (0x7, 0x0).EDX[15]. And the patch modefied > testcase for give the user a better example. > > It has been bootstrapped and

Re: [PATCH] [i386] Fix push2pop2 test fail on non-linux target [PR112729]

2023-11-28 Thread Hongtao Liu
On Tue, Nov 28, 2023 at 9:51 PM Hongyu Wang wrote: > > Hi, > > On linux x86-64, -fomit-frame-pointer was by default enabled so the > push2pop2 tests cfi scans are based on it. On other target with > -fno-omit-frame-pointer the cfi scan will be wrong as the frame pointer > is pushed at first. Add

Re: [PATCH] i386: Fix AVX512 and AVX10 option issues

2023-11-23 Thread Hongtao Liu
On Thu, Nov 23, 2023 at 2:10 PM Haochen Jiang wrote: > > Hi all, > > This patch should be able to fix the current issue mentioned in PR112643. > > Also, I fixed some legacy issues in code related to AVX512/AVX10. > > Ok for trunk? Ok > > Thx, > Haochen > > gcc/ChangeLog: > > PR

Re: [PATCH] [APX PUSH2POP2] Adjust operand order for PUSH2POP2

2023-11-21 Thread Hongtao Liu
On Wed, Nov 22, 2023 at 11:31 AM Hongyu Wang wrote: > > Hi, > > The push2/pop2 operand order does not match the binutils implementation > for AT syntax that it will first push operands[2] then operands[1]. > Correct it by reverse operand order for AT syntax. > > Bootstrapped/regtested on

Re: [PATCH] [APX PPX] Support Intel APX PPX

2023-11-20 Thread Hongtao Liu
. > > Yes, such change also worked and no cfa adjustment required then, > thanks for the suggestion. > Updated patch with just 1 new UNSPEC and removed cfa handling. LGTM. > > Hongtao Liu 于2023年11月20日周一 14:46写道: > > > > On Fri, Nov 17, 2023 at 3:26 PM Hongyu Wang wrote:

Re: [PATCH] [APX PPX] Support Intel APX PPX

2023-11-19 Thread Hongtao Liu
On Fri, Nov 17, 2023 at 3:26 PM Hongyu Wang wrote: > > Intel APX PPX feature has been released in [1]. > > PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP > can be marked with a 1-bit hint to indicate that the POP reads the > value written by the PUSH from the stack.

Re: [PATCH] Initial support for AVX10.1

2023-11-19 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 9:42 AM Haochen Jiang wrote: > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h (get_available_features): > Add avx10_set and version and detect avx10.1. > (cpu_indicator_init): Handle avx10.1-512. > * common/config/i386/i386-common.cc >

Re: [PATCH] [i386] APX: Fix EGPR usage in several patterns.

2023-11-15 Thread Hongtao Liu
On Wed, Nov 15, 2023 at 5:43 PM Hongyu Wang wrote: > > Hi, > > For vextract/insert{if}128 they cannot adopt EGPR in their memory operand, all > related pattern should be adjusted to disable EGPR usage on them. > Also fix a wrong gpr16 attr for insertps. > > Bootstrapped/regtested on

Re: [PATCH] x86: Make testcase apx-spill_to_egprs-1.c more robust

2023-11-14 Thread Hongtao Liu
On Tue, Nov 14, 2023 at 5:01 PM Lehua Ding wrote: > > Hi, > > This little patch adjust the assert in apx-spill_to_egprs-1.c testcase. > The -mapxf compilation option allows more registers to be used, which in > turn eliminates the need for local variables to be stored in stack memory. >

Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-13 Thread Hongtao Liu
On Mon, Nov 13, 2023 at 7:25 PM Richard Biener wrote: > > On Mon, Nov 13, 2023 at 7:58 AM Hongtao Liu wrote: > > > > On Fri, Nov 10, 2023 at 6:15 PM Richard Biener > > wrote: > > > > > > On Fri, Nov 10, 2023 at 2:42 AM Haochen J

Re: [PATCH] Avoid generate vblendps with ymm16+

2023-11-13 Thread Hongtao Liu
On Mon, Nov 13, 2023 at 4:45 PM Jakub Jelinek wrote: > > On Mon, Nov 13, 2023 at 02:27:35PM +0800, Hongtao Liu wrote: > > > 1) if it isn't better to use separate alternative instead of > > >x86_evex_reg_mentioned_p, like in the patch below > > vblendps doesn't

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-12 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 5:12 PM Richard Biener wrote: > > On Wed, Nov 8, 2023 at 9:22 AM Hongtao Liu wrote: > > > > On Wed, Nov 8, 2023 at 3:53 PM Richard Biener > > wrote: > > > > > > On Wed, Nov 8, 2023 at 2:18 AM Hongtao Liu wrote: > > >

Re: [PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just VCE:((a cmp b) ? (VCE c) : (VCE d)).

2023-11-12 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 2:14 PM liuhongt wrote: > > When I'm working on PR112443, I notice there's some misoptimizations: > after we fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend > fails to combine it back to v{,p}blendv{v,ps,pd} since the pattern is > too complicated, so I think

Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-12 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 6:15 PM Richard Biener wrote: > > On Fri, Nov 10, 2023 at 2:42 AM Haochen Jiang wrote: > > > > Hi all, > > > > This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512 > > support, it makes a lot easier to add them comparing to the August version. > >

Re: [PATCH] Avoid generate vblendps with ymm16+

2023-11-12 Thread Hongtao Liu
On Sat, Nov 11, 2023 at 4:11 AM Jakub Jelinek wrote: > > On Thu, Nov 09, 2023 at 03:27:11PM +0800, Hongtao Liu wrote: > > On Thu, Nov 9, 2023 at 3:15 PM Hu, Lin1 wrote: > > > > > > This patch aims to avoid generate vblendps with ymm16+, And have > > > boo

Re: [PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE:a cmp VCE:b) ? c : d.

2023-11-09 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 10:11 AM Andrew Pinski wrote: > > On Thu, Nov 9, 2023 at 5:52 PM liuhongt wrote: > > > > When I'm working on PR112443, I notice there's some misoptimizations: after > > we > > fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend fails to combine > > it > > back to

Re: [PATCH] Avoid generate vblendps with ymm16+

2023-11-08 Thread Hongtao Liu
On Thu, Nov 9, 2023 at 3:15 PM Hu, Lin1 wrote: > > This patch aims to avoid generate vblendps with ymm16+, And have > bootstrapped and tested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk? > > gcc/ChangeLog: > > PR target/112435 > * config/i386/sse.md: Adding constraints to

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-08 Thread Hongtao Liu
On Wed, Nov 8, 2023 at 3:53 PM Richard Biener wrote: > > On Wed, Nov 8, 2023 at 2:18 AM Hongtao Liu wrote: > > > > On Tue, Nov 7, 2023 at 10:34 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu wrote: > > &g

Re: [PATCH] [i386] APX: Fix ICE due to movti postreload splitter [PR112394]

2023-11-07 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 3:33 PM Hongyu Wang wrote: > > Hi, > > When APX EGPR enabled, the TImode move pattern *movti_internal allows > move between gpr and sse reg using constraint pair ("r","Yd"). Then a > post-reload splitter transform such move to vec_extractv2di, while under > -msse4.1

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-07 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 10:34 PM Richard Biener wrote: > > On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu wrote: > > > > On Tue, Nov 7, 2023 at 4:10 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 7, 2023 at 7:08 AM liuhongt wrote: > > >

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-07 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 4:10 PM Richard Biener wrote: > > On Tue, Nov 7, 2023 at 7:08 AM liuhongt wrote: > > > > analyze_and_compute_bitop_with_inv_effect assumes the first operand is > > loop invariant which is not the case when it's INTEGER_CST. > > > > Bootstrapped and regtseted on

Re: [PATCH] i386: Fix isa attribute for TI/TF andnot mode

2023-11-06 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 10:27 AM Haochen Jiang wrote: > > Hi all, > > This patch aims fo fix the wrong isa attribute which caused regression > on PR111907. > > Regtested on x86_64-pc-linux-gnu. Ok for trunk? > > Thx, > Haochen > > gcc/ChangeLog: > > PR target/111907 > *

Re: [PATCH 5/5] x86: yet more PR target/100711-like splitting

2023-11-06 Thread Hongtao Liu
On Mon, Nov 6, 2023 at 7:10 PM Jan Beulich wrote: > > On 25.06.2023 08:41, Hongtao Liu wrote: > > On Sun, Jun 25, 2023 at 2:35 PM Hongtao Liu wrote: > >> > >> On Sun, Jun 25, 2023 at 2:25 PM Jan Beulich wrote: > >>> > >>> On 25.06.2023 07:1

Re: [RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Hongtao Liu
On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak wrote: > > The patch generalizes address register class handling to allow multiple > address register classes. For APX EGPR targets, some instructions can't be > encoded with REX2 prefix, so it is necessary to limit address register > class to avoid

Re: [PATCH 0/4] Fix no-evex512 function attribute

2023-10-31 Thread Hongtao Liu
On Tue, Oct 31, 2023 at 2:39 PM Haochen Jiang wrote: > > Hi all, > > These four patches are going to fix no-evex512 function attribute. The detail > of the issue comes following: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889 > > My proposal for this problem is to also push "no-evex512"

Re: [PATCH] Fix incorrect option mask and avx512cd target push

2023-10-30 Thread Hongtao Liu
On Mon, Oct 30, 2023 at 3:47 PM Haochen Jiang wrote: > > Hi all, > > This patch fixed two obvious bug in current evex512 implementation. > > Also, I moved AVX512CD+AVX512VL part out of the AVX512VL to avoid > accidental handle miss in avx512cd in the future. > > Ok for trunk? Ok. > > BRs, >

Re: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.

2023-10-27 Thread Hongtao Liu
On Fri, Oct 27, 2023 at 3:21 PM Hongtao Liu wrote: > > On Fri, Oct 27, 2023 at 2:49 PM Richard Biener > wrote: > > > > > > > > > Am 27.10.2023 um 07:50 schrieb liuhongt : > > > > > > When 2 vectors are equal, kmask is allones

Re: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.

2023-10-27 Thread Hongtao Liu
On Fri, Oct 27, 2023 at 2:49 PM Richard Biener wrote: > > > > > Am 27.10.2023 um 07:50 schrieb liuhongt : > > > > When 2 vectors are equal, kmask is allones and kortest will set CF, > > else CF will be cleared. > > > > So CF bit can be used to check for the result of the comparison. > > > >

Re: [PATCH] i386: Fix undefined masks in vpopcnt tests

2023-10-24 Thread Hongtao Liu
On Tue, Oct 24, 2023 at 6:10 PM Richard Sandiford wrote: > > The files changed in this patch had tests for masked and unmasked > popcnt. However, the mask inputs to the masked forms were undefined, > and would be set to zero by init_regs. Any combine-like pass that > ran after init_regs could

Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-23 Thread Hongtao Liu
On Tue, Oct 24, 2023 at 1:23 PM Hongtao Liu wrote: > > On Tue, Oct 24, 2023 at 10:53 AM Hongtao Liu wrote: > > > > On Mon, Oct 23, 2023 at 8:35 PM Richard Biener > > wrote: > > > > > > On Mon, Oct 23, 2023 at 10:48 AM liuhongt wrote: > > >

Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-23 Thread Hongtao Liu
On Tue, Oct 24, 2023 at 10:53 AM Hongtao Liu wrote: > > On Mon, Oct 23, 2023 at 8:35 PM Richard Biener > wrote: > > > > On Mon, Oct 23, 2023 at 10:48 AM liuhongt wrote: > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > >

Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-23 Thread Hongtao Liu
On Mon, Oct 23, 2023 at 8:35 PM Richard Biener wrote: > > On Mon, Oct 23, 2023 at 10:48 AM liuhongt wrote: > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > > vcond and vcondeq shouldn't be necessary if there's > vcond_mask and vcmp support which is the

Re: [PATCH] x86: Correct ISA enabled for clients since Arrow Lake

2023-10-19 Thread Hongtao Liu
On Wed, Oct 18, 2023 at 4:10 PM Haochen Jiang wrote: > > Hi all, > > I just found that since ISAs enabled on Sierra Forest changed, clients since > Arrow Lake will wrongly enable ENQCMD according to the current code. > > To avoid messing up again in the future, I changed the dependency on how

Re: [PATCH] Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big. 65; 6800; 1c There's loop in vect_peel_nonlinear_iv_init to

2023-10-18 Thread Hongtao Liu
On Wed, Oct 18, 2023 at 4:33 PM liuhongt wrote: > Cut from subject... There's a loop in vect_peel_nonlinear_iv_init to get init_expr * pow (step_expr, skip_niters). When skipn_iters is too big, compile time hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to init_expr <<

Re: [PATCH 0/3] Add Intel new cpu archs

2023-10-17 Thread Hongtao Liu
On Mon, Oct 16, 2023 at 2:25 PM Haochen Jiang wrote: > > Hi all, > > The patches aim to add new cpu archs Clear Water Forest and > Panther Lake. Here comes the documentation: > > https://cdrdv2.intel.com/v1/dl/getContent/671368 > > Also in the patches, I refactored how we detect cpu according to

Re: [PATCH] Disparage slightly for the alternative which move DFmode between SSE_REGS and GENERAL_REGS.

2023-10-12 Thread Hongtao Liu
On Thu, Jul 6, 2023 at 1:53 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, Jul 6, 2023 at 3:14 AM liuhongt wrote: > > > > For testcase > > > > void __cond_swap(double* __x, double* __y) { > > bool __r = (*__x < *__y); > > auto __tmp = __r ? *__x : *__y; > > *__y = __r ? *__y : *__x; > >

Re: [PATCH] [APX] Support Intel APX PUSH2POP2

2023-10-11 Thread Hongtao Liu
On Tue, Oct 10, 2023 at 2:51 PM Hongyu Wang wrote: > > From: "Mo, Zewei" > > Hi, > > Intel APX PUSH2POP2 feature has been released in [1]. > > This feature requires stack to be aligned at 16byte, therefore in > prologue/epilogue, a standalone push/pop will be emitted before any > push2/pop2 if

Re: [PATCH] [i386] APX EGPR: fix missing pattern that prohibits egpr

2023-10-08 Thread Hongtao Liu
On Mon, Oct 9, 2023 at 10:05 AM Hongyu Wang wrote: > > For vec_concatv2di, m constraint in alternative 0 and 1 could result in > egpr allocated on operand 2 under -mapxf. Should use jm instead. > > Bootstrapped/regtested on x86-64-linux-gnu. > > Ok for trunk? Ok. > > gcc/ChangeLog: > > *

Re: [PATCH 03/13] [APX_EGPR] Initial support for APX_F

2023-10-06 Thread Hongtao Liu
> (apx_egpr): Likewise. > (apx_push2pop2): Likewise. > (apx_ndd): Likewise. > (apx_all): Likewise. > * doc/invoke.texi: Document mapxf. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/apx-1.c: New test. > > Co-aut

  1   2   3   4   5   6   7   8   9   10   >