Re: [PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]

2024-09-19 Thread Uros Bizjak
rth it to implement insn patterns with generic RTXes instead of unspecs. Maybe some future improvement to generic RTX simplification will be able to handle them. > > 2024-09-19 Uros Bizjak > Jakub Jelinek > > PR target/116738 > * config/i386/subst

Re: [PATCH] x86-64: Don't use temp for argument in a TImode register

2024-09-15 Thread Uros Bizjak
On Sat, Sep 14, 2024 at 12:58 PM H.J. Lu wrote: > > On Sun, Sep 8, 2024 at 12:10 AM Uros Bizjak wrote: > > > > On Fri, Sep 6, 2024 at 2:24 PM H.J. Lu wrote: > > > > > > Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression > >

[committed] i386: Implement SAT_ADD for signed vector integers

2024-09-12 Thread Uros Bizjak
Enable V4QI, V2QI and V2HI mode signed saturated arithmetic insn patterns and add a couple of testcases to test for PADDSB and PADDSW instructions. PR target/112600 gcc/ChangeLog: * config/i386/mmx.md (3): Rename from *3. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-3a.c

[committed]: i386: Use offsettable address constraint for double-word memory operands

2024-09-09 Thread Uros Bizjak
Double-word memory operands are accessed as their high and low parts, so the memory location has to be offsettable. Use "o" constraint instead of "m" for double-word memory operands. gcc/ChangeLog: * config/i386/i386.md (*insvdi_lowpart_1): Use "o" constraint instead of "m" for double-wo

Re: [PATCH] x86-64: Don't use temp for argument in a TImode register

2024-09-08 Thread Uros Bizjak
On Fri, Sep 6, 2024 at 2:24 PM H.J. Lu wrote: > > Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression > in a TImode register. Otherwise, the TImode variable will be put in > the GPR save area which guarantees only 8-byte alignment. > > gcc/ > > PR target/116621 >

Re: [x86_64 PATCH] Support read-modify-write memory operands in STV.

2024-08-31 Thread Uros Bizjak
On Sat, Aug 31, 2024 at 3:28 PM Roger Sayle wrote: > > > Hi Uros, > > As requested this patch is split out from my previous submission. > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659450.html > This patch enables STV when the first operand of a TImode binary > logic operand (AND, IOR o

Re: [PATCH] [x86] Check avx upper register for parallel.

2024-08-29 Thread Uros Bizjak
On Fri, Aug 30, 2024 at 6:49 AM liuhongt wrote: > > > Can the above loop be a part of ix86_check_avx_upper_register, so this > > function would scan the full RTX for avx upper register? > Changed, also adjust ix86_check_avx_upper_stores and ix86_avx_u128_mode_needed > to either inline the old ix86

Re: [PATCH] [x86] Check avx upper register for parallel.

2024-08-29 Thread Uros Bizjak
On Thu, Aug 29, 2024 at 9:33 AM liuhongt wrote: > > For function arguments/return, when it's BLK mode, it's put in a > parallel with an expr_list, and the expr_list contains the real mode > and registers. > Current ix86_check_avx_upper_register only checked for SSE_REG_P, and > failed to handle th

Re: [x86_64 PATCH] Update STV's gains for TImode arithmetic right shifts on AVX2.

2024-08-25 Thread Uros Bizjak
V sob., 24. avg. 2024 17:11 je oseba Roger Sayle napisala: > > This patch tweaks timode_scalar_chain::compute_convert_gain to better > reflect the expansion of V1TImode arithmetic right shifts by the i386 > backend. The comment "see ix86_expand_v1ti_ashiftrt" appears after > "case ASHIFTRT" in c

Re: [PATCH] testsuite: i386: Fix g++.target/i386/pr116275-2.C on Solaris/x86

2024-08-20 Thread Uros Bizjak
On Tue, Aug 20, 2024 at 3:06 PM Rainer Orth wrote: > > The new g++.target/i386/pr116275-2.C test FAILs on 32-bit Solaris/x86: > > FAIL: g++.target/i386/pr116275-2.C scan-assembler vpslld > > This happens because Solaris defaults to -mstackrealign, disabling -mstv. > > Fixed by disabling the for

Re: [PATCH] Align predicates for operands[1] between mov and *mov_internal.

2024-08-20 Thread Uros Bizjak
On Tue, Aug 20, 2024 at 12:25 PM liuhongt wrote: > > From [1] > > > It's not obvious to me why movv16qi requires a nonimmediate_operand > > > source, especially since ix86_expand_vector_mode does have code to > > > cope with constant operand[1]s. emit_move_insn_1 doesn't check the > > > predicate

Re: [PATCH v2] [x86] Movement between GENERAL_REGS and SSE_REGS for TImode doesn't need secondary reload.

2024-08-15 Thread Uros Bizjak
On Thu, Aug 15, 2024 at 9:27 AM liuhongt wrote: > > It results in 2 failures for x86_64-pc-linux-gnu{\ > -march=cascadelake}; > > gcc: gcc.target/i386/extendditi3-1.c scan-assembler cqt?o > gcc: gcc.target/i386/pr113560.c scan-assembler-times \tmulq 1 > > For pr113560.c, now GCC generates mulx ins

Re: [x86_64 PATCH] Support wide immediate constants in STV.

2024-08-15 Thread Uros Bizjak
On Thu, Aug 15, 2024 at 11:34 AM Roger Sayle wrote: > > > As requested this patch is split out from my earlier submission. > This patch provides more accurate costs/gains for (wide) immediate > constants in STV, suitably adjusting the costs/gains when the highpart > and lowpart words are the same.

Re: [x86 PATCH] Improve split of *extendv2di2_highpart_stv_noavx512vl.

2024-08-15 Thread Uros Bizjak
with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2024-08-15 Roger Sayle > Uros Bizjak > > gcc/ChangeLog > * config/i386/i386.md (*extendv2di2_highpart

Re: [PATCH] [x86] Movement between GENERAL_REGS and SSE_REGS for TImode doesn't need secondary reload.

2024-08-13 Thread Uros Bizjak
On Wed, Aug 14, 2024 at 3:28 AM liuhongt wrote: > > It results in 2 failures for x86_64-pc-linux-gnu{\ > -march=cascadelake}; > > gcc: gcc.target/i386/extendditi3-1.c scan-assembler cqt?o > gcc: gcc.target/i386/pr113560.c scan-assembler-times \tmulq 1 > > For pr113560.c, now GCC generates mulx ins

Re: [PATCH] [x86] Movement between GENERAL_REGS and SSE_REGS for TImode doesn't need secondary reload.

2024-08-13 Thread Uros Bizjak
On Wed, Aug 14, 2024 at 3:28 AM liuhongt wrote: > > It results in 2 failures for x86_64-pc-linux-gnu{\ > -march=cascadelake}; > > gcc: gcc.target/i386/extendditi3-1.c scan-assembler cqt?o > gcc: gcc.target/i386/pr113560.c scan-assembler-times \tmulq 1 > > For pr113560.c, now GCC generates mulx ins

Re: [x86 PATCH] PR target/116275: Handle STV of *extenddi2_doubleword_highpart

2024-08-11 Thread Uros Bizjak
On Sun, Aug 11, 2024 at 12:16 PM Roger Sayle wrote: > > > This patch resolves PR target/116275, a recent ICE-on-valid regression on > -m32 caused by my recent change to enable STV of DImode arithmeric right > shift on non-AVX512VL targets. The oversight is that the i386 backend > contains an *ext

Re: [PATCH] i386: Fix up __builtin_ia32_b{extr{, i}_u{32, 64}, zhi_{s, d}i} folding [PR116287]

2024-08-09 Thread Uros Bizjak
On Fri, Aug 9, 2024 at 9:29 AM Jakub Jelinek wrote: > > Hi! > > The GENERIC folding of these builtins have cases where it folds to a > constant regardless of the value of the first operand. If so, we need > to use omit_one_operand to avoid throwing away side-effects in the first > operand if any.

Re: [x86 PATCH] Tweak ix86_mode_can_transfer_bits to restore bootstrap on RHEL.

2024-08-08 Thread Uros Bizjak
On Thu, Aug 8, 2024 at 10:28 AM Roger Sayle wrote: > > > This minor patch, very similar to one posted and approved previously at > https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657229.html is > required to restore builds on systems using gcc 4.8 as a host compiler. > Using the enumeration co

Re: [x86_64 PATCH] Support memory destinations and wide immediate constants in STV.

2024-08-06 Thread Uros Bizjak
On Mon, Aug 5, 2024 at 5:50 PM Roger Sayle wrote: > > > Hi Uros, > Very many thanks for the quick review and approval. Here's another. > > This patch implements two improvements/refinements to the i386 backend's > Scalar-To-Vector (STV) pass. The first is to support memory destinations > in bina

Re: [x86_64 PATCH] Refactor V2DI arithmetic right shift expansion for STV.

2024-08-05 Thread Uros Bizjak
On Mon, Aug 5, 2024 at 12:22 PM Roger Sayle wrote: > > > This patch refactors ashrv2di RTL expansion into a function so that it may > be reused by a pre-reload splitter, such that DImode right shifts may be > considered candidates during the Scalar-To-Vector (STV) pass. Currently > DImode arithme

Re: [PATCH] Fix mismatch between constraint and predicate for ashl3_doubleword.

2024-08-01 Thread Uros Bizjak
On Tue, Jul 30, 2024 at 5:05 AM liuhongt wrote: > > (insn 98 94 387 2 (parallel [ > (set (reg:TI 337 [ _32 ]) > (ashift:TI (reg:TI 329) > (reg:QI 521))) > (clobber (reg:CC 17 flags)) > ]) "test.c":11:13 953 {ashlti3_doubleword} >

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 11:33 AM Richard Biener wrote: > > > > > > OK. Richard, can you please mention the above in the comment why > > > > > > XFmode is rejected in the hook? > > > > > > > > > > > > Later, we can perhaps benchmark XFmode move vs. generic memory copy > > > > > > to > > > > > > g

Re: [PATCH 2/3] [x86] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 3:40 PM Richard Biener wrote: > > The following implements the hook, excluding x87 modes for scalar > and complex float modes. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK this way? > > Thanks, > Richard. > > * i386.cc (TARGET_MODE_CAN_TRANSFER_BI

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 11:33 AM Richard Biener wrote: > > On Wed, 31 Jul 2024, Uros Bizjak wrote: > > > On Wed, Jul 31, 2024 at 10:48 AM Richard Biener wrote: > > > > > > On Wed, 31 Jul 2024, Uros Bizjak wrote: > > > > > > >

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 10:48 AM Richard Biener wrote: > > On Wed, 31 Jul 2024, Uros Bizjak wrote: > > > On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek wrote: > > > > > > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote: > > > > OK. Ric

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek wrote: > > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote: > > OK. Richard, can you please mention the above in the comment why > > XFmode is rejected in the hook? > > > > Later, we can perhaps benchmark

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 10:02 AM Hongtao Liu wrote: > > > > > > On Tue, 30 Jul 2024, Richard Biener wrote: > > > > > > > > > > > > > > Oh, and please add a small comment why we don't use XFmode here. > > > > > > > > > > > > > > Will do. > > > > > > > > > > > > > > /* Do not enable XFmode,

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 9:11 AM Hongtao Liu wrote: > > On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak wrote: > > > > On Tue, Jul 30, 2024 at 3:00 PM Richard Biener wrote: > > > > > > On Tue, 30 Jul 2024, Alexander Monakov wrote: > > > > > &g

[committed] i386/testsuite: Add testcase for fixed PR [PR51492]

2024-07-30 Thread Uros Bizjak
PR target/51492 gcc/testsuite/ChangeLog: * gcc.target/i386/pr51492.c: New test. Tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/testsuite/gcc.target/i386/pr51492.c b/gcc/testsuite/gcc.target/i386/pr51492.c new file mode 100644 index 000..0892e0c79a7 --- /dev/null +++

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Uros Bizjak
On Tue, Jul 30, 2024 at 3:00 PM Richard Biener wrote: > > On Tue, 30 Jul 2024, Alexander Monakov wrote: > > > > > On Tue, 30 Jul 2024, Richard Biener wrote: > > > > > > Oh, and please add a small comment why we don't use XFmode here. > > > > > > Will do. > > > > > > /* Do not enable XFmode

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Uros Bizjak
On Tue, Jul 30, 2024 at 1:07 PM Uros Bizjak wrote: > > On Tue, Jul 30, 2024 at 12:18 PM Richard Biener wrote: > > > > The following implements the hook, excluding x87 modes for scalar > > and complex float modes. > > > > Bootstrapped and tested on

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Uros Bizjak
On Tue, Jul 30, 2024 at 12:18 PM Richard Biener wrote: > > The following implements the hook, excluding x87 modes for scalar > and complex float modes. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK? > > Thanks, > Richard. > > * i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Def

Re: [PATCH v2] i386: Change prefetchi output template

2024-07-22 Thread Uros Bizjak
On Tue, Jul 23, 2024 at 4:59 AM Haochen Jiang wrote: > > Hi all, > > I tested with %a and it works. Therefore I suppose it is a better solution. > > Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk and backport > to GCC 13 and 14? OK, also for backports. Thanks, Uros. > > Thx, >

Re: [PATCH] Relax ix86_hardreg_mov_ok after split1.

2024-07-22 Thread Uros Bizjak
On Tue, Jul 23, 2024 at 3:08 AM liuhongt wrote: > > ix86_hardreg_mov_ok is added by r11-5066-gbe39636d9f68c4 > > >The solution proposed here is to have the x86 backend/recog prevent > >early RTL passes composing instructions (that set likely_spilled hard > >registers) that they (combin

[committed] libatomic: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
From: mayshao PR target/104688 libatomic/ChangeLog: * config/x86/init.c (__libat_feat1_init): Don't clear bit_AVX on ZHAOXIN CPUs. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c index 261

[committed] libatomic: Improve cpuid usage in __libat_feat1_init

2024-07-18 Thread Uros Bizjak
Check the result of __get_cpuid and process FEAT1_REGISTER only when __get_cpuid returns success. Use __cpuid instead of nested __get_cpuid. libatomic/ChangeLog: * config/x86/init.c (__libat_feat1_init): Check the result of __get_cpuid and process FEAT1_REGISTER only when __get_cpuid

Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 2:07 PM Jakub Jelinek wrote: > > On Thu, Jul 18, 2024 at 01:57:11PM +0200, Uros Bizjak wrote: > > Attached patch illustrates the proposed improvement with nested cpuid > > calls. Bootstrapped and teased with libatomic testsuite. > > > > Jaku

Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 10:31 AM Uros Bizjak wrote: > > On Thu, Jul 18, 2024 at 10:21 AM Jakub Jelinek wrote: > > > > On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote: > > > On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek wrote: > > > > > >

Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 10:21 AM Jakub Jelinek wrote: > > On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote: > > On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek wrote: > > > > > > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote: > >

Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek wrote: > > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote: > > > > + unsigned int ecx2 = 0, family = 0; > > > > No need to initialize these two variables. > > The function ignores __get_cpuid resu

Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 9:29 AM Jakub Jelinek wrote: > > On Thu, Jul 18, 2024 at 03:23:05PM +0800, MayShao-oc wrote: > > From: mayshao > > > > Hi Jakub: > > > > Thanks for your review,We should just amend this to handle Zhaoxin. > > > > Bootstrapped /regtested X86_64. > > > > Ok for t

Re: [PATCH v2] i386: Fix testcases generating invalid asm

2024-07-17 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 8:52 AM Haochen Jiang wrote: > > Hi all, > > I revised the patch according to the comment. > > Ok for trunk? > > Thx, > Haochen > > --- > > Changes in v2: Add suffix for mov to make the test more robust. > > --- > > For compile test, we should generate valid asm except for

Re: [PATCH v2] [x86][avx512] Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

2024-07-17 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 3:35 AM liuhongt wrote: > > > Also, in case the insn is deleted, do: > > > > emit_note (NOTE_INSN_DELETED); > > > > DONE; > > > > instead of leaving (const_int 0) in the stream. > > > > So, the above insn preparation statements should read: > > > > --cut here-- > > if (cons

Re: [PATCH] i386: Fix testcases generating invalid asm

2024-07-17 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 3:46 AM Haochen Jiang wrote: > > Hi all, > > For compile test, we should generate valid asm except for special purposes. > Fix the compile test that generates invalid asm. > > Regtested on x86-64-pc-linux-gnu. Ok for trunk? > > Thx, > Haochen > > gcc/testsuite/ChangeLog: >

[committed] alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

2024-07-17 Thread Uros Bizjak
Add missing "cannot_copy" attribute to instructions that have to stay in 1-1 correspondence with another insn. PR target/115526 gcc/ChangeLog: * config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy attribute. (movdi_er_tlsgd): Ditto. (movdi_er_tlsldm): Ditto. (call_value_

Re: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

2024-07-17 Thread Uros Bizjak
On Wed, Jul 17, 2024 at 8:54 AM Liu, Hongtao wrote: > > > > > -Original Message- > > From: Uros Bizjak > > Sent: Wednesday, July 17, 2024 2:52 PM > > To: Liu, Hongtao > > Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com > >

Re: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

2024-07-16 Thread Uros Bizjak
On Wed, Jul 17, 2024 at 3:27 AM liuhongt wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > gcc/ChangeLog: > > PR target/115843 > * config/i386/predicates.md (const0_or_m1_operand): New > predicate. > * config/i386/sse.md

Re: [x86 PATCH] Tweak i386-expand.cc to restore bootstrap on RHEL.

2024-07-14 Thread Uros Bizjak
On Sun, Jul 14, 2024 at 3:42 PM Roger Sayle wrote: > > > This is a minor change to restore bootstrap on systems using gcc 4.8 > as a host compiler. The fatal error is: > > In file included from gcc/gcc/coretypes.h:471:0, > from gcc/gcc/config/i386/i386-expand.cc:23: > gcc/gcc/con

Re: [r15-1936 Regression] FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c execution test on Linux/x86_64

2024-07-10 Thread Uros Bizjak
On Wed, Jul 10, 2024 at 3:42 PM haochen.jiang wrote: > > On Linux/x86_64, > > 80e446e829d818dc19daa6e671b9626e93ee4949 is the first bad commit > commit 80e446e829d818dc19daa6e671b9626e93ee4949 > Author: Pan Li > Date: Fri Jul 5 20:36:35 2024 +0800 > > Match: Support form 2 for the .SAT_TRUN

[committed] i386: Swap compare operands in ustrunc patterns

2024-07-10 Thread Uros Bizjak
A last minute change led to a wrong operand order in the compare insn. gcc/ChangeLog: * config/i386/i386.md (ustruncdi2): Swap compare operands. (ustruncsi2): Ditto. (ustrunchiqi2): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/config

[PATCH] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Uros Bizjak
emit_store_flag_1 calculates scode (swapped condition code) at the beginning of the function from the value of code variable. However, code variable may change before scode usage site, resulting in invalid stalled scode value. Move calculation of scode value just before its only usage site to avo

Re: [PATCH] [alpha] adjust MEM alignment for block move [PR115459] (was: Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required)

2024-07-10 Thread Uros Bizjak
On Thu, Jun 13, 2024 at 9:37 AM Alexandre Oliva wrote: > > Hello, Maciej, > > On Jun 12, 2024, "Maciej W. Rozycki" wrote: > > > This has regressed building the `alpha-linux-gnu' target, in libada, as > > from commit d6b756447cd5 including GCC 14 and up to current GCC 15 trunk: > > > | Error dete

[committed] i386: Implement .SAT_TRUNC for unsigned integers

2024-07-09 Thread Uros Bizjak
The following testcase: unsigned short foo (unsigned int x) { _Bool overflow = x > (unsigned int)(unsigned short)(-1); return ((unsigned short)x | (unsigned short)-overflow); } currently compiles (-O2) to: foo: xorl%eax, %eax cmpl$65535, %edi seta%al negl%eax

Re: [PATCH] i386: Correct AVX10 CPUID emulation

2024-07-09 Thread Uros Bizjak
On Tue, Jul 9, 2024 at 10:38 AM Haochen Jiang wrote: > > Hi all, > > AVX10 Documentaion has specified ecx value as 0 for AVX10 version and > vector size under 0x24 subleaf. Although for ecx=1, the bits are all > reserved for now, we still need to specify ecx as 0 to avoid dirty > value in ecx. > >

[committed] i386: Promote {QI, HI}mode x86_movcc_0_m1_neg to SImode

2024-07-08 Thread Uros Bizjak
Promote HImode x86_movcc_0_m1_neg insn to SImode to avoid redundant prefixes. Also promote QImode insn when TARGET_PROMOTE_QImode is set. This is similar to promotable_binary_operator splitter, where we promote the result to SImode. Also correct insn condition for splitters to SImode of NEG and NO

Re: [PATCH v2] i386: Refactor ssedoublemode

2024-07-05 Thread Uros Bizjak
On Fri, Jul 5, 2024 at 9:07 AM Hu, Lin1 wrote: > > I Modified the changelog and comments. > > ssedoublemode's double should mean double type, like SI -> DI. > And we need to refactor some patterns with instead of > . > > BRs, > Lin > > gcc/ChangeLog: > > * config/i386/sse.md (ssedoublemod

Re: [PATCH] i386: Refactor ssedoublemode

2024-07-04 Thread Uros Bizjak
On Fri, Jul 5, 2024 at 7:48 AM Hu, Lin1 wrote: > > Hi, all > > ssedoublemode's double should mean double type, like SI -> DI. > And we need to refactor some patterns with instead of > . > > Bootstrapped and regtested on x86-64-linux-gnu, OK for trunk? > > BRs, > Lin > > gcc/ChangeLog: > >

Re: [x86 PATCH] Add additional variant of bswaphisi2_lowpart peephole2.

2024-07-01 Thread Uros Bizjak
On Mon, Jul 1, 2024 at 3:20 PM Roger Sayle wrote: > > > This patch adds an additional variation of the peephole2 used to convert > bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into > rotw if the flags register isn't live. The motivating example is: > > void ext(int x); > vo

Re: [x86 PATCH]: Additional peephole2 to use lea in round-up integer division.

2024-06-30 Thread Uros Bizjak
On Sun, Jun 30, 2024 at 9:09 PM Roger Sayle wrote: > > > Hi Uros, > > On Sat, Jun 29, 2024 at 6:21 PM Roger Sayle > > wrote: > > > A common idiom for implementing an integer division that rounds > > > upwards is to write (x + y - 1) / y. Conveniently on x86, the two > > > additions to form the n

Re: [x86 PATCH]: Additional peephole2 to use lea in round-up integer division.

2024-06-30 Thread Uros Bizjak
On Sat, Jun 29, 2024 at 6:21 PM Roger Sayle wrote: > > > A common idiom for implementing an integer division that rounds upwards is > to write (x + y - 1) / y. Conveniently on x86, the two additions to form > the numerator can be performed by a single lea instruction, and indeed gcc > currently g

[PATCH] i386: Cleanup tmp variable usage in ix86_expand_move

2024-06-28 Thread Uros Bizjak
Remove extra assignment, extra temp variable and variable shadowing. No functional changes intended. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_move): Remove extra assignment to tmp variable, reuse tmp variable instead of declaring new temporary variable and remove tmp

Re: [PATCH] i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL

2024-06-28 Thread Uros Bizjak
On Fri, Jun 28, 2024 at 1:41 PM Evgeny Karpov wrote: > > Thursday, June 27, 2024 8:13 PM > Uros Bizjak wrote: > > > > > So, there is no problem having #endif just after else. > > > > Anyway, it's your call, this is not a hill I'm willing to die on.

Re: [PATCH 3/3] [x86] Enable flate-combine.

2024-06-27 Thread Uros Bizjak
On Fri, Jun 28, 2024 at 7:29 AM liuhongt wrote: > > Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also > define target_insn_cost to prevent post_reload pass_late_combine to > revert the optimziation did in pass_rpad. > > Adjust testcases since pass_late_combine generates better

Re: [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative.

2024-06-27 Thread Uros Bizjak
On Fri, Jun 28, 2024 at 7:29 AM liuhongt wrote: > > late_combine will combine lshift + zero into *lshifrtsi3_1_zext which > cause extra mov between gpr and kmask, add ?k to the pattern. > > gcc/ChangeLog: > > PR target/115610 > * config/i386/i386.md (<*insnsi3_zext): Add alternativ

Re: [x86 PATCH] Handle sign_extend like zero_extend in *concatditi3_[346]

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 9:40 PM Roger Sayle wrote: > > > This patch generalizes some of the patterns in i386.md that recognize > double word concatenation, so they handle sign_extend the same way that > they handle zero_extend in appropriate contexts. > > As a motivating example consider the follo

Re: [PATCH] i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 12:50 PM Evgeny Karpov wrote: > > Thursday, June 27, 2024 10:39 AM > Uros Bizjak wrote: > > > > diff --git a/gcc/config/i386/i386-expand.cc > > > b/gcc/config/i386/i386-expand.cc > > > index 5dfa7d49f58..20adb42e17b 100644 >

Re: [PATCH] libgccjit: Add support for machine-dependent builtins

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 12:49 AM David Malcolm wrote: > > On Thu, 2023-11-23 at 17:17 -0500, Antoni Boucher wrote: > > Hi. > > I did split the patch and sent one for the bfloat16 support and > > another > > one for the vector support. > > > > Here's the updated patch for the machine-dependent buil

Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-27 Thread Uros Bizjak
EN_STORE (vectp_out.12_75, 32B, { -1, ... }, _81, 0, > vect_patt_49.11_73); > vectp_op_1.7_69 = vectp_op_1.7_68 + ivtmp_67; > vectp_out.12_76 = vectp_out.12_75 + ivtmp_74; > ivtmp_80 = ivtmp_79 - _81; > > riscv64-unknown-elf-gcc (GCC) 15.0.0 20240627 (experimental) >

Re: [PATCH] i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 9:16 AM Evgeny Karpov wrote: > > Thank you for reporting the issues and discussing the root causes. > It helped in preparing the patch. > > This patch fixes 3 bugs reported after merging > the "Add DLL import/export implementation to AArch64" series. > https://gcc.gnu.org/p

Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-26 Thread Uros Bizjak
On Mon, Jun 24, 2024 at 3:55 PM wrote: > > From: Pan Li > > The zip benchmark of coremark-pro have one SAT_SUB like pattern but > truncated as below: > > void test (uint16_t *x, unsigned b, unsigned n) > { > unsigned a = 0; > register uint16_t *p = x; > > do { > a = *--p; > *p = (ui

Re: [PATCH V2] Fix wrong cost of MEM when addr is a lea.

2024-06-26 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 5:57 AM liuhongt wrote: > > > But rtx_cost invokes targetm.rtx_cost which allows to avoid that > > recursive processing at any level. You're dealing with MEM [addr] > > here, so why's rtx_cost (addr, Pmode, MEM, 0, speed) not always > > the best way to deal with this? Sin

Re: [PATCH] i386: Fix some ISA bit test in option_override

2024-06-19 Thread Uros Bizjak
On Thu, Jun 20, 2024 at 3:16 AM Hongyu Wang wrote: > > Hi, > > This patch adjusts several new feature check in ix86_option_override_interal > that directly use TARGET_* instead of TARGET_*_P (opts->ix86_isa_flags), > which caused cmdline option overrides target_attribute isa flag. > > Bootstrapped

Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement

2024-06-19 Thread Uros Bizjak
On Tue, Jun 18, 2024 at 9:21 AM mayshao-oc wrote: > > > > On 5/28/24 14:15, Uros Bizjak wrote: > > > > > > > > On Mon, May 27, 2024 at 10:33 AM MayShao wrote: > >> > >> From: mayshao > >> > >> Hi all: > >>

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak
On Thu, Jun 13, 2024 at 3:44 AM Hongyu Wang wrote: > > Thanks for the advice, updated patch in attachment. > > Bootstrapped/regtested on x86-64-pc-linux-gnu. Ok for trunk? > > Uros Bizjak 于2024年6月12日周三 18:12写道: > > > > On Wed, Jun 12, 2024 at 12:00 PM Uros Bizjak

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak
On Wed, Jun 12, 2024 at 12:00 PM Uros Bizjak wrote: > > On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang wrote: > > > > Hi, > > > > For CTEST, we don't have conditional AND so there's no optimization > > opportunity to write a new ctest pattern. Emit ct

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak
On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang wrote: > > Hi, > > For CTEST, we don't have conditional AND so there's no optimization > opportunity to write a new ctest pattern. Emit ctest when ccmp did > comparison to const 0 to save bytes. > > Bootstrapped & regtested under x86-64-pc-linux-gnu. > >

Re: [PATCH] rust: Do not link with libdl and libpthread unconditionally

2024-06-12 Thread Uros Bizjak
On Tue, Jun 11, 2024 at 11:21 AM Arthur Cohen wrote: > > Thanks Richi! > > Tested again and pushed on trunk. This patch introduced a couple of errors during ./configure: checking for library containing dlopen... none required checking for library containing pthread_create... none required /git/

[committed] i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600]

2024-06-11 Thread Uros Bizjak
For TARGET_CMOV targets emit insn sequence involving conditional move. .SAT_ADD: addl%esi, %edi movl$-1, %eax cmovnc %edi, %eax ret .SAT_SUB: subl%esi, %edi movl$0, %eax cmovnc %edi, %eax ret PR target/112600 gc

[committed] i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]

2024-06-09 Thread Uros Bizjak
The following testcase: unsigned sub_sat (unsigned x, unsigned y) { unsigned res; res = x - y; res &= -(x >= y); return res; } currently compiles (-O2) to: sub_sat: movl%edi, %edx xorl%eax, %eax subl%esi, %edx cmpl%esi, %edi setnb

Re: [committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak
On Sat, Jun 8, 2024 at 2:09 PM Gerald Pfeifer wrote: > > On Sat, 8 Jun 2024, Uros Bizjak wrote: > > gcc/ChangeLog: > > > > * config/i386/i386.md (usadd3): New expander. > > (x86_movcc_0_m1_neg): Use SWI mode iterator. > > When you write "committed

[committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak
The following testcase: unsigned add_sat(unsigned x, unsigned y) { unsigned z; return __builtin_add_overflow(x, y, &z) ? -1u : z; } currently compiles (-O2) to: add_sat: addl%esi, %edi jc .L3 movl%edi, %eax ret .L3: orl $-1, %eax

Re: [PATCH v2 2/6] Extract ix86 dllimport implementation to mingw

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:48 AM Evgeny Karpov wrote: > > This patch extracts the ix86 implementation for expanding a SYMBOL > into its corresponding dllimport, far-address, or refptr symbol. > It will be reused in the aarch64-w64-mingw32 target. > The implementation is copied as is from i386/i386.

Re: [x86 PATCH] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:21 AM Roger Sayle wrote: > > > This patch addresses PR target/115351, which is a code quality regression > on x86 when passing floating point complex numbers. The ABI considers > these arguments to have TImode, requiring interunit moves to place the > FP values (which ar

[committed] testsuite/i386: Add vector sat_sub testcases [PR112600]

2024-06-06 Thread Uros Bizjak
PR middle-end/112600 gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-2a.c: New test. * gcc.target/i386/pr112600-2b.c: New test. Tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2a.c b/gcc/testsuite/gcc.target/i386/pr112600-2a.c new f

Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
tting .SAT_SUB via __builtin_sub_overflow (and in similar way for saturated add). Uros. > > Pan > > -Original Message- > From: Uros Bizjak > Sent: Wednesday, June 5, 2024 4:46 PM > To: Li, Pan2 > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; > juzhe.zh...@rivai.ai;

Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2 wrote: > > > I see. x86 doesn't have scalar saturating instructions, so the scalar > > version indeed can't be converted. > > > I will amend x86 testcases after the vector part of your patch is committed. > > Thanks for the confirmation. Just curious, the .

Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:22 AM Li, Pan2 wrote: > > > Is the above testcase correct? You need "(x + y)" as the first term. > > Thanks for comments, should be copy issue here, you can take SAT_SUB (x, y) > => (x - y) & (-(TYPE)(x >= y)) or below template for reference. > > +#define DEF_SAT_U_SUB_F

Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 9:38 AM Li, Pan2 wrote: > > Thanks Richard, will commit after the rebased pass the regression test. > > Pan > > -Original Message- > From: Richard Biener > Sent: Wednesday, June 5, 2024 3:19 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kit

Re: [PATCH v1 0/6] Add DLL import/export implementation to AArch64

2024-06-04 Thread Uros Bizjak
On Tue, Jun 4, 2024 at 10:10 PM Evgeny Karpov wrote: > > Richard and Uros, could you please review the changes for v2? LGTM for the generic x86 part, OS-specific part (cygming) should also be reviewed by OS port maintainer (CC'd). Thanks, Uros. > Additionally, we have detected an issue with GCC

[committed] i386: Force operand 1 of bswapsi2 to a register for !TARGET_BSWAP [PR115321]

2024-06-03 Thread Uros Bizjak
PR target/115321 gcc/ChangeLog: * config/i386/i386.md (bswapsi2): Force operand 1 to a register also for !TARGET_BSWAP. gcc/testsuite/ChangeLog: * gcc.target/i386/pr115321.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,m32}. Uros. diff --git a/gcc/config

Re: [PATCH] [x86] Add some preference for floating point rtl ifcvt when sse4.1 is not available

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:11 AM liuhongt wrote: > > W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for > movdfcc/movsfcc, and could possibly fail cost comparison. Increase > branch cost could hurt performance for other modes, so specially add > some preference for floating point i

Re: [PATCH 39/52] i386: New hook implementation ix86_c_mode_for_floating_type

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:02 AM Kewen Lin wrote: > > This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE > defines in i386 port, and add new port specific hook > implementation ix86_c_mode_for_floating_type. > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_c_mode_for_floating_type):

[committed] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-05-31 Thread Uros Bizjak
any_divmod instructions are modelled with invalid RTX: [(set (match_operand:DI 0 "register_operand" "=c") (sign_extend:DI (match_operator:SI 3 "divmod_operator" [(match_operand:DI 1 "register_operand" "a") (match_operand:DI 2 "register_ope

[committed] i386: Rewrite bswaphi2 handling [PR115102]

2024-05-30 Thread Uros Bizjak
Introduce *bswaphi2 instruction pattern and enable bswaphi2 expander also for non-movbe targets. The testcase: unsigned short bswap8 (unsigned short val) { return ((val & 0xff00) >> 8) | ((val & 0xff) << 8); } now expands through bswaphi2 named expander. Rewrite bswaphi_lowpart insn pattern a

[committed] i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 x86_32 targets

2024-05-28 Thread Uros Bizjak
Use MOVD/PEXTRD and MOVD/PINSRD insn sequences to move DImode value between XMM and GPR register sets for SSE4.1 x86_32 targets in order to avoid spilling the value to stack. The load from _Atomic location a improves from: movqa, %xmm0 movq%xmm0, (%esp) movl(%esp), %eax

Re: [PATCH V2] Reduce cost of MEM (A + imm).

2024-05-28 Thread Uros Bizjak
On Tue, May 28, 2024 at 12:48 PM liuhongt wrote: > > > IMO, there is no need for CONST_INT_P condition, we should also allow > > symbol_ref, label_ref and const (all allowed by > > x86_64_immediate_operand predicate), these all decay to an immediate > > value. > > Changed. > > Bootstrapped and reg

Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement

2024-05-27 Thread Uros Bizjak
On Mon, May 27, 2024 at 10:33 AM MayShao wrote: > > From: mayshao > > Hi all: > This patch enables -march/-mtune=shijidadao, costs and tunings are set > according to the characteristics of the processor. > > Bootstrapped /regtested X86_64. > > Ok for trunk? OK. Thanks, Uros. > BR

Re: [PATCH] Reduce cost of MEM (A + imm).

2024-05-27 Thread Uros Bizjak
On Tue, May 28, 2024 at 4:48 AM liuhongt wrote: > > For MEM, rtx_cost iterates each subrtx, and adds up the costs, > so for MEM (reg) and MEM (reg + 4), the former costs 5, > the latter costs 9, it is not accurate for x86. Ideally > address_cost should be used, but it reduce cost too much. > So cu

Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-23 Thread Uros Bizjak
On Thu, May 23, 2024 at 7:53 PM Evgeny Karpov wrote: > > > Thursday, May 23, 2024 10:35 AM > Uros Bizjak wrote: > > > Richard Sandiford wrote: > > > > > > > This looks good to me apart from a couple of very minor comments > > > > below, bu

  1   2   3   4   5   6   7   8   9   10   >