Re: [PATCH 05/17] [APX NDD] Support APX NDD for adc insns

2023-12-05 Thread Uros Bizjak
On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang wrote: > > From: Kong Lingling > > Legacy adc patterns are commonly adopted to TImode add, when extending TImode > add to NDD version, operands[0] and operands[1] can be different, so extra > move > should be emitted if those patterns have optimization

Re: [PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add

2023-12-05 Thread Uros Bizjak
On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang wrote: > > From: Kong Lingling > > gcc/ChangeLog: > > * config/i386/i386.md: (addsi_1_zext): Add new alternatives for > NDD and adjust output templates. > (*add_2): Likewise. > (*addsi_2_zext): Likewise. > (*add_3)

Re: [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled

2023-12-05 Thread Uros Bizjak
On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang wrote: > > Under APX NDD, previous TImode allocation will have issue that it was > originally allocated using continuous pair, like rax:rdi, rdi:rdx. > > This will cause issue for all TImode NDD patterns. For NDD we will not > assume the arithmetic operat

Re: [PATCH] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2023-12-04 Thread Uros Bizjak
On Wed, Nov 29, 2023 at 1:25 PM Richard Biener wrote: > > On Wed, Nov 29, 2023 at 10:35 AM Uros Bizjak wrote: > > > > The compiler, configured with --enable-checking=yes,rtl,extra ICEs with: > > > > internal compiler error: RTL check: expected elt 0 type 'e&#x

Re: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

2023-12-03 Thread Uros Bizjak
On Mon, Dec 4, 2023 at 8:11 AM Hongtao Liu wrote: > > On Fri, Dec 1, 2023 at 10:26 PM Richard Biener > wrote: > > > > On Fri, Dec 1, 2023 at 3:39 AM liuhongt wrote: > > > > > > > Hmm, I would suggest you put reg_needed into the class and accumulate > > > > over all vec_construct, with your patch

Re: [PATCH] i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837]

2023-12-03 Thread Uros Bizjak
On Mon, Dec 4, 2023 at 8:41 AM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs with RTL checking, because it sets if > XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set) > is actually an UNSPEC, so any time we see any other insn with PARALLEL > and a SET in it

Re: [PATCH] i386: Fix up signbit2 expander [PR112816]

2023-12-03 Thread Uros Bizjak
On Mon, Dec 4, 2023 at 8:35 AM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs, because the signbit2 expander uses an > explicit SUBREG in the pattern around match_operand with register_operand > predicate. If we are unlucky enough that expansion tries to expand it > with some SUBREG

Re: [PATCH] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2023-11-30 Thread Uros Bizjak
On Thu, Nov 30, 2023 at 9:21 AM Segher Boessenkool wrote: > > Hi! > > On Wed, Nov 29, 2023 at 02:20:03PM +0100, Uros Bizjak wrote: > > On Wed, Nov 29, 2023 at 1:25 PM Richard Biener > > wrote: > > > On Wed, Nov 29, 2023 at 10:35 AM Uros Bizjak wrote: > > I

Re: [PATCH] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2023-11-29 Thread Uros Bizjak
On Wed, Nov 29, 2023 at 1:25 PM Richard Biener wrote: > > On Wed, Nov 29, 2023 at 10:35 AM Uros Bizjak wrote: > > > > The compiler, configured with --enable-checking=yes,rtl,extra ICEs with: > > > > internal compiler error: RTL check: expected elt 0 type 'e&#x

[PATCH] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2023-11-29 Thread Uros Bizjak
The compiler, configured with --enable-checking=yes,rtl,extra ICEs with: internal compiler error: RTL check: expected elt 0 type 'e' or 'u', have 'E' (rtx unspec) in try_combine, at combine.cc:3237 This is 3236 /* Just replace the CC reg with a new mode. */ 3237 SUBST

[committed] i386: Improve cmpstrnqi_1 insn pattern [PR112494]

2023-11-28 Thread Uros Bizjak
REPZ CMPSB instruction does not update FLAGS register when %ecx register equals zero. Improve cmpstrnqi_1 insn pattern to set FLAGS_REG to its previous value instead of (const_int 0) when operand 2 equals zero. PR target/112494 gcc/ChangeLog: * config/i386/i386.md (cmpstrnqi_1): Set FLA

Re: [PATCH] i386: Fix up *jcc_bt*_mask{,_1} [PR111408]

2023-11-25 Thread Uros Bizjak
On Sat, Nov 25, 2023 at 8:19 AM Jakub Jelinek wrote: > > Hi! > > The following testcase is miscompiled in GCC 14 because the > *jcc_bt_mask and *jcc_bt_mask_1 patterns have just > one argument in (match_operator 0 "bt_comparison_operator" [...]) > but as bt_comparison_operator is eq,ne, we need tw

[committed] i386: Fix ICE with -fsplit-stack -mcmodel=large [PR112686]

2023-11-24 Thread Uros Bizjak
For -mcmodel=large, we have to load function address to a register. PR target/112686 gcc/ChangeLog: * config/i386/i386.cc (ix86_expand_split_stack_prologue): Load function address to a register for ix86_cmodel == CM_LARGE. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112686.c:

Re: [PATCH] i386: Fix ICE during cbranchv16qi4 expansion [PR112681]

2023-11-24 Thread Uros Bizjak
On Fri, Nov 24, 2023 at 9:31 AM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs, because cbranchv16qi4 expansion calls > ix86_expand_branch with op1 being a pre-AVX unaligned memory and > ix86_expand_branch emits a xorv16qi3 instruction without making sure > the operand predicates are

[committed] i386: Wrong code with __builtin_parityl [PR112672]

2023-11-23 Thread Uros Bizjak
gen_parityhi2_cmp instruction clobbers its input operand, so use a temporary register in the call to gen_parityhi2_cmp. PR target/112672 gcc/ChangeLog: * config/i386/i386.md (parityhi2): Use temporary register in the call to gen_parityhi2_cmp. gcc/testsuite/ChangeLog: * gcc.tar

Re: [RFC PATCH] i386: Fix ICE with -mforce-indirect-call and -fsplit-stack [PR89316]

2023-11-23 Thread Uros Bizjak
On Mon, Nov 20, 2023 at 5:33 PM Uros Bizjak wrote: > > With the above two options, use a temporary register regno (as returned > from split_stack_prologue_scratch_regno) as an indirect call scratch > register to hold __morestack function address. On 64-bit targets, two > temporar

[RFC PATCH] i386: Fix ICE with -mforce-indirect-call and -fsplit-stack [PR89316]

2023-11-20 Thread Uros Bizjak
With the above two options, use a temporary register regno (as returned from split_stack_prologue_scratch_regno) as an indirect call scratch register to hold __morestack function address. On 64-bit targets, two temporary registers are always available, so load the function address in %r11 and call

[committed] i386: Optimize QImode insn with high input registers

2023-11-16 Thread Uros Bizjak
Sometimes the compiler emits the following code with qi_ext_0: shrl$8, %eax addb%bh, %al Patch introduces new low part QImode insn patterns with both of their input arguments extracted from high register. This invalid insn is split after reload to a move from the high reg

[committed] i386: Fix invalid RTX in split2 pass [PR112567]

2023-11-16 Thread Uros Bizjak
Also fix some indentitation inconsistencies. PR target/112567 gcc/ChangeLog: * config/i386/i386.md (*qi_ext_1_slp): Fix generation of invalid RTX in split pattern. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/config/i386/i386.md b/gcc/confi

Re: [PATCH] i386: Fix mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi peephole2 [PR112526]

2023-11-15 Thread Uros Bizjak
On Thu, Nov 16, 2023 at 8:16 AM Jakub Jelinek wrote: > > Hi! > > The following testcase is miscompiled on x86_64 since PR110551 r14-4968 > commit. That commit added 2 peephole2s, one for > mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY > which I believe is ok, and another one f

Re: [committed] i386: Return CCmode from ix86_cc_mode for unknown RTX code [PR112494]

2023-11-15 Thread Uros Bizjak
On Tue, Nov 14, 2023 at 6:51 PM Jakub Jelinek wrote: > > On Mon, Nov 13, 2023 at 10:49:23PM +0100, Uros Bizjak wrote: > > Combine wants to combine following instructions into an insn that can > > perform both an (arithmetic) operation and set the condition code. During > &

[committed] i386: Optimize strict_low_part QImode insn with high input registers

2023-11-15 Thread Uros Bizjak
Following testcase: struct S1 { unsigned char val; unsigned char pad1; unsigned short pad2; }; struct S2 { unsigned char pad1; unsigned char val; unsigned short pad2; }; struct S1 test_add (struct S1 a, struct S2 b, struct S2 c) { a.val = b.val + c.val; return a; } compiles wit

[committed] i386: Fix strict_low_part QImode insn with high input register patterns [PR112540]

2023-11-15 Thread Uros Bizjak
PR target/112540 gcc/ChangeLog: * config/i386/i386.md (*addqi_ext_1_slp): Correct operand numbers in split pattern. Replace !Q constraint of operand 1 with !qm. Add insn constrain. (*subqi_ext_1_slp): Ditto. (*qi_ext_1_slp): Ditto. Bootstrapped and regression tested on

Re: [committed] i386: Generate strict_low_part QImode insn with high input register

2023-11-15 Thread Uros Bizjak
On Tue, Nov 14, 2023 at 6:37 PM Uros Bizjak wrote: > PR target/78904 > > gcc/ChangeLog: > > * config/i386/i386.md (*addqi_ext_1_slp): > New define_insn_and_split pattern. > (*subqi_ext_1_slp): Ditto. > (*qi_ext_1_slp): Ditto. > > gcc/testsuite/C

[committed] i386: Generate strict_low_part QImode insn with high input register

2023-11-14 Thread Uros Bizjak
Following testcase: struct S1 { unsigned char val; unsigned char pad1; unsigned short pad2; }; struct S2 { unsigned char pad1; unsigned char val; unsigned short pad2; }; struct S1 test_and (struct S1 a, struct S2 b) { a.val &= b.val; return a; } compiles with -O2 to: movl

Re: [PATCH] i386: Fix up 3_doubleword_lowpart [PR112523]

2023-11-14 Thread Uros Bizjak
On Tue, Nov 14, 2023 at 1:07 PM Jakub Jelinek wrote: > > Hi! > > On Sun, Nov 12, 2023 at 09:03:42PM -, Roger Sayle wrote: > > This patch improves register pressure during reload, inspired by PR 97756. > > Normally, a double-word right-shift by a constant produces a double-word > > result, the

[committed] i386: Rewrite pushfl2 and popfl1 as unspecs

2023-11-13 Thread Uros Bizjak
Flags reg is valid only with CC mode. gcc/ChangeLog: * config/i386/i386-expand.h (gen_pushfl): New prototype. (gen_popfl): Ditto. * config/i386/i386-expand.cc (ix86_expand_builtin) [case IX86_BUILTIN_READ_FLAGS]: Use gen_pushfl. [case IX86_BUILTIN_WRITE_FLAGS]: Use gen_popfl.

[committed] i386: Return CCmode from ix86_cc_mode for unknown RTX code [PR112494]

2023-11-13 Thread Uros Bizjak
Combine wants to combine following instructions into an insn that can perform both an (arithmetic) operation and set the condition code. During the conversion a new RTX is created, and combine passes the RTX code of the innermost RTX expression of the CC use insn in which CC reg is used to SELECT_

Re: [x86 PATCH] Improve reg pressure of double-word right-shift then truncate.

2023-11-12 Thread Uros Bizjak
On Sun, Nov 12, 2023 at 10:03 PM Roger Sayle wrote: > > > This patch improves register pressure during reload, inspired by PR 97756. > Normally, a double-word right-shift by a constant produces a double-word > result, the highpart of which is dead when followed by a truncation. > The dead code cal

[committed] i386: Remove *stack_protect_set_4s__di alternative that will never match

2023-11-12 Thread Uros Bizjak
The relevant peephole2 will never generate alternative (=m,=&a,0,m) because operand 1 is not dead before the peephole2 pattern. gcc/ChangeLog: * config/i386/i386.md (*stack_protect_set_4s__di): Remove alternative 0. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. Uros. diff -

[committed] i386: Clear stack protector scratch with zero/sign-extend instruction

2023-11-10 Thread Uros Bizjak
Use unrelated register initializations using zero/sign-extend instructions to clear stack protector scratch register. Handle only SI -> DImode extensions for 64-bit targets, as this is the only extension that triggers the peephole in a non-negligible number. Also use explicit check for word_mode

[committed] i386: Improve stack protector patterns and peephole2s even more

2023-11-09 Thread Uros Bizjak
Improve stack protector patterns and peephole2s even more: a. Use unrelated register clears with integer mode size <= word mode size to clear stack protector scratch register. b. Use unrelated register initializations in front of stack protector sequence to clear stack protector scratch reg

Re: [PATCH v2] i386 PIE: accept @GOTOFF in load/store multi base address

2023-11-09 Thread Uros Bizjak
On Wed, Nov 8, 2023 at 5:37 PM Alexandre Oliva wrote: > > Ping? > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598872.html > > Looking at the code generated for sse2-{load,store}-multi.c with PIE, > I realized we could use UNSPEC_GOTOFF as a base address, and that this > would enable the te

[committed] i386: Apply LRA reload workaround to insns with high registers [PR82524]

2023-11-08 Thread Uros Bizjak
LRA is not able to reload zero_extracted in-out operand with matched input operand in the same way as strict_low_part in-out operand. The patch applies the strict_low_part workaround, where we allow LRA to generate an instruction with non-matched input operand, which is split post reload to the in

[committed] i386: Make flags_reg_operand a special predicate

2023-11-07 Thread Uros Bizjak
There is no need to check the mode in flags_reg_operand predicate. The mode in flags setting instructions is checked with ix86_match_ccmode. The patch avoids "warning: operand X missing mode?" warnings with VOIDmode flags_reg_operand predicate. gcc/ChangeLog: * config/i386/predicates.md ("fl

[committed] i386: Use "addr" attribute to limit address regclass to non-REX regs

2023-11-06 Thread Uros Bizjak
Use "addr" attribute with "gpr8" value to limit address register class to non-REX registers in instructions with high registers, where REX registers can not be used in the address. gcc/ChangeLog: * config/i386/constraints.md (Bc): Remove constraint. (Bn): Rewrite to use x86_extended_reg_m

[committed] i386: Add LEGACY_INDEX_REG register class.

2023-11-05 Thread Uros Bizjak
Also rename LEGACY_REGS to LEGACY_GENERAL_REGS. gcc/ChangeLog: * config/i386/i386.h (enum reg_class): Add LEGACY_INDEX_REGS. Rename LEGACY_REGS to LEGACY_GENERAL_REGS. (REG_CLASS_NAMES): Ditto. (REG_CLASS_CONTENTS): Ditto. * config/i386/constraints.md ("R"): Update for rename.

[COMMITTED]: i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
The patch generalizes address register class handling to allow multiple register classes. For APX EGPR targets, some instructions do not support GPR32 registers, so it is necessary to limit address register set to avoid them. The same situation happens for instructions with high registers, where

Re: [RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
023年11月3日周五 20:50写道: > > > > On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak wrote: > > > > > > The patch generalizes address register class handling to allow multiple > > > address register classes. For APX EGPR targets, some instructions can't > >

[RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
The patch generalizes address register class handling to allow multiple address register classes. For APX EGPR targets, some instructions can't be encoded with REX2 prefix, so it is necessary to limit address register class to avoid REX2 registers. The same situation happens for instructions with

[COMMITTED] i386: Move stack protector patterns above mov $0 -> xor peephole

2023-11-02 Thread Uros Bizjak
Move stack protector patterns above mov $0,%reg -> xor %reg,%reg so the latter won't interfere with stack protector peephole2s. gcc/ChangeLog: * config/i386/i386.md: Move stack protector patterns above mov $0,%reg -> xor %reg,%reg peephole2 pattern. Bootstrapped and regression tested on

Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Uros Bizjak
On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle wrote: > > > Hi Uros, > > > From: Uros Bizjak > > Sent: 01 November 2023 10:05 > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation > > using peephole2. > > > > On Mon

Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Uros Bizjak
On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle wrote: > > > This patch is a follow-up to my previous PR target/110551 patch, this > time to address the additional move after mulx, seen on TARGET_BMI2 > architectures (such as -march=haswell). The complication here is > that the flexible multiple-set

[PUSHED] i386: Improve stack protector patterns and peephole2s

2023-11-01 Thread Uros Bizjak
Improve stack protector patterns and peephole2s to substitute stack protector scratch register clear with unrelated subsequent register initialization in several ways: a. Explicitly generate scratch register as named pseudo. This allows optimizers to eventually reuse the zero value in the registe

Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-30 Thread Uros Bizjak
On Mon, Oct 30, 2023 at 10:08 AM Mayshao-oc wrote: > > >On Fri, Oct 27, 2023 at 12:20 PM mayshao wrote: > >> > >> On 2023/10/26 17:34, Uros Bizjak wrote: > >> > On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote: > >> >> > >> >>

Re: [PATCH] Testsuite, i386: Fix test by passing -march

2023-10-30 Thread Uros Bizjak
On Mon, Oct 30, 2023 at 12:53 PM FX Coudert wrote: > > Hi, > > The newly introduced test gcc.target/i386/pr111698.c currently fails on > Darwin, where the default arch is core2. > Andrew suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112287 to > pass a recent value to -march, and I ca

Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-27 Thread Uros Bizjak
On Fri, Oct 27, 2023 at 12:20 PM mayshao wrote: > > On 2023/10/26 17:34, Uros Bizjak wrote: > > On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote: > >> > >> Hi all: > >> This patch enables -march/-mtune=yongfeng, costs and tunings are set > >> a

Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-26 Thread Uros Bizjak
On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote: > > Hi all: > This patch enables -march/-mtune=yongfeng, costs and tunings are set > according to the characteristics of the processor. We add a new md file to > describe yongfeng processor. > > Bootstrapped /regtested X86_64. > > Ok for

Re: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-26 Thread Uros Bizjak
sses before reload check both predicates and > constraints. > > My original patch fixes PR 110511, using the same peephole2 idiom as already > used elsewhere in i386.md. Ok for mainline? Thanks for the explanation. The patch is OK. > > -Original Message- > > From: U

[committed] i386: Narrow test instructions with immediate operands [PR111698]

2023-10-25 Thread Uros Bizjak
i386: Narrow test instructions with immediate operands [PR111698] Narrow test instructions with immediate operand that test memory location for zero. E.g. testl $0x00aa, mem can be converted to testb $0xaa, mem+2. Reject targets where reading (possibly unaligned) part of memory location after

Re: [x86 PATCH] Fine tune STV register conversion costs for -Os.

2023-10-24 Thread Uros Bizjak
On Mon, Oct 23, 2023 at 4:47 PM Roger Sayle wrote: > > > The eagle-eyed may have spotted that my recent testcases for DImode shifts > on x86_64 included -mno-stv in the dg-options. This is because the > Scalar-To-Vector (STV) pass currently transforms these shifts to use > SSE vector operations,

Re: [PATCH] i386: Avoid paradoxical subreg dests in vector zero_extend

2023-10-24 Thread Uros Bizjak
On Tue, Oct 24, 2023 at 12:08 PM Richard Sandiford wrote: > > For the V2HI -> V2SI zero extension in: > > typedef unsigned short v2hi __attribute__((vector_size(4))); > typedef unsigned int v2si __attribute__((vector_size(8))); > v2si f (v2hi x) { return (v2si) {x[0], x[1]}; } > > ix86_expan

Re: [PATCH] [x86] Remove unused mmx_pinsrw.

2023-10-20 Thread Uros Bizjak
On Fri, Oct 20, 2023 at 8:54 AM liuhongt wrote: > > When I'm working on enable more 32/64-bit vectorization for _Float16, > I notice there's 1 redundant define_expand, the patch removed the expander. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: >

Re: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-19 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 9:05 PM Roger Sayle wrote: > > > This patch contains clean-ups of the widening multiplication patterns in > i386.md, and provides variants of the existing highpart multiplication > peephole2 transformations (that tidy up register allocation after > reload), and thereby fixe

Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-18 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 7:54 PM Roger Sayle wrote: > > > Hi Uros, > Thanks for the speedy review. > > > From: Uros Bizjak > > Sent: 17 October 2023 17:38 > > > > On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle > > wrote: > > > > > > &g

Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-17 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle wrote: > > > This patch is the backend piece of a solution to PRs 101955 and 106245, > that adds a define_insn_and_split to the i386 backend, to perform sign > extension of a single (least significant) bit using AND $1 then NEG. > > Previously, (x<<31)>>

Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Mon, Oct 16, 2023 at 9:58 PM Fangrui Song wrote: > > On Mon, Oct 16, 2023 at 12:10 PM Uros Bizjak wrote: > > > > On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote: > > > > > > On 2023-10-16, Uros Bizjak wrote: > > > >On Tue, Aug 1, 2023 at 9:5

Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote: > > On 2023-10-16, Uros Bizjak wrote: > >On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote: > >> > >> When using -mcmodel=medium, large data objects larger than the > >> -mlarge-data-threshold thresh

Re: [PATCH v4] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote: > > When using -mcmodel=medium, large data objects larger than the > -mlarge-data-threshold threshold are placed into large data sections > (.lrodata, .ldata, .lbss and some variants). GNU ld and ld.lld 17 place > .l* sections into separate outpu

Re: [X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

2023-10-09 Thread Uros Bizjak
On Fri, Oct 6, 2023 at 3:59 PM Roger Sayle wrote: > > > Grr! I've done it again. ENOPATCH. > > > -Original Message- > > From: Roger Sayle > > Sent: 06 October 2023 14:58 > > To: 'gcc-patches@gcc.gnu.org' > > Cc: 'Uros Bizja

[COMMITTED] i386: Improve memory copy from named address space [PR111657]

2023-10-05 Thread Uros Bizjak
The stringop strategy selection algorithm falls back to a libcall strategy when it exhausts its pool of available strategies. The memory area copy function (memcpy) is not available from the system library for non-default address spaces, so the compiler emits the most trivial byte-at-a-time copy l

Re: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.

2023-10-05 Thread Uros Bizjak
On Thu, Oct 5, 2023 at 1:45 PM Roger Sayle wrote: > > Doh! ENOPATCH. > > > -Original Message- > > From: Roger Sayle > > Sent: 05 October 2023 12:44 > > To: 'gcc-patches@gcc.gnu.org' > > Cc: 'Uros Bizjak' > > Subject: [X8

Re: [X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz.

2023-10-05 Thread Uros Bizjak
On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle wrote: > > > This patch avoids long lea instructions for performing x<<2 and x<<3 > by splitting them into shorter sal and move (or xchg instructions). > Because this increases the number of instructions, but reduces the > total size, its suitable for -O

[committed] i386: Handle CONST_WIDE_INT in output_pic_addr_const [PR111340]

2023-09-11 Thread Uros Bizjak via Gcc-patches
PR target/111340 gcc/ChangeLog: * config/i386/i386.cc (output_pic_addr_const): Handle CONST_WIDE_INT. Call output_addr_const for CASE_CONST_SCALAR_INT. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111340.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-06 Thread Uros Bizjak via Gcc-patches
On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov wrote: > > > On 9/1/23 05:07, Hongyu Wang wrote: > > Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:16写道: > >> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote: > >>> From: Kong Lingling > >>> >

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-04 Thread Uros Bizjak via Gcc-patches
On Mon, Sep 4, 2023 at 2:28 AM Hongtao Liu wrote: > > > > > > > I think there should be some constraint which explicitly has all > > > > > > > the 32 > > > > > > > GPRs, like there is one for just all 16 GPRs (h), so that > > > > > > > regardless of > > > > > > > -mapx-inline-asm-use-gpr32 one

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Uros Bizjak via Gcc-patches
On Fri, Sep 1, 2023 at 12:36 PM Hongtao Liu wrote: > > On Fri, Sep 1, 2023 at 5:38 PM Uros Bizjak via Gcc-patches > wrote: > > > > On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang wrote: > > > > > > Uros Bizjak via Gcc-patches 于2023年8月31日周四 > > > 18:

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Uros Bizjak via Gcc-patches
On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang wrote: > > Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:01写道: > > > > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches > > wrote: > > > > > > On Thu, Aug 31, 2023 at 04:20:17PM +0800

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote: > > From: Kong Lingling > > Current reload infrastructure does not support selective base_reg_class > for backend insn. Add insn argument to base_reg_class for > lra/reload usage. I don't think this is the correct approach. Ideally, a memory co

Re: [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote: > > From: Kong Lingling > > These legacy insn in opcode map0/1 only support GPR16, > and do not have vex/evex counterpart, directly adjust constraints and > add gpr32 attr to patterns. > > insn list: > 1. xsave/xsave64, xrstor/xrstor64 > 2. xsav

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches wrote: > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote: > > From: Kong Lingling > > > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR > > usage by default from mapping the comm

[PATCH] fortran: Rename TRUE/FALSE to true/false in *.cc files

2023-08-25 Thread Uros Bizjak via Gcc-patches
gcc/fortran/ChangeLog: * match.cc (gfc_match_equivalence): Rename TRUE/FALSE to true/false. * module.cc (check_access): Ditto. * primary.cc (match_real_constant): Ditto. * trans-array.cc (gfc_trans_allocate_array_storage): Ditto. (get_array_ctor_strlen): Ditto. * trans-comm

[committed] treewide: Rename TRUE/FALSE to true/false in *.cc files

2023-08-25 Thread Uros Bizjak via Gcc-patches
gcc/c-family/ChangeLog: * c-format.cc (read_any_format_width): Rename TRUE/FALSE to true/false. gcc/ChangeLog: * caller-save.cc (new_saved_hard_reg): Rename TRUE/FALSE to true/false. (setup_save_areas): Ditto. * gcc.cc (set_collect_gcc_options): Ditto. (driver::build_

[committed] i386: Optimize pinsrq of 0 with index 1 into movq [PR94866]

2023-08-24 Thread Uros Bizjak via Gcc-patches
Add new pattern involving vec_merge RTX that is produced by combine from the combination of sse4_1_pinsrq and *movdi_internal: 7: r86:DI=0 8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2) REG_DEAD r87:V2DI REG_DEAD r86:DI Successfully matched this instruction: (set (re

Re: [PATCH 6/12] i386: Enable _BitInt on x86-64 [PR102989]

2023-08-23 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:19 PM Jakub Jelinek wrote: > > Hi! > > The following patch enables _BitInt support on x86-64, the only > target which has _BitInt specified in psABI. > > 2023-08-09 Jakub Jelinek > > PR c/102989 > * config/i386/i386.cc (classify_argument): Handle BITINT_

[committed] i386: Fix register spill failure with concat RTX [PR111010]

2023-08-23 Thread Uros Bizjak via Gcc-patches
Disable (=&r,m,m) alternative for 32-bit targets. The combination of two memory operands (possibly with complex addressing mode), early clobbered output, frame pointer and PIC registers uses too many registers on a register constrained 32-bit target. Also merge two similar patterns using DWIH mode

[committed] i386: Micro-optimize ix86_expand_sse_extend

2023-08-20 Thread Uros Bizjak via Gcc-patches
Partial vector src is forced to a register as ops[1], we can use it instead of SRC in the call to ix86_expand_sse_cmp. This change avoids forcing operand[1] to a register in sign/zero-extend expanders. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse_extend): Use ops[1] inste

[committed]: i386: Use PUNPCKL?? to implement vector extend and zero_extend for TARGET_SSE2 [PR111023]

2023-08-18 Thread Uros Bizjak via Gcc-patches
Implement vector extend and zero_extend functionality for TARGET_SSE2 using PUNPCKL?? family of instructions. The code for e.g. zero-extend from V2SI to V2DImode improves from: movd%xmm0, %edx pshufd $85, %xmm0, %xmm0 movd%xmm0, %eax movq%rdx, (%rdi)

Re: [PATCH] Generate vmovapd instead of vmovsd for moving DFmode between SSE_REGS.

2023-08-14 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 14, 2023 at 4:46 AM liuhongt via Gcc-patches wrote: > > vmovapd can enable register renaming and have same code size as > vmovsd. Similar for vmovsh vs vmovaps, vmovaps is 1 byte less than > vmovsh. > > When TARGET_AVX512VL is not available, still generate > vmovsd/vmovss/vmovsh to avo

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 10, 2023 at 9:40 AM Richard Biener wrote: > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > Currently we have 3 different independent tunes for gather > > "use_gather,use_gather_2parts,use_gather_4parts", > > similar for scatter, there're > > "use_scatter,use_scatter_2parts,

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to

Re: [PATCH] i386: Do not sanitize upper part of V2HFmode and V4HFmode reg with -fno-trapping-math [PR110832]

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 10, 2023 at 2:49 AM liuhongt wrote: > > Also add ix86_partial_vec_fp_math to to condition of V2HF/V4HF named > patterns in order to avoid generation of partial vector V8HFmode > trapping instructions. > > Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,} > Ok for trunk? > > gcc/

Re: [PATCH] i386: Clear upper bits of XMM register for V4HFmode/V2HFmode operations [PR110762]

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 7, 2023 at 1:20 PM Richard Biener wrote: > > Please also note the RFC patch [1] that relaxes clears for V2SFmode > > with -fno-trapping-math. The patched compiler will then emit the same > > code as clang does for -O2. Which raises another question - should gcc > > default to -fno-tra

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:38 AM Uros Bizjak wrote: > > On Wed, Aug 9, 2023 at 8:37 AM Liu, Hongtao wrote: > > > > > > > > > -Original Message- > > > From: Uros Bizjak > > > Sent: Wednesday, August 9, 2023 2:33 PM > > > To: Liu,

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:37 AM Liu, Hongtao wrote: > > > > > -Original Message- > > From: Uros Bizjak > > Sent: Wednesday, August 9, 2023 2:33 PM > > To: Liu, Hongtao > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH V2] [X86] Work

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 3:48 AM liuhongt wrote: > > > Please rather do it in a more self-descriptive way, as proposed in the > > attached patch. You won't need a comment then. > > > > Adjusted in V2 patch. > > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is > supported via EAX. > > Intel

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 3:48 AM liuhongt wrote: > > > Please rather do it in a more self-descriptive way, as proposed in the > > attached patch. You won't need a comment then. > > > > Adjusted in V2 patch. > > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is > supported via EAX. > > Intel

[committed] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-08-08 Thread Uros Bizjak via Gcc-patches
Also introduce -m[no-]partial-vector-fp-math option to disable trapping V2SF named patterns in order to avoid generation of partial vector V4SFmode trapping instructions. The new option is enabled by default, because even with sanitization, a small but consistent speed up of 2 to 3% with Polyhedro

Re: [PATCH] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 8, 2023 at 9:58 AM liuhongt wrote: > > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is > supported via EAX. > > Intel documentation says invalid subleaves return 0. We had been > relying on that behavior instead of checking the max sublef number. > > It appears that some Sand

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 8, 2023 at 12:08 PM Richard Biener wrote: > > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > > > > > named patterns in order to avoid generation of partial vector > > > > > > V4SFmode > > > > > > trapping instructions. > > > > > > > > > > > > The new

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 8, 2023 at 10:07 AM Richard Biener wrote: > > On Mon, 7 Aug 2023, Uros Bizjak wrote: > > > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener wrote: > > > > > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > > > > > Also introduce

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-08-07 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 31, 2023 at 11:40 AM Richard Biener wrote: > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > named patterns in order to avoid generation of partial vector V4SFmode > > trapping in

Re: PR target/107671: Make more use of btl/btq on x86_64.

2023-08-07 Thread Uros Bizjak via Gcc-patches
ith and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-08-07 Roger Sayle > Uros Bizjak > > gcc/ChangeLog > PR target/107671 > * config/i386/i386.md (*bt_setc_mask): Allow the > shift count to

Re: [PATCH] i386: Clear upper bits of XMM register for V4HFmode/V2HFmode operations [PR110762]

2023-08-07 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 7, 2023 at 10:57 AM liuhongt wrote: > > Similar like r14-2786-gade30fad6669e5, the patch is for V4HF/V2HFmode. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/110762 > * config/i386/mmx.md (3): Changed from

Re: [x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns.

2023-08-03 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 3, 2023 at 9:10 AM Roger Sayle wrote: > > > This patch is the final piece in the series to improve the ABI issues > affecting PR 88873. The previous patches tackled inserting DFmode > values into V2DFmode registers, by introducing insvti_{low,high}part > patterns. This patch improves

Re: [x86 PATCH] PR target/110792: Early clobber issues with rot32di2_doubleword.

2023-08-02 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 3, 2023 at 12:18 AM Roger Sayle wrote: > > > This patch is a conservative fix for PR target/110792, a wrong-code > regression affecting doubleword rotations by BITS_PER_WORD, which > effectively swaps the highpart and lowpart words, when the source to be > rotated resides in memory. Th

Re: [PATCH] Optimize vlddqu + inserti128 to vbroadcasti128

2023-08-01 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 2, 2023 at 3:33 AM liuhongt wrote: > > In [1], I propose a patch to generate vmovdqu for all vlddqu intrinsics > after AVX2, it's rejected as > > The instruction is reachable only as __builtin_ia32_lddqu* (aka > > _mm_lddqu_si*), so it was chosen by the programmer for a reason. I > > t

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-07-31 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 31, 2023 at 11:40 AM Richard Biener wrote: > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > named patterns in order to avoid generation of partial vector V4SFmode > > trapping in

[RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-07-30 Thread Uros Bizjak via Gcc-patches
Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF named patterns in order to avoid generation of partial vector V4SFmode trapping instructions. The new option is enabled by default, because even with sanitization, a small but consistent speed up of 2 to 3% with Polyhedron capaci

[committed] testsuite: Fix gfortran.dg/ieee/comparisons_3.F90 testsuite failures

2023-07-26 Thread Uros Bizjak via Gcc-patches
The testcase should use dg-additional-options instead of dg-options to not overwrite default compile flags that include path for finding the IEEE modules. gcc/testsuite/ChangeLog: * gfortran.dg/ieee/comparisons_3.F90: Use dg-additional-options instead of dg-options. Tested on x86_64-linu

<    1   2   3   4   5   6   7   8   9   10   >