Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-15 Thread Uros Bizjak via Gcc-patches
On Sat, Jan 15, 2022 at 5:39 PM Hongyu Wang wrote: > > Thanks for the suggestion, here is the updated patch that survived > bootstrap/regtest. LGTM for me, but please get the final approval from Hongtao. Thanks, Uros. > > Please note reg_mentioned_p in the above condition. This function > > ret

Re: [PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]

2022-01-15 Thread Uros Bizjak via Gcc-patches
On Sat, Jan 15, 2022 at 12:23 PM Jakub Jelinek wrote: > > On Sat, Jan 15, 2022 at 11:42:55AM +0100, Uros Bizjak wrote: > > Yes, that would be nice. XFmode is used for long double, and not obsolete. > > Ok, that seems to work. Compared to the incremental patch I've posted, I > also had to add hand

Re: [PATCH] widening_mul, i386: Improve spaceship expansion on x86 [PR103973]

2022-01-15 Thread Uros Bizjak via Gcc-patches
On Sat, Jan 15, 2022 at 10:56 AM Jakub Jelinek wrote: > > On Sat, Jan 15, 2022 at 09:29:05AM +0100, Uros Bizjak wrote: > > > --- gcc/config/i386/i386.md.jj 2022-01-14 11:51:34.432384170 +0100 > > > +++ gcc/config/i386/i386.md 2022-01-14 18:22:41.140906449 +0100 > > > @@ -23886,6 +23886,18 @@

Re: [PATCH] widening_mul, i386: Improve spaceship expansion on x86 [PR103973]

2022-01-15 Thread Uros Bizjak via Gcc-patches
On Fri, Jan 14, 2022 at 11:56 PM Jakub Jelinek wrote: > > Hi! > > C++20: > #include > auto cmp4way(double a, double b) > { > return a <=> b; > } > expands to: > ucomisd %xmm1, %xmm0 > jp .L8 > movl$0, %eax > jne .L8 > .L2: > ret > .p2

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jan 14, 2022 at 2:44 PM Hongyu Wang wrote: > > > Are there any technical obstacles to introduce subst to > > define_{,insn_and_}split? > > gccint says: define_subst can be used only in define_insn and > define_expand, it cannot be used in other expressions (e.g. in > define_insn_and_split)

[PATCH] i386: Mark some of strict_low_part insn constraints earlyclobbered

2022-01-14 Thread Uros Bizjak via Gcc-patches
While there is practically impossible that input registers are matched with in-out register, better mark the output operand of the split alternative as earlyclobbered - we do output early to the output operand when the insn is split. 2022-01-14 Uroš Bizjak gcc/ChangeLog: * config/i386/i38

[PATCH] libstdc++: Fix 22_locale/numpunct/members/char/3.cc execution test

2022-01-14 Thread Uros Bizjak via Gcc-patches
The test fails on Fedora 33+ because nl_NL locale got thousands separator defined. Use one of ar_SA, bg_BG, bs_BA, pt_PT or plain C locale instead. 2022-01-14 Uroš Bizjak libstdc++-v3/ChangeLog: * testsuite/22_locale/numpunct/members/char/3.cc (test02): Use pt_PT locale instead

Re: [PATCH] x86_64: Improvements to arithmetic right shifts of V1TImode values.

2022-01-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jan 14, 2022 at 10:00 AM Roger Sayle wrote: > > > Hi Uros, > Here's a revised version of this patch incorporating your suggestion of using > force_reg instead of emit_move_insn to a pseudo allocated by gen_reg_rtx. > I also took the opportunity to transition the rest of the function (and

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jan 14, 2022 at 7:11 AM Hongyu Wang wrote: > > > > No, the approach is wrong. You have to solve output clearing on RTL > > > level, please look at how e.g. tzcnt false dep is solved: > > > > Actually we have considered such approach before, but we found we need > > to break original define

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jan 14, 2022 at 6:46 AM Hongyu Wang wrote: > > > No, the approach is wrong. You have to solve output clearing on RTL > > level, please look at how e.g. tzcnt false dep is solved: > > Actually we have considered such approach before, but we found we need > to break original define_insn to r

[PATCH] i386: Introduce V2QImode vectorized shifts [PR103861]

2022-01-13 Thread Uros Bizjak via Gcc-patches
Add V2QImode shift operations and split them to synthesized double HI/LO QImode operations with integer registers. Also robustify arithmetic split patterns. 2022-01-13 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/i386.md (*ashlqi_ext_2): New insn pattern. (*qi_ext_2)

[PATCH] i386: Cleanup V2QI arithmetic instructions

2022-01-13 Thread Uros Bizjak via Gcc-patches
2022-01-13 Uroš Bizjak gcc/ChangeLog: * config/i386/mmx.md (negv2qi): Disparage GPR alternative a bit. Disable for TARGET_PARTIAL_REG_STALL unless optimizing for size. (negv2qi splitters): Use lowpart_subreg instead of gen_lowpart to create subreg. (v2qi3): Disparage GPR al

[PATCH] i386: Add 16-bit vector modes to xop_pcmov [PR104003]

2022-01-13 Thread Uros Bizjak via Gcc-patches
2022-01-13 Uroš Bizjak gcc/ChangeLog: PR target/104003 * config/i386/mmx.md (*xop_pcmov_): Use VI_16_32 mode iterator. gcc/testsuite/ChangeLog: PR target/104003 * g++.target/i386/pr103861-1-sse4.C: New test. * g++.target/i386/pr103861-1-xop.C: Ditto. Bootstrapped and reg

Re: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask).

2022-01-12 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 13, 2022 at 2:53 AM Jiang, Haochen wrote: > > Hi Uros, > > Has fixed that format issue with this new patch. Ok for trunk? The patch was already approved in my previous message, so no need to re-approve it. I'm sure you are able to move one brace to a new position without another revie

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-12 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 13, 2022 at 8:28 AM Hongyu Wang wrote: > > From: wwwhhhyyy > > Hi, > > For GoldenCove micro-architecture, force insert zero-idiom in asm > template to break false dependency of dest register for several insns. > > The related insns are: > > VPERM/D/Q/PS/PD > VRANGEPD/PS/SD/SS > VGETMA

[PATCH] testsuite: Compile g++.dg/vect/slp-pr98855.cc only for x86 targets [PR103935]

2022-01-12 Thread Uros Bizjak via Gcc-patches
The testcase is x86 specific, other targets have different costs defined. 2022-01-12 Uroš Bizjak gcc/testsuite/ChangeLog: PR target/103935 * g++.dg/vect/slp-pr98855.cc: Compile only for x86 targets. Tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/testsu

Re: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask).

2022-01-12 Thread Uros Bizjak via Gcc-patches
On Wed, Jan 12, 2022 at 9:11 AM Haochen Jiang wrote: > > Hi all, > > This patch targets PR94790, which change the instruction selection under the > following circumstance. > > Regtested on x86_64-pc-linux-gnu. Ok for trunk? Please also test with -m32, e.g.: make -j 12 -k check RUNTESTFLAGS="--t

Re: [PATCH] x86_64: Improvements to arithmetic right shifts of V1TImode values.

2022-01-12 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 11, 2022 at 2:26 PM Roger Sayle wrote: > > > This patch to the i386 backend's ix86_expand_v1ti_ashiftrt provides > improved (shorter) implementations of V1TI mode arithmetic right shifts > for constant amounts between 111 and 126 bits. The significance of > this range is that this fun

[PATCH] i386: Add CC clobber and splits for 32-bit vector mode logic insns [PR100673, PR103861]

2022-01-12 Thread Uros Bizjak via Gcc-patches
Add CC clobber to 32-bit vector mode logic insns to allow variants with general-purpose registers. Also improve ix86_sse_movcc to emit insn with CC clobber for narrow vector modes in order to re-enable conditional moves for 16-bit and 32-bit narrow vector modes with -msse2. 2022-01-12 Uroš Bizja

[PATCH] i386: Introduce V2QImode vector cmove for -msse4.1 [PR103861]

2022-01-11 Thread Uros Bizjak via Gcc-patches
This patch also moves V2HI and V4QImode vector conditional moves to SSE4.1 targets. Vector cmoves are implemented with SSE logic functions without -msse4.1, and they are hardly worthwile for narrow vector modes. More important, we would like to keep vector logic functions for GPR registers, and th

[PATCH] i386: Introduce V2QImode vector compares [PR103861]

2022-01-10 Thread Uros Bizjak via Gcc-patches
Add V2QImode vector compares with SSE registers. 2022-01-10 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/i386-expand.c (ix86_expand_int_sse_cmp): Handle V2QImode. * config/i386/mmx.md (3): Use VI1_16_32 mode iterator. (*eq3): Ditto. (*gt3): Ditto.

[PATCH] tree-optimization/103948 - detect vector vec_cmp in expand_vector_condition

2022-01-10 Thread Uros Bizjak via Gcc-patches
Currently, expand_vector_condition detects only vcondMN and vconduMN named RTX patterns. Teach it to also consider vec_cmpMN and vec_cmpuMN RTX patterns when all ones vector is returned for true and all zeros vector is returned for false. Patch by Richard, I tested it on the patched x86 target an

Re: [PATCH] x86_64: Ignore zero width bitfields in ABI and issue -Wpsabi warning about C zero width bitfield ABI changes [PR102024]

2022-01-10 Thread Uros Bizjak via Gcc-patches
On Mon, Jan 10, 2022 at 3:23 PM Michael Matz wrote: > > Hello, > > On Mon, 20 Dec 2021, Uros Bizjak wrote: > > > > Thanks. > > > I see nobody commented on Micha's post there. > > > > > > Here is a patch that implements it in GCC, i.e. C++ doesn't change ABI > > > (at least > > > not from the past

Re: [PATCH] x86_64: Improve (interunit) moves from TImode to V1TImode.

2022-01-08 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 6, 2022 at 7:00 PM Roger Sayle wrote: > > > > This patch improves the code generated when moving a 128-bit value > > in TImode, represented by two 64-bit registers, to V1TImode, which > > is a single SSE register. > > > > Currently, the simple move: > > typedef unsigned __int128 uv1ti

[PATCH] i386: Robustify V2QI and V4QI move patterns

2022-01-07 Thread Uros Bizjak via Gcc-patches
Add sse2 isa attribute where needed and remove where not needed. 2022-01-07 Uroš Bizjak gcc/ChangeLog: * config/i386/mmx.md (*move_internal): Add isa attribute. (*movv2qi_internal): Remve sse2 requirement for alternatives 4,5. Bootstrapped and regression tested on x86_64-linux-gnu {,

Re: [PATCH] x86: Generate INT3 for __builtin_eh_return

2022-01-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 6, 2022 at 7:58 PM H.J. Lu wrote: > > Generate INT3 after indirect jmp in exception return for -fcf-protection > with -mharden-sls=indirect-jmp. > > gcc/ > > PR target/103925 > * config/i386/i386.c (ix86_output_indirect_function_return): > Generate INT3 after in

Re: [PATCH] x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp

2022-01-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 6, 2022 at 7:57 PM H.J. Lu wrote: > > Indirect branch also includes indirect call instructions. Rename > -harden-sls=indirect-branch to -harden-sls=indirect-jmp to match its > intended behavior. > > PR target/102952 > * config/i386/i386-opts.h (harden_sls): Replace >

[PATCH] i386: Improve HImode interunit moves

2022-01-06 Thread Uros Bizjak via Gcc-patches
Currently, the compiler moves HImode values between GPR and XMM registers with: %vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0} %vpextrw\t{$0, %1, %k0|%k0, %1, 0} but it could use slightly faster and shorter: %vmovd\t{%k1, %0|%0, %k1} %vmovd\t{%1, %k0|%k0, %1} 2022-01-06 Uroš Bizjak gc

Re: [PATCH] [i386] Optimize V16HF vector insert to element 0 for AVX2.

2022-01-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 6, 2022 at 10:22 AM liuhongt via Gcc-patches wrote: > > Also remove mode attribute blendsuf, use ssemodesuf instead. > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready to push to trunk. > > gcc/ChangeLog: > > PR target/103753 > * config/i386/i386-expand

[PATCH] i386: Introduce V2QImode minmax, abs and uavgv2hi3_ceil [PR103861]

2022-01-05 Thread Uros Bizjak via Gcc-patches
Add V2QImode minmax, abs and uavxv2qi3_ceil operations with SSE registers. 2022-01-05 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/mmx.md (VI_16_32): New mode iterator. (VI1_16_32): Ditto. (mmxvecsize): Handle V2QI mode. (3): Rename from v4qi3. Use VI1_16_

[PATCH] i386: Fix type of one_cmplv2qi2 alternatives 1,2 [PR103915]

2022-01-05 Thread Uros Bizjak via Gcc-patches
2022-01-05 Uroš Bizjak gcc/ChangeLog: PR target/103915 * config/i386/mmx.md (one_cmplv2qi2): Change alternatives 1,2 type from sselog to sselog1. gcc/testsuite/ChangeLog: PR target/103915 * gcc.target/i386/pr103915.c: New test. Bootstrapped and regression tested on x86_

[PATCH] i386: Fix expand_vec_perm_pshufb for narrow modes [PR103905]

2022-01-05 Thread Uros Bizjak via Gcc-patches
2022-01-05 Uroš Bizjak gcc/ChangeLog: PR target/103905 * config/i386/i386-expand.c (expand_vec_perm_pshufb): Fix number of narrow mode remapped elements for !one_operand_p case. gcc/testsuite/ChangeLog: PR target/103905 * gcc.target/i386/pr103905.c: New test. Bootstrappe

[PATCH] i386: Introduce V2QImode vectorized logic [PR103861]

2022-01-04 Thread Uros Bizjak via Gcc-patches
Add V2QImode logic operations with SSE and GP registers and split them to V4QImode SSE instructions or SImode GP instructions. The patch also fixes PR target/103900. 2022-01-04 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/mmx.md (one_cmplv2qi3): New insn pattern. (on

Re: [PATCH] x86: Update model value for Alderlake and Rocketlake

2022-01-04 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 4, 2022 at 6:20 AM Cui,Lili wrote: > > Hi Uros, > > This patch is to update model value for Alderlake and Rocketlake. > > Bootstrap is ok, and no regressions for i386/x86-64 testsuite. > > OK for master? > > gcc/ChangeLog > > * common/config/i386/cpuinfo.h (get_intel_cpu): Add

[PATCH] i386: Always enable mov patterns [PR103894]

2022-01-03 Thread Uros Bizjak via Gcc-patches
Middle end tries to generate V4QImode moves to implement V2QImode inserts and calls emit_move_multi_word when V4QImode moves are unavailable, as is the case with 32-bit vector moves, constrainted with TARGET_SSE2. However, this triggers gcc_assert (mode_size >= UNITS_PER_WORD); in emit_move_mu

[PATCH] i386: Introduce V2QImode vectorized arithmetic [PR103861]

2022-01-02 Thread Uros Bizjak via Gcc-patches
On Thu, Dec 30, 2021 at 3:45 PM Uros Bizjak wrote: > > This patch adds basic V2QImode infrastructure and V2QImode arithmetic > operations (plus, minus and neg). The patched compiler can emit SSE > vectorized QImode operations (e.g. PADDB) with partial QImode vector, > and also synthesized double

[PATCH] testsuite: XFAIL some Wstringop-overflow tests ...

2021-12-31 Thread Uros Bizjak via Gcc-patches
... for targets that support vectorization of 2-byte char stores with unaligned address at plain O2. 2021-12-31 Uroš Bizjak gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_vect_slp_store_usage): Handle TEST_V2QI_2. (check_effective_target_vect_slp_v2qi_store_unalign): Ne

[RFC PATCH] i386: Introduce V2QImode vectorized arithmetic [PR103861]

2021-12-30 Thread Uros Bizjak via Gcc-patches
This patch adds basic V2QImode infrastructure and V2QImode arithmetic operations (plus, minus and neg). The patched compiler can emit SSE vectorized QImode operations (e.g. PADDB) with partial QImode vector, and also synthesized double HI/LO QImode operations with integer registers. The testcase:

[PATCH] i386: Robustify some expanders w.r.t. paradoxical SUBREGs

2021-12-29 Thread Uros Bizjak via Gcc-patches
lowpart_subreg might fail in some cases when trying to create paradoxical SUBREGs. Use force_reg on input operand, use new temporary output operand and emit move into the destination afterwards. Also, replace simplify_gen_subreg (Mx, op, My, 0) with equivalent lowpart_subreg (Mx, op, My). 2021-1

Re: [PATCH v3] i386: Check AX input in any_mul_highpart peepholes

2021-12-26 Thread Uros Bizjak via Gcc-patches
On Sat, Dec 25, 2021 at 3:28 PM H.J. Lu wrote: > > When applying peephole optimization to transform > > mov imm, %reg0 > mov %reg1, %AX_REG > imul %reg0 > > to > > mov imm, %AX_REG > imul %reg1 > > disable peephole optimization if reg1 == AX_REG. > > gcc/ >

[PATCH] i386: Add V2SFmode DIV insn pattern [PR95046, PR103797]

2021-12-24 Thread Uros Bizjak via Gcc-patches
Use V4SFmode "DIVPS X,Y" with [y0, y1, 1.0f, 1.0f] as a divisor to avoid division by zero. 2021-12-24 Uroš Bizjak gcc/ChangeLog: PR target/95046 PR target/103797 * config/i386/mmx.md (divv2sf3): New instruction pattern. gcc/testsuite/ChangeLog: PR target/95046 PR target/

Re: [PATCH v2] ix86: Don't use the 'm' constraint for x86_64_general_operand

2021-12-24 Thread Uros Bizjak via Gcc-patches
On Thu, Dec 23, 2021 at 3:42 PM H.J. Lu wrote: > > On Mon, Dec 20, 2021 at 2:22 PM H.J. Lu wrote: > > > > On Mon, Dec 20, 2021 at 12:38 PM Jakub Jelinek wrote: > > > > > > On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote: > > > > The problem is in > > > > > > > > (define_memory_constraint

Re: [PATCH] i386: Require TARGET_64BIT for any_mul_highpart peephole

2021-12-24 Thread Uros Bizjak via Gcc-patches
On Thu, Dec 23, 2021 at 11:21 PM H.J. Lu via Gcc-patches wrote: > > Restore i686 bootstrap by requiring TARGET_64BIT for any_mul_highpart > peephole. > > PR bootstrap/103785 > * config/i386/i386.md: Require TARGET_64BIT for any_mul_highpart > peephole. I don't think this i

Re: [PATCH take #3] PR target/103773: Fix wrong-code with -Oz from pop to memory.

2021-12-23 Thread Uros Bizjak via Gcc-patches
On Thu, Dec 23, 2021 at 10:35 AM Roger Sayle wrote: > > Hi Uros, > > A huge thanks for the list of suggested improvements to the -Oz related > patches. > I've combined them altogether in the submission below, which makes sense now > that everything is implemented using peephole2. The implementat

Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.

2021-12-22 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 22, 2021 at 11:26 AM Uros Bizjak wrote: > > On Wed, Dec 22, 2021 at 10:26 AM Roger Sayle > wrote: > > > > > > Hi Uros, > > Would you consider the following variant that disables this optimization > > when a > > red zone is used by the current function? You're right that cfun's > >

Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.

2021-12-22 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 22, 2021 at 10:26 AM Roger Sayle wrote: > > > Hi Uros, > Would you consider the following variant that disables this optimization when > a > red zone is used by the current function? You're right that cfun's > red_zone_size is > recalculated dynamically, but ix86_red_zone_used shoul

Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.

2021-12-22 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 22, 2021 at 9:10 AM Uros Bizjak wrote: > > On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle > wrote: > > > > > > My apologies for the inconvenience. The new support for -Oz using > > push/pop for small integer constants on x86_64 is only a win/correct > > for loading registers. Fixed by

Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.

2021-12-22 Thread Uros Bizjak via Gcc-patches
On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle wrote: > > > My apologies for the inconvenience. The new support for -Oz using > push/pop for small integer constants on x86_64 is only a win/correct > for loading registers. Fixed by adding !MEM_P tests in the appropriate > locations. > > This patch h

Re: [PATCH] x86: Shrink writing 0/-1 to memory using and/or with -Oz.

2021-12-22 Thread Uros Bizjak via Gcc-patches
On Tue, Dec 21, 2021 at 4:08 PM Roger Sayle wrote: > > > This is the second part of my fix to PR target/103773 where -Oz shouldn't > use push/pop on x86 to shrink writing small integer constants to memory. > Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem" > when writing -1 t

Re: [PATCH] [i386]Add missing BMI function to align with clang

2021-12-20 Thread Uros Bizjak via Gcc-patches
On Tue, Dec 21, 2021 at 6:22 AM Haochen Jiang wrote: > > Hi all, > > This patch adds missing BMI function _tzcnt_u16, _andn_u32, _andn_u64 to > align with clang. > > Regtested on x86_64-pc-linux-gnu. Ok for trunk? > > BRs, > Haochen > > gcc/ChangeLog: > > * config/i386/bmiintrin.h (_tzcnt

[PATCH] i386: Fix _pinsr and its splitters [PR103772]

2021-12-20 Thread Uros Bizjak via Gcc-patches
The clever trick to duplicate the value of the input operand into itself proved not so clever after all. The splitter should not clobber the input operand in any case, since the register can hold the value outside the HImode lowpart when accessed as subreg. Use the standard earlyclobber approach

Re: [PATCH take #2] x86_64: Improve code expanded for highpart multiplications.

2021-12-20 Thread Uros Bizjak via Gcc-patches
On Mon, Dec 20, 2021 at 12:26 PM Roger Sayle wrote: > > > Hi Uros, > Many thanks for the review. Here's a revised patch incorporating your > suggestion to use a single define_insn with a mode iterator instead of two > new near identical define_insns for SImode and DImode. I initially tried > SWI

Re: [PATCH] x86_64: Ignore zero width bitfields in ABI and issue -Wpsabi warning about C zero width bitfield ABI changes [PR102024]

2021-12-20 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 15, 2021 at 3:50 PM Jakub Jelinek wrote: > > On Mon, Nov 29, 2021 at 05:25:30AM -0700, H.J. Lu wrote: > > > I'd like to ping this patch, but perhaps first it would be nice to discuss > > > it in the x86-64 psABI group. > > > The current psABI doesn't seem to mention zero sized bitfield

Re: [PATCH] ix86: Don't match the 'm' constraint on x86_64_general_operand

2021-12-20 Thread Uros Bizjak via Gcc-patches
On Sun, Dec 19, 2021 at 9:06 PM H.J. Lu wrote: > > x86_64_general_operand is different from general_operand for 64-bit > target. To avoid LRA selecting a memory operand which doesn't satisfy > x86_64_general_operand for 64-bit target: > > 1. Add a 'BM' constraint which is similar to the 'm' const

[PATCH] i386: Enable VxHF vector modes lower ABI levels [PR103571]

2021-12-16 Thread Uros Bizjak via Gcc-patches
Enable VxHF vector modes for SSE2, AVX and AVX512F ABIs. 2021-12-16 Uroš Bizjak gcc/ChangeLog: PR target/103571 * config/i386/i386.h (VALID_AVX256_REG_MODE): Add V16HFmode. (VALID_AVX256_REG_OR_OI_VHF_MODE): Replace with ... (VALID_AVX256_REG_OR_OI_MODE): ... this. Remove V16

Re: [PATCH] i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z) {==,!=,<,<=,>,>=} 0 [PR98737]

2021-12-15 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 15, 2021 at 10:23 AM Jakub Jelinek wrote: > > On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches > wrote: > > On 1/27/21 11:37 AM, Jakub Jelinek wrote: > > > Would equality comparison against 0 handle the most common cases. > > > > > > The user can write it as >

Re: [PATCH] x86: PR target/103611: Splitter for DST:DI = (HI:SI<<32)|LO:SI.

2021-12-15 Thread Uros Bizjak via Gcc-patches
On Mon, Dec 13, 2021 at 3:10 PM Roger Sayle wrote: > > > A common idiom is to create a DImode value from the "concat" of two SImode > values, using "(long long)hi << 32 | (long long)lo", where the operation > may be ior, xor or plus. On x86, with -m32, the high and low parts of > a DImode registe

[PATCH] i386: Implement VxHF vector set/insert/extract with lower ABI levels

2021-12-14 Thread Uros Bizjak via Gcc-patches
This is a preparation patch that moves VxHF vector set/insert/extract expansions from AVX512FP16 ABI to lower ABIs. There are no functional changes for -mavx512fp16 and a follow-up patch is needed to actually enable VxHF vector modes for lower ABIs. 2021-12-14 Uroš Bizjak gcc/ChangeLog:

Re: [PATCH] PR target/103611: Avoid generating orb $0, %ah on x86.

2021-12-13 Thread Uros Bizjak via Gcc-patches
On Mon, Dec 13, 2021 at 1:09 PM Roger Sayle wrote: > > > I'll post my proposed fix for PR target/103611 shortly, but this patch > fixes another missed optimization opportunity revealed by that PR. > Occasionally, reload materializes integer constants during register > allocation sometimes resultin

Re: [PATCH] x86_64: Improve code expanded for highpart multiplications.

2021-12-13 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 10, 2021 at 12:58 PM Roger Sayle wrote: > > > While working on a middle-end patch to more aggressively use highpart > multiplications on targets that support them, I noticed that the RTL > expanded by the x86 backend interacts poorly with register allocation > leading to suboptimal cod

Re: [PATCH] x86: Update -mtune=tremont

2021-12-09 Thread Uros Bizjak via Gcc-patches
On Thu, Dec 9, 2021 at 7:59 AM Cui,Lili wrote: > > Hi Uros, > > This patch is to update mtune for tremont. > > Bootstrap is ok, and no regressions for i386/x86-64 testsuite. > > OK for master? OK. Thanks, Uros. > > > Silvermont has a special handle in add_stmt_cost function, because it has in >

Re: [PATCH] [i386]Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

2021-12-07 Thread Uros Bizjak via Gcc-patches
On Tue, Dec 7, 2021 at 3:10 AM Haochen Jiang via Gcc-patches wrote: > > This patch adds combine splitter to transform vpcmpeqd/vpxor/vblendvps to > vblendvps for ~op0. > > OK for trunk? > > BRs, > Haochen > > gcc/ChangeLog: > > PR target/100738 > * config/i386/sse.md > (*_blendv_

Re: [PATCH] x86: Check FUNCTION_DECL before calling cgraph_node::get

2021-12-07 Thread Uros Bizjak via Gcc-patches
On Tue, Dec 7, 2021 at 2:15 PM H.J. Lu wrote: > > gcc/ > > PR target/103594 > * config/i386/i386.c (ix86_call_use_plt_p): Check FUNCTION_DECL > before calling cgraph_node::get. > > gcc/testsuite/ > > PR target/103594 > * gcc.dg/pr103594.c: New test. OK. Th

Re: [PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-06 Thread Uros Bizjak via Gcc-patches
On Mon, Dec 6, 2021 at 4:41 AM liuhongt via Gcc-patches wrote: > > When moves between integer and sse registers are cheap. > > 2021-12-06 Hongtao Liu > Uroš Bizjak > gcc/ChangeLog: > > PR target/95740 > * config/i386/i386.c (ix86_preferred_reload_class): Allow >

Re: PING^1 [PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-12-03 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 3, 2021 at 2:24 PM H.J. Lu wrote: > > On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu wrote: > > > > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move > > and store, independent of -mprefer-vector-width=bits: > > > > 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX

Re: [PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-03 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 3, 2021 at 7:19 AM liuhongt wrote: > > Hi: > > Please also consider TARGET_INTER_UNIT_MOVES_TO_VEC and > > TARGET_INTER_UNIT_MOVES_FROM_VEC. > Here's updated patch. > > Also honor TARGET_INTER_UNIT_MOVES_TO/FROM_VEC and in > preferred_{,out_}reload_class. > > Bootstrapped and regtested

Re: [PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-02 Thread Uros Bizjak via Gcc-patches
On Thu, Dec 2, 2021 at 9:36 AM Hongtao Liu wrote: > > On Thu, Dec 2, 2021 at 4:27 PM liuhongt wrote: > > > > The patch helps reload to choose GENENRAL_REGS alternatives for > > SSE_FLOAT_MODE and enabled optimization like > > > > - vmovd %xmm0, -4(%rsp) > > - movl$1, %eax > >

[PATCH] i386: Improve V8HI and V8HF inserts [PR102811]

2021-12-01 Thread Uros Bizjak via Gcc-patches
Introduce vec_set_0 pattern for V8HI and V8HF modes to implement scalar element 0 inserts to from a GP register, SSE register or memory. Also add V8HI and V8HF AVX2 (x,x,x) alternative to PINSR insn pattern, which is split after reload to a sequence of PBROADCASTW and PBLENDW. The V8HF inserts fr

Re: [PATCH] x86: Speed up target attribute handling by using a cache

2021-12-01 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 22, 2021 at 10:36 AM Jakub Jelinek wrote: > > Hi! > > The target attribute handling is very expensive and for the common case > from x86intrin.h where many functions get implicitly the same target > attribute, we can speed up compilation a lot by caching it. > > The following patches b

Re: [PATCH] [i386] Fix ICE in ix86_attr_length_immediate_default.

2021-11-30 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 30, 2021 at 10:43 AM liuhongt wrote: > > ix86_attr_length_immediate_default assume TYPE ishift only have 1 > constant operand, > but *x86_64_shld_1/*x86_shld_1/*x86_64_shrd_1/*x86_shrd_1 has 2, with > condition: INTVAL (operands[3]) == 32 - INTVAL (operands[2]) or > INTVAL (operands[3]

Re: [PATCH] Fix regression introduced by r12-5536.

2021-11-29 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 29, 2021 at 10:48 AM Hongtao Liu wrote: > > On Mon, Nov 29, 2021 at 3:53 PM Uros Bizjak wrote: > > > > On Mon, Nov 29, 2021 at 2:32 AM liuhongt wrote: > > > > > > There're several failures reported in [1]: > > > 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)" >

Re: [PATCH take #2] x86_64: PR target/100711: Splitters for pandn

2021-11-29 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 29, 2021 at 7:14 PM Roger Sayle wrote: > > > Hi Uros, > Many thanks for the review. Here's a revised version of the patch > incorporating all of your suggestions. This has been (re)tested on > x86_64-pc-linux-gnu with make bootstrap and make -k check, > both with and without --target

Re: [PATCH] Optimize _Float16 usage for non AVX512FP16.

2021-11-28 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 29, 2021 at 8:46 AM liuhongt wrote: > > As discussed in PR, this patch do optimizations: > 1. No memory is needed to move HI/HFmode between GPR and SSE registers > under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o > AVX512FP16. > 2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0

Re: [PATCH] Fix regression introduced by r12-5536.

2021-11-28 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 29, 2021 at 2:32 AM liuhongt wrote: > > There're several failures reported in [1]: > 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)" > %vpextrw should be used in output templates. > 2. ICE in get_attr_memory for movhi_internal since some alternatives > are marked

Re: [PATCH] x86_64: PR target/100711: Splitters for pandn

2021-11-28 Thread Uros Bizjak via Gcc-patches
On Sun, Nov 28, 2021 at 2:25 PM Roger Sayle wrote: > > > This patch addresses PR target/100711 by introducing define_split > patterns so that not/broadcast/pand may be simplified (by combine) > to broadcast/pandn. This introduces two splitters one for optimizing > pandn on TARGET_SSE for V4SI and

Re: [PATCH] x86_64: Improved V1TImode rotations by non-constant amounts.

2021-11-28 Thread Uros Bizjak via Gcc-patches
On Sun, Nov 28, 2021 at 3:02 PM Roger Sayle wrote: > > > This patch builds on the recent improvements to TImode rotations (and > Jakub's fixes to shldq/shrdq patterns). Now that expanding a TImode > rotation can never fail, it is safe to allow general_operand constraints > on the QImode shift amo

Re: [PATCH] x86: Fix up x86_{,64_}sh{l,r}d patterns [PR103431]

2021-11-27 Thread Uros Bizjak via Gcc-patches
On Sat, Nov 27, 2021 at 10:04 AM Jakub Jelinek wrote: > > Hi! > > The following testcase is miscompiled because the x86_{,64_}sh{l,r}d > patterns don't properly describe what the instructions do. One thing > is left out, in particular that there is initial count &= 63 for > sh{l,r}dq and initial

Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 24, 2021 at 9:44 AM Kong, Lingling wrote: > > Hi, > > vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with > -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. > Cleared before conversion, updated movhi_internal and > ix86_can_change_mode_c

Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 24, 2021 at 9:06 AM Kong, Lingling wrote: > > Hi Uros, > > > BTW: When playing with my patch, I introduced (define_insn > > "*vec_set_0" ...) to optimize scalar load to a vector. Does > > ix86_expand_vector_set work OK without this pattern? > > Yes, ix86_expand_vector_set could work

Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-23 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 24, 2021 at 7:25 AM Kong, Lingling via Gcc-patches wrote: > > Hi, > > vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with > -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. > And cleared before conversion, updated movhi_internal and > ix

Re: [PATCH] i386: Fix up handling of target attribute [PR101180]

2021-11-21 Thread Uros Bizjak via Gcc-patches
On Sat, Nov 20, 2021 at 9:20 AM Jakub Jelinek wrote: > > Hi! > > As shown in the testcase below, if a function has multiple target attributes > (rather than a single one with one or more arguments) or if a function > gets one target attribute on one declaration and another one on another > declara

Re: [PATCH] Don't allow mask/sse/mmx mov in TLS code sequences.

2021-11-18 Thread Uros Bizjak via Gcc-patches
On Fri, Nov 19, 2021 at 8:50 AM Uros Bizjak wrote: > > On Fri, Nov 19, 2021 at 2:14 AM liuhongt wrote: > > > > >Why is the above declared as a special memory constraint? Also the > > Change to define_memory_constraint since it's ok for > > reload can make them match by converting the operand to t

Re: [PATCH] Don't allow mask/sse/mmx mov in TLS code sequences.

2021-11-18 Thread Uros Bizjak via Gcc-patches
On Fri, Nov 19, 2021 at 2:14 AM liuhongt wrote: > > >Why is the above declared as a special memory constraint? Also the > Change to define_memory_constraint since it's ok for > reload can make them match by converting the operand to the form > ‘(mem (reg X))’.where X is a base register (from the r

Re: [PATCH v3] x86: Add -mindirect-branch-cs-prefix

2021-11-18 Thread Uros Bizjak via Gcc-patches
On Thu, Nov 18, 2021 at 3:24 PM H.J. Lu wrote: > > On Thu, Nov 18, 2021 at 12:25 AM Uros Bizjak wrote: > > > > On Wed, Nov 17, 2021 at 2:47 PM H.J. Lu wrote: > > > > > > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk > > > via r8-r15 registers when converting indirect

Re: [PATCH v2] x86: Add -mindirect-branch-cs-prefix

2021-11-18 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 2:47 PM H.J. Lu wrote: > > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk > via r8-r15 registers when converting indirect call and jump to increase > the instruction length to 6, allowing the non-thunk form to be inlined. > > gcc/ > > PR t

Re: [PATCH] i386: Fix wrong codegen for -mrelax-cmpxchg-loop

2021-11-18 Thread Uros Bizjak via Gcc-patches
On Thu, Nov 18, 2021 at 8:37 AM Hongyu Wang wrote: > > Hi Uros, > > For -mrelax-cmpxchg-loop introduced by PR 103069/r12-5265, it would > produce infinite loop. The correct code should be > > .L84: > movl(%rdi), %ecx > movl%eax, %edx > orl %esi, %edx > c

Re: [PATCH] Don't allow mask/sse/mmx mov in TLS code sequences.

2021-11-18 Thread Uros Bizjak via Gcc-patches
On Thu, Nov 18, 2021 at 8:18 AM liuhongt wrote: > > As change in assembler, refer to [1], this patch disallow mask/sse/mmx > mov in TLS code sequences which require integer MOV instructions. > > [1] > https://sourceware.org/git/?p=binutils-gdb.git;a=patch;h=d7e3e627027fcf37d63e284144fe27ff4eba36b

Re: [PATCH v2] x86: Remove "%!" before ret

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 9:33 PM H.J. Lu wrote: > > On Wed, Nov 17, 2021 at 11:46 AM Uros Bizjak wrote: > > > > On Wed, Nov 17, 2021 at 8:44 PM H.J. Lu wrote: > > > > > > Before MPX was removed, "%!" was mapped to > > > > > > case '!': > > > if (ix86_bnd_prefixed_insn_p (current

Re: [PATCH v3] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 9:02 PM H.J. Lu wrote: > > On Wed, Nov 17, 2021 at 7:53 AM Uros Bizjak wrote: > > > > On Wed, Nov 17, 2021 at 4:35 PM H.J. Lu wrote: > > > > > > Add -mharden-sls= to mitigate against straight line speculation (SLS) > > > for function return and indirect branch by adding a

[PATCH] i386: Redefine indirect_thunks_used as HARD_REG_SET.

2021-11-17 Thread Uros Bizjak via Gcc-patches
Change indirect_thunks_used to HARD_REG_SET to avoid recalculations of correct register numbers and allow usage of SET/TEST_HARD_REG_BIT accessors. 2021-11-17 Uroš Bizjak gcc/ChangeLog: * config/i386/i386.c (indirect_thunks_used): Redefine as HARD_REG_SET. (ix86_code_end): Use TEST_HA

Re: [PATCH] x86: Remove "%!" before ret

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 8:44 PM H.J. Lu wrote: > > Before MPX was removed, "%!" was mapped to > > case '!': > if (ix86_bnd_prefixed_insn_p (current_output_insn)) > fputs ("bnd ", file); > return; > > After CET was added and MPX was removed, "%!" was mapped t

[PATCH] i386: Introduce LEGACY_SSE_REGNO_P predicate

2021-11-17 Thread Uros Bizjak via Gcc-patches
Introduce LEGACY_SSE_REGNO_P predicate to simplify a couple of places. No functional changes. 2021-11-17 Uroš Bizjak gcc/ChangeLog: * config/i386/i386.h (LEGACY_SSE_REGNO_P): New predicate. (SSE_REGNO_P): Use LEGACY_SSE_REGNO_P predicate. * config/i386/i386.c (zero_all_vector_reg

Re: [PATCH v2] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 4:35 PM H.J. Lu wrote: > > Add -mharden-sls= to mitigate against straight line speculation (SLS) > for function return and indirect branch by adding an INT3 instruction > after function return and indirect branch. > > gcc/ > > PR target/102952 > * config/i38

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 2:46 PM H.J. Lu wrote: > > On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak wrote: > > > > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches > > wrote: > > > > > > Add -mharden-sls= to mitigate against straight line speculation (SLS) > > > for function return and indirec

Re: [PATCH] x86: Add -mindirect-branch-cs-prefix

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 16, 2021 at 7:51 PM H.J. Lu via Gcc-patches wrote: > > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk > via r8-r15 registers when converting indirect call and jump to increase > the instruction length to 6, allowing the non-thunk form to be inlined. > > gcc/

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches wrote: > > Add -mharden-sls= to mitigate against straight line speculation (SLS) > for function return and indirect branch by adding an INT3 instruction > after function return and indirect branch. > > gcc/ > > PR target/102952 >

Re: [PATCH 12/15] i386: Fix non-robust split condition in define_insn_and_split

2021-11-16 Thread Uros Bizjak via Gcc-patches
On Thu, Nov 11, 2021 at 12:25 PM Kewen Lin wrote: > > This patch is to fix some non-robust split conditions in some > define_insn_and_splits, to make each of them applied on top of > the corresponding condition for define_insn part, otherwise the > splitting could perform unexpectedly. > > gcc/Cha

Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-16 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 16, 2021 at 9:15 AM Kong, Lingling via Gcc-patches wrote: > > Hi, > > vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with > -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. > > OK for master? No, this is the wrong approach. There can be i

Re: [PATCH] x86_64: Avoid rorx rotation instructions with -Os

2021-11-16 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 15, 2021 at 2:54 PM Roger Sayle wrote: > > > This patch teaches the i386 backend to avoid using BMI2's rorx > instructions when optimizing for size. The benefits are shown > with the following example: > > unsigned int ror1(unsigned int x) { return (x >> 1) | (x << 31); } > unsigned i

Re: [PATCH] x86: Require TARGET_HIMODE_MATH for HImode atomic bit expanders

2021-11-15 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 15, 2021 at 9:01 AM Jakub Jelinek wrote: > > On Fri, Nov 12, 2021 at 04:34:27PM +0100, Jakub Jelinek via Gcc-patches wrote: > > Why? When one uses 16-bit atomics, no matter what he does there will be > > some HImode math (at least the atomic instruction). And the rest can be > > deal

<    1   2   3   4   5   6   7   8   9   10   >