On Sat, Jan 15, 2022 at 5:39 PM Hongyu Wang wrote:
>
> Thanks for the suggestion, here is the updated patch that survived
> bootstrap/regtest.
LGTM for me, but please get the final approval from Hongtao.
Thanks,
Uros.
> > Please note reg_mentioned_p in the above condition. This function
> > ret
On Sat, Jan 15, 2022 at 12:23 PM Jakub Jelinek wrote:
>
> On Sat, Jan 15, 2022 at 11:42:55AM +0100, Uros Bizjak wrote:
> > Yes, that would be nice. XFmode is used for long double, and not obsolete.
>
> Ok, that seems to work. Compared to the incremental patch I've posted, I
> also had to add hand
On Sat, Jan 15, 2022 at 10:56 AM Jakub Jelinek wrote:
>
> On Sat, Jan 15, 2022 at 09:29:05AM +0100, Uros Bizjak wrote:
> > > --- gcc/config/i386/i386.md.jj 2022-01-14 11:51:34.432384170 +0100
> > > +++ gcc/config/i386/i386.md 2022-01-14 18:22:41.140906449 +0100
> > > @@ -23886,6 +23886,18 @@
On Fri, Jan 14, 2022 at 11:56 PM Jakub Jelinek wrote:
>
> Hi!
>
> C++20:
> #include
> auto cmp4way(double a, double b)
> {
> return a <=> b;
> }
> expands to:
> ucomisd %xmm1, %xmm0
> jp .L8
> movl$0, %eax
> jne .L8
> .L2:
> ret
> .p2
On Fri, Jan 14, 2022 at 2:44 PM Hongyu Wang wrote:
>
> > Are there any technical obstacles to introduce subst to
> > define_{,insn_and_}split?
>
> gccint says: define_subst can be used only in define_insn and
> define_expand, it cannot be used in other expressions (e.g. in
> define_insn_and_split)
While there is practically impossible that input registers are matched
with in-out register, better mark the output operand of the split alternative
as earlyclobbered - we do output early to the output operand when
the insn is split.
2022-01-14 Uroš Bizjak
gcc/ChangeLog:
* config/i386/i38
The test fails on Fedora 33+ because nl_NL locale got thousands
separator defined. Use one of ar_SA, bg_BG, bs_BA, pt_PT
or plain C locale instead.
2022-01-14 Uroš Bizjak
libstdc++-v3/ChangeLog:
* testsuite/22_locale/numpunct/members/char/3.cc (test02):
Use pt_PT locale instead
On Fri, Jan 14, 2022 at 10:00 AM Roger Sayle wrote:
>
>
> Hi Uros,
> Here's a revised version of this patch incorporating your suggestion of using
> force_reg instead of emit_move_insn to a pseudo allocated by gen_reg_rtx.
> I also took the opportunity to transition the rest of the function (and
On Fri, Jan 14, 2022 at 7:11 AM Hongyu Wang wrote:
>
> > > No, the approach is wrong. You have to solve output clearing on RTL
> > > level, please look at how e.g. tzcnt false dep is solved:
> >
> > Actually we have considered such approach before, but we found we need
> > to break original define
On Fri, Jan 14, 2022 at 6:46 AM Hongyu Wang wrote:
>
> > No, the approach is wrong. You have to solve output clearing on RTL
> > level, please look at how e.g. tzcnt false dep is solved:
>
> Actually we have considered such approach before, but we found we need
> to break original define_insn to r
Add V2QImode shift operations and split them to synthesized
double HI/LO QImode operations with integer registers.
Also robustify arithmetic split patterns.
2022-01-13 Uroš Bizjak
gcc/ChangeLog:
PR target/103861
* config/i386/i386.md (*ashlqi_ext_2): New insn pattern.
(*qi_ext_2)
2022-01-13 Uroš Bizjak
gcc/ChangeLog:
* config/i386/mmx.md (negv2qi): Disparage GPR alternative a bit.
Disable for TARGET_PARTIAL_REG_STALL unless optimizing for size.
(negv2qi splitters): Use lowpart_subreg instead of
gen_lowpart to create subreg.
(v2qi3): Disparage GPR al
2022-01-13 Uroš Bizjak
gcc/ChangeLog:
PR target/104003
* config/i386/mmx.md (*xop_pcmov_): Use VI_16_32 mode iterator.
gcc/testsuite/ChangeLog:
PR target/104003
* g++.target/i386/pr103861-1-sse4.C: New test.
* g++.target/i386/pr103861-1-xop.C: Ditto.
Bootstrapped and reg
On Thu, Jan 13, 2022 at 2:53 AM Jiang, Haochen wrote:
>
> Hi Uros,
>
> Has fixed that format issue with this new patch. Ok for trunk?
The patch was already approved in my previous message, so no need to
re-approve it. I'm sure you are able to move one brace to a new
position without another revie
On Thu, Jan 13, 2022 at 8:28 AM Hongyu Wang wrote:
>
> From: wwwhhhyyy
>
> Hi,
>
> For GoldenCove micro-architecture, force insert zero-idiom in asm
> template to break false dependency of dest register for several insns.
>
> The related insns are:
>
> VPERM/D/Q/PS/PD
> VRANGEPD/PS/SD/SS
> VGETMA
The testcase is x86 specific, other targets have different costs defined.
2022-01-12 Uroš Bizjak
gcc/testsuite/ChangeLog:
PR target/103935
* g++.dg/vect/slp-pr98855.cc: Compile only for x86 targets.
Tested on x86_64-linux-gnu {,-m32}.
Pushed to master.
Uros.
diff --git a/gcc/testsu
On Wed, Jan 12, 2022 at 9:11 AM Haochen Jiang wrote:
>
> Hi all,
>
> This patch targets PR94790, which change the instruction selection under the
> following circumstance.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Please also test with -m32, e.g.:
make -j 12 -k check RUNTESTFLAGS="--t
On Tue, Jan 11, 2022 at 2:26 PM Roger Sayle wrote:
>
>
> This patch to the i386 backend's ix86_expand_v1ti_ashiftrt provides
> improved (shorter) implementations of V1TI mode arithmetic right shifts
> for constant amounts between 111 and 126 bits. The significance of
> this range is that this fun
Add CC clobber to 32-bit vector mode logic insns to allow variants with
general-purpose registers. Also improve ix86_sse_movcc to emit insn with
CC clobber for narrow vector modes in order to re-enable conditional moves
for 16-bit and 32-bit narrow vector modes with -msse2.
2022-01-12 Uroš Bizja
This patch also moves V2HI and V4QImode vector conditional moves
to SSE4.1 targets. Vector cmoves are implemented with SSE logic functions
without -msse4.1, and they are hardly worthwile for narrow vector modes.
More important, we would like to keep vector logic functions for GPR
registers, and th
Add V2QImode vector compares with SSE registers.
2022-01-10 Uroš Bizjak
gcc/ChangeLog:
PR target/103861
* config/i386/i386-expand.c (ix86_expand_int_sse_cmp):
Handle V2QImode.
* config/i386/mmx.md (3):
Use VI1_16_32 mode iterator.
(*eq3): Ditto.
(*gt3): Ditto.
Currently, expand_vector_condition detects only vcondMN and vconduMN
named RTX patterns. Teach it to also consider vec_cmpMN and vec_cmpuMN
RTX patterns when all ones vector is returned for true and all zeros vector
is returned for false.
Patch by Richard, I tested it on the patched x86 target an
On Mon, Jan 10, 2022 at 3:23 PM Michael Matz wrote:
>
> Hello,
>
> On Mon, 20 Dec 2021, Uros Bizjak wrote:
>
> > > Thanks.
> > > I see nobody commented on Micha's post there.
> > >
> > > Here is a patch that implements it in GCC, i.e. C++ doesn't change ABI
> > > (at least
> > > not from the past
On Thu, Jan 6, 2022 at 7:00 PM Roger Sayle wrote:
>
>
>
> This patch improves the code generated when moving a 128-bit value
>
> in TImode, represented by two 64-bit registers, to V1TImode, which
>
> is a single SSE register.
>
>
>
> Currently, the simple move:
>
> typedef unsigned __int128 uv1ti
Add sse2 isa attribute where needed and remove where not needed.
2022-01-07 Uroš Bizjak
gcc/ChangeLog:
* config/i386/mmx.md (*move_internal): Add isa attribute.
(*movv2qi_internal): Remve sse2 requirement for alternatives 4,5.
Bootstrapped and regression tested on x86_64-linux-gnu {,
On Thu, Jan 6, 2022 at 7:58 PM H.J. Lu wrote:
>
> Generate INT3 after indirect jmp in exception return for -fcf-protection
> with -mharden-sls=indirect-jmp.
>
> gcc/
>
> PR target/103925
> * config/i386/i386.c (ix86_output_indirect_function_return):
> Generate INT3 after in
On Thu, Jan 6, 2022 at 7:57 PM H.J. Lu wrote:
>
> Indirect branch also includes indirect call instructions. Rename
> -harden-sls=indirect-branch to -harden-sls=indirect-jmp to match its
> intended behavior.
>
> PR target/102952
> * config/i386/i386-opts.h (harden_sls): Replace
>
Currently, the compiler moves HImode values between GPR and XMM registers with:
%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}
%vpextrw\t{$0, %1, %k0|%k0, %1, 0}
but it could use slightly faster and shorter:
%vmovd\t{%k1, %0|%0, %k1}
%vmovd\t{%1, %k0|%k0, %1}
2022-01-06 Uroš Bizjak
gc
On Thu, Jan 6, 2022 at 10:22 AM liuhongt via Gcc-patches
wrote:
>
> Also remove mode attribute blendsuf, use ssemodesuf instead.
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready to push to trunk.
>
> gcc/ChangeLog:
>
> PR target/103753
> * config/i386/i386-expand
Add V2QImode minmax, abs and uavxv2qi3_ceil operations with SSE registers.
2022-01-05 Uroš Bizjak
gcc/ChangeLog:
PR target/103861
* config/i386/mmx.md (VI_16_32): New mode iterator.
(VI1_16_32): Ditto.
(mmxvecsize): Handle V2QI mode.
(3): Rename from v4qi3.
Use VI1_16_
2022-01-05 Uroš Bizjak
gcc/ChangeLog:
PR target/103915
* config/i386/mmx.md (one_cmplv2qi2): Change
alternatives 1,2 type from sselog to sselog1.
gcc/testsuite/ChangeLog:
PR target/103915
* gcc.target/i386/pr103915.c: New test.
Bootstrapped and regression tested on x86_
2022-01-05 Uroš Bizjak
gcc/ChangeLog:
PR target/103905
* config/i386/i386-expand.c (expand_vec_perm_pshufb): Fix number of
narrow mode remapped elements for !one_operand_p case.
gcc/testsuite/ChangeLog:
PR target/103905
* gcc.target/i386/pr103905.c: New test.
Bootstrappe
Add V2QImode logic operations with SSE and GP registers and split
them to V4QImode SSE instructions or SImode GP instructions.
The patch also fixes PR target/103900.
2022-01-04 Uroš Bizjak
gcc/ChangeLog:
PR target/103861
* config/i386/mmx.md (one_cmplv2qi3): New insn pattern.
(on
On Tue, Jan 4, 2022 at 6:20 AM Cui,Lili wrote:
>
> Hi Uros,
>
> This patch is to update model value for Alderlake and Rocketlake.
>
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>
> OK for master?
>
> gcc/ChangeLog
>
> * common/config/i386/cpuinfo.h (get_intel_cpu): Add
Middle end tries to generate V4QImode moves to implement V2QImode inserts
and calls emit_move_multi_word when V4QImode moves are unavailable, as is
the case with 32-bit vector moves, constrainted with TARGET_SSE2.
However, this triggers
gcc_assert (mode_size >= UNITS_PER_WORD);
in emit_move_mu
On Thu, Dec 30, 2021 at 3:45 PM Uros Bizjak wrote:
>
> This patch adds basic V2QImode infrastructure and V2QImode arithmetic
> operations (plus, minus and neg). The patched compiler can emit SSE
> vectorized QImode operations (e.g. PADDB) with partial QImode vector,
> and also synthesized double
... for targets that support vectorization of 2-byte char stores
with unaligned address at plain O2.
2021-12-31 Uroš Bizjak
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_vect_slp_store_usage):
Handle TEST_V2QI_2.
(check_effective_target_vect_slp_v2qi_store_unalign): Ne
This patch adds basic V2QImode infrastructure and V2QImode arithmetic
operations (plus, minus and neg). The patched compiler can emit SSE
vectorized QImode operations (e.g. PADDB) with partial QImode vector,
and also synthesized double HI/LO QImode operations with integer registers.
The testcase:
lowpart_subreg might fail in some cases when trying to create paradoxical
SUBREGs. Use force_reg on input operand, use new temporary output operand
and emit move into the destination afterwards.
Also, replace simplify_gen_subreg (Mx, op, My, 0)
with equivalent lowpart_subreg (Mx, op, My).
2021-1
On Sat, Dec 25, 2021 at 3:28 PM H.J. Lu wrote:
>
> When applying peephole optimization to transform
>
> mov imm, %reg0
> mov %reg1, %AX_REG
> imul %reg0
>
> to
>
> mov imm, %AX_REG
> imul %reg1
>
> disable peephole optimization if reg1 == AX_REG.
>
> gcc/
>
Use V4SFmode "DIVPS X,Y" with [y0, y1, 1.0f, 1.0f] as a divisor
to avoid division by zero.
2021-12-24 Uroš Bizjak
gcc/ChangeLog:
PR target/95046
PR target/103797
* config/i386/mmx.md (divv2sf3): New instruction pattern.
gcc/testsuite/ChangeLog:
PR target/95046
PR target/
On Thu, Dec 23, 2021 at 3:42 PM H.J. Lu wrote:
>
> On Mon, Dec 20, 2021 at 2:22 PM H.J. Lu wrote:
> >
> > On Mon, Dec 20, 2021 at 12:38 PM Jakub Jelinek wrote:
> > >
> > > On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote:
> > > > The problem is in
> > > >
> > > > (define_memory_constraint
On Thu, Dec 23, 2021 at 11:21 PM H.J. Lu via Gcc-patches
wrote:
>
> Restore i686 bootstrap by requiring TARGET_64BIT for any_mul_highpart
> peephole.
>
> PR bootstrap/103785
> * config/i386/i386.md: Require TARGET_64BIT for any_mul_highpart
> peephole.
I don't think this i
On Thu, Dec 23, 2021 at 10:35 AM Roger Sayle wrote:
>
> Hi Uros,
>
> A huge thanks for the list of suggested improvements to the -Oz related
> patches.
> I've combined them altogether in the submission below, which makes sense now
> that everything is implemented using peephole2. The implementat
On Wed, Dec 22, 2021 at 11:26 AM Uros Bizjak wrote:
>
> On Wed, Dec 22, 2021 at 10:26 AM Roger Sayle
> wrote:
> >
> >
> > Hi Uros,
> > Would you consider the following variant that disables this optimization
> > when a
> > red zone is used by the current function? You're right that cfun's
> >
On Wed, Dec 22, 2021 at 10:26 AM Roger Sayle wrote:
>
>
> Hi Uros,
> Would you consider the following variant that disables this optimization when
> a
> red zone is used by the current function? You're right that cfun's
> red_zone_size is
> recalculated dynamically, but ix86_red_zone_used shoul
On Wed, Dec 22, 2021 at 9:10 AM Uros Bizjak wrote:
>
> On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle
> wrote:
> >
> >
> > My apologies for the inconvenience. The new support for -Oz using
> > push/pop for small integer constants on x86_64 is only a win/correct
> > for loading registers. Fixed by
On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle wrote:
>
>
> My apologies for the inconvenience. The new support for -Oz using
> push/pop for small integer constants on x86_64 is only a win/correct
> for loading registers. Fixed by adding !MEM_P tests in the appropriate
> locations.
>
> This patch h
On Tue, Dec 21, 2021 at 4:08 PM Roger Sayle wrote:
>
>
> This is the second part of my fix to PR target/103773 where -Oz shouldn't
> use push/pop on x86 to shrink writing small integer constants to memory.
> Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem"
> when writing -1 t
On Tue, Dec 21, 2021 at 6:22 AM Haochen Jiang wrote:
>
> Hi all,
>
> This patch adds missing BMI function _tzcnt_u16, _andn_u32, _andn_u64 to
> align with clang.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> * config/i386/bmiintrin.h (_tzcnt
The clever trick to duplicate the value of the input operand into itself
proved not so clever after all. The splitter should not clobber the input
operand in any case, since the register can hold the value outside the HImode
lowpart when accessed as subreg. Use the standard earlyclobber approach
On Mon, Dec 20, 2021 at 12:26 PM Roger Sayle wrote:
>
>
> Hi Uros,
> Many thanks for the review. Here's a revised patch incorporating your
> suggestion to use a single define_insn with a mode iterator instead of two
> new near identical define_insns for SImode and DImode. I initially tried
> SWI
On Wed, Dec 15, 2021 at 3:50 PM Jakub Jelinek wrote:
>
> On Mon, Nov 29, 2021 at 05:25:30AM -0700, H.J. Lu wrote:
> > > I'd like to ping this patch, but perhaps first it would be nice to discuss
> > > it in the x86-64 psABI group.
> > > The current psABI doesn't seem to mention zero sized bitfield
On Sun, Dec 19, 2021 at 9:06 PM H.J. Lu wrote:
>
> x86_64_general_operand is different from general_operand for 64-bit
> target. To avoid LRA selecting a memory operand which doesn't satisfy
> x86_64_general_operand for 64-bit target:
>
> 1. Add a 'BM' constraint which is similar to the 'm' const
Enable VxHF vector modes for SSE2, AVX and AVX512F ABIs.
2021-12-16 Uroš Bizjak
gcc/ChangeLog:
PR target/103571
* config/i386/i386.h (VALID_AVX256_REG_MODE): Add V16HFmode.
(VALID_AVX256_REG_OR_OI_VHF_MODE): Replace with ...
(VALID_AVX256_REG_OR_OI_MODE): ... this. Remove V16
On Wed, Dec 15, 2021 at 10:23 AM Jakub Jelinek wrote:
>
> On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches
> wrote:
> > On 1/27/21 11:37 AM, Jakub Jelinek wrote:
> > > Would equality comparison against 0 handle the most common cases.
> > >
> > > The user can write it as
>
On Mon, Dec 13, 2021 at 3:10 PM Roger Sayle wrote:
>
>
> A common idiom is to create a DImode value from the "concat" of two SImode
> values, using "(long long)hi << 32 | (long long)lo", where the operation
> may be ior, xor or plus. On x86, with -m32, the high and low parts of
> a DImode registe
This is a preparation patch that moves VxHF vector set/insert/extract
expansions from AVX512FP16 ABI to lower ABIs. There are no functional
changes for -mavx512fp16 and a follow-up patch is needed to actually
enable VxHF vector modes for lower ABIs.
2021-12-14 Uroš Bizjak
gcc/ChangeLog:
On Mon, Dec 13, 2021 at 1:09 PM Roger Sayle wrote:
>
>
> I'll post my proposed fix for PR target/103611 shortly, but this patch
> fixes another missed optimization opportunity revealed by that PR.
> Occasionally, reload materializes integer constants during register
> allocation sometimes resultin
On Fri, Dec 10, 2021 at 12:58 PM Roger Sayle wrote:
>
>
> While working on a middle-end patch to more aggressively use highpart
> multiplications on targets that support them, I noticed that the RTL
> expanded by the x86 backend interacts poorly with register allocation
> leading to suboptimal cod
On Thu, Dec 9, 2021 at 7:59 AM Cui,Lili wrote:
>
> Hi Uros,
>
> This patch is to update mtune for tremont.
>
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>
> OK for master?
OK.
Thanks,
Uros.
>
>
> Silvermont has a special handle in add_stmt_cost function, because it has in
>
On Tue, Dec 7, 2021 at 3:10 AM Haochen Jiang via Gcc-patches
wrote:
>
> This patch adds combine splitter to transform vpcmpeqd/vpxor/vblendvps to
> vblendvps for ~op0.
>
> OK for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/100738
> * config/i386/sse.md
> (*_blendv_
On Tue, Dec 7, 2021 at 2:15 PM H.J. Lu wrote:
>
> gcc/
>
> PR target/103594
> * config/i386/i386.c (ix86_call_use_plt_p): Check FUNCTION_DECL
> before calling cgraph_node::get.
>
> gcc/testsuite/
>
> PR target/103594
> * gcc.dg/pr103594.c: New test.
OK.
Th
On Mon, Dec 6, 2021 at 4:41 AM liuhongt via Gcc-patches
wrote:
>
> When moves between integer and sse registers are cheap.
>
> 2021-12-06 Hongtao Liu
> Uroš Bizjak
> gcc/ChangeLog:
>
> PR target/95740
> * config/i386/i386.c (ix86_preferred_reload_class): Allow
>
On Fri, Dec 3, 2021 at 2:24 PM H.J. Lu wrote:
>
> On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu wrote:
> >
> > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
> > and store, independent of -mprefer-vector-width=bits:
> >
> > 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX
On Fri, Dec 3, 2021 at 7:19 AM liuhongt wrote:
>
> Hi:
> > Please also consider TARGET_INTER_UNIT_MOVES_TO_VEC and
> > TARGET_INTER_UNIT_MOVES_FROM_VEC.
> Here's updated patch.
>
> Also honor TARGET_INTER_UNIT_MOVES_TO/FROM_VEC and in
> preferred_{,out_}reload_class.
>
> Bootstrapped and regtested
On Thu, Dec 2, 2021 at 9:36 AM Hongtao Liu wrote:
>
> On Thu, Dec 2, 2021 at 4:27 PM liuhongt wrote:
> >
> > The patch helps reload to choose GENENRAL_REGS alternatives for
> > SSE_FLOAT_MODE and enabled optimization like
> >
> > - vmovd %xmm0, -4(%rsp)
> > - movl$1, %eax
> >
Introduce vec_set_0 pattern for V8HI and V8HF modes to implement scalar
element 0 inserts to from a GP register, SSE register or memory. Also
add V8HI and V8HF AVX2 (x,x,x) alternative to PINSR insn pattern, which is
split after reload to a sequence of PBROADCASTW and PBLENDW.
The V8HF inserts fr
On Mon, Nov 22, 2021 at 10:36 AM Jakub Jelinek wrote:
>
> Hi!
>
> The target attribute handling is very expensive and for the common case
> from x86intrin.h where many functions get implicitly the same target
> attribute, we can speed up compilation a lot by caching it.
>
> The following patches b
On Tue, Nov 30, 2021 at 10:43 AM liuhongt wrote:
>
> ix86_attr_length_immediate_default assume TYPE ishift only have 1
> constant operand,
> but *x86_64_shld_1/*x86_shld_1/*x86_64_shrd_1/*x86_shrd_1 has 2, with
> condition: INTVAL (operands[3]) == 32 - INTVAL (operands[2]) or
> INTVAL (operands[3]
On Mon, Nov 29, 2021 at 10:48 AM Hongtao Liu wrote:
>
> On Mon, Nov 29, 2021 at 3:53 PM Uros Bizjak wrote:
> >
> > On Mon, Nov 29, 2021 at 2:32 AM liuhongt wrote:
> > >
> > > There're several failures reported in [1]:
> > > 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
>
On Mon, Nov 29, 2021 at 7:14 PM Roger Sayle wrote:
>
>
> Hi Uros,
> Many thanks for the review. Here's a revised version of the patch
> incorporating all of your suggestions. This has been (re)tested on
> x86_64-pc-linux-gnu with make bootstrap and make -k check,
> both with and without --target
On Mon, Nov 29, 2021 at 8:46 AM liuhongt wrote:
>
> As discussed in PR, this patch do optimizations:
> 1. No memory is needed to move HI/HFmode between GPR and SSE registers
> under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
> AVX512FP16.
> 2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0
On Mon, Nov 29, 2021 at 2:32 AM liuhongt wrote:
>
> There're several failures reported in [1]:
> 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
> %vpextrw should be used in output templates.
> 2. ICE in get_attr_memory for movhi_internal since some alternatives
> are marked
On Sun, Nov 28, 2021 at 2:25 PM Roger Sayle wrote:
>
>
> This patch addresses PR target/100711 by introducing define_split
> patterns so that not/broadcast/pand may be simplified (by combine)
> to broadcast/pandn. This introduces two splitters one for optimizing
> pandn on TARGET_SSE for V4SI and
On Sun, Nov 28, 2021 at 3:02 PM Roger Sayle wrote:
>
>
> This patch builds on the recent improvements to TImode rotations (and
> Jakub's fixes to shldq/shrdq patterns). Now that expanding a TImode
> rotation can never fail, it is safe to allow general_operand constraints
> on the QImode shift amo
On Sat, Nov 27, 2021 at 10:04 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase is miscompiled because the x86_{,64_}sh{l,r}d
> patterns don't properly describe what the instructions do. One thing
> is left out, in particular that there is initial count &= 63 for
> sh{l,r}dq and initial
On Wed, Nov 24, 2021 at 9:44 AM Kong, Lingling wrote:
>
> Hi,
>
> vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with
> -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
> Cleared before conversion, updated movhi_internal and
> ix86_can_change_mode_c
On Wed, Nov 24, 2021 at 9:06 AM Kong, Lingling wrote:
>
> Hi Uros,
>
> > BTW: When playing with my patch, I introduced (define_insn
> > "*vec_set_0" ...) to optimize scalar load to a vector. Does
> > ix86_expand_vector_set work OK without this pattern?
>
> Yes, ix86_expand_vector_set could work
On Wed, Nov 24, 2021 at 7:25 AM Kong, Lingling via Gcc-patches
wrote:
>
> Hi,
>
> vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with
> -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
> And cleared before conversion, updated movhi_internal and
> ix
On Sat, Nov 20, 2021 at 9:20 AM Jakub Jelinek wrote:
>
> Hi!
>
> As shown in the testcase below, if a function has multiple target attributes
> (rather than a single one with one or more arguments) or if a function
> gets one target attribute on one declaration and another one on another
> declara
On Fri, Nov 19, 2021 at 8:50 AM Uros Bizjak wrote:
>
> On Fri, Nov 19, 2021 at 2:14 AM liuhongt wrote:
> >
> > >Why is the above declared as a special memory constraint? Also the
> > Change to define_memory_constraint since it's ok for
> > reload can make them match by converting the operand to t
On Fri, Nov 19, 2021 at 2:14 AM liuhongt wrote:
>
> >Why is the above declared as a special memory constraint? Also the
> Change to define_memory_constraint since it's ok for
> reload can make them match by converting the operand to the form
> ‘(mem (reg X))’.where X is a base register (from the r
On Thu, Nov 18, 2021 at 3:24 PM H.J. Lu wrote:
>
> On Thu, Nov 18, 2021 at 12:25 AM Uros Bizjak wrote:
> >
> > On Wed, Nov 17, 2021 at 2:47 PM H.J. Lu wrote:
> > >
> > > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
> > > via r8-r15 registers when converting indirect
On Wed, Nov 17, 2021 at 2:47 PM H.J. Lu wrote:
>
> Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
> via r8-r15 registers when converting indirect call and jump to increase
> the instruction length to 6, allowing the non-thunk form to be inlined.
>
> gcc/
>
> PR t
On Thu, Nov 18, 2021 at 8:37 AM Hongyu Wang wrote:
>
> Hi Uros,
>
> For -mrelax-cmpxchg-loop introduced by PR 103069/r12-5265, it would
> produce infinite loop. The correct code should be
>
> .L84:
> movl(%rdi), %ecx
> movl%eax, %edx
> orl %esi, %edx
> c
On Thu, Nov 18, 2021 at 8:18 AM liuhongt wrote:
>
> As change in assembler, refer to [1], this patch disallow mask/sse/mmx
> mov in TLS code sequences which require integer MOV instructions.
>
> [1]
> https://sourceware.org/git/?p=binutils-gdb.git;a=patch;h=d7e3e627027fcf37d63e284144fe27ff4eba36b
On Wed, Nov 17, 2021 at 9:33 PM H.J. Lu wrote:
>
> On Wed, Nov 17, 2021 at 11:46 AM Uros Bizjak wrote:
> >
> > On Wed, Nov 17, 2021 at 8:44 PM H.J. Lu wrote:
> > >
> > > Before MPX was removed, "%!" was mapped to
> > >
> > > case '!':
> > > if (ix86_bnd_prefixed_insn_p (current
On Wed, Nov 17, 2021 at 9:02 PM H.J. Lu wrote:
>
> On Wed, Nov 17, 2021 at 7:53 AM Uros Bizjak wrote:
> >
> > On Wed, Nov 17, 2021 at 4:35 PM H.J. Lu wrote:
> > >
> > > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > > for function return and indirect branch by adding a
Change indirect_thunks_used to HARD_REG_SET to avoid recalculations
of correct register numbers and allow usage of SET/TEST_HARD_REG_BIT
accessors.
2021-11-17 Uroš Bizjak
gcc/ChangeLog:
* config/i386/i386.c (indirect_thunks_used): Redefine as HARD_REG_SET.
(ix86_code_end): Use TEST_HA
On Wed, Nov 17, 2021 at 8:44 PM H.J. Lu wrote:
>
> Before MPX was removed, "%!" was mapped to
>
> case '!':
> if (ix86_bnd_prefixed_insn_p (current_output_insn))
> fputs ("bnd ", file);
> return;
>
> After CET was added and MPX was removed, "%!" was mapped t
Introduce LEGACY_SSE_REGNO_P predicate to simplify a couple of places.
No functional changes.
2021-11-17 Uroš Bizjak
gcc/ChangeLog:
* config/i386/i386.h (LEGACY_SSE_REGNO_P): New predicate.
(SSE_REGNO_P): Use LEGACY_SSE_REGNO_P predicate.
* config/i386/i386.c (zero_all_vector_reg
On Wed, Nov 17, 2021 at 4:35 PM H.J. Lu wrote:
>
> Add -mharden-sls= to mitigate against straight line speculation (SLS)
> for function return and indirect branch by adding an INT3 instruction
> after function return and indirect branch.
>
> gcc/
>
> PR target/102952
> * config/i38
On Wed, Nov 17, 2021 at 2:46 PM H.J. Lu wrote:
>
> On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak wrote:
> >
> > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
> > wrote:
> > >
> > > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > > for function return and indirec
On Tue, Nov 16, 2021 at 7:51 PM H.J. Lu via Gcc-patches
wrote:
>
> Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
> via r8-r15 registers when converting indirect call and jump to increase
> the instruction length to 6, allowing the non-thunk form to be inlined.
>
> gcc/
On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
wrote:
>
> Add -mharden-sls= to mitigate against straight line speculation (SLS)
> for function return and indirect branch by adding an INT3 instruction
> after function return and indirect branch.
>
> gcc/
>
> PR target/102952
>
On Thu, Nov 11, 2021 at 12:25 PM Kewen Lin wrote:
>
> This patch is to fix some non-robust split conditions in some
> define_insn_and_splits, to make each of them applied on top of
> the corresponding condition for define_insn part, otherwise the
> splitting could perform unexpectedly.
>
> gcc/Cha
On Tue, Nov 16, 2021 at 9:15 AM Kong, Lingling via Gcc-patches
wrote:
>
> Hi,
>
> vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with
> -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
>
> OK for master?
No, this is the wrong approach. There can be i
On Mon, Nov 15, 2021 at 2:54 PM Roger Sayle wrote:
>
>
> This patch teaches the i386 backend to avoid using BMI2's rorx
> instructions when optimizing for size. The benefits are shown
> with the following example:
>
> unsigned int ror1(unsigned int x) { return (x >> 1) | (x << 31); }
> unsigned i
On Mon, Nov 15, 2021 at 9:01 AM Jakub Jelinek wrote:
>
> On Fri, Nov 12, 2021 at 04:34:27PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > Why? When one uses 16-bit atomics, no matter what he does there will be
> > some HImode math (at least the atomic instruction). And the rest can be
> > deal
501 - 600 of 1175 matches
Mail list logo