On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang wrote:
>
> From: Kong Lingling
>
> Legacy adc patterns are commonly adopted to TImode add, when extending TImode
> add to NDD version, operands[0] and operands[1] can be different, so extra
> move
> should be emitted if those patterns have optimization
On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang wrote:
>
> From: Kong Lingling
>
> gcc/ChangeLog:
>
> * config/i386/i386.md: (addsi_1_zext): Add new alternatives for
> NDD and adjust output templates.
> (*add_2): Likewise.
> (*addsi_2_zext): Likewise.
> (*add_3)
On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang wrote:
>
> Under APX NDD, previous TImode allocation will have issue that it was
> originally allocated using continuous pair, like rax:rdi, rdi:rdx.
>
> This will cause issue for all TImode NDD patterns. For NDD we will not
> assume the arithmetic operat
On Wed, Nov 29, 2023 at 1:25 PM Richard Biener
wrote:
>
> On Wed, Nov 29, 2023 at 10:35 AM Uros Bizjak wrote:
> >
> > The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:
> >
> > internal compiler error: RTL check: expected elt 0 type 'e
On Mon, Dec 4, 2023 at 8:11 AM Hongtao Liu wrote:
>
> On Fri, Dec 1, 2023 at 10:26 PM Richard Biener
> wrote:
> >
> > On Fri, Dec 1, 2023 at 3:39 AM liuhongt wrote:
> > >
> > > > Hmm, I would suggest you put reg_needed into the class and accumulate
> > > > over all vec_construct, with your patch
On Mon, Dec 4, 2023 at 8:41 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase ICEs with RTL checking, because it sets if
> XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
> is actually an UNSPEC, so any time we see any other insn with PARALLEL
> and a SET in it
On Mon, Dec 4, 2023 at 8:35 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase ICEs, because the signbit2 expander uses an
> explicit SUBREG in the pattern around match_operand with register_operand
> predicate. If we are unlucky enough that expansion tries to expand it
> with some SUBREG
On Thu, Nov 30, 2023 at 9:21 AM Segher Boessenkool
wrote:
>
> Hi!
>
> On Wed, Nov 29, 2023 at 02:20:03PM +0100, Uros Bizjak wrote:
> > On Wed, Nov 29, 2023 at 1:25 PM Richard Biener
> > wrote:
> > > On Wed, Nov 29, 2023 at 10:35 AM Uros Bizjak wrote:
> > I
On Wed, Nov 29, 2023 at 1:25 PM Richard Biener
wrote:
>
> On Wed, Nov 29, 2023 at 10:35 AM Uros Bizjak wrote:
> >
> > The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:
> >
> > internal compiler error: RTL check: expected elt 0 type 'e
The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:
internal compiler error: RTL check: expected elt 0 type 'e' or 'u',
have 'E' (rtx unspec) in try_combine, at combine.cc:3237
This is
3236 /* Just replace the CC reg with a new mode. */
3237 SUBST
REPZ CMPSB instruction does not update FLAGS register when %ecx register
equals zero. Improve cmpstrnqi_1 insn pattern to set FLAGS_REG to its
previous value instead of (const_int 0) when operand 2 equals zero.
PR target/112494
gcc/ChangeLog:
* config/i386/i386.md (cmpstrnqi_1): Set FLA
On Sat, Nov 25, 2023 at 8:19 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase is miscompiled in GCC 14 because the
> *jcc_bt_mask and *jcc_bt_mask_1 patterns have just
> one argument in (match_operator 0 "bt_comparison_operator" [...])
> but as bt_comparison_operator is eq,ne, we need tw
For -mcmodel=large, we have to load function address to a register.
PR target/112686
gcc/ChangeLog:
* config/i386/i386.cc (ix86_expand_split_stack_prologue): Load
function address to a register for ix86_cmodel == CM_LARGE.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112686.c:
On Fri, Nov 24, 2023 at 9:31 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase ICEs, because cbranchv16qi4 expansion calls
> ix86_expand_branch with op1 being a pre-AVX unaligned memory and
> ix86_expand_branch emits a xorv16qi3 instruction without making sure
> the operand predicates are
gen_parityhi2_cmp instruction clobbers its input operand, so use
a temporary register in the call to gen_parityhi2_cmp.
PR target/112672
gcc/ChangeLog:
* config/i386/i386.md (parityhi2):
Use temporary register in the call to gen_parityhi2_cmp.
gcc/testsuite/ChangeLog:
* gcc.tar
On Mon, Nov 20, 2023 at 5:33 PM Uros Bizjak wrote:
>
> With the above two options, use a temporary register regno (as returned
> from split_stack_prologue_scratch_regno) as an indirect call scratch
> register to hold __morestack function address. On 64-bit targets, two
> temporar
With the above two options, use a temporary register regno (as returned
from split_stack_prologue_scratch_regno) as an indirect call scratch
register to hold __morestack function address. On 64-bit targets, two
temporary registers are always available, so load the function address in
%r11 and call
Sometimes the compiler emits the following code with qi_ext_0:
shrl$8, %eax
addb%bh, %al
Patch introduces new low part QImode insn patterns with both of
their input arguments extracted from high register. This invalid
insn is split after reload to a move from the high reg
Also fix some indentitation inconsistencies.
PR target/112567
gcc/ChangeLog:
* config/i386/i386.md (*qi_ext_1_slp):
Fix generation of invalid RTX in split pattern.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/confi
On Thu, Nov 16, 2023 at 8:16 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase is miscompiled on x86_64 since PR110551 r14-4968
> commit. That commit added 2 peephole2s, one for
> mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY
> which I believe is ok, and another one f
On Tue, Nov 14, 2023 at 6:51 PM Jakub Jelinek wrote:
>
> On Mon, Nov 13, 2023 at 10:49:23PM +0100, Uros Bizjak wrote:
> > Combine wants to combine following instructions into an insn that can
> > perform both an (arithmetic) operation and set the condition code. During
> &
Following testcase:
struct S1
{
unsigned char val;
unsigned char pad1;
unsigned short pad2;
};
struct S2
{
unsigned char pad1;
unsigned char val;
unsigned short pad2;
};
struct S1 test_add (struct S1 a, struct S2 b, struct S2 c)
{
a.val = b.val + c.val;
return a;
}
compiles wit
PR target/112540
gcc/ChangeLog:
* config/i386/i386.md (*addqi_ext_1_slp):
Correct operand numbers in split pattern. Replace !Q constraint
of operand 1 with !qm. Add insn constrain.
(*subqi_ext_1_slp): Ditto.
(*qi_ext_1_slp): Ditto.
Bootstrapped and regression tested on
On Tue, Nov 14, 2023 at 6:37 PM Uros Bizjak wrote:
> PR target/78904
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (*addqi_ext_1_slp):
> New define_insn_and_split pattern.
> (*subqi_ext_1_slp): Ditto.
> (*qi_ext_1_slp): Ditto.
>
> gcc/testsuite/C
Following testcase:
struct S1
{
unsigned char val;
unsigned char pad1;
unsigned short pad2;
};
struct S2
{
unsigned char pad1;
unsigned char val;
unsigned short pad2;
};
struct S1 test_and (struct S1 a, struct S2 b)
{
a.val &= b.val;
return a;
}
compiles with -O2 to:
movl
On Tue, Nov 14, 2023 at 1:07 PM Jakub Jelinek wrote:
>
> Hi!
>
> On Sun, Nov 12, 2023 at 09:03:42PM -, Roger Sayle wrote:
> > This patch improves register pressure during reload, inspired by PR 97756.
> > Normally, a double-word right-shift by a constant produces a double-word
> > result, the
Flags reg is valid only with CC mode.
gcc/ChangeLog:
* config/i386/i386-expand.h (gen_pushfl): New prototype.
(gen_popfl): Ditto.
* config/i386/i386-expand.cc (ix86_expand_builtin)
[case IX86_BUILTIN_READ_FLAGS]: Use gen_pushfl.
[case IX86_BUILTIN_WRITE_FLAGS]: Use gen_popfl.
Combine wants to combine following instructions into an insn that can
perform both an (arithmetic) operation and set the condition code. During
the conversion a new RTX is created, and combine passes the RTX code of the
innermost RTX expression of the CC use insn in which CC reg is used to
SELECT_
On Sun, Nov 12, 2023 at 10:03 PM Roger Sayle wrote:
>
>
> This patch improves register pressure during reload, inspired by PR 97756.
> Normally, a double-word right-shift by a constant produces a double-word
> result, the highpart of which is dead when followed by a truncation.
> The dead code cal
The relevant peephole2 will never generate alternative (=m,=&a,0,m) because
operand 1 is not dead before the peephole2 pattern.
gcc/ChangeLog:
* config/i386/i386.md (*stack_protect_set_4s__di):
Remove alternative 0.
Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
Uros.
diff -
Use unrelated register initializations using zero/sign-extend instructions
to clear stack protector scratch register.
Handle only SI -> DImode extensions for 64-bit targets, as this is the
only extension that triggers the peephole in a non-negligible number.
Also use explicit check for word_mode
Improve stack protector patterns and peephole2s even more:
a. Use unrelated register clears with integer mode size <= word
mode size to clear stack protector scratch register.
b. Use unrelated register initializations in front of stack
protector sequence to clear stack protector scratch reg
On Wed, Nov 8, 2023 at 5:37 PM Alexandre Oliva wrote:
>
> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598872.html
>
> Looking at the code generated for sse2-{load,store}-multi.c with PIE,
> I realized we could use UNSPEC_GOTOFF as a base address, and that this
> would enable the te
LRA is not able to reload zero_extracted in-out operand with matched input
operand in the same way as strict_low_part in-out operand. The patch
applies the strict_low_part workaround, where we allow LRA to generate
an instruction with non-matched input operand, which is split post reload
to the in
There is no need to check the mode in flags_reg_operand predicate. The
mode in flags setting instructions is checked with ix86_match_ccmode.
The patch avoids "warning: operand X missing mode?" warnings with
VOIDmode flags_reg_operand predicate.
gcc/ChangeLog:
* config/i386/predicates.md ("fl
Use "addr" attribute with "gpr8" value to limit address register class
to non-REX registers in instructions with high registers, where REX
registers can not be used in the address.
gcc/ChangeLog:
* config/i386/constraints.md (Bc): Remove constraint.
(Bn): Rewrite to use x86_extended_reg_m
Also rename LEGACY_REGS to LEGACY_GENERAL_REGS.
gcc/ChangeLog:
* config/i386/i386.h (enum reg_class): Add LEGACY_INDEX_REGS.
Rename LEGACY_REGS to LEGACY_GENERAL_REGS.
(REG_CLASS_NAMES): Ditto.
(REG_CLASS_CONTENTS): Ditto.
* config/i386/constraints.md ("R"): Update for rename.
The patch generalizes address register class handling to allow multiple
register classes. For APX EGPR targets, some instructions do not support
GPR32 registers, so it is necessary to limit address register set to
avoid them. The same situation happens for instructions with high registers,
where
023年11月3日周五 20:50写道:
> >
> > On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak wrote:
> > >
> > > The patch generalizes address register class handling to allow multiple
> > > address register classes. For APX EGPR targets, some instructions can't
> >
The patch generalizes address register class handling to allow multiple
address register classes. For APX EGPR targets, some instructions can't be
encoded with REX2 prefix, so it is necessary to limit address register
class to avoid REX2 registers. The same situation happens for instructions
with
Move stack protector patterns above mov $0,%reg -> xor %reg,%reg
so the latter won't interfere with stack protector peephole2s.
gcc/ChangeLog:
* config/i386/i386.md: Move stack protector patterns
above mov $0,%reg -> xor %reg,%reg peephole2 pattern.
Bootstrapped and regression tested on
On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak
> > Sent: 01 November 2023 10:05
> > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> > using peephole2.
> >
> > On Mon
On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle wrote:
>
>
> This patch is a follow-up to my previous PR target/110551 patch, this
> time to address the additional move after mulx, seen on TARGET_BMI2
> architectures (such as -march=haswell). The complication here is
> that the flexible multiple-set
Improve stack protector patterns and peephole2s to substitute stack
protector scratch register clear with unrelated subsequent register
initialization in several ways:
a. Explicitly generate scratch register as named pseudo. This allows
optimizers to eventually reuse the zero value in the registe
On Mon, Oct 30, 2023 at 10:08 AM Mayshao-oc wrote:
>
> >On Fri, Oct 27, 2023 at 12:20 PM mayshao wrote:
> >>
> >> On 2023/10/26 17:34, Uros Bizjak wrote:
> >> > On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote:
> >> >>
> >> >>
On Mon, Oct 30, 2023 at 12:53 PM FX Coudert wrote:
>
> Hi,
>
> The newly introduced test gcc.target/i386/pr111698.c currently fails on
> Darwin, where the default arch is core2.
> Andrew suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112287 to
> pass a recent value to -march, and I ca
On Fri, Oct 27, 2023 at 12:20 PM mayshao wrote:
>
> On 2023/10/26 17:34, Uros Bizjak wrote:
> > On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote:
> >>
> >> Hi all:
> >> This patch enables -march/-mtune=yongfeng, costs and tunings are set
> >> a
On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote:
>
> Hi all:
> This patch enables -march/-mtune=yongfeng, costs and tunings are set
> according to the characteristics of the processor. We add a new md file to
> describe yongfeng processor.
>
> Bootstrapped /regtested X86_64.
>
> Ok for
sses before reload check both predicates and
> constraints.
>
> My original patch fixes PR 110511, using the same peephole2 idiom as already
> used elsewhere in i386.md. Ok for mainline?
Thanks for the explanation. The patch is OK.
> > -Original Message-
> > From: U
i386: Narrow test instructions with immediate operands [PR111698]
Narrow test instructions with immediate operand that test memory location
for zero. E.g. testl $0x00aa, mem can be converted to testb $0xaa, mem+2.
Reject targets where reading (possibly unaligned) part of memory location
after
On Mon, Oct 23, 2023 at 4:47 PM Roger Sayle wrote:
>
>
> The eagle-eyed may have spotted that my recent testcases for DImode shifts
> on x86_64 included -mno-stv in the dg-options. This is because the
> Scalar-To-Vector (STV) pass currently transforms these shifts to use
> SSE vector operations,
On Tue, Oct 24, 2023 at 12:08 PM Richard Sandiford
wrote:
>
> For the V2HI -> V2SI zero extension in:
>
> typedef unsigned short v2hi __attribute__((vector_size(4)));
> typedef unsigned int v2si __attribute__((vector_size(8)));
> v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }
>
> ix86_expan
On Fri, Oct 20, 2023 at 8:54 AM liuhongt wrote:
>
> When I'm working on enable more 32/64-bit vectorization for _Float16,
> I notice there's 1 redundant define_expand, the patch removed the expander.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
On Tue, Oct 17, 2023 at 9:05 PM Roger Sayle wrote:
>
>
> This patch contains clean-ups of the widening multiplication patterns in
> i386.md, and provides variants of the existing highpart multiplication
> peephole2 transformations (that tidy up register allocation after
> reload), and thereby fixe
On Tue, Oct 17, 2023 at 7:54 PM Roger Sayle wrote:
>
>
> Hi Uros,
> Thanks for the speedy review.
>
> > From: Uros Bizjak
> > Sent: 17 October 2023 17:38
> >
> > On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle
> > wrote:
> > >
> > >
&g
On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle wrote:
>
>
> This patch is the backend piece of a solution to PRs 101955 and 106245,
> that adds a define_insn_and_split to the i386 backend, to perform sign
> extension of a single (least significant) bit using AND $1 then NEG.
>
> Previously, (x<<31)>>
On Mon, Oct 16, 2023 at 9:58 PM Fangrui Song wrote:
>
> On Mon, Oct 16, 2023 at 12:10 PM Uros Bizjak wrote:
> >
> > On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote:
> > >
> > > On 2023-10-16, Uros Bizjak wrote:
> > > >On Tue, Aug 1, 2023 at 9:5
On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote:
>
> On 2023-10-16, Uros Bizjak wrote:
> >On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote:
> >>
> >> When using -mcmodel=medium, large data objects larger than the
> >> -mlarge-data-threshold thresh
On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote:
>
> When using -mcmodel=medium, large data objects larger than the
> -mlarge-data-threshold threshold are placed into large data sections
> (.lrodata, .ldata, .lbss and some variants). GNU ld and ld.lld 17 place
> .l* sections into separate outpu
On Fri, Oct 6, 2023 at 3:59 PM Roger Sayle wrote:
>
>
> Grr! I've done it again. ENOPATCH.
>
> > -Original Message-
> > From: Roger Sayle
> > Sent: 06 October 2023 14:58
> > To: 'gcc-patches@gcc.gnu.org'
> > Cc: 'Uros Bizja
The stringop strategy selection algorithm falls back to a libcall strategy
when it exhausts its pool of available strategies. The memory area copy
function (memcpy) is not available from the system library for non-default
address spaces, so the compiler emits the most trivial byte-at-a-time
copy l
On Thu, Oct 5, 2023 at 1:45 PM Roger Sayle wrote:
>
> Doh! ENOPATCH.
>
> > -Original Message-
> > From: Roger Sayle
> > Sent: 05 October 2023 12:44
> > To: 'gcc-patches@gcc.gnu.org'
> > Cc: 'Uros Bizjak'
> > Subject: [X8
On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle wrote:
>
>
> This patch avoids long lea instructions for performing x<<2 and x<<3
> by splitting them into shorter sal and move (or xchg instructions).
> Because this increases the number of instructions, but reduces the
> total size, its suitable for -O
PR target/111340
gcc/ChangeLog:
* config/i386/i386.cc (output_pic_addr_const): Handle CONST_WIDE_INT.
Call output_addr_const for CASE_CONST_SCALAR_INT.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr111340.c: New test.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32
On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov wrote:
>
>
> On 9/1/23 05:07, Hongyu Wang wrote:
> > Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:16写道:
> >> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote:
> >>> From: Kong Lingling
> >>>
>
On Mon, Sep 4, 2023 at 2:28 AM Hongtao Liu wrote:
> > > > > > > I think there should be some constraint which explicitly has all
> > > > > > > the 32
> > > > > > > GPRs, like there is one for just all 16 GPRs (h), so that
> > > > > > > regardless of
> > > > > > > -mapx-inline-asm-use-gpr32 one
On Fri, Sep 1, 2023 at 12:36 PM Hongtao Liu wrote:
>
> On Fri, Sep 1, 2023 at 5:38 PM Uros Bizjak via Gcc-patches
> wrote:
> >
> > On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang wrote:
> > >
> > > Uros Bizjak via Gcc-patches 于2023年8月31日周四
> > > 18:
On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang wrote:
>
> Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:01写道:
> >
> > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> > wrote:
> > >
> > > On Thu, Aug 31, 2023 at 04:20:17PM +0800
On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote:
>
> From: Kong Lingling
>
> Current reload infrastructure does not support selective base_reg_class
> for backend insn. Add insn argument to base_reg_class for
> lra/reload usage.
I don't think this is the correct approach. Ideally, a memory
co
On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote:
>
> From: Kong Lingling
>
> These legacy insn in opcode map0/1 only support GPR16,
> and do not have vex/evex counterpart, directly adjust constraints and
> add gpr32 attr to patterns.
>
> insn list:
> 1. xsave/xsave64, xrstor/xrstor64
> 2. xsav
On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
wrote:
>
> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > From: Kong Lingling
> >
> > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > usage by default from mapping the comm
gcc/fortran/ChangeLog:
* match.cc (gfc_match_equivalence): Rename TRUE/FALSE to true/false.
* module.cc (check_access): Ditto.
* primary.cc (match_real_constant): Ditto.
* trans-array.cc (gfc_trans_allocate_array_storage): Ditto.
(get_array_ctor_strlen): Ditto.
* trans-comm
gcc/c-family/ChangeLog:
* c-format.cc (read_any_format_width):
Rename TRUE/FALSE to true/false.
gcc/ChangeLog:
* caller-save.cc (new_saved_hard_reg):
Rename TRUE/FALSE to true/false.
(setup_save_areas): Ditto.
* gcc.cc (set_collect_gcc_options): Ditto.
(driver::build_
Add new pattern involving vec_merge RTX that is produced by combine from the
combination of sse4_1_pinsrq and *movdi_internal:
7: r86:DI=0
8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2)
REG_DEAD r87:V2DI
REG_DEAD r86:DI
Successfully matched this instruction:
(set (re
On Wed, Aug 9, 2023 at 8:19 PM Jakub Jelinek wrote:
>
> Hi!
>
> The following patch enables _BitInt support on x86-64, the only
> target which has _BitInt specified in psABI.
>
> 2023-08-09 Jakub Jelinek
>
> PR c/102989
> * config/i386/i386.cc (classify_argument): Handle BITINT_
Disable (=&r,m,m) alternative for 32-bit targets. The combination of two
memory operands (possibly with complex addressing mode), early clobbered
output, frame pointer and PIC registers uses too many registers on
a register constrained 32-bit target.
Also merge two similar patterns using DWIH mode
Partial vector src is forced to a register as ops[1], we can use it
instead of SRC in the call to ix86_expand_sse_cmp. This change avoids
forcing operand[1] to a register in sign/zero-extend expanders.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_sse_extend): Use ops[1]
inste
Implement vector extend and zero_extend functionality for TARGET_SSE2 using
PUNPCKL?? family of instructions. The code for e.g. zero-extend from V2SI to
V2DImode improves from:
movd%xmm0, %edx
pshufd $85, %xmm0, %xmm0
movd%xmm0, %eax
movq%rdx, (%rdi)
On Mon, Aug 14, 2023 at 4:46 AM liuhongt via Gcc-patches
wrote:
>
> vmovapd can enable register renaming and have same code size as
> vmovsd. Similar for vmovsh vs vmovaps, vmovaps is 1 byte less than
> vmovsh.
>
> When TARGET_AVX512VL is not available, still generate
> vmovsd/vmovss/vmovsh to avo
On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
wrote:
>
> On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote:
> >
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,
On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote:
>
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
>
> The patch support 2 standardizing options to
On Thu, Aug 10, 2023 at 2:49 AM liuhongt wrote:
>
> Also add ix86_partial_vec_fp_math to to condition of V2HF/V4HF named
> patterns in order to avoid generation of partial vector V8HFmode
> trapping instructions.
>
> Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}
> Ok for trunk?
>
> gcc/
On Mon, Aug 7, 2023 at 1:20 PM Richard Biener
wrote:
> > Please also note the RFC patch [1] that relaxes clears for V2SFmode
> > with -fno-trapping-math. The patched compiler will then emit the same
> > code as clang does for -O2. Which raises another question - should gcc
> > default to -fno-tra
On Wed, Aug 9, 2023 at 8:38 AM Uros Bizjak wrote:
>
> On Wed, Aug 9, 2023 at 8:37 AM Liu, Hongtao wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Uros Bizjak
> > > Sent: Wednesday, August 9, 2023 2:33 PM
> > > To: Liu,
On Wed, Aug 9, 2023 at 8:37 AM Liu, Hongtao wrote:
>
>
>
> > -Original Message-
> > From: Uros Bizjak
> > Sent: Wednesday, August 9, 2023 2:33 PM
> > To: Liu, Hongtao
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH V2] [X86] Work
On Wed, Aug 9, 2023 at 3:48 AM liuhongt wrote:
>
> > Please rather do it in a more self-descriptive way, as proposed in the
> > attached patch. You won't need a comment then.
> >
>
> Adjusted in V2 patch.
>
> Don't access leaf 7 subleaf 1 unless subleaf 0 says it is
> supported via EAX.
>
> Intel
On Wed, Aug 9, 2023 at 3:48 AM liuhongt wrote:
>
> > Please rather do it in a more self-descriptive way, as proposed in the
> > attached patch. You won't need a comment then.
> >
>
> Adjusted in V2 patch.
>
> Don't access leaf 7 subleaf 1 unless subleaf 0 says it is
> supported via EAX.
>
> Intel
Also introduce -m[no-]partial-vector-fp-math option to disable trapping
V2SF named patterns in order to avoid generation of partial vector V4SFmode
trapping instructions.
The new option is enabled by default, because even with sanitization,
a small but consistent speed up of 2 to 3% with Polyhedro
On Tue, Aug 8, 2023 at 9:58 AM liuhongt wrote:
>
> Don't access leaf 7 subleaf 1 unless subleaf 0 says it is
> supported via EAX.
>
> Intel documentation says invalid subleaves return 0. We had been
> relying on that behavior instead of checking the max sublef number.
>
> It appears that some Sand
On Tue, Aug 8, 2023 at 12:08 PM Richard Biener wrote:
> > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > > > > > named patterns in order to avoid generation of partial vector
> > > > > > V4SFmode
> > > > > > trapping instructions.
> > > > > >
> > > > > > The new
On Tue, Aug 8, 2023 at 10:07 AM Richard Biener wrote:
>
> On Mon, 7 Aug 2023, Uros Bizjak wrote:
>
> > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener wrote:
> > >
> > > On Sun, 30 Jul 2023, Uros Bizjak wrote:
> > >
> > > > Also introduce
On Mon, Jul 31, 2023 at 11:40 AM Richard Biener wrote:
>
> On Sun, 30 Jul 2023, Uros Bizjak wrote:
>
> > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > named patterns in order to avoid generation of partial vector V4SFmode
> > trapping in
ith and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
>
> 2023-08-07 Roger Sayle
> Uros Bizjak
>
> gcc/ChangeLog
> PR target/107671
> * config/i386/i386.md (*bt_setc_mask): Allow the
> shift count to
On Mon, Aug 7, 2023 at 10:57 AM liuhongt wrote:
>
> Similar like r14-2786-gade30fad6669e5, the patch is for V4HF/V2HFmode.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/110762
> * config/i386/mmx.md (3): Changed from
On Thu, Aug 3, 2023 at 9:10 AM Roger Sayle wrote:
>
>
> This patch is the final piece in the series to improve the ABI issues
> affecting PR 88873. The previous patches tackled inserting DFmode
> values into V2DFmode registers, by introducing insvti_{low,high}part
> patterns. This patch improves
On Thu, Aug 3, 2023 at 12:18 AM Roger Sayle wrote:
>
>
> This patch is a conservative fix for PR target/110792, a wrong-code
> regression affecting doubleword rotations by BITS_PER_WORD, which
> effectively swaps the highpart and lowpart words, when the source to be
> rotated resides in memory. Th
On Wed, Aug 2, 2023 at 3:33 AM liuhongt wrote:
>
> In [1], I propose a patch to generate vmovdqu for all vlddqu intrinsics
> after AVX2, it's rejected as
> > The instruction is reachable only as __builtin_ia32_lddqu* (aka
> > _mm_lddqu_si*), so it was chosen by the programmer for a reason. I
> > t
On Mon, Jul 31, 2023 at 11:40 AM Richard Biener wrote:
>
> On Sun, 30 Jul 2023, Uros Bizjak wrote:
>
> > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > named patterns in order to avoid generation of partial vector V4SFmode
> > trapping in
Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
named patterns in order to avoid generation of partial vector V4SFmode
trapping instructions.
The new option is enabled by default, because even with sanitization,
a small but consistent speed up of 2 to 3% with Polyhedron capaci
The testcase should use dg-additional-options instead of dg-options to
not overwrite default compile flags that include path for finding
the IEEE modules.
gcc/testsuite/ChangeLog:
* gfortran.dg/ieee/comparisons_3.F90: Use dg-additional-options
instead of dg-options.
Tested on x86_64-linu
201 - 300 of 1006 matches
Mail list logo