Li; Zamyatin, Igor
> Subject: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
>
> For the following testcase 1.c, on westmere and sandybridge, performance with
> the option -mtune=^use_vector_fp_converts is better (improves from 3.46s to
> 2.83s). It means cvtss2sd is often b
> Sent: Thursday, September 12, 2013 2:51 AM
>> To: GCC Patches
>> Cc: David Li; Zamyatin, Igor
>> Subject: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
>>
>> For the following testcase 1.c, on westmere and sandybridge, performance
>> with the option -mtun
Ccing Uros. Changes in i386.md could be related to the fix for PR57954.
Thanks,
Igor
-Original Message-
From: Wei Mi [mailto:w...@google.com]
Sent: Thursday, September 12, 2013 2:51 AM
To: GCC Patches
Cc: David Li; Zamyatin, Igor
Subject: [PATCH] disable use_vector_fp_converts for
Ping.
> -Original Message-
> From: Wei Mi [mailto:w...@google.com]
> Sent: Thursday, September 12, 2013 2:51 AM
> To: GCC Patches
> Cc: David Li; Zamyatin, Igor
> Subject: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
>
> For the following testcase 1.c, on
For the following testcase 1.c, on westmere and sandybridge,
performance with the option -mtune=^use_vector_fp_converts is better
(improves from 3.46s to 2.83s). It means cvtss2sd is often better than
unpcklps+cvtps2pd on recent x86 platforms.
1.c:
float total = 0.2;
int k = 5;
int main() {
int
> Hi Wei Mi,
>
> Have you checked in your patch?
>
> --
> H.J.
No, I havn't. Honza wants me to wait for his testing on AMD hardware.
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html
Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
> > USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
> >
> > On Wed, Sep 15, 2021 at 10:10 AM wrote:
> > >
> > > From: "H.J. Lu"
> > >
> > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> >
> > Hi Wei Mi,
> >
> > Have you checked in your patch?
> >
> > --
> > H.J.
>
> No, I havn't. Honza wants me to wait for his testing on AMD hardware.
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html
I only wanted to separate it from the changes in generic so the regular testers
can pick it
> -Original Message-
> From: Uros Bizjak
> Sent: Thursday, September 16, 2021 2:28 PM
> To: Cui, Lili
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; H. J. Lu
>
> Subject: Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
> USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONV
uite/gcc.target/i386/pr101900-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake -mfpmath=sse
-mtune-ctrl=use_vector_fp_converts" } */
+
+extern float f;
+extern double d;
+extern int i;
+
+void
+foo (void)
+{
+ d = f;
+ f = i;
+}
+
+/* { dg-final { scan-
On Tue, Oct 1, 2013 at 3:50 PM, Jan Hubicka wrote:
>> > Hi Wei Mi,
>> >
>> > Have you checked in your patch?
>> >
>> > --
>> > H.J.
>>
>> No, I havn't. Honza wants me to wait for his testing on AMD hardware.
>> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html
> I only wanted to separate it
est to not use this:
> >>
> >> Assembly/Compiler Coding Rule 33. (M impact, H generality)
> >> INC and DEC instructions should be replaced with ADD or SUB instructions,
> >> because ADD and SUB overwrite all flags, whereas INC and DEC do not,
> >> ther
gt; diff --git a/gcc/testsuite/gcc.target/i386/pr101900-1.c
> b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> new file mode 100644
> index 000..0a45f8e340a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
>
>> > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00884.html
>
> This patch seems resonable. (in fact I have pretty much same in my tree)
> use_vector_fp_converts is actually trying to solve the same problem in AMD
> hardware - you need to type the whole register when convert
rly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY
gcc/common/config/i386/i386-common.c | 2 +-
gcc/config/i386/i386-features.c | 23 +++-
gcc/config/i386/i386-options.c| 2 +-
gcc/config/i
/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -386,6 +386,10 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
"use_vector_fp_converts",
from integer to FP. */
DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, "use_vector_converts", m_AMDFAM10)
+/* X86_TUNE_SLOW_SHUFB
On Fri, Sep 17, 2021 at 08:35:57AM +0200, Uros Bizjak via Gcc-patches wrote:
> > > On Wed, Sep 15, 2021 at 10:10 AM wrote:
> > > >
> > > > From: "H.J. Lu"
> > > >
> > > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> > > TARGET_USE_VECTOR_CONVERTS when
> > > > handling avx_partial_xmm_update attribute
On Sat, Sep 18, 2021 at 7:50 AM Jakub Jelinek via Gcc-patches
wrote:
>
> On Fri, Sep 17, 2021 at 08:35:57AM +0200, Uros Bizjak via Gcc-patches wrote:
> > > > On Wed, Sep 15, 2021 at 10:10 AM wrote:
> > > > >
> > > > > From: "H.J. Lu"
> > > > >
> > > > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> >
.
Other change dropped is use_vector_fp_converts that seems to improve
Core perofrmance.
I benchmarked the patch on SPEC2k and earlier it was benchmarked on 2k6
and the performance difference seems in noise. It causes about 0.3% code
size reduction. Main motivation for the patch is to drop some
SUB overwrite all flags, whereas INC and DEC do not, therefore
> creating false dependencies on earlier instructions that set the flags.
>
> Other change dropped is use_vector_fp_converts that seems to improve
> Core perofrmance.
I did not see this in your patch, but Wei has this t
e 33. (M impact, H generality)
>> INC and DEC instructions should be replaced with ADD or SUB instructions,
>> because ADD and SUB overwrite all flags, whereas INC and DEC do not,
>> therefore
>> creating false dependencies on earlier instructions that set the flags.
>>
>&g
ule")
+DEF_TUNE (X86_TUNE_USE_BT, "use_bt")
+DEF_TUNE (X86_TUNE_USE_INCDEC, "use_incdec")
+DEF_TUNE (X86_TUNE_PAD_RETURNS, "pad_returns")
+DEF_TUNE (X86_TUNE_PAD_SHORT_FUNCTION, "pad_short_function")
+DEF_TUNE (X86_TUNE_EXT_80387_CONSTANTS, "ext_80387_cons
ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 4ae5f70..3d395b0 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -193,10 +193,24 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
"us
ix86_tune_features[X86_TUNE_FUSE_ALU_AND_BRANCH]
#define TARGET_OPT_AGU ix86_tune_features[X86_TUNE_OPT_AGU]
#define TARGET_VECTORIZE_DOUBLE \
ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 4ae5f70..3d395b0 10
TARGET_VECTORIZE_DOUBLE \
ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 4ae5f70..3d395b0 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -193,10 +193,24 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
&
_AND_BRANCH_32)
+#define TARGET_FUSE_CMP_AND_BRANCH_SOFLAGS \
+ ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS]
+#define TARGET_FUSE_ALU_AND_BRANCH \
+ ix86_tune_features[X86_TUNE_FUSE_ALU_AND_BRANCH]
#define TARGET_OPT_AGU ix86_tune_features[X86_TUNE_OPT_AGU]
#define
; ix86_tune_features[X86_TUNE_USE_VECTOR_FP_CONVERTS]
> #define TARGET_USE_VECTOR_CONVERTS \
> ix86_tune_features[X86_TUNE_USE_VECTOR_CONVERTS]
> +#define TARGET_FUSE_CMP_AND_BRANCH_32 \
> + ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
> +#define TARG
F_TUNE (X86_TUNE_EPILOGUE_USING_MOVE, "epilogue_using_move")
-DEF_TUNE (X86_TUNE_SHIFT1, "shift1")
-DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep")
-DEF_TUNE (X86_TUNE_INTER_UNIT_MOVES_TO_VEC, "inter_unit_moves_to_vec")
-DEF_TUNE (X86_TUNE_INTER_UNIT_MOVES_FROM_VEC, &q
ot;sse_split_regs")
-DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores")
-DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor")
-DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall")
-DEF_TUNE (X86_TUNE_PROLOGUE_USING_MOVE, "pr
6_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR
than a MOV. */
DEF_TUNE (X86_TUNE_MOVE_M1_VIA_OR, "move_m1_via_or", m_PENT)
+
/* X86_TUNE_NOT_UNPAIRABLE: NOT is not pairable on Pentium, while XOR is,
but one byte longer. */
DEF_TUNE (X86_TUNE_NOT_UNPAIRA
30 matches
Mail list logo