https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
Bug 89071 depends on bug 87007, which changed state.
Bug 87007 Summary: [8 Regression] 10% slowdown with -march=skylake-avx512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #22 from Peter Cordes ---
Nice, that's exactly the kind of thing I suggested in bug 80571. If this
covers
* vsqrtss/sd (mem),%merge_into, %xmm
* vpcmpeqd%same,%same, %dest# false dep on KNL / Silvermont
* vcmptrueps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #21 from hjl at gcc dot gnu.org ---
Author: hjl
Date: Fri Feb 22 15:54:08 2019
New Revision: 269119
URL: https://gcc.gnu.org/viewcvs?rev=269119=gcc=rev
Log:
i386: Add pass_remove_partial_avx_dependency
With -mavx, for
$ cat foo.i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
Uroš Bizjak changed:
What|Removed |Added
Target Milestone|--- |9.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #19 from H.J. Lu ---
(In reply to Uroš Bizjak from comment #18)
> The only remaining question is on cvtsd2ss mem->xmm, where ICC goes with the
> same strategy as with other non-conversion SSE unops:
>
>vmovsdd(%rip), %xmm0
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #18 from Uroš Bizjak ---
The only remaining question is on cvtsd2ss mem->xmm, where ICC goes with the
same strategy as with other non-conversion SSE unops:
vmovsdd(%rip), %xmm0
vcvtsd2ss %xmm0, %xmm0, %xmm0
but with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #17 from uros at gcc dot gnu.org ---
Author: uros
Date: Sun Feb 3 16:48:41 2019
New Revision: 268496
URL: https://gcc.gnu.org/viewcvs?rev=268496=gcc=rev
Log:
PR target/89071
* config/i386/i386.md (*sqrt2_sse): Add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #16 from Uroš Bizjak ---
(In reply to Peter Cordes from comment #15)
> (In reply to Uroš Bizjak from comment #13)
> > I assume that memory inputs are not problematic for SSE/AVX {R,}SQRT, RCP
> > and ROUND instructions. Contrary to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #15 from Peter Cordes ---
(In reply to Uroš Bizjak from comment #13)
> I assume that memory inputs are not problematic for SSE/AVX {R,}SQRT, RCP
> and ROUND instructions. Contrary to CVTSI2S{S,D}, CVTSS2SD and CVTSD2SS, we
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
Uroš Bizjak changed:
What|Removed |Added
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #13 from Uroš Bizjak ---
I assume that memory inputs are not problematic for SSE/AVX {R,}SQRT, RCP and
ROUND instructions. Contrary to CVTSI2S{S,D}, CVTSS2SD and CVTSD2SS, we
currently don't emit XOR clear in front of these
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #12 from Uroš Bizjak ---
(In reply to Peter Cordes from comment #10)
> It also bizarrely uses it for VMOVSS, which gcc should only emit if it
> actually wants to merge (right?). *If* this part of the patch isn't a bug
>
> -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #11 from uros at gcc dot gnu.org ---
Author: uros
Date: Thu Jan 31 20:06:42 2019
New Revision: 268427
URL: https://gcc.gnu.org/viewcvs?rev=268427=gcc=rev
Log:
PR target/89071
* config/i386/i386.md (*extendsfdf2):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #10 from Peter Cordes ---
(In reply to Uroš Bizjak from comment #9)
> There was similar patch for sqrt [1], I think that the approach is
> straightforward, and could be applied to other reg->reg scalar insns as
> well, independently
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #9 from Uroš Bizjak ---
There was similar patch for sqrt [1], I think that the approach is
straightforward, and could be applied to other reg->reg scalar insns as well,
independently of PR87007 patch.
[1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #8 from Peter Cordes ---
Created attachment 45544
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45544=edit
testloop-cvtss2sd.asm
(In reply to H.J. Lu from comment #7)
> I fixed assembly codes and run it on different AVX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #7 from H.J. Lu ---
I fixed assembly codes and run it on different AVX machines.
I got similar results:
./test
sse : 28346518
sse_clear: 28046302
avx : 28214775
avx2 : 28251195
avx_clear: 28092687
avx_clear:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #6 from Peter Cordes ---
(In reply to Peter Cordes from comment #5)
> But whatever the effect is, it's totally unrelated to what you were *trying*
> to test. :/
After adding a `ret` to each AVX function, all 5 are basically the same
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #5 from Peter Cordes ---
(In reply to H.J. Lu from comment #4)
> (In reply to Peter Cordes from comment #2)
> > Can you show some
> > asm where this performs better?
>
> Please try cvtsd2ss branch at:
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #4 from H.J. Lu ---
(In reply to Peter Cordes from comment #2)
> (In reply to H.J. Lu from comment #1)
> > But
> >
> > vxorps %xmm0, %xmm0, %xmm0
> > vcvtsd2ss %xmm1, %xmm0, %xmm0
> >
> > are faster than both.
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #3 from Peter Cordes ---
(In reply to H.J. Lu from comment #1)
I have a patch for PR 87007:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00298.html
>
> which inserts a vxorps at the last possible position. vxorps
> will be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #2 from Peter Cordes ---
(In reply to H.J. Lu from comment #1)
> But
>
> vxorps %xmm0, %xmm0, %xmm0
> vcvtsd2ss %xmm1, %xmm0, %xmm0
>
> are faster than both.
On Skylake-client (i7-6700k), I can't reproduce this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
H.J. Lu changed:
What|Removed |Added
Depends on||87007
--- Comment #1 from H.J. Lu ---
24 matches
Mail list logo