https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
Andrew Pinski changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
--- Comment #5 from Alexander Monakov ---
Ah, in that sense. The extra load is problematic in cold code where it's likely
a TLB miss. For hot code: the load does not depend on any previous computations
and so does not increase dependency chains.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
--- Comment #4 from rguenther at suse dot de ---
On January 8, 2020 4:34:40 PM GMT+01:00, "amonakov at gcc dot gnu.org"
wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
>
>--- Comment #3 from Alexander Monakov ---
>> The question is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
--- Comment #3 from Alexander Monakov ---
> The question is for which CPUs is it actually faster to use SSE?
In the context of chains where the source and the destination need to be SSE
registers, pretty much all CPUs? Inter-unit moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
Richard Biener changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
Marc Glisse changed:
What|Removed |Added
Target||x86_64-*-*
--- Comment #1 from Marc