[Bug target/56676] unnecesary splitted load when using avx2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=98172 Target Milestone|--- |11.0 Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #7 from Andrew Pinski --- Changed the generic tuning by r11-7115-gb80fefd626460f (PR 98172) so fixed.
[Bug target/56676] unnecesary splitted load when using avx2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676 --- Comment #6 from Andrew Pinski --- GCC 11 produces: ``` _Z3fooPiS_: .LFB0: .cfi_startproc vmovdqu (%rdi), %ymm2 vmovdqu 32(%rdi), %ymm3 vpmulld (%rsi), %ymm2, %ymm1 vpmulld 32(%rsi), %ymm3, %ymm0 vpaddd %ymm0, %ymm1, %ymm1 vmovdqu 64(%rdi), %ymm4 vpmulld 64(%rsi), %ymm4, %ymm0 vpaddd %ymm1, %ymm0, %ymm0 vmovdqu 96(%rdi), %ymm1 vpmulld 96(%rsi), %ymm1, %ymm1 vpaddd %ymm0, %ymm1, %ymm1 vextracti128$0x1, %ymm1, %xmm0 vpaddd %xmm1, %xmm0, %xmm0 vpsrldq $8, %xmm0, %xmm1 vpaddd %xmm1, %xmm0, %xmm0 vpsrldq $4, %xmm0, %xmm1 vpaddd %xmm1, %xmm0, %xmm0 vmovd %xmm0, %eax vzeroupper ret ``` While GCC 10 produces: ``` _Z3fooPiS_: .LFB0: .cfi_startproc vmovdqu (%rdi), %xmm3 vmovdqu (%rsi), %xmm4 vinserti128 $0x1, 16(%rdi), %ymm3, %ymm1 vinserti128 $0x1, 16(%rsi), %ymm4, %ymm0 vmovdqu 32(%rdi), %xmm5 vmovdqu 32(%rsi), %xmm6 vpmulld %ymm1, %ymm0, %ymm0 vmovdqu 64(%rdi), %xmm7 vmovdqu 64(%rsi), %xmm3 vinserti128 $0x1, 48(%rdi), %ymm5, %ymm2 vinserti128 $0x1, 48(%rsi), %ymm6, %ymm1 vmovdqu 96(%rsi), %xmm4 vmovdqu 96(%rdi), %xmm5 vpmulld %ymm2, %ymm1, %ymm1 vinserti128 $0x1, 80(%rdi), %ymm7, %ymm2 vpaddd %ymm1, %ymm0, %ymm0 vinserti128 $0x1, 80(%rsi), %ymm3, %ymm1 vpmulld %ymm2, %ymm1, %ymm1 vinserti128 $0x1, 112(%rsi), %ymm4, %ymm2 vpaddd %ymm0, %ymm1, %ymm0 vinserti128 $0x1, 112(%rdi), %ymm5, %ymm1 vpmulld %ymm2, %ymm1, %ymm1 vpaddd %ymm0, %ymm1, %ymm1 vmovdqa %xmm1, %xmm0 vextracti128$0x1, %ymm1, %xmm1 vpaddd %xmm1, %xmm0, %xmm0 vpsrldq $8, %xmm0, %xmm1 vpaddd %xmm1, %xmm0, %xmm0 vpsrldq $4, %xmm0, %xmm1 vpaddd %xmm1, %xmm0, %xmm0 vmovd %xmm0, %eax vzeroupper ret ```
[Bug target/56676] unnecesary splitted load when using avx2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676 Vedran Miletic changed: What|Removed |Added CC||rivanvx at gmail dot com --- Comment #5 from Vedran Miletic --- Confirmed still affecting GCC 6.2.1. Similar C++ example: #include #include float f(std::vector& A, std::vector& B) { return std::inner_product(A.begin(), A.end(), B.begin(), 0.f); }
[Bug target/56676] unnecesary splitted load when using avx2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org 2013-03-21 13:30:42 UTC --- I believe we split unaligned loads by default because that's faster for generic tuning.
[Bug target/56676] unnecesary splitted load when using avx2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676 --- Comment #2 from Ondrej Bilka neleai at seznam dot cz 2013-03-21 14:53:26 UTC --- On Thu, Mar 21, 2013 at 01:30:42PM +, rguenth at gcc dot gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org 2013-03-21 13:30:42 UTC --- I believe we split unaligned loads by default because that's faster for generic tuning. I used avx2 which is far from generic. Now only haswell supports it. Documentation says it supports 2 32byte loads per cycle. Unless 32 byte loads have bigger latency they will be more effective. -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You reported the bug.
[Bug target/56676] unnecesary splitted load when using avx2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676 --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org 2013-03-21 15:11:14 UTC --- Well, while true we don't adjust tuning based on that. Use -march=core-avx2 instead.
[Bug target/56676] unnecesary splitted load when using avx2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676 Igor Zamyatin izamyatin at gmail dot com changed: What|Removed |Added CC||izamyatin at gmail dot com --- Comment #4 from Igor Zamyatin izamyatin at gmail dot com 2013-03-21 15:18:24 UTC --- We (at Intel) used to try to remove splitting for avx2 but saw no reasonable gains in general