[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #11 from Hongtao.liu --- (In reply to Andrew Pinski from comment #10) > float_double and fix_double don't produce the best code yet. It's because loop vectorizer can only use one vector size, since BB vect supports different vector sizes in the same instance, w/ "-O2 -ftree-slp-vectorize -march=skylake-avx512 -funroll-loops" produce optimal codes, this is related to PR101097.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #10 from Andrew Pinski --- float_double and fix_double don't produce the best code yet.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #9 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:94c0409717bf8bf783963c1d50bb8f4a4732dce7 commit r11-596-g94c0409717bf8bf783963c1d50bb8f4a4732dce7 Author: liuhongt Date: Sat May 23 15:30:58 2020 +0800 Add missing expander for vector float_extend and float_truncate. 2020-05-25 Hongtao Liu gcc/ChangeLog PR target/95125 * config/i386/sse.md (sf2dfmode_lower): New mode attribute. (trunc2) New expander. (extend2): Ditto. gcc/testsuite/ChangeLog * gcc.target/i386/pr95125-avx.c: New test. * gcc.target/i386/pr95125-avx512f.c: Ditto.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #8 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #7) > (In reply to Uroš Bizjak from comment #6) > > (In reply to Hongtao.liu from comment #5) > > > (In reply to Uroš Bizjak from comment #3) > > > > It turns out that a bunch of patterns have to be renamed (and testcases > > > > added). > > > > > > > > Easyhack, waiting for someone to show some love to conversion patterns > > > > in > > > > sse.md. > > > > > > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists. > > > > > > if change **float_double fix_double** to > > > --- > > > void > > > float_double (void) > > > { > > > d[0] = i[0]; > > > d[1] = i[1]; > > > d[2] = i[2]; > > > d[3] = i[3]; > > > } > > > > Hm, the above is vectorized, but the equivalent: > > > > void > > float_double (void) > > { > > for (int n = 0; n < 4; n++) > > d[n] = i[n]; > > } > > > > is not? > > Yes, we're committing to a too high VF here, likely because we pick the > "wrong" vector mode too early. We could eventually fix this up in > the early vectype analysis. It might be worth investigating VECT_COMPARE_COSTS, which weighs the cost of different VFs against each other and is how SVE copes with this. I guess the danger is that it might interfere with -mprefer-* options (although the first VF listed by autovectorize_vector_modes wins in a tie).
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #7 from Richard Biener --- (In reply to Uroš Bizjak from comment #6) > (In reply to Hongtao.liu from comment #5) > > (In reply to Uroš Bizjak from comment #3) > > > It turns out that a bunch of patterns have to be renamed (and testcases > > > added). > > > > > > Easyhack, waiting for someone to show some love to conversion patterns in > > > sse.md. > > > > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists. > > > > if change **float_double fix_double** to > > --- > > void > > float_double (void) > > { > > d[0] = i[0]; > > d[1] = i[1]; > > d[2] = i[2]; > > d[3] = i[3]; > > } > > Hm, the above is vectorized, but the equivalent: > > void > float_double (void) > { > for (int n = 0; n < 4; n++) > d[n] = i[n]; > } > > is not? Yes, we're committing to a too high VF here, likely because we pick the "wrong" vector mode too early. We could eventually fix this up in the early vectype analysis.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 Uroš Bizjak changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #6 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #5) > (In reply to Uroš Bizjak from comment #3) > > It turns out that a bunch of patterns have to be renamed (and testcases > > added). > > > > Easyhack, waiting for someone to show some love to conversion patterns in > > sse.md. > > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists. > > if change **float_double fix_double** to > --- > void > float_double (void) > { > d[0] = i[0]; > d[1] = i[1]; > d[2] = i[2]; > d[3] = i[3]; > } Hm, the above is vectorized, but the equivalent: void float_double (void) { for (int n = 0; n < 4; n++) d[n] = i[n]; } is not?
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #5 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #3) > It turns out that a bunch of patterns have to be renamed (and testcases > added). > > Easyhack, waiting for someone to show some love to conversion patterns in > sse.md. expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists. if change **float_double fix_double** to --- void float_double (void) { d[0] = i[0]; d[1] = i[1]; d[2] = i[2]; d[3] = i[3]; } void fix_double (void) { i[0] = d[0]; i[1] = d[1]; i[2] = d[2]; i[3] = d[3]; } it successfully generate --- float_double(): vcvtdq2pd i(%rip), %ymm0 vmovapd %ymm0, d(%rip) vzeroupper ret fix_double(): vcvttpd2dqy d(%rip), %xmm0 vmovdqa %xmm0, i(%rip) ret l: -
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #4 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #3) > It turns out that a bunch of patterns have to be renamed (and testcases > added). > > Easyhack, waiting for someone to show some love to conversion patterns in > sse.md. I'll take a look.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #3 from Uroš Bizjak --- It turns out that a bunch of patterns have to be renamed (and testcases added). Easyhack, waiting for someone to show some love to conversion patterns in sse.md.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #2 from Uroš Bizjak --- (In reply to Richard Biener from comment #1) > ISTR I filed a duplicate 10 years ago or so. The issue is the vectorizer > could not handle V4DFmode -> V4SFmode conversions. > > Could, because for SVE we added the capability but this requires > additional instruction patterns (IIRC I filed a but about this last > year). Yep. PR92658 it is. Oh... yes. And it is even assigned to me. And there is a patch... ;) Anyway, I got surprised, since my soon-to-be committed v2sf-v2df conversion patch was able to fully vectorize similar testcase involving double[2] and float[2], while code involving [4] compiled to he mess below.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 Richard Biener changed: What|Removed |Added Version|unknown |11.0 Ever confirmed|0 |1 Last reconfirmed||2020-05-14 Target||x86_64-*-* i?86-*-* Keywords||missed-optimization Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener --- ISTR I filed a duplicate 10 years ago or so. The issue is the vectorizer could not handle V4DFmode -> V4SFmode conversions. Could, because for SVE we added the capability but this requires additional instruction patterns (IIRC I filed a but about this last year). Yep. PR92658 it is.