[Bug target/95125] Unoptimal code for vectorized conversions

2021-08-02 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #11 from Hongtao.liu  ---
(In reply to Andrew Pinski from comment #10)
> float_double and fix_double don't produce the best code yet.

It's because loop vectorizer can only use one vector size, since BB vect
supports different vector sizes in the same instance, w/ "-O2
-ftree-slp-vectorize -march=skylake-avx512 -funroll-loops" produce optimal
codes, this is related to PR101097.

[Bug target/95125] Unoptimal code for vectorized conversions

2021-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #10 from Andrew Pinski  ---
float_double and fix_double don't produce the best code yet.

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-24 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #9 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:94c0409717bf8bf783963c1d50bb8f4a4732dce7

commit r11-596-g94c0409717bf8bf783963c1d50bb8f4a4732dce7
Author: liuhongt 
Date:   Sat May 23 15:30:58 2020 +0800

Add missing expander for vector float_extend and float_truncate.

2020-05-25  Hongtao Liu  

gcc/ChangeLog
PR target/95125
* config/i386/sse.md (sf2dfmode_lower): New mode attribute.
(trunc2) New expander.
(extend2): Ditto.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr95125-avx.c: New test.
* gcc.target/i386/pr95125-avx512f.c: Ditto.

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-22 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #8 from rsandifo at gcc dot gnu.org  
---
(In reply to Richard Biener from comment #7)
> (In reply to Uroš Bizjak from comment #6)
> > (In reply to Hongtao.liu from comment #5)
> > > (In reply to Uroš Bizjak from comment #3)
> > > > It turns out that a bunch of patterns have to be renamed (and testcases
> > > > added).
> > > > 
> > > > Easyhack, waiting for someone to show some love to conversion patterns 
> > > > in
> > > > sse.md.
> > > 
> > > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
> > > 
> > > if change **float_double fix_double** to
> > > ---
> > > void
> > > float_double (void)
> > > {
> > > d[0] = i[0];
> > > d[1] = i[1];
> > > d[2] = i[2];
> > > d[3] = i[3];
> > > }
> > 
> > Hm, the above is vectorized, but the equivalent:
> > 
> > void
> > float_double (void)
> > {
> >   for (int n = 0; n < 4; n++)
> > d[n] = i[n];
> > }
> > 
> > is not?
> 
> Yes, we're committing to a too high VF here, likely because we pick the
> "wrong" vector mode too early.  We could eventually fix this up in
> the early vectype analysis.
It might be worth investigating VECT_COMPARE_COSTS, which weighs
the cost of different VFs against each other and is how SVE copes
with this.  I guess the danger is that it might interfere with
-mprefer-* options (although the first VF listed by
autovectorize_vector_modes wins in a tie).

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #7 from Richard Biener  ---
(In reply to Uroš Bizjak from comment #6)
> (In reply to Hongtao.liu from comment #5)
> > (In reply to Uroš Bizjak from comment #3)
> > > It turns out that a bunch of patterns have to be renamed (and testcases
> > > added).
> > > 
> > > Easyhack, waiting for someone to show some love to conversion patterns in
> > > sse.md.
> > 
> > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
> > 
> > if change **float_double fix_double** to
> > ---
> > void
> > float_double (void)
> > {
> > d[0] = i[0];
> > d[1] = i[1];
> > d[2] = i[2];
> > d[3] = i[3];
> > }
> 
> Hm, the above is vectorized, but the equivalent:
> 
> void
> float_double (void)
> {
>   for (int n = 0; n < 4; n++)
> d[n] = i[n];
> }
> 
> is not?

Yes, we're committing to a too high VF here, likely because we pick the
"wrong" vector mode too early.  We could eventually fix this up in
the early vectype analysis.

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-22 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

Uroš Bizjak  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #6 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #5)
> (In reply to Uroš Bizjak from comment #3)
> > It turns out that a bunch of patterns have to be renamed (and testcases
> > added).
> > 
> > Easyhack, waiting for someone to show some love to conversion patterns in
> > sse.md.
> 
> expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
> 
> if change **float_double fix_double** to
> ---
> void
> float_double (void)
> {
> d[0] = i[0];
> d[1] = i[1];
> d[2] = i[2];
> d[3] = i[3];
> }

Hm, the above is vectorized, but the equivalent:

void
float_double (void)
{
  for (int n = 0; n < 4; n++)
d[n] = i[n];
}

is not?

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-22 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #5 from Hongtao.liu  ---
(In reply to Uroš Bizjak from comment #3)
> It turns out that a bunch of patterns have to be renamed (and testcases
> added).
> 
> Easyhack, waiting for someone to show some love to conversion patterns in
> sse.md.

expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.

if change **float_double fix_double** to
---
void
float_double (void)
{
d[0] = i[0];
d[1] = i[1];
d[2] = i[2];
d[3] = i[3];
}

void
fix_double (void)
{
i[0] = d[0];
i[1] = d[1];
i[2] = d[2];
i[3] = d[3];
}


it successfully generate

---
float_double():
vcvtdq2pd   i(%rip), %ymm0
vmovapd %ymm0, d(%rip)
vzeroupper
ret
fix_double():
vcvttpd2dqy d(%rip), %xmm0
vmovdqa %xmm0, i(%rip)
ret
l:
-

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-21 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #4 from Hongtao.liu  ---
(In reply to Uroš Bizjak from comment #3)
> It turns out that a bunch of patterns have to be renamed (and testcases
> added).
> 
> Easyhack, waiting for someone to show some love to conversion patterns in
> sse.md.

I'll take a look.

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-14 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #3 from Uroš Bizjak  ---
It turns out that a bunch of patterns have to be renamed (and testcases added).

Easyhack, waiting for someone to show some love to conversion patterns in
sse.md.

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-14 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #2 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #1)
> ISTR I filed a duplicate 10 years ago or so.  The issue is the vectorizer
> could not handle V4DFmode -> V4SFmode conversions.
> 
> Could, because for SVE we added the capability but this requires
> additional instruction patterns (IIRC I filed a but about this last
> year).  Yep.  PR92658 it is.

Oh... yes. And it is even assigned to me. And there is a patch... ;)

Anyway, I got surprised, since my soon-to-be committed v2sf-v2df conversion
patch was able to fully vectorize similar testcase involving double[2] and
float[2], while code involving [4] compiled to he mess below.

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

Richard Biener  changed:

   What|Removed |Added

Version|unknown |11.0
 Ever confirmed|0   |1
   Last reconfirmed||2020-05-14
 Target||x86_64-*-* i?86-*-*
   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
ISTR I filed a duplicate 10 years ago or so.  The issue is the vectorizer
could not handle V4DFmode -> V4SFmode conversions.

Could, because for SVE we added the capability but this requires
additional instruction patterns (IIRC I filed a but about this last
year).  Yep.  PR92658 it is.