[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2024-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 Richard Biener changed: What|Removed |Added Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2024-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #10 from Richard Biener --- So this is now fixed if you use --param vect-partial-vector-usage=2, there is at the moment no way to get masking/not masking costed against each other. In theory vect_analyze_loop_costing and

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-06-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-06-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #9 from rguenther at suse dot de --- On Tue, 13 Jun 2023, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 > > --- Comment #8 from Hongtao.liu --- > > > Can x86 do this? We'd want to apply

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-06-12 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #8 from Hongtao.liu --- > Can x86 do this? We'd want to apply this to a scalar, so move ivtmp > to xmm, apply pack_usat or as you say below, the non-existing us_trunc > and then broadcast. I see, we don't have scalar version.

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-06-12 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #7 from rguenther at suse dot de --- On Mon, 12 Jun 2023, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 > > --- Comment #6 from Hongtao.liu --- > > > and the key thing to optimize is > >

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-06-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #6 from Hongtao.liu --- > and the key thing to optimize is > > ivtmp_78 = ivtmp_77 + 4294967232; // -64 > _79 = MIN_EXPR ; > _80 = (unsigned char) _79; > _81 = {_80, _80, _80, _80, _80, _80, _80, _80, _80, _80, _80, _80,

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-06-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #5 from Richard Biener --- Btw, for the case we can use the same mask compare type as we use as type for the IV (so we know we can represent all required values) we can elide the saturation. So for example void foo (double *

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #4 from Richard Biener --- Adding fully masked AVX512 and AVX512 with a masked epilog data: size scalar 128 256 512512e512f 19.42 11.329.35 11.17 15.13 16.89 25.726.536.66

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-01-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #3 from Richard Biener --- the naiive "bad" code-gen produces size 512-masked 212.19 4 6.09 6 4.06 8 3.04 12 2.03 14 1.52 16 1.21 20 1.01 24 0.87 32 0.76 34 0.71 38

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-01-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #2 from Richard Biener --- The naiive masked epilogue (--param vect-partial-vector-usage=1 and support for whilesiult as in a prototype I have) then looks like leal-1(%rdx), %eax cmpl$62, %eax jbe

[Bug middle-end/108410] x264 averaging loop not optimized well for avx512

2023-01-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 Richard Biener changed: What|Removed |Added Blocks||53947 Last reconfirmed|