Re: [FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly

2022-09-02 Thread Henrik Gramner
On Fri, Sep 2, 2022 at 7:55 AM Lynne wrote: > +movd xmm4, strided > +neg t2d > +movd xmm5, t2d > +SPLATD xmm4 > +SPLATD xmm5 > +vperm2f128 m4, m4, m4, 0x00 ; +stride splatted > +vperm2f128 m5, m5, m5, 0x00 ; -stride splatted movd xm4, strided pxor m5, m5

Re: [FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly

2022-09-01 Thread Lynne
Sep 2, 2022, 07:49 by d...@lynne.ee: > Version 2 notes: halved the amount of loads and loops for the > pre-transform loop by exploiting the symmetry. > > This commit implements an iMDCT in pure assembly. > > This is capable of processing any mod-8 transforms, rather than just > power of two, but

[FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly

2022-09-01 Thread Lynne
Version 2 notes: halved the amount of loads and loops for the pre-transform loop by exploiting the symmetry. This commit implements an iMDCT in pure assembly. This is capable of processing any mod-8 transforms, rather than just power of two, but since power of two is all we have assembly for