On Fri, Sep 2, 2022 at 7:55 AM Lynne wrote:
> +movd xmm4, strided
> +neg t2d
> +movd xmm5, t2d
> +SPLATD xmm4
> +SPLATD xmm5
> +vperm2f128 m4, m4, m4, 0x00 ; +stride splatted
> +vperm2f128 m5, m5, m5, 0x00 ; -stride splatted
movd xm4, strided
pxor m5, m5
Sep 2, 2022, 07:49 by d...@lynne.ee:
> Version 2 notes: halved the amount of loads and loops for the
> pre-transform loop by exploiting the symmetry.
>
> This commit implements an iMDCT in pure assembly.
>
> This is capable of processing any mod-8 transforms, rather than just
> power of two, but
Version 2 notes: halved the amount of loads and loops for the
pre-transform loop by exploiting the symmetry.
This commit implements an iMDCT in pure assembly.
This is capable of processing any mod-8 transforms, rather than just
power of two, but since power of two is all we have assembly for