On Sat, Mar 2, 2024 at 10:13 PM Kieran Kunhya wrote:
> SPLATB_LOAD m0, r0+r1*0-1, m2
> SPLATB_LOAD m1, r0+r1*1-1, m2
This adds an extra unnecessary shuffle in the SSE2 code as it splats
to a full register. The easiest way of fixing it would probably be to
unroll the macro and manually
On Sat, 2 Mar 2024 at 21:18, Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:
> Kieran Kunhya:
> > $subj
> >
> > Old:
> > pred8x8_horizontal_8_c: 6.8
> > pred8x8_horizontal_8_mmxext: 8.6
> > pred8x8_horizontal_8_ssse3: 4.8
> >
> > New:
> > pred8x8_horizontal_8_c: 9.2
> >
On 3/2/2024 6:20 PM, Andreas Rheinhardt wrote:
Kieran Kunhya:
$subj
Old:
pred8x8_horizontal_8_c: 6.8
pred8x8_horizontal_8_mmxext: 8.6
pred8x8_horizontal_8_ssse3: 4.8
New:
pred8x8_horizontal_8_c: 9.2
pred8x8_horizontal_8_sse2: 12.2
pred8x8_horizontal_8_ssse3: 4.9
You do realize that the
Kieran Kunhya:
> $subj
>
> Old:
> pred8x8_horizontal_8_c: 6.8
> pred8x8_horizontal_8_mmxext: 8.6
> pred8x8_horizontal_8_ssse3: 4.8
>
> New:
> pred8x8_horizontal_8_c: 9.2
> pred8x8_horizontal_8_sse2: 12.2
> pred8x8_horizontal_8_ssse3: 4.9
>
You do realize that the SSE2 version is worse than the
$subj
Old:
pred8x8_horizontal_8_c: 6.8
pred8x8_horizontal_8_mmxext: 8.6
pred8x8_horizontal_8_ssse3: 4.8
New:
pred8x8_horizontal_8_c: 9.2
pred8x8_horizontal_8_sse2: 12.2
pred8x8_horizontal_8_ssse3: 4.9
0001-libavcodec-h264pred-Remove-pred8x8_horizontal_8_mmxe.patch
Description: Binary data