Here are some extra implementations that extend Christophe's work.

The first one (SSE) is only for x86_32 targets as x86_64 guarantees SSE2 is 
available.

Second patch is an AVX implementation using ymm registers.
In my tests it was about 30 cycles faster than SSE2 on a Sandy Bridge CPU.

I don't have proper numbers for the third patch since i could only test on an 
AMD 
rig, where functions using ymm registers tend to have subpar performance.
It still beat the AVX version by a decent marging, though, so Haswell should 
see 
a nice boost with it.

I could add an FMA4 version using xmm registers, which would benefit AMD users 
unlike these AVX/FMA3 ymm ones. Thoughts?

James Almer (3):
  x86/synth_filter: add synth_filter_fma3
  x86/synth_filter: add synth_filter_sse
  x86/synth_filter: add synth_filter_avx

 libavcodec/x86/dcadsp.asm    | 109 ++++++++++++++++++++++++++++---------------
 libavcodec/x86/dcadsp_init.c |  52 ++++++++++++++-------
 2 files changed, 107 insertions(+), 54 deletions(-)

-- 
1.8.3.2

_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to