Re: [FFmpeg-devel] [PATCH 6/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_{10, 12}_{sse2, avx2}

2015-02-04 Thread Mickaƫl Raulet
LGTM. Mickael 2015-02-04 13:51 GMT+01:00 Christophe Gisquet : > Hi, > > 2015-02-04 4:55 GMT+01:00 James Almer : > > > -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1)= { > 0x0001000100010001ULL, 0x0001000100010001ULL }; > > -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_2)= { > 0x0002000200020

Re: [FFmpeg-devel] [PATCH 6/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_{10, 12}_{sse2, avx2}

2015-02-04 Thread Christophe Gisquet
Hi, 2015-02-04 4:55 GMT+01:00 James Almer : > -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1)= { 0x0001000100010001ULL, > 0x0001000100010001ULL }; > -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_2)= { 0x0002000200020002ULL, > 0x0002000200020002ULL }; > +DECLARE_ALIGNED(32, const ymm_reg, ff

[FFmpeg-devel] [PATCH 6/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_{10, 12}_{sse2, avx2}

2015-02-03 Thread James Almer
Original x86 intrinsics code by Pierre-Edouard Lepere. Yasm port by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips 29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 s