Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-08 Thread Muhammad Faiz
On Wed, Jun 8, 2016 at 11:34 PM, James Almer wrote: > On 6/7/2016 6:18 AM, Muhammad Faiz wrote: +sub lend, 2 >> +lea dstq, [dstq + 16] >>> > >>> > Use add >>> > >> +lea coeffsq, [coeffsq + 2*Coeffs.sizeof] >>> > >>> > Same, assuming sizeof is an i

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-08 Thread James Almer
On 6/7/2016 6:18 AM, Muhammad Faiz wrote: >>> +sub lend, 2 >>> >> +lea dstq, [dstq + 16] >> > >> > Use add >> > >>> >> +lea coeffsq, [coeffsq + 2*Coeffs.sizeof] >> > >> > Same, assuming sizeof is an immediate. >> > > This is optimization to separate sub and jnz w

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-08 Thread Muhammad Faiz
On Tue, Jun 7, 2016 at 4:18 PM, Muhammad Faiz wrote: > On Tue, Jun 7, 2016 at 10:36 AM, James Almer wrote: >> On 6/4/2016 4:36 AM, Muhammad Faiz wrote: >>> benchmark on x86_64 >>> cqt_time: >>> plain = 3.292 s >>> SSE = 1.640 s >>> SSE3 = 1.631 s >>> AVX = 1.395 s >>> FMA3 = 1.271 s >>> FMA

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-08 Thread Muhammad Faiz
On Tue, Jun 7, 2016 at 2:51 PM, Muhammad Faiz wrote: > On Tue, Jun 7, 2016 at 9:49 AM, Michael Niedermayer > wrote: >> On Tue, Jun 07, 2016 at 08:07:45AM +0700, Muhammad Faiz wrote: >>> On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote: >>> > benchmark on x86_64 >>> > cqt_time: >>> > plain = 3

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-07 Thread Muhammad Faiz
On Tue, Jun 7, 2016 at 4:18 PM, Muhammad Faiz wrote: > On Tue, Jun 7, 2016 at 10:36 AM, James Almer wrote: >> On 6/4/2016 4:36 AM, Muhammad Faiz wrote: >>> benchmark on x86_64 >>> cqt_time: >>> plain = 3.292 s >>> SSE = 1.640 s >>> SSE3 = 1.631 s >>> AVX = 1.395 s >>> FMA3 = 1.271 s >>> FMA

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-07 Thread Nicolas George
Le decadi 20 prairial, an CCXXIV, Muhammad Faiz a écrit : > mkfifo in0.y4m > mkfifo in1.y4m > $build_path/ffmpeg $3 -i "$1" -filter_complex "showcqt, format=$2, > format=yuv444p|yuv422p|yuv420p" -f yuv4mpegpipe -y in0.y4m > 2>&1 ffmpeg $3 -i "$1" -filter_complex "showcqt, format=$2, > format=yuv4

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-07 Thread Muhammad Faiz
On Tue, Jun 7, 2016 at 10:36 AM, James Almer wrote: > On 6/4/2016 4:36 AM, Muhammad Faiz wrote: >> benchmark on x86_64 >> cqt_time: >> plain = 3.292 s >> SSE = 1.640 s >> SSE3 = 1.631 s >> AVX = 1.395 s >> FMA3 = 1.271 s >> FMA4 = not available > > Try using the START_TIMER and STOP_TIMER m

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-07 Thread Muhammad Faiz
On Tue, Jun 7, 2016 at 9:49 AM, Michael Niedermayer wrote: > On Tue, Jun 07, 2016 at 08:07:45AM +0700, Muhammad Faiz wrote: >> On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote: >> > benchmark on x86_64 >> > cqt_time: >> > plain = 3.292 s >> > SSE = 1.640 s >> > SSE3 = 1.631 s >> > AVX = 1

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-06 Thread James Almer
On 6/4/2016 4:36 AM, Muhammad Faiz wrote: > benchmark on x86_64 > cqt_time: > plain = 3.292 s > SSE = 1.640 s > SSE3 = 1.631 s > AVX = 1.395 s > FMA3 = 1.271 s > FMA4 = not available Try using the START_TIMER and STOP_TIMER macros to wrap the s->cqt_calc call in libavfilter/avf_showcqt.c It

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-06 Thread James Almer
On 6/6/2016 11:49 PM, Michael Niedermayer wrote: > On Tue, Jun 07, 2016 at 08:07:45AM +0700, Muhammad Faiz wrote: >> On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote: >>> benchmark on x86_64 >>> cqt_time: >>> plain = 3.292 s >>> SSE = 1.640 s >>> SSE3 = 1.631 s >>> AVX = 1.395 s >>> FMA3

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-06 Thread Michael Niedermayer
On Tue, Jun 07, 2016 at 08:07:45AM +0700, Muhammad Faiz wrote: > On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote: > > benchmark on x86_64 > > cqt_time: > > plain = 3.292 s > > SSE = 1.640 s > > SSE3 = 1.631 s > > AVX = 1.395 s > > FMA3 = 1.271 s > > FMA4 = not available > > > > untested

Re: [FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-06 Thread Muhammad Faiz
On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote: > benchmark on x86_64 > cqt_time: > plain = 3.292 s > SSE = 1.640 s > SSE3 = 1.631 s > AVX = 1.395 s > FMA3 = 1.271 s > FMA4 = not available > > untested on x86_32 > > Signed-off-by: Muhammad Faiz > --- > libavfilter/avf_showcqt.c

[FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

2016-06-04 Thread Muhammad Faiz
benchmark on x86_64 cqt_time: plain = 3.292 s SSE = 1.640 s SSE3 = 1.631 s AVX = 1.395 s FMA3 = 1.271 s FMA4 = not available untested on x86_32 Signed-off-by: Muhammad Faiz --- libavfilter/avf_showcqt.c | 7 ++ libavfilter/avf_showcqt.h | 3 + libavfilter/x86/Makefi