On Wed, Jun 8, 2016 at 11:34 PM, James Almer wrote:
> On 6/7/2016 6:18 AM, Muhammad Faiz wrote:
+sub lend, 2
>> +lea dstq, [dstq + 16]
>>> >
>>> > Use add
>>> >
>> +lea coeffsq, [coeffsq + 2*Coeffs.sizeof]
>>> >
>>> > Same, assuming sizeof is an i
On 6/7/2016 6:18 AM, Muhammad Faiz wrote:
>>> +sub lend, 2
>>> >> +lea dstq, [dstq + 16]
>> >
>> > Use add
>> >
>>> >> +lea coeffsq, [coeffsq + 2*Coeffs.sizeof]
>> >
>> > Same, assuming sizeof is an immediate.
>> >
> This is optimization to separate sub and jnz w
On Tue, Jun 7, 2016 at 4:18 PM, Muhammad Faiz wrote:
> On Tue, Jun 7, 2016 at 10:36 AM, James Almer wrote:
>> On 6/4/2016 4:36 AM, Muhammad Faiz wrote:
>>> benchmark on x86_64
>>> cqt_time:
>>> plain = 3.292 s
>>> SSE = 1.640 s
>>> SSE3 = 1.631 s
>>> AVX = 1.395 s
>>> FMA3 = 1.271 s
>>> FMA
On Tue, Jun 7, 2016 at 2:51 PM, Muhammad Faiz wrote:
> On Tue, Jun 7, 2016 at 9:49 AM, Michael Niedermayer
> wrote:
>> On Tue, Jun 07, 2016 at 08:07:45AM +0700, Muhammad Faiz wrote:
>>> On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote:
>>> > benchmark on x86_64
>>> > cqt_time:
>>> > plain = 3
On Tue, Jun 7, 2016 at 4:18 PM, Muhammad Faiz wrote:
> On Tue, Jun 7, 2016 at 10:36 AM, James Almer wrote:
>> On 6/4/2016 4:36 AM, Muhammad Faiz wrote:
>>> benchmark on x86_64
>>> cqt_time:
>>> plain = 3.292 s
>>> SSE = 1.640 s
>>> SSE3 = 1.631 s
>>> AVX = 1.395 s
>>> FMA3 = 1.271 s
>>> FMA
Le decadi 20 prairial, an CCXXIV, Muhammad Faiz a écrit :
> mkfifo in0.y4m
> mkfifo in1.y4m
> $build_path/ffmpeg $3 -i "$1" -filter_complex "showcqt, format=$2,
> format=yuv444p|yuv422p|yuv420p" -f yuv4mpegpipe -y in0.y4m
> 2>&1 ffmpeg $3 -i "$1" -filter_complex "showcqt, format=$2,
> format=yuv4
On Tue, Jun 7, 2016 at 10:36 AM, James Almer wrote:
> On 6/4/2016 4:36 AM, Muhammad Faiz wrote:
>> benchmark on x86_64
>> cqt_time:
>> plain = 3.292 s
>> SSE = 1.640 s
>> SSE3 = 1.631 s
>> AVX = 1.395 s
>> FMA3 = 1.271 s
>> FMA4 = not available
>
> Try using the START_TIMER and STOP_TIMER m
On Tue, Jun 7, 2016 at 9:49 AM, Michael Niedermayer
wrote:
> On Tue, Jun 07, 2016 at 08:07:45AM +0700, Muhammad Faiz wrote:
>> On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote:
>> > benchmark on x86_64
>> > cqt_time:
>> > plain = 3.292 s
>> > SSE = 1.640 s
>> > SSE3 = 1.631 s
>> > AVX = 1
On 6/4/2016 4:36 AM, Muhammad Faiz wrote:
> benchmark on x86_64
> cqt_time:
> plain = 3.292 s
> SSE = 1.640 s
> SSE3 = 1.631 s
> AVX = 1.395 s
> FMA3 = 1.271 s
> FMA4 = not available
Try using the START_TIMER and STOP_TIMER macros to wrap the s->cqt_calc
call in libavfilter/avf_showcqt.c
It
On 6/6/2016 11:49 PM, Michael Niedermayer wrote:
> On Tue, Jun 07, 2016 at 08:07:45AM +0700, Muhammad Faiz wrote:
>> On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote:
>>> benchmark on x86_64
>>> cqt_time:
>>> plain = 3.292 s
>>> SSE = 1.640 s
>>> SSE3 = 1.631 s
>>> AVX = 1.395 s
>>> FMA3
On Tue, Jun 07, 2016 at 08:07:45AM +0700, Muhammad Faiz wrote:
> On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote:
> > benchmark on x86_64
> > cqt_time:
> > plain = 3.292 s
> > SSE = 1.640 s
> > SSE3 = 1.631 s
> > AVX = 1.395 s
> > FMA3 = 1.271 s
> > FMA4 = not available
> >
> > untested
On Sat, Jun 4, 2016 at 2:36 PM, Muhammad Faiz wrote:
> benchmark on x86_64
> cqt_time:
> plain = 3.292 s
> SSE = 1.640 s
> SSE3 = 1.631 s
> AVX = 1.395 s
> FMA3 = 1.271 s
> FMA4 = not available
>
> untested on x86_32
>
> Signed-off-by: Muhammad Faiz
> ---
> libavfilter/avf_showcqt.c
benchmark on x86_64
cqt_time:
plain = 3.292 s
SSE = 1.640 s
SSE3 = 1.631 s
AVX = 1.395 s
FMA3 = 1.271 s
FMA4 = not available
untested on x86_32
Signed-off-by: Muhammad Faiz
---
libavfilter/avf_showcqt.c | 7 ++
libavfilter/avf_showcqt.h | 3 +
libavfilter/x86/Makefi
13 matches
Mail list logo