Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-28 Thread Timothy Gu
On Sun, Feb 14, 2016 at 03:45:11PM +0100, Henrik Gramner wrote: > You could try doing 8 or 16 bytes per iteration instead of 4, it might > be faster depending on how good your cpu is at OOE. As discussed on IRC, no observable difference has been observed with such changes, mainly because the

Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-14 Thread Henrik Gramner
You could try doing 8 or 16 bytes per iteration instead of 4, it might be faster depending on how good your cpu is at OOE. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-14 Thread Paul B Mahol
On 2/14/16, Timothy Gu wrote: > On Sat, Feb 13, 2016 at 07:21:25PM -0800, Timothy Gu wrote: >> --- >> libavfilter/x86/vf_blend.asm| 30 ++ >> libavfilter/x86/vf_blend_init.c | 2 ++ >> 2 files changed, 32 insertions(+) > > Locally added

Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-13 Thread Timothy Gu
On Sat, Feb 13, 2016 at 07:21:25PM -0800, Timothy Gu wrote: > --- > libavfilter/x86/vf_blend.asm| 30 ++ > libavfilter/x86/vf_blend_init.c | 2 ++ > 2 files changed, 32 insertions(+) Locally added commit message: 4.5x faster than C float version with

[FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-13 Thread Timothy Gu
--- libavfilter/x86/vf_blend.asm| 30 ++ libavfilter/x86/vf_blend_init.c | 2 ++ 2 files changed, 32 insertions(+) diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm index a5ea74c..303ea3a 100644 --- a/libavfilter/x86/vf_blend.asm +++