Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-07 Thread Christophe Gisquet
2014-02-07 Janne Grunau janne-li...@jannau.net: Do you have someone who's keen to review x86 asm? Actually I got a review from Loren, who pointed the following improvements/fixes: - the SSE yasm function was using a SSE2 instruction, which is corrected - the macro parameter was the number of SSE

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-07 Thread Christophe Gisquet
Hi, here's an updated version also taking into account the changes suggested for first patch of the series. -- Christophe From 87983deb56aa52c2cdcfbf248dd76bccb97d694a Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Fri, 11 May 2012 11:25:30 +0200 Subject:

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-07 Thread Janne Grunau
On 2014-02-07 21:57:08 +0100, Christophe Gisquet wrote: From 87983deb56aa52c2cdcfbf248dd76bccb97d694a Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Fri, 11 May 2012 11:25:30 +0200 Subject: [PATCH 02/10] x86: dcadsp: implement int8x8_fmul_int32 For the

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-06 Thread Janne Grunau
On 2014-02-06 00:40:51 +, Christophe Gisquet wrote: For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 2926 Win64: 30 33 2523 The SSE version is neither compiled nor set for 64bits. That are cpu cycles? When the proper

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-06 Thread Christophe Gisquet
2014-02-06 Janne Grunau janne-li...@jannau.net: On 2014-02-06 00:40:51 +, Christophe Gisquet wrote: For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 2926 Win64: 30 33 2523 The SSE version is neither compiled nor set for

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-06 Thread Diego Biurrun
On Thu, Feb 06, 2014 at 12:40:51AM +, Christophe Gisquet wrote: --- /dev/null +++ b/libavcodec/x86/dca.h @@ -0,0 +1,56 @@ +/* + * Copyright (c) 2012 Christophe Gisquet christophe.gisq...@gmail.com Happy new year? +#if HAVE_SSE2_INLINE +# include libavutil/x86/asm.h +# include

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-06 Thread Janne Grunau
On 2014-02-06 16:08:29 +0100, Christophe Gisquet wrote: 2014-02-06 Janne Grunau janne-li...@jannau.net: On 2014-02-06 00:40:51 +, Christophe Gisquet wrote: Yes, as long as this header is included before dcadsp.h. This will be rewritten anyway, following your proposal. just because

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-06 Thread Janne Grunau
On 2014-02-06 16:21:49 +0100, Diego Biurrun wrote: On Thu, Feb 06, 2014 at 12:40:51AM +, Christophe Gisquet wrote: --- /dev/null +++ b/libavcodec/x86/dca.h @@ -0,0 +1,56 @@ +/* + * Copyright (c) 2012 Christophe Gisquet christophe.gisq...@gmail.com Happy new year? +#if

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-06 Thread Christophe Gisquet
2014-02-06 Janne Grunau janne-li...@jannau.net: The function is very short so the function call overhead becomes significant. 34 vs. 39 cycles on a cortex-a9, i.e. the inline version is over 10% faster. Yes, arm also does the same with reason. Same overhead (10%?) probably the same for

[libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

2014-02-05 Thread Christophe Gisquet
For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 2926 Win64: 30 33 2523 The SSE version is neither compiled nor set for 64bits. When the proper compile macros are set (e.g. ARCH_X86_64 or HAVE_SSEx), the macro reverts to use the