2014-02-07 Janne Grunau janne-li...@jannau.net:
Do you have someone who's keen to review x86 asm?
Actually I got a review from Loren, who pointed the following
improvements/fixes:
- the SSE yasm function was using a SSE2 instruction, which is corrected
- the macro parameter was the number of SSE
Hi,
here's an updated version also taking into account the changes
suggested for first patch of the series.
--
Christophe
From 87983deb56aa52c2cdcfbf248dd76bccb97d694a Mon Sep 17 00:00:00 2001
From: Christophe Gisquet christophe.gisq...@gmail.com
Date: Fri, 11 May 2012 11:25:30 +0200
Subject:
On 2014-02-07 21:57:08 +0100, Christophe Gisquet wrote:
From 87983deb56aa52c2cdcfbf248dd76bccb97d694a Mon Sep 17 00:00:00 2001
From: Christophe Gisquet christophe.gisq...@gmail.com
Date: Fri, 11 May 2012 11:25:30 +0200
Subject: [PATCH 02/10] x86: dcadsp: implement int8x8_fmul_int32
For the
On 2014-02-06 00:40:51 +, Christophe Gisquet wrote:
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 2926
Win64: 30 33 2523
The SSE version is neither compiled nor set for 64bits.
That are cpu cycles?
When the proper
2014-02-06 Janne Grunau janne-li...@jannau.net:
On 2014-02-06 00:40:51 +, Christophe Gisquet wrote:
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 2926
Win64: 30 33 2523
The SSE version is neither compiled nor set for
On Thu, Feb 06, 2014 at 12:40:51AM +, Christophe Gisquet wrote:
--- /dev/null
+++ b/libavcodec/x86/dca.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2012 Christophe Gisquet christophe.gisq...@gmail.com
Happy new year?
+#if HAVE_SSE2_INLINE
+# include libavutil/x86/asm.h
+# include
On 2014-02-06 16:08:29 +0100, Christophe Gisquet wrote:
2014-02-06 Janne Grunau janne-li...@jannau.net:
On 2014-02-06 00:40:51 +, Christophe Gisquet wrote:
Yes, as long as this header is included before dcadsp.h.
This will be rewritten anyway, following your proposal.
just because
On 2014-02-06 16:21:49 +0100, Diego Biurrun wrote:
On Thu, Feb 06, 2014 at 12:40:51AM +, Christophe Gisquet wrote:
--- /dev/null
+++ b/libavcodec/x86/dca.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2012 Christophe Gisquet christophe.gisq...@gmail.com
Happy new year?
+#if
2014-02-06 Janne Grunau janne-li...@jannau.net:
The function is very short so the function call overhead becomes
significant. 34 vs. 39 cycles on a cortex-a9, i.e. the inline version
is over 10% faster.
Yes, arm also does the same with reason.
Same overhead (10%?) probably the same for
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 2926
Win64: 30 33 2523
The SSE version is neither compiled nor set for 64bits.
When the proper compile macros are set (e.g. ARCH_X86_64 or HAVE_SSEx),
the macro reverts to use the
10 matches
Mail list logo