On Sat, May 21, 2011 at 6:22 PM, Matt Turner matts...@gmail.com wrote:
In the mean time, I've pushed my patches to the avx-optimizations
branch of git://anongit.freedesktop.org/~mattst88/pixman. I also added
an AVX path for composite_src_x888_.
Attached are the cairo-perf-trace results
On Sat, Jul 23, 2011 at 10:28 PM, Siarhei Siamashka
siarhei.siamas...@gmail.com wrote:
The 'test1' function does not look good because it uses ARM
instructions to read data one byte at a time and combine it. Function
'test2' looks a bit better because it now uses WALIGNR, but this is
still not
This will make upcoming ARM usage of pixman-mmx.c unambiguous.
Signed-off-by: Matt Turner matts...@gmail.com
---
configure.ac|8
pixman/Makefile.am |2 +-
pixman/Makefile.win32 |2 +-
pixman/pixman-cpu.c |6 +++---
pixman/pixman-mmx.c |4
Adding iwmmxt inline assembly doesn't help performance, so don't bother.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 13 +++--
1 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 6a68080..7673675
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 112 ---
1 files changed, 79 insertions(+), 33 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 7673675..471c51d 100644
--- a/pixman/pixman-mmx.c
Signed-off-by: Matt Turner matts...@gmail.com
---
I left the gcc check at 4.6 in the hopes that my gcc patch will be
accepted for some gcc-4.6 release. If it doesn't, the check for 4.6
shouldn't be harmful, because the intrinsic I chose to test is known
to cause an unpatched gcc-4.6 to assert
On Wed, Aug 31, 2011 at 8:12 AM, Soeren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
I've been trying to figure out if the ARM iwmmxt inline assembly makes
any difference at all. I think the conclusion is that it does not.
Updated code is here:
http
On Wed, Aug 31, 2011 at 7:37 PM, Soeren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
Never does using inline assembly seem to make any sort of meaningful
difference over simply compiling pixman-mmx.c for ARM/iwmmxt. I tried
checking the alignment in the 'wip
Signed-off-by: Matt Turner matts...@gmail.com
---
test/lowlevel-blt-bench.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 099e434..bdafb35 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
b8r8g8 is apparently no longer supported sometime since this code was
commented.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c |4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 697ec4c..f835a14
gcc isn't able to see that w is no greater than 1, so it generates
unnecessary loop instructions with while (w).
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 10 +-
1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman
On Fri, Sep 23, 2011 at 1:21 AM, Taekyun Kim podai...@gmail.com wrote:
Though it would be optimized out by the compiler,
how about removing also w-- and dst++?
Good idea. Updated patch sent.
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
Following this email is a series of eight patches which add
the ability to compile pixman's pixman-mmx.c on ARM in order
to use the iwMMXt SIMD instruction set. The purpose of this
work is to improve the compositing performance of the OLPC
XO 1.75.
Care has been taken to ensure that each commit
This will make upcoming ARM usage of pixman-mmx.c unambiguous.
Signed-off-by: Matt Turner matts...@gmail.com
---
configure.ac|8
pixman/Makefile.am |2 +-
pixman/Makefile.win32 |2 +-
pixman/pixman-cpu.c |6 +++---
pixman/pixman-mmx.c |4
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c |8
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index eca6d25..8782d89 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -1784,7 +1784,7
-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 185 +++---
1 files changed, 129 insertions(+), 56 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 8782d89..0317b9a 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman
after NEON, since we expect the NEON
optimizations to be more capable and faster than iwmmxt.
Signed-off-by: Matt Turner matts...@gmail.com
---
configure.ac| 48 +++
pixman/Makefile.am | 12 +++
pixman/pixman-cpu.c
On Mon, Sep 26, 2011 at 4:09 AM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
b8r8g8 is apparently no longer supported sometime since this code was
commented.
All three patches look good to me.
If you were benchmarking the over_x888_8_() operation
on these platforms.
So, just #error out in the test if the __arm__ preprocessor directive
isn't defined.
Fixes https://bugs.gentoo.org/show_bug.cgi?id=385179
Signed-off-by: Matt Turner matts...@gmail.com
---
configure.ac |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git
On Thu, Oct 6, 2011 at 7:44 AM, Toms Miks barvins.t...@gmail.com wrote:
accepting -march=iwmmxt on x86 is a bug in GCC
-march just tells gcc for what processor it has to generate the code. It
does not matter what processor the computer actually has. I mean, it's ok to
compile code for ARM on
On Thu, Oct 6, 2011 at 7:15 AM, Søren Sandmann sandm...@cs.au.dk wrote:
#if defined(__GNUC__) (__GNUC__ 4 || (__GNUC__ == 3 __GNUC_MINOR__
6))
#error Need GCC = 4.6 for IWMMXT intrinsics
#endif
where __GNUC__ == 3 should be __GNUC__ == 4?
Yes, looks like a copy-n-pasto from
On Thu, Oct 6, 2011 at 12:14 PM, Matt Turner matts...@gmail.com wrote:
On Thu, Oct 6, 2011 at 7:44 AM, Toms Miks barvins.t...@gmail.com wrote:
accepting -march=iwmmxt on x86 is a bug in GCC
-march just tells gcc for what processor it has to generate the code. It
does not matter what processor
Although not part of the original MMX instruction set, both SSE and
AMD's Extended 3DNow! both provide the pshufw instruction.
ARM iwMMXt also has an equivalent instruction, as do the Loongson
Multimedia Instructions.
We can simplify the expand_alpha, expand_alpha_rev, and invert_colors
Missed in commit 57fd8c37.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c |4 ++--
pixman/pixman-sse2.c |2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 937ce8f..82f6b54 100644
--- a/pixman
731d .libs/libpixman_mmx_la-pixman-mmx.o
arm
textdata bss dec hex filename
316321792 0 334248290 .libs/libpixman_iwmmxt_la-pixman-mmx.o
301761792 0 319687ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o
Signed-off-by: Matt Turner matts...@gmail.com
The pshufw x86 instruction is part of Extended 3DNow! and SSE1. The
equivalent ARM wshufh instruction was available from the first iwMMXt
instrucion set.
This instruction is already used in the SSE2 code.
Reduces code size by ~9%.
amd64
textdata bss dec hex filename
29925
Missed in commit 57fd8c37.
Signed-off-by: Matt Turner matts...@gmail.com
---
v2: found a few more. I really think that's the last of them.
pixman/pixman-mmx.c | 14 +++---
pixman/pixman-sse2.c |2 +-
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/pixman/pixman
On Sun, Feb 19, 2012 at 4:59 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
The current runtime test checked that MMX extensions were available
before executing code in pixman-mmx.c, even though no MMX extensions
were used.
The new --{enable,disable
On Sun, Feb 19, 2012 at 5:23 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
Are there any interesting chips without MMX extensions? Pentium MMX
definitely is not interesting. If there aren't, I'd rather just get rid
of the ifdefs and unconditionally require
On Sun, Feb 19, 2012 at 6:15 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
On Sun, Feb 19, 2012 at 5:23 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
Are there any interesting chips without MMX extensions? Pentium MMX
731d .libs/libpixman_mmx_la-pixman-mmx.o
arm
textdata bss dec hex filename
316321792 0 334248290 .libs/libpixman_iwmmxt_la-pixman-mmx.o
301761792 0 319687ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o
Signed-off-by: Matt Turner matts...@gmail.com
On Sun, Feb 19, 2012 at 6:41 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
+/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
+ * instructions to be generated that we don't want. Just duplicate the
+ * functions we want to use
Signed-off-by: Matt Turner matts...@gmail.com
---
configure.ac |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/configure.ac b/configure.ac
index 4f8a0c5..ae092e5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -297,6 +297,9 @@ error Need GCC = 3.4 for MMX intrinsics
687f .libs/libpixman_mmx_la-pixman-mmx.o
arm
textdata bss dec hex filename
301761792 0 319687ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o
273841792 0 2917671f8 .libs/libpixman_iwmmxt_la-pixman-mmx.o
Signed-off-by: Matt Turner matts...@gmail.com
before: over_x888_8_ = L1: 7.63 L2: 7.72 M: 6.44 ( 19.17%) HT:
6.24 VT: 6.11 R: 5.87 RT: 4.61 ( 51Kops/s)
after : over_x888_8_ = L1: 11.88 L2: 11.11 M: 8.70 ( 26.01%) HT:
8.15 VT: 8.07 R: 7.76 RT: 5.62 ( 61Kops/s)
Signed-off-by: Matt Turner matts
needs SSE or 3DNowA
- we can't use -msse, since it'll cause more SSE instructions to be
generated that we don't want
- there is no -m3dnowa flag (anymore?)
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 20 ++--
1 files changed
needs SSE or 3DNowA
- we can't use -msse, since it'll cause more SSE instructions to be
generated that we don't want
- there is no -m3dnowa flag (anymore?)
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 55
On Tue, Feb 21, 2012 at 12:24 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
v4: use inline assembly since the intrinsic needs SSE or 3DNowA
- we can't use -msse, since it'll cause more SSE instructions to be
generated that we don't want
On Tue, Feb 21, 2012 at 1:05 PM, Siarhei Siamashka
siarhei.siamas...@gmail.com wrote:
On Tue, Feb 21, 2012 at 4:59 PM, Nemanja Lukic nlu...@mips.com wrote:
Per previous code review:
Run time detection is still there (per Siarhei's comments), uses
/proc/cpuinfo,
but now properly detects
On Wed, Feb 22, 2012 at 2:27 AM, Jeremy Huddleston
jerem...@freedesktop.org wrote:
I just got my tinderbox back up today and noticed this build regression in
pixman. I haven't looked into it yet, but git-log blames Matt ;)
http://tinderbox.x.org/builds/2012-02-22-0001/logs/pixman/#build
)
over_x888_8_ = L1: 263.69 L2: 260.84 M:247.48 ( 15.65%)
HT:197.37 VT:166.21 R:144.30 RT: 72.07 ( 859Kops/s)
And for good measure, SSE2:
over_x888_8_ = L1: 747.44 L2: 735.23 M:635.67 ( 40.42%)
HT:308.51 VT:230.03 R:198.13 RT: 86.94 (1004Kops/s)
Both patches are Reviewed-by: Matt
Signed-off-by: Matt Turner matts...@gmail.com
---
.gitignore | 23 +++
1 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/.gitignore b/.gitignore
index 1584064..60b5bb4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -26,25 +26,48 @@ stamp-h?
config.h
config.h.in
The load/store functions act as a boundary between the integer
and vector registers.
Consider code like
uint32_t d = *dst;
__m64 vdest = load(d);
The program loads 4 bytes of data into an integer register and then
transfers it to the vector register, when it could have
Allows us to tune how we store data from the vector registers.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 93 ++-
1 files changed, 47 insertions(+), 46 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman
Allows us to tune how we load data into the vector registers.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 270 +++
1 files changed, 141 insertions(+), 129 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman
For unaligned loads.
This will be squash-merged with the previous patch.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 13 ++---
1 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index fe091a2..bd44f63
Signed-off-by: Matt Turner matts...@gmail.com
---
test/lowlevel-blt-bench.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index e990f5f..95513ba 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
On Thu, Feb 23, 2012 at 5:49 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
The load/store functions act as a boundary between the integer
and vector registers.
Consider code like
uint32_t d = *dst;
__m64 vdest = load(d
to recognize that it can
load to the vector register directly.
This patch is necessary for the Loongson optimizations when __m64 is
typedef'd as double.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 42 +-
1 files changed, 21
Signed-off-by: Matt Turner matts...@gmail.com
---
configure.ac | 45 +
pixman/Makefile.am | 12 +++
pixman/loongson-mmintrin.h | 218
pixman/pixman-cpu.c|4 +-
pixman/pixman-mmx.c| 40
Loongson:
add__ = L1: 68.73 L2: 55.09 M: 25.39 ( 68.18%) HT: 25.28 VT:
22.42 R: 20.74 RT: 13.26 ( 131Kops/s)
add__ = L1: 159.19 L2: 114.10 M: 30.74 ( 77.91%) HT: 27.63 VT:
24.99 R: 24.61 RT: 14.49 ( 141Kops/s)
Signed-off-by: Matt Turner matts...@gmail.com
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 10 +++---
1 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index a2af1b6..8b55b32 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -2158,7 +2158,7
To be used by the next commit.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 17 +
1 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 8b55b32..63edf18 100644
--- a/pixman/pixman-mmx.c
+++ b
Signed-off-by: Matt Turner matts...@gmail.com
---
test/lowlevel-blt-bench.c |2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 95513ba..8a39a46 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
.
Reviewed-by: Matt Turner matts...@gmail.com
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
and narrowing down bugs.
The current list of implementations that can be disabled:
fast
mmx
sse2
arm-simd
arm-neon
mips-dspr2
vmx
And arm-iwmmxt.
Looks good otherwise.
Reviewed-by: Matt Turner matts...@gmail.com
The general and noop implementations can't
On Tue, Feb 28, 2012 at 7:47 AM, Nemanja Lukic nlu...@mips.com wrote:
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz
Referent (before):
cairo-perf-trace:
[ # ] backend test min(s) median(s) stddev. count
[ # ]
On Tue, Feb 28, 2012 at 1:20 PM, Lukic, Nemanja nlu...@mips.com wrote:
Good point.
Only problem there is that address on which we are storing might not be
4-byte aligned (since we are doing memset on array of uint16_t).
But *dest can be aligned (with simple check) before the main loop, and
On Tue, Feb 28, 2012 at 8:57 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
+/* vectors are stored in 64-bit floating-point registers */
+typedef double __m64;
[...]
@@ -114,11 +118,14 @@ _mm_shuffle_pi16 (__m64 __A, int8_t const __N)
* uint64_t
On Tue, Feb 28, 2012 at 9:28 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
diff --git a/pixman/pixman-cpu.c b/pixman/pixman-cpu.c
index 92942b2..1fc9faa 100644
--- a/pixman/pixman-cpu.c
+++ b/pixman/pixman-cpu.c
@@ -690,7 +690,9
On Tue, Feb 28, 2012 at 9:25 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
To be used by the next commit.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 17 +
1 files changed, 17 insertions(+), 0 deletions
On Thu, Mar 8, 2012 at 5:41 PM, Jeremy Huddleston jerem...@apple.com wrote:
Signed-off-by: Jeremy Huddleston jerem...@apple.com
---
configure.ac | 9 +
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/configure.ac b/configure.ac
index c3c711c..1ca3c02 100644
---
Signed-off-by: Matt Turner matts...@gmail.com
---
configure.ac |4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/configure.ac b/configure.ac
index a920be2..29f881b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -609,7 +609,7 @@ AC_MSG_CHECKING(whether to use MIPS DSPr2
On Wed, Mar 14, 2012 at 5:29 PM, Søren Sandmann sandm...@cs.au.dk wrote:
From: Søren Sandmann Pedersen s...@redhat.com
In 32 bit mode the =A constraint refers to the register pair
edx:eax, but according to GCC developers this is not the case in 64
bit mode, where it refers to rax.
Hence,
Yes, and the reason that this is broken is exactly why we have to use
K to tell the compiler that it's an immediate value.
Reviewed-by: Matt Turner matts...@gmail.com
Please commit this.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http
On Thu, Mar 8, 2012 at 12:41 PM, Jeremy Huddleston jerem...@apple.com wrote:
This code was pretty much coppied from a similar commit that I made to
xorg-server in April.
cf: xorg/xserver: bb4d145bd25e2aee988b100ecf1105ea3b6a40b8
Signed-off-by: Jeremy Huddleston jerem...@apple.com
---
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 16 +++-
1 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index f9efd73..7acec6f 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -368,10
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 16 +++-
1 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 7acec6f..137a214 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -349,9
to recognize that it can
load to the vector register directly.
This patch is necessary for the Loongson optimizations when __m64 is
typedef'd as double.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 54 +-
1 files changed, 27
Signed-off-by: Matt Turner matts...@gmail.com
---
configure.ac | 54 +++
pixman/Makefile.am | 12 +++
pixman/loongson-mmintrin.h | 218
pixman/pixman-cpu.c| 37 ++--
pixman/pixman-mmx.c| 40
Loongson:
add__ = L1: 68.73 L2: 55.09 M: 25.39 ( 68.18%) HT: 25.28 VT:
22.42 R: 20.74 RT: 13.26 ( 131Kops/s)
add__ = L1: 159.19 L2: 114.10 M: 30.74 ( 77.91%) HT: 27.63 VT:
24.99 R: 24.61 RT: 14.49 ( 141Kops/s)
Signed-off-by: Matt Turner matts...@gmail.com
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 10 +++---
1 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 8cfb281..41c655a 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -2162,7 +2162,7
To be used by the next commit.
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 29 +
1 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 41c655a..0cc8935 100644
--- a/pixman/pixman
%) HT: 37.88 VT: 41.18 R:
36.14 RT: 15.52 ( 124Kops/s)
in_8_8 = L1: 74.93 L2: 63.00 M: 46.19 ( 27.49%) HT: 33.81 VT: 48.70 R:
44.17 RT: 24.56 ( 152Kops/s)
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 28 +++-
1 files changed, 15
( 44.64%) HT: 33.55 VT: 33.55
R: 28.57 RT: 13.05 ( 103Kops/s)
in_n_8_8 = L1: 75.71 L2: 70.41 M: 49.80 ( 44.99%) HT: 34.87 VT: 34.84
R: 27.77 RT: 13.87 ( 110Kops/s)
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 10 --
1 files changed, 8 insertions(+), 2
%) HT: 52.84 VT: 48.04 R:
44.53 RT: 18.19 ( 131Kops/s)
add_8_8 = L1: 285.81 L2: 217.86 M:102.16 ( 60.34%) HT: 56.68 VT: 53.97 R:
47.76 RT: 19.64 ( 143Kops/s)
Signed-off-by: Matt Turner matts...@gmail.com
---
pixman/pixman-mmx.c | 30 --
1 files changed, 20
Signed-off-by: Matt Turner matts...@gmail.com
---
Looks like an oversight, but maybe there was some reason it wasn't enabled?
pixman/pixman-mmx.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 9d1f6af..4ac9863 100644
On Tue, Apr 3, 2012 at 1:30 PM, Nemanja Lukic nlu...@mips.com wrote:
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz
Referent (before):
lowlevel-blt-bench:
over_n_8_ = L1: 10.71 L2: 10.11 M: 8.70 ( 34.57%) HT: 7.82
VT:
On Tue, Apr 10, 2012 at 1:50 AM, Jeremy Huddleston
jerem...@freedesktop.org wrote:
Newer clangs support the K constraint? Fixed in response to its use
in pixman? :)
And to close the loop on this, the fix has landed in clang trunk and will be
in 3.1:
The pinsrh instruction is analogous to MMX EXT's pinsrw, except like
other Loongson vector instructions it cannot access the general purpose
registers. In the cases of other Loongson vector instructions, this is a
headache, but it is actually a good thing here. Since the instruction is
different
Same story as pinsrh in the previous commit.
text databss dec hex filename
25336 1952 0 272886a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
25072 1952 0 270246990 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
-dsll: 95
+dsll: 70
-dsrl: 135
+dsrl: 105
oprofile of xfce4-terminal-a1
2105359.0407 libpixman-1.so.0.25.3fetch_scanline_a8
1448026.0054 libpixman-1.so.0.25.3mmx_fetch_a8
Loongson:
add_8_8_8 = L1: 17.98 L2: 17.28 M: 14.28 ( 19.79%) HT: 11.11 VT:
10.38 R: 9.97 RT: 5.14 ( 55Kops/s)
add_8_8_8 =
Loongson:
add_x888_x888 = L1: 29.36 L2: 27.81 M: 14.05 ( 38.74%) HT: 12.45 VT:
11.78 R: 11.52 RT: 7.23 ( 75Kops/s)
add_x888_x888 = L1: 36.06 L2: 34.55 M: 14.81 ( 41.03%) HT: 14.01 VT:
13.41 R: 13.06 RT: 9.06 ( 90Kops/s)
src_x888_8_x888 = L1: 21.92 L2: 20.15 M:
Uses the pmadd technique described in
http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf
Loongson:
src__0565 = L1: 106.13 L2: 83.57 M: 33.46 ( 68.90%) HT: 30.29 VT:
27.67 R: 26.11 RT: 15.06 ( 135Kops/s)
src__0565 = L1: 122.10 L2: 117.53 M:
On Thu, Apr 19, 2012 at 5:40 PM, Matt Turner matts...@gmail.com wrote:
Uses the pmadd technique described in
http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf
+static force_inline __m64
+pack_4xpacked565 (__m64 a, __m64 b)
+{
+ __m64 rb0
On Fri, Apr 20, 2012 at 3:43 PM, Matt Turner matts...@gmail.com wrote:
On Thu, Apr 19, 2012 at 5:40 PM, Matt Turner matts...@gmail.com wrote:
Uses the pmadd technique described in
http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf
+static force_inline __m64
I started porting my src__0565 MMX function to SSE2, and in the
process started thinking about using SSE3+. The useful instructions
added post SSE2 that I see are
SSE3: lddqu - for unaligned loads across cache lines
SSSE3: palignr - for unaligned loads (but requires software
If not compiling with -march=iwmmxt, the configure test will still pass,
thinking that the __builtin_arm_* intrinsic is a function instead of
generating a single instruction. Since no linking is done, the configure
test doesn't catch this, and we get linking errors in the build.
---
configure.ac
On Tue, May 15, 2012 at 4:36 PM, Matt Turner matts...@gmail.com wrote:
+#error IWMMXT not enabled (with -march=iwmmxt)
Missing closing
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
On Tue, May 15, 2012 at 5:37 PM, Siarhei Siamashka
siarhei.siamas...@gmail.com wrote:
I still need to add improvement for that packing/unpacking of the RGBA
pixels after bilinear/before OVER operation, but I don't expect big
improvement there (it is just a couple of instructions).
It's not
On Thu, May 17, 2012 at 5:40 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Søren Sandmann sandm...@cs.au.dk writes:
Given a pixel with only the red component of these values, the results
are off-by-one.
0x03 - 0x19 (0x18)
0x07 - 0x3A (0x39)
0x18 - 0xC5 (0xC6)
0x1C - 0xE6 (0xE7)
(Same for
If not compiling with -march=iwmmxt, the configure test will still pass,
thinking that the __builtin_arm_* intrinsic is a function instead of
generating a single instruction. Since no linking is done, the configure
test doesn't catch this, and we get linking errors in the build.
---
configure.ac
Loongson:
add_0565_0565 = L1: 14.39 L2: 13.98 M: 11.28 ( 15.22%) HT: 10.11 VT:
9.74 R: 9.39 RT: 6.05 ( 67Kops/s)
add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT:
10.15 R: 9.74 RT: 6.19 ( 68Kops/s)
ARM/iwMMXt:
add_0565_0565 = L1: 11.12 L2: 10.40
I'll need this code for header and tail alignment loops in MMX, so I
might as well implement a fast path here.
---
pixman/pixman-fast-path.c | 44
1 files changed, 44 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-fast-path.c
Loongson:
add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT:
10.15 R: 9.74 RT: 6.19 ( 68Kops/s)
add_0565_0565 = L1: 45.06 L2: 46.71 M: 27.45 ( 38.00%) HT: 23.76 VT:
22.84 R: 18.96 RT: 9.79 ( 104Kops/s)
ARM/iwMMXt:
add_0565_0565 = L1: 12.87 L2: 11.58
On Fri, May 18, 2012 at 2:41 PM, Matt Turner matts...@gmail.com wrote:
If not compiling with -march=iwmmxt, the configure test will still pass,
thinking that the __builtin_arm_* intrinsic is a function instead of
generating a single instruction. Since no linking is done, the configure
test
Loongson:
over_reverse_n_ = L1: 16.04 L2: 15.35 M: 10.20 ( 27.96%) HT: 10.95
VT: 10.45 R: 9.18 RT: 6.99 ( 76Kops/s)
over_reverse_n_ = L1: 27.40 L2: 26.67 M: 16.97 ( 45.78%) HT: 16.66
VT: 15.38 R: 14.15 RT: 9.44 ( 97Kops/s)
image poppler
On Sat, May 19, 2012 at 9:45 AM, Ingmar Runge ing...@irsoft.de wrote:
From: Ingmar Runge ing...@irsoft.de
---
pixman/pixman-mmx.c | 9 +++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 01a2bc9..eb02d1a 100644
---
On Sat, May 19, 2012 at 12:34 PM, Søren Sandmann sandm...@cs.au.dk wrote:
Matt Turner matts...@gmail.com writes:
On Fri, May 18, 2012 at 2:41 PM, Matt Turner matts...@gmail.com wrote:
If not compiling with -march=iwmmxt, the configure test will still pass,
thinking that the __builtin_arm_
Fixes spurious test failures.
---
pixman/pixman-mmx.c |2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index bb125bf..ab70275 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -2232,6 +2232,8 @@
1 - 100 of 205 matches
Mail list logo