Re: [Pixman] [RFC] AVX codepaths

2011-05-21 Thread Matt Turner
On Sat, May 21, 2011 at 6:22 PM, Matt Turner matts...@gmail.com wrote: In the mean time, I've pushed my patches to the avx-optimizations branch of git://anongit.freedesktop.org/~mattst88/pixman. I also added an AVX path for composite_src_x888_. Attached are the cairo-perf-trace results

Re: [Pixman] [PATCH 2/3] mmx: fix unaligned accesses

2011-07-31 Thread Matt Turner
On Sat, Jul 23, 2011 at 10:28 PM, Siarhei Siamashka siarhei.siamas...@gmail.com wrote: The 'test1' function does not look good because it uses ARM instructions to read data one byte at a time and combine it. Function 'test2' looks a bit better because it now uses WALIGNR, but this is still not

[Pixman] [PATCH 1/4] mmx: rename USE_MMX to USE_X86_MMX

2011-08-25 Thread Matt Turner
This will make upcoming ARM usage of pixman-mmx.c unambiguous. Signed-off-by: Matt Turner matts...@gmail.com --- configure.ac|8 pixman/Makefile.am |2 +- pixman/Makefile.win32 |2 +- pixman/pixman-cpu.c |6 +++--- pixman/pixman-mmx.c |4

[Pixman] [PATCH 2/4] mmx: prepare pixman-mmx.c to be compiled for ARM/iwmmxt

2011-08-25 Thread Matt Turner
Adding iwmmxt inline assembly doesn't help performance, so don't bother. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 6a68080..7673675

[Pixman] [PATCH 3/4] mmx: fix unaligned accesses

2011-08-25 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 112 --- 1 files changed, 79 insertions(+), 33 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 7673675..471c51d 100644 --- a/pixman/pixman-mmx.c

[Pixman] [PATCH 4/4] mmx: compile on ARM for iwmmxt optimizations

2011-08-25 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- I left the gcc check at 4.6 in the hopes that my gcc patch will be accepted for some gcc-4.6 release. If it doesn't, the check for 4.6 shouldn't be harmful, because the intrinsic I chose to test is known to cause an unpatched gcc-4.6 to assert

Re: [Pixman] ARM iwmmxt patches

2011-08-31 Thread Matt Turner
On Wed, Aug 31, 2011 at 8:12 AM, Soeren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: I've been trying to figure out if the ARM iwmmxt inline assembly makes any difference at all. I think the conclusion is that it does not. Updated code is here: http

Re: [Pixman] ARM iwmmxt patches

2011-08-31 Thread Matt Turner
On Wed, Aug 31, 2011 at 7:37 PM, Soeren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: Never does using inline assembly seem to make any sort of meaningful difference over simply compiling pixman-mmx.c for ARM/iwmmxt. I tried checking the alignment in the 'wip

[Pixman] [PATCH] lowlevel-blt: add over_x888_8_8888

2011-09-22 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- test/lowlevel-blt-bench.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c index 099e434..bdafb35 100644 --- a/test/lowlevel-blt-bench.c +++ b/test/lowlevel-blt-bench.c

[Pixman] [PATCH] mmx: fix formats in commented code

2011-09-22 Thread Matt Turner
b8r8g8 is apparently no longer supported sometime since this code was commented. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 697ec4c..f835a14

[Pixman] [PATCH] mmx: convert while (w) to if (w) when possible

2011-09-22 Thread Matt Turner
gcc isn't able to see that w is no greater than 1, so it generates unnecessary loop instructions with while (w). Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman

Re: [Pixman] [PATCH] mmx: convert while (w) to if (w) when possible

2011-09-23 Thread Matt Turner
On Fri, Sep 23, 2011 at 1:21 AM, Taekyun Kim podai...@gmail.com wrote: Though it would be optimized out by the compiler, how about removing also w-- and dst++? Good idea. Updated patch sent. Matt ___ Pixman mailing list Pixman@lists.freedesktop.org

[Pixman] [PATCH 0/8]: ARM/iwmmxt optimizations

2011-09-23 Thread Matt Turner
Following this email is a series of eight patches which add the ability to compile pixman's pixman-mmx.c on ARM in order to use the iwMMXt SIMD instruction set. The purpose of this work is to improve the compositing performance of the OLPC XO 1.75. Care has been taken to ensure that each commit

[Pixman] [PATCH 1/8] mmx: rename USE_MMX to USE_X86_MMX

2011-09-23 Thread Matt Turner
This will make upcoming ARM usage of pixman-mmx.c unambiguous. Signed-off-by: Matt Turner matts...@gmail.com --- configure.ac|8 pixman/Makefile.am |2 +- pixman/Makefile.win32 |2 +- pixman/pixman-cpu.c |6 +++--- pixman/pixman-mmx.c |4

[Pixman] [PATCH 2/8] mmx: wrap x86/MMX inline assembly in ifdef USE_X86_MMX

2011-09-23 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index eca6d25..8782d89 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -1784,7 +1784,7

[Pixman] [PATCH 3/8] mmx: fix unaligned accesses

2011-09-23 Thread Matt Turner
-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 185 +++--- 1 files changed, 129 insertions(+), 56 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 8782d89..0317b9a 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman

[Pixman] [PATCH 5/8] mmx: compile on ARM for iwmmxt optimizations

2011-09-23 Thread Matt Turner
after NEON, since we expect the NEON optimizations to be more capable and faster than iwmmxt. Signed-off-by: Matt Turner matts...@gmail.com --- configure.ac| 48 +++ pixman/Makefile.am | 12 +++ pixman/pixman-cpu.c

Re: [Pixman] [PATCH] mmx: fix formats in commented code

2011-09-26 Thread Matt Turner
On Mon, Sep 26, 2011 at 4:09 AM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: b8r8g8 is apparently no longer supported sometime since this code was commented. All three patches look good to me. If you were benchmarking the over_x888_8_() operation

[Pixman] [PATCH] Make sure iwMMXt is only detected on ARM

2011-10-05 Thread Matt Turner
on these platforms. So, just #error out in the test if the __arm__ preprocessor directive isn't defined. Fixes https://bugs.gentoo.org/show_bug.cgi?id=385179 Signed-off-by: Matt Turner matts...@gmail.com --- configure.ac |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git

Re: [Pixman] [PATCH] Make sure iwMMXt is only detected on ARM

2011-10-06 Thread Matt Turner
On Thu, Oct 6, 2011 at 7:44 AM, Toms Miks barvins.t...@gmail.com wrote: accepting -march=iwmmxt on x86 is a bug in GCC -march just tells gcc for what processor it has to generate the code. It does not matter what processor the computer actually has. I mean, it's ok to compile code for ARM on

Re: [Pixman] [PATCH] Make sure iwMMXt is only detected on ARM

2011-10-06 Thread Matt Turner
On Thu, Oct 6, 2011 at 7:15 AM, Søren Sandmann sandm...@cs.au.dk wrote:    #if defined(__GNUC__) (__GNUC__ 4 || (__GNUC__ == 3 __GNUC_MINOR__ 6))    #error Need GCC = 4.6 for IWMMXT intrinsics    #endif where __GNUC__ == 3 should be __GNUC__ == 4? Yes, looks like a copy-n-pasto from

Re: [Pixman] [PATCH] Make sure iwMMXt is only detected on ARM

2011-10-06 Thread Matt Turner
On Thu, Oct 6, 2011 at 12:14 PM, Matt Turner matts...@gmail.com wrote: On Thu, Oct 6, 2011 at 7:44 AM, Toms Miks barvins.t...@gmail.com wrote: accepting -march=iwmmxt on x86 is a bug in GCC -march just tells gcc for what processor it has to generate the code. It does not matter what processor

[Pixman] [RFC PATCH] mmx: Use shuffle instruction when available

2012-02-12 Thread Matt Turner
Although not part of the original MMX instruction set, both SSE and AMD's Extended 3DNow! both provide the pshufw instruction. ARM iwMMXt also has an equivalent instruction, as do the Loongson Multimedia Instructions. We can simplify the expand_alpha, expand_alpha_rev, and invert_colors

[Pixman] [PATCH] Convert while (w) to if (w) when possible

2012-02-17 Thread Matt Turner
Missed in commit 57fd8c37. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c |4 ++-- pixman/pixman-sse2.c |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 937ce8f..82f6b54 100644 --- a/pixman

[Pixman] [PATCH 2/3] mmx: Use _mm_mulhi_pu16 when available

2012-02-19 Thread Matt Turner
731d .libs/libpixman_mmx_la-pixman-mmx.o arm textdata bss dec hex filename 316321792 0 334248290 .libs/libpixman_iwmmxt_la-pixman-mmx.o 301761792 0 319687ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o Signed-off-by: Matt Turner matts...@gmail.com

[Pixman] [PATCH 3/3] mmx: Use _mm_shuffle_pi16 when available

2012-02-19 Thread Matt Turner
The pshufw x86 instruction is part of Extended 3DNow! and SSE1. The equivalent ARM wshufh instruction was available from the first iwMMXt instrucion set. This instruction is already used in the SSE2 code. Reduces code size by ~9%. amd64 textdata bss dec hex filename 29925

[Pixman] [PATCH v2] Convert while (w) to if (w) when possible

2012-02-19 Thread Matt Turner
Missed in commit 57fd8c37. Signed-off-by: Matt Turner matts...@gmail.com --- v2: found a few more. I really think that's the last of them. pixman/pixman-mmx.c | 14 +++--- pixman/pixman-sse2.c |2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/pixman/pixman

Re: [Pixman] [PATCH 1/3] autoconf: add MMX EXT support check

2012-02-19 Thread Matt Turner
On Sun, Feb 19, 2012 at 4:59 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: The current runtime test checked that MMX extensions were available before executing code in pixman-mmx.c, even though no MMX extensions were used. The new --{enable,disable

Re: [Pixman] [PATCH 1/3] autoconf: add MMX EXT support check

2012-02-19 Thread Matt Turner
On Sun, Feb 19, 2012 at 5:23 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: Are there any interesting chips without MMX extensions? Pentium MMX definitely is not interesting. If there aren't, I'd rather just get rid of the ifdefs and unconditionally require

Re: [Pixman] [PATCH 1/3] autoconf: add MMX EXT support check

2012-02-19 Thread Matt Turner
On Sun, Feb 19, 2012 at 6:15 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: On Sun, Feb 19, 2012 at 5:23 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: Are there any interesting chips without MMX extensions? Pentium MMX

[Pixman] [PATCH v2 1/2] mmx: Use _mm_mulhi_pu16

2012-02-19 Thread Matt Turner
731d .libs/libpixman_mmx_la-pixman-mmx.o arm textdata bss dec hex filename 316321792 0 334248290 .libs/libpixman_iwmmxt_la-pixman-mmx.o 301761792 0 319687ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o Signed-off-by: Matt Turner matts...@gmail.com

Re: [Pixman] [PATCH v2 1/2] mmx: Use _mm_mulhi_pu16

2012-02-19 Thread Matt Turner
On Sun, Feb 19, 2012 at 6:41 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: +/* We have to compile with -msse to use xmmintrin.h, but that causes SSE + * instructions to be generated that we don't want. Just duplicate the + * functions we want to use

[Pixman] [PATCH v3 1/3] autoconf: test MMX extension instructions

2012-02-20 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- configure.ac |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/configure.ac b/configure.ac index 4f8a0c5..ae092e5 100644 --- a/configure.ac +++ b/configure.ac @@ -297,6 +297,9 @@ error Need GCC = 3.4 for MMX intrinsics

[Pixman] [PATCH v3 3/3] mmx: Use _mm_shuffle_pi16

2012-02-20 Thread Matt Turner
687f .libs/libpixman_mmx_la-pixman-mmx.o arm textdata bss dec hex filename 301761792 0 319687ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o 273841792 0 2917671f8 .libs/libpixman_iwmmxt_la-pixman-mmx.o Signed-off-by: Matt Turner matts...@gmail.com

[Pixman] [PATCH] mmx: enable over_x888_8_8888 on ARM/iwMMXt

2012-02-20 Thread Matt Turner
before: over_x888_8_ = L1: 7.63 L2: 7.72 M: 6.44 ( 19.17%) HT: 6.24 VT: 6.11 R: 5.87 RT: 4.61 ( 51Kops/s) after : over_x888_8_ = L1: 11.88 L2: 11.11 M: 8.70 ( 26.01%) HT: 8.15 VT: 8.07 R: 7.76 RT: 5.62 ( 61Kops/s) Signed-off-by: Matt Turner matts

[Pixman] [PATCH v4 1/2] mmx: Use _mm_mulhi_pu16

2012-02-21 Thread Matt Turner
needs SSE or 3DNowA - we can't use -msse, since it'll cause more SSE instructions to be generated that we don't want - there is no -m3dnowa flag (anymore?) Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 20 ++-- 1 files changed

[Pixman] [PATCH v4 2/2] mmx: Use _mm_shuffle_pi16

2012-02-21 Thread Matt Turner
needs SSE or 3DNowA - we can't use -msse, since it'll cause more SSE instructions to be generated that we don't want - there is no -m3dnowa flag (anymore?) Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 55

Re: [Pixman] [PATCH v4 2/2] mmx: Use _mm_shuffle_pi16

2012-02-21 Thread Matt Turner
On Tue, Feb 21, 2012 at 12:24 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: v4: use inline assembly since the intrinsic needs SSE or 3DNowA       - we can't use -msse, since it'll cause more SSE instructions to be         generated that we don't want

Re: [Pixman] Basic infrastructure for MIPS architecture and initial set of SRC routines.

2012-02-21 Thread Matt Turner
On Tue, Feb 21, 2012 at 1:05 PM, Siarhei Siamashka siarhei.siamas...@gmail.com wrote: On Tue, Feb 21, 2012 at 4:59 PM, Nemanja Lukic nlu...@mips.com wrote: Per previous code review: Run time detection is still there (per Siarhei's comments), uses /proc/cpuinfo, but now properly detects

Re: [Pixman] mmx build regression

2012-02-22 Thread Matt Turner
On Wed, Feb 22, 2012 at 2:27 AM, Jeremy Huddleston jerem...@freedesktop.org wrote: I just got my tinderbox back up today and noticed this build regression in pixman.  I haven't looked into it yet, but git-log blames Matt ;) http://tinderbox.x.org/builds/2012-02-22-0001/logs/pixman/#build

Re: [Pixman] [PATCH 1/2] mmx: Enable over_x888_8_8888() for x86 as well.

2012-02-22 Thread Matt Turner
) over_x888_8_ = L1: 263.69 L2: 260.84 M:247.48 ( 15.65%) HT:197.37 VT:166.21 R:144.30 RT: 72.07 ( 859Kops/s) And for good measure, SSE2: over_x888_8_ = L1: 747.44 L2: 735.23 M:635.67 ( 40.42%) HT:308.51 VT:230.03 R:198.13 RT: 86.94 (1004Kops/s) Both patches are Reviewed-by: Matt

[Pixman] [PATCH] Update .gitignore with more demos and tests

2012-02-22 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- .gitignore | 23 +++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/.gitignore b/.gitignore index 1584064..60b5bb4 100644 --- a/.gitignore +++ b/.gitignore @@ -26,25 +26,48 @@ stamp-h? config.h config.h.in

[Pixman] mmx: Improving the load8888/store8888 functions

2012-02-22 Thread Matt Turner
The load/store functions act as a boundary between the integer and vector registers. Consider code like uint32_t d = *dst; __m64 vdest = load(d); The program loads 4 bytes of data into an integer register and then transfers it to the vector register, when it could have

[Pixman] [PATCH 1/3] mmx: make store8888 take uint32_t *dest as argument

2012-02-22 Thread Matt Turner
Allows us to tune how we store data from the vector registers. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 93 ++- 1 files changed, 47 insertions(+), 46 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman

[Pixman] [PATCH 2/3] mmx: make load8888 take a pointer to data instead of the data itself

2012-02-22 Thread Matt Turner
Allows us to tune how we load data into the vector registers. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 270 +++ 1 files changed, 141 insertions(+), 129 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman

[Pixman] [PATCH 3/3] mmx: define and use load8888u function

2012-02-22 Thread Matt Turner
For unaligned loads. This will be squash-merged with the previous patch. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 13 ++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index fe091a2..bd44f63

[Pixman] [PATCH 2/2] lowlevel-blt: add over_x888_n_8888

2012-02-23 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- test/lowlevel-blt-bench.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c index e990f5f..95513ba 100644 --- a/test/lowlevel-blt-bench.c +++ b/test/lowlevel-blt-bench.c

Re: [Pixman] mmx: Improving the load8888/store8888 functions

2012-02-24 Thread Matt Turner
On Thu, Feb 23, 2012 at 5:49 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: The load/store functions act as a boundary between the integer and vector registers. Consider code like     uint32_t d = *dst;     __m64 vdest = load(d

[Pixman] [PATCH 1/6] mmx: make ldq_u take __m64* directly

2012-02-24 Thread Matt Turner
to recognize that it can load to the vector register directly. This patch is necessary for the Loongson optimizations when __m64 is typedef'd as double. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 42 +- 1 files changed, 21

[Pixman] [PATCH 2/6] mmx: compile on MIPS for Loongson MMI optimizations

2012-02-24 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- configure.ac | 45 + pixman/Makefile.am | 12 +++ pixman/loongson-mmintrin.h | 218 pixman/pixman-cpu.c|4 +- pixman/pixman-mmx.c| 40

[Pixman] [PATCH 3/6] mmx: remove unnecessary uint64_t-__m64 conversions

2012-02-24 Thread Matt Turner
Loongson: add__ = L1: 68.73 L2: 55.09 M: 25.39 ( 68.18%) HT: 25.28 VT: 22.42 R: 20.74 RT: 13.26 ( 131Kops/s) add__ = L1: 159.19 L2: 114.10 M: 30.74 ( 77.91%) HT: 27.63 VT: 24.99 R: 24.61 RT: 14.49 ( 141Kops/s) Signed-off-by: Matt Turner matts...@gmail.com

[Pixman] [PATCH 4/6] mmx: simplify srcsrcsrcsrc calculation in over_n_8_0565

2012-02-24 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 10 +++--- 1 files changed, 3 insertions(+), 7 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index a2af1b6..8b55b32 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -2158,7 +2158,7

[Pixman] [PATCH 5/6] mmx: introduce is_equal and is_zero functions

2012-02-24 Thread Matt Turner
To be used by the next commit. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 8b55b32..63edf18 100644 --- a/pixman/pixman-mmx.c +++ b

[Pixman] [PATCH] lowlevel-blt-bench: add in_8_8 and in_n_8_8

2012-02-24 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- test/lowlevel-blt-bench.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c index 95513ba..8a39a46 100644 --- a/test/lowlevel-blt-bench.c +++ b/test/lowlevel-blt-bench.c

Re: [Pixman] [PATCH 1/1] Disable implementations mentioned in the PIXMAN_DISABLE environment variable.

2012-02-26 Thread Matt Turner
. Reviewed-by: Matt Turner matts...@gmail.com ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Disable implementations mentioned in the PIXMAN_DISABLE environment variable.

2012-02-26 Thread Matt Turner
and narrowing down bugs. The current list of implementations that can be disabled:    fast    mmx    sse2    arm-simd    arm-neon    mips-dspr2    vmx And arm-iwmmxt. Looks good otherwise. Reviewed-by: Matt Turner matts...@gmail.com The general and noop implementations can't

Re: [Pixman] [PATCH] MIPS: DSPr2: Added mips_dspr2_blt and mips_dspr2_fill routines.

2012-02-28 Thread Matt Turner
On Tue, Feb 28, 2012 at 7:47 AM, Nemanja Lukic nlu...@mips.com wrote: From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): cairo-perf-trace: [ # ]  backend                         test   min(s) median(s) stddev. count [ # ]    

Re: [Pixman] [PATCH] MIPS: DSPr2: Added mips_dspr2_blt and mips_dspr2_fill routines.

2012-02-28 Thread Matt Turner
On Tue, Feb 28, 2012 at 1:20 PM, Lukic, Nemanja nlu...@mips.com wrote: Good point. Only problem there is that address on which we are storing might not be 4-byte aligned (since we are doing memset on array of uint16_t). But *dest can be aligned (with simple check) before the main loop, and

Re: [Pixman] [PATCH 2/6] mmx: compile on MIPS for Loongson MMI optimizations

2012-02-28 Thread Matt Turner
On Tue, Feb 28, 2012 at 8:57 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: +/* vectors are stored in 64-bit floating-point registers */ +typedef double __m64; [...] @@ -114,11 +118,14 @@ _mm_shuffle_pi16 (__m64 __A, int8_t const __N)   * uint64_t

Re: [Pixman] [PATCH 2/6] mmx: compile on MIPS for Loongson MMI optimizations

2012-02-28 Thread Matt Turner
On Tue, Feb 28, 2012 at 9:28 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: diff --git a/pixman/pixman-cpu.c b/pixman/pixman-cpu.c index 92942b2..1fc9faa 100644 --- a/pixman/pixman-cpu.c +++ b/pixman/pixman-cpu.c @@ -690,7 +690,9

Re: [Pixman] [PATCH 5/6] mmx: introduce is_equal and is_zero functions

2012-02-28 Thread Matt Turner
On Tue, Feb 28, 2012 at 9:25 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: To be used by the next commit. Signed-off-by: Matt Turner matts...@gmail.com ---  pixman/pixman-mmx.c |   17 +  1 files changed, 17 insertions(+), 0 deletions

Re: [Pixman] [PATCH 2/4] Disable MMX when incompatible clang is being used.

2012-03-08 Thread Matt Turner
On Thu, Mar 8, 2012 at 5:41 PM, Jeremy Huddleston jerem...@apple.com wrote: Signed-off-by: Jeremy Huddleston jerem...@apple.com ---  configure.ac |    9 +  1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/configure.ac b/configure.ac index c3c711c..1ca3c02 100644 ---

[Pixman] [PATCH] Use AC_LANG_SOURCE for DSPr2 configure program

2012-03-14 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- configure.ac |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index a920be2..29f881b 100644 --- a/configure.ac +++ b/configure.ac @@ -609,7 +609,7 @@ AC_MSG_CHECKING(whether to use MIPS DSPr2

Re: [Pixman] [PATCH] Use =a and =d constraints for rdtsc inline assembly

2012-03-14 Thread Matt Turner
On Wed, Mar 14, 2012 at 5:29 PM, Søren Sandmann sandm...@cs.au.dk wrote: From: Søren Sandmann Pedersen s...@redhat.com In 32 bit mode the =A constraint refers to the register pair edx:eax, but according to GCC developers this is not the case in 64 bit mode, where it refers to rax. Hence,

Re: [Pixman] [PATCH] Fix a false-negative in MMX check

2012-03-14 Thread Matt Turner
Yes, and the reason that this is broken is exactly why we have to use K to tell the compiler that it's an immediate value. Reviewed-by: Matt Turner matts...@gmail.com Please commit this. ___ Pixman mailing list Pixman@lists.freedesktop.org http

Re: [Pixman] [PATCH 4/4] Expand TLS support beyond __thread to __declspec(thread)

2012-03-14 Thread Matt Turner
On Thu, Mar 8, 2012 at 12:41 PM, Jeremy Huddleston jerem...@apple.com wrote: This code was pretty much coppied from a similar commit that I made to xorg-server in April. cf: xorg/xserver: bb4d145bd25e2aee988b100ecf1105ea3b6a40b8 Signed-off-by: Jeremy Huddleston jerem...@apple.com ---  

[Pixman] [PATCH 01/11] mmx: add store function and use it in add_8888_8888

2012-03-14 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 16 +++- 1 files changed, 11 insertions(+), 5 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index f9efd73..7acec6f 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -368,10

[Pixman] [PATCH 02/11] mmx: add load function and use it in add_8888_8888

2012-03-14 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 16 +++- 1 files changed, 11 insertions(+), 5 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 7acec6f..137a214 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -349,9

[Pixman] [PATCH 03/11] mmx: make ldq_u take __m64* directly

2012-03-14 Thread Matt Turner
to recognize that it can load to the vector register directly. This patch is necessary for the Loongson optimizations when __m64 is typedef'd as double. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 54 +- 1 files changed, 27

[Pixman] [PATCH 04/11] mmx: compile on MIPS for Loongson MMI optimizations

2012-03-14 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- configure.ac | 54 +++ pixman/Makefile.am | 12 +++ pixman/loongson-mmintrin.h | 218 pixman/pixman-cpu.c| 37 ++-- pixman/pixman-mmx.c| 40

[Pixman] [PATCH 05/11] mmx: remove unnecessary uint64_t-__m64 conversions

2012-03-14 Thread Matt Turner
Loongson: add__ = L1: 68.73 L2: 55.09 M: 25.39 ( 68.18%) HT: 25.28 VT: 22.42 R: 20.74 RT: 13.26 ( 131Kops/s) add__ = L1: 159.19 L2: 114.10 M: 30.74 ( 77.91%) HT: 27.63 VT: 24.99 R: 24.61 RT: 14.49 ( 141Kops/s) Signed-off-by: Matt Turner matts...@gmail.com

[Pixman] [PATCH 06/11] mmx: simplify srcsrcsrcsrc calculation in over_n_8_0565

2012-03-14 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 10 +++--- 1 files changed, 3 insertions(+), 7 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 8cfb281..41c655a 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -2162,7 +2162,7

[Pixman] [PATCH 07/11] mmx: introduce is_equal and is_zero functions

2012-03-14 Thread Matt Turner
To be used by the next commit. Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 41c655a..0cc8935 100644 --- a/pixman/pixman

[Pixman] [PATCH 09/11] mmx: optimize in_8_8

2012-03-14 Thread Matt Turner
%) HT: 37.88 VT: 41.18 R: 36.14 RT: 15.52 ( 124Kops/s) in_8_8 = L1: 74.93 L2: 63.00 M: 46.19 ( 27.49%) HT: 33.81 VT: 48.70 R: 44.17 RT: 24.56 ( 152Kops/s) Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 28 +++- 1 files changed, 15

[Pixman] [PATCH 10/11] mmx: optimize in_n_8_8

2012-03-14 Thread Matt Turner
( 44.64%) HT: 33.55 VT: 33.55 R: 28.57 RT: 13.05 ( 103Kops/s) in_n_8_8 = L1: 75.71 L2: 70.41 M: 49.80 ( 44.99%) HT: 34.87 VT: 34.84 R: 27.77 RT: 13.87 ( 110Kops/s) Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 10 -- 1 files changed, 8 insertions(+), 2

[Pixman] [PATCH 11/11] mmx: optimize add_8_8

2012-03-14 Thread Matt Turner
%) HT: 52.84 VT: 48.04 R: 44.53 RT: 18.19 ( 131Kops/s) add_8_8 = L1: 285.81 L2: 217.86 M:102.16 ( 60.34%) HT: 56.68 VT: 53.97 R: 47.76 RT: 19.64 ( 143Kops/s) Signed-off-by: Matt Turner matts...@gmail.com --- pixman/pixman-mmx.c | 30 -- 1 files changed, 20

[Pixman] [PATCH] mmx: enable over_n_0565 for b5g6r5

2012-03-15 Thread Matt Turner
Signed-off-by: Matt Turner matts...@gmail.com --- Looks like an oversight, but maybe there was some reason it wasn't enabled? pixman/pixman-mmx.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 9d1f6af..4ac9863 100644

Re: [Pixman] [PATCH] MIPS: DSPr2: Added over_n_8_8888 and over_n_8_0565 fast paths.

2012-04-03 Thread Matt Turner
On Tue, Apr 3, 2012 at 1:30 PM, Nemanja Lukic nlu...@mips.com wrote: From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): lowlevel-blt-bench:     over_n_8_ =  L1:  10.71  L2:  10.11  M:  8.70 ( 34.57%)  HT:  7.82   VT:  

Re: [Pixman] xmmintrin.h (was [PATCH 2/4] Disable MMX when incompatible clang is being used.)

2012-04-10 Thread Matt Turner
On Tue, Apr 10, 2012 at 1:50 AM, Jeremy Huddleston jerem...@freedesktop.org wrote: Newer clangs support the K constraint? Fixed in response to its use in pixman? :) And to close the loop on this, the fix has landed in clang trunk and will be in 3.1:

[Pixman] [PATCH 1/2] mmx: Use Loongson pinsrh instruction in pack_565

2012-04-17 Thread Matt Turner
The pinsrh instruction is analogous to MMX EXT's pinsrw, except like other Loongson vector instructions it cannot access the general purpose registers. In the cases of other Loongson vector instructions, this is a headache, but it is actually a good thing here. Since the instruction is different

[Pixman] [PATCH 2/2] mmx: Use Loongson pextrh instruction in expand565

2012-04-17 Thread Matt Turner
Same story as pinsrh in the previous commit. text databss dec hex filename 25336 1952 0 272886a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o 25072 1952 0 270246990 .libs/libpixman_loongson_mmi_la-pixman-mmx.o -dsll: 95 +dsll: 70 -dsrl: 135 +dsrl: 105

[Pixman] [PATCH 2/3] mmx: add a8 fetcher

2012-04-18 Thread Matt Turner
oprofile of xfce4-terminal-a1 2105359.0407 libpixman-1.so.0.25.3fetch_scanline_a8 1448026.0054 libpixman-1.so.0.25.3mmx_fetch_a8 Loongson: add_8_8_8 = L1: 17.98 L2: 17.28 M: 14.28 ( 19.79%) HT: 11.11 VT: 10.38 R: 9.97 RT: 5.14 ( 55Kops/s) add_8_8_8 =

[Pixman] [PATCH 3/3] mmx: add x8f8g8b8 fetcher

2012-04-18 Thread Matt Turner
Loongson: add_x888_x888 = L1: 29.36 L2: 27.81 M: 14.05 ( 38.74%) HT: 12.45 VT: 11.78 R: 11.52 RT: 7.23 ( 75Kops/s) add_x888_x888 = L1: 36.06 L2: 34.55 M: 14.81 ( 41.03%) HT: 14.01 VT: 13.41 R: 13.06 RT: 9.06 ( 90Kops/s) src_x888_8_x888 = L1: 21.92 L2: 20.15 M:

[Pixman] [PATCH] mmx: add src_8888_0565

2012-04-19 Thread Matt Turner
Uses the pmadd technique described in http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf Loongson: src__0565 = L1: 106.13 L2: 83.57 M: 33.46 ( 68.90%) HT: 30.29 VT: 27.67 R: 26.11 RT: 15.06 ( 135Kops/s) src__0565 = L1: 122.10 L2: 117.53 M:

Re: [Pixman] [PATCH] mmx: add src_8888_0565

2012-04-20 Thread Matt Turner
On Thu, Apr 19, 2012 at 5:40 PM, Matt Turner matts...@gmail.com wrote: Uses the pmadd technique described in http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf +static force_inline __m64 +pack_4xpacked565 (__m64 a, __m64 b) +{ +    __m64 rb0

Re: [Pixman] [PATCH] mmx: add src_8888_0565

2012-04-20 Thread Matt Turner
On Fri, Apr 20, 2012 at 3:43 PM, Matt Turner matts...@gmail.com wrote: On Thu, Apr 19, 2012 at 5:40 PM, Matt Turner matts...@gmail.com wrote: Uses the pmadd technique described in http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf +static force_inline __m64

[Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-05-02 Thread Matt Turner
I started porting my src__0565 MMX function to SSE2, and in the process started thinking about using SSE3+. The useful instructions added post SSE2 that I see are SSE3: lddqu - for unaligned loads across cache lines SSSE3: palignr - for unaligned loads (but requires software

[Pixman] [PATCH] configure.ac: Fail the ARM/iwMMXt test if not compiling with -march=iwmmxt

2012-05-15 Thread Matt Turner
If not compiling with -march=iwmmxt, the configure test will still pass, thinking that the __builtin_arm_* intrinsic is a function instead of generating a single instruction. Since no linking is done, the configure test doesn't catch this, and we get linking errors in the build. --- configure.ac

Re: [Pixman] [PATCH] configure.ac: Fail the ARM/iwMMXt test if not compiling with -march=iwmmxt

2012-05-15 Thread Matt Turner
On Tue, May 15, 2012 at 4:36 PM, Matt Turner matts...@gmail.com wrote: +#error IWMMXT not enabled (with -march=iwmmxt) Missing closing ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/2] MIPS: DSPr2: Added bilinear over_8888_8_8888 fast path.

2012-05-15 Thread Matt Turner
On Tue, May 15, 2012 at 5:37 PM, Siarhei Siamashka siarhei.siamas...@gmail.com wrote: I still need to add improvement for that packing/unpacking of the RGBA pixels after bilinear/before OVER operation, but I don't expect big improvement there (it is just a couple of instructions). It's not

Re: [Pixman] [RFC] mmx: add and use expand_4xpacked565

2012-05-18 Thread Matt Turner
On Thu, May 17, 2012 at 5:40 PM, Søren Sandmann sandm...@cs.au.dk wrote: Søren Sandmann sandm...@cs.au.dk writes: Given a pixel with only the red component of these values, the results are off-by-one. 0x03 - 0x19 (0x18) 0x07 - 0x3A (0x39) 0x18 - 0xC5 (0xC6) 0x1C - 0xE6 (0xE7) (Same for

[Pixman] [PATCH 1/5] configure.ac: Fail the ARM/iwMMXt test if not compiling with -march=iwmmxt

2012-05-18 Thread Matt Turner
If not compiling with -march=iwmmxt, the configure test will still pass, thinking that the __builtin_arm_* intrinsic is a function instead of generating a single instruction. Since no linking is done, the configure test doesn't catch this, and we get linking errors in the build. --- configure.ac

[Pixman] [PATCH 2/5] mmx: add and use expand_4xpacked565 function

2012-05-18 Thread Matt Turner
Loongson: add_0565_0565 = L1: 14.39 L2: 13.98 M: 11.28 ( 15.22%) HT: 10.11 VT: 9.74 R: 9.39 RT: 6.05 ( 67Kops/s) add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 11.12 L2: 10.40

[Pixman] [PATCH 4/5] fast: add add_0565_0565 function

2012-05-18 Thread Matt Turner
I'll need this code for header and tail alignment loops in MMX, so I might as well implement a fast path here. --- pixman/pixman-fast-path.c | 44 1 files changed, 44 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-fast-path.c

[Pixman] [PATCH 5/5] mmx: add add_0565_0565

2012-05-18 Thread Matt Turner
Loongson: add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s) add_0565_0565 = L1: 45.06 L2: 46.71 M: 27.45 ( 38.00%) HT: 23.76 VT: 22.84 R: 18.96 RT: 9.79 ( 104Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 12.87 L2: 11.58

Re: [Pixman] [PATCH 1/5] configure.ac: Fail the ARM/iwMMXt test if not compiling with -march=iwmmxt

2012-05-18 Thread Matt Turner
On Fri, May 18, 2012 at 2:41 PM, Matt Turner matts...@gmail.com wrote: If not compiling with -march=iwmmxt, the configure test will still pass, thinking that the __builtin_arm_* intrinsic is a function instead of generating a single instruction. Since no linking is done, the configure test

[Pixman] [PATCH 6/5] mmx: add over_reverse_n_8888

2012-05-18 Thread Matt Turner
Loongson: over_reverse_n_ = L1: 16.04 L2: 15.35 M: 10.20 ( 27.96%) HT: 10.95 VT: 10.45 R: 9.18 RT: 6.99 ( 76Kops/s) over_reverse_n_ = L1: 27.40 L2: 26.67 M: 16.97 ( 45.78%) HT: 16.66 VT: 15.38 R: 14.15 RT: 9.44 ( 97Kops/s) image poppler

Re: [Pixman] [PATCH] Fix MSVC compilation (only up to three SSE intrinsics supported in function declaration)

2012-05-19 Thread Matt Turner
On Sat, May 19, 2012 at 9:45 AM, Ingmar Runge ing...@irsoft.de wrote: From: Ingmar Runge ing...@irsoft.de ---  pixman/pixman-mmx.c |    9 +++--  1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 01a2bc9..eb02d1a 100644 ---

Re: [Pixman] [PATCH 1/5] configure.ac: Fail the ARM/iwMMXt test if not compiling with -march=iwmmxt

2012-05-19 Thread Matt Turner
On Sat, May 19, 2012 at 12:34 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: On Fri, May 18, 2012 at 2:41 PM, Matt Turner matts...@gmail.com wrote: If not compiling with -march=iwmmxt, the configure test will still pass, thinking that the __builtin_arm_

[Pixman] [PATCH] mmx: add missing _mm_empty call to mmx_composite_src_x888_0565

2012-05-27 Thread Matt Turner
Fixes spurious test failures. --- pixman/pixman-mmx.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index bb125bf..ab70275 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -2232,6 +2232,8 @@

  1   2   3   >