Re: [Pixman] [RFC] Performance reporting capabilities for pixman?

2010-10-20 Thread Siarhei Siamashka
On Wednesday 20 October 2010 13:40:26 Maarten Bosmans wrote: 2010/10/20 Siarhei Siamashka siarhei.siamas...@gmail.com: Here is a work-in-progress branch with the initial variant slow path reporting code: http://cgit.freedesktop.org/~siamashka/pixman/log/?h=perfstat-wip I tried to compile

Re: [Pixman] [RFC] Performance reporting capabilities for pixman?

2010-10-19 Thread Siarhei Siamashka
On Tuesday 28 September 2010 07:01:55 Soeren Sandmann wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: Let's face it, I think the end users rarely report performance problems in pixman unless the performance is really notoriously bad (and not ignored because it's software

Re: [Pixman] [PATCH] test: Change composite so that it tests randomly generated images

2010-10-08 Thread Siarhei Siamashka
On Tuesday 05 October 2010 20:47:46 Soeren Sandmann wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: On Sunday 07 March 2010, Søren Sandmann wrote: Previously this test would try to exhaustively test all combinations of formats and operators, which meant that it would take

[Pixman] [PATCH] ARM: restore fallback to ARMv6 implementation from NEON in the delegate chain

2010-10-04 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com After fast path cache introduction, the overhead of having this fallback is insignificant. On the other hand, some of the ARM assembly optimizations (for example nearest neighbor scaling) do not need NEON. --- pixman/pixman-arm-neon.c |8

[Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-21 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com Without this fix, setting PAD repeat on a source image prevents the use of any nonscaled standard fast paths, affecting performance a lot. But as long as no pixels outside the source image boundaries are touched by the compositing operation, all

[Pixman] [PATCH 3/6] Introduce a fake PIXMAN_REPEAT_COVER constant

2010-09-17 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com We need to implement a true PIXMAN_REPEAT_NONE support later (padding the source with zero pixels). So it's better not to use PIXMAN_REPEAT_NONE for handling FAST_PATH_SAMPLES_COVER_CLIP special case. --- pixman/pixman-fast-path.c | 10

[Pixman] [PATCH 5/6] NONE repeat support for fast scaling with nearest filter

2010-09-17 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com Implemented very similar to PAD repeat. And gcc also seems to be able to completely eliminate the code responsible for left and right padding pixels for OVER operation with NONE repeat. --- pixman/pixman-fast-path.c |5 + pixman/pixman

[Pixman] [PATCH 6/6] SSE2 optimization for scaled over_8888_8888 operation with nearest filter

2010-09-17 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com This is the first demo implementation, it should be possible to generalize it later to cover more operations with less lines of code. It should be also possible to introduce the use of '__builtin_constant_p' gcc builtin function

Re: [Pixman] [PATCH 0/6] Improvements for the nearest scaling, second try

2010-09-17 Thread Siarhei Siamashka
On Friday 17 September 2010 22:45:56 Siarhei Siamashka wrote: This is the second revision of the nearest neigbour scaling patchset posted earlier: http://lists.freedesktop.org/archives/pixman/2010-September/000477.html The changes since the last submission now include: - the move

Re: [Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

2010-09-15 Thread Siarhei Siamashka
or don't want to do this? At least this may be also handy for debugging or developing new code when some complex fast path function is only partially implemented initially. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part

Re: [Pixman] [cairo] [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions

2010-09-15 Thread Siarhei Siamashka
. There is another add_8000_8000 operation which typically accompanies over_n_8_ when dealing with fonts, it makes sense to optimize it too. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing

Re: [Pixman] [cairo] [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions

2010-09-15 Thread Siarhei Siamashka
even not having the big endian hardware. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Add a lowlevel blitter benchmark

2010-09-15 Thread Siarhei Siamashka
to TESTPROGRAMS. TESTPROGRAMS are run on 'make check' invocation and are testing correctness. Adding a benchmarking program there is really not a good idea. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part

Re: [Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

2010-09-14 Thread Siarhei Siamashka
On Tuesday 14 September 2010 08:53:37 Soeren Sandmann wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: +/* A variant of 'over', which works faster for non-additive blending on the + * platforms which do not have special instructions for saturated addition + */ +static

Re: [Pixman] [PATCH] Add a lowlevel blitter test

2010-09-14 Thread Siarhei Siamashka
. That's much appreciated. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Add a lowlevel blitter test

2010-09-14 Thread Siarhei Siamashka
to be done for gettimeofday(). -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [cairo] [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions

2010-09-13 Thread Siarhei Siamashka
optimizations as a requirement for approving patches (the all or nothing approach). But I guess, optimistically, getting more performance is in your best interests anyway. I just hope that my reply was somewhat useful. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally

[Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

2010-09-11 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com The main loop is split into handling 3 cases: - opaque source - translucent source without additive blending - translucent source with additive blending When using a normal premultiplied alpha format (by converting to it from non-premultiplied

Re: [Pixman] [PATCH 2/3] SSE2 optimizations for scaled over_8888_8888 with nearest filter

2010-09-09 Thread Siarhei Siamashka
On Wednesday 08 September 2010 10:45:07 Siarhei Siamashka wrote: +/* A variant of 'core_combine_over_u_sse2' with minor tweaks */ +static force_inline void +scaled_nearest_scanline_sse2___none_OVER (uint32_t* pd, + const uint32_t

[Pixman] [PATCH 0/3] [ARM] NEON optimized over_0565_8_0565 fast path

2010-09-08 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com Some minor refactoring of ARM NEON optimizations (addition of more common macros) and the introduction of over_0565_8_0565 fast path. The patches are also available at: http://cgit.freedesktop.org/~siamashka/pixman/log/?h

[Pixman] [PATCH 1/3] ARM: common init/cleanup macro for saving/restoring NEON registers

2010-09-08 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com This is a typical prologue/epilogie for many NEON fast path functions, so it makes sense to provide common reusable macros for it in the header file. --- pixman/pixman-arm-neon-asm.S | 52 - pixman

[Pixman] [PATCH 0/3] Improvements for the nearest scaling (let's use SIMD)

2010-09-08 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com The following patches reorganize nearest neighbor scaling fast path macros to allow SIMD optimizations. SSE2 optimization for scaled over__ operation with nearest filter is provided as an example (this operation becomes more than 2x

[Pixman] [PATCH 1/3] Nearest scaling fast path macr o moved to header file and split into parts

2010-09-08 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com Scanline processing is splitted into a separate function. This provides an easy way of overriding it with a platform specific implementation, which may use SIMD optimizations. Only basic C data types are used as the arguments for this function

[Pixman] [PATCH 2/3] SSE2 optimizations for scaled over_8888_8888 with nearest filter

2010-09-08 Thread Siarhei Siamashka
Benchmarked on Intel Core i7 860: == before (nearest OVER) == op=3, src_fmt=2002, dst_fmt=2002, speed=142.01 MPix/s == after (nearest OVER) == op=3, src_fmt=2002, dst_fmt=2002, speed=314.99 MPix/s == performance of nonscaled operation as a reference == op=3, src_fmt=2002,

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-07 Thread Siarhei Siamashka
assembly code yourself or was the output of some C compiler (at least partially) used for it? -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-07 Thread Siarhei Siamashka
On Friday 03 September 2010 01:39:54 Soeren Sandmann wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: Apparently software prefetch also disables or interferes with the hardware prefetcher on Intel Atom, hurting performance a lot. More advanced processors can cope

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-03 Thread Siarhei Siamashka
and for the trailing pixels. This can and IMHO should be avoided. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-02 Thread Siarhei Siamashka
On Monday 30 August 2010 23:31:35 Siarhei Siamashka wrote: And Intel Atom does not like software prefetch very much. This reminds me an older report: http://lists.freedesktop.org/archives/pixman/2010-June/000218.html I can try to run a full set of cairo-perf-trace benchmarks to get more

Re: [Pixman] [PATCH] Store a2b2g2r2 pixel through the WRITE macro

2010-08-30 Thread Siarhei Siamashka
of accessors? -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/2] ARM: added 'neon_composite_src_8888_8_0565' fast path

2010-08-30 Thread Siarhei Siamashka
On Thursday 26 August 2010 15:09:32 Soeren Sandmann wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: +.macro pixman_composite_src_n_8_0565_process_pixblock_head +/* in */ +vmull.u8q15, d24, d2 +vmull.u8q3, d24, d1 +vmull.u8q2, d24, d0

Re: [Pixman] Valgrind-clean pixman

2010-08-29 Thread Siarhei Siamashka
On Saturday 28 August 2010 12:21:52 Andrea Canciani wrote: On Fri, Aug 27, 2010 at 3:58 PM, Siarhei Siamashka Having something like pixman_init()/pixman_cleanup() functions could solve all the problems, but it's an API change. Other solutions (using atexit(), using mutexes or atomic

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-29 Thread Siarhei Siamashka
On Sunday 29 August 2010 12:25:47 Xu, Samuel wrote: Hi, Siarhei Siamashka: Q:--- What problems do you have without merge mechanism? A: Of course there isn't correctness issue w/o merge. Currently, sse2_fast_paths/mmx_fast_paths/c_fast_paths...are excluded each other, although some checking

Re: [Pixman] Valgrind-clean pixman

2010-08-28 Thread Siarhei Siamashka
On Saturday 28 August 2010 12:21:52 Andrea Canciani wrote: On Fri, Aug 27, 2010 at 3:58 PM, Siarhei Siamashka wrote: This code which is setting a global implementation pointer is also not quite thread safe (though very unlikely to cause any practical problems other than a bit bigger one

Re: [Pixman] Compiling for iOS

2010-08-28 Thread Siarhei Siamashka
about iPhone, but pixman crosscompiles just fine at least in linux using ./configure --host=your-crosstoolchain-triplet-here If you want some help, you would need to provide a bit more information, like the logs from your failed builds. -- Best regards, Siarhei Siamashka signature.asc

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-27 Thread Siarhei Siamashka
On Friday 27 August 2010 05:59:00 Xu, Samuel wrote: Hi Siarhei Siamashka, Here is a new patch, can you review it? Thank you! It address following suggestions: 1: SSSE3 file is split to a new file. Thanks. Comparing with to duplicate every content from SSE2 file, I added a way to merge

Re: [Pixman] Valgrind-clean pixman

2010-08-27 Thread Siarhei Siamashka
runtime overhead. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-27 Thread Siarhei Siamashka
On Friday 27 August 2010 15:00:49 Xu, Samuel wrote: Hi, Siarhei Siamashka: Thanks for quick response! For 64 bit detect_cpu_features(), if ignore HAVE_GETISAX and _MSC_VER, it is ok for us to simplify it as your example in next update. If you can ensure MSVC compatibility

Re: [Pixman] [PATCH/RFC 1/2] New FAST_PATH_SIMPLE_ROTATE_TRANSFORM flag

2010-08-26 Thread Siarhei Siamashka
On Monday 02 August 2010 17:08:58 Soeren Sandmann wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: +static pixman_bool_t +has_suitable_filter_for_simple_rotate (pixman_image_t *image) +{ +if (image-common.filter == PIXMAN_FILTER_NEAREST) + return TRUE

[Pixman] [PATCH 1/2] ARM: added 'neon_composite_over_8888_8_0565' fast path

2010-08-24 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com --- pixman/pixman-arm-neon-asm.S | 36 pixman/pixman-arm-neon.c |4 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S

Re: [Pixman] [cairo] Floating point API in Pixman

2010-08-24 Thread Siarhei Siamashka
). But it depends on the pixel data. Solid filled images are going to be faster than the ones filled with random data. Also table lookups make SIMD optimizations quite challenging. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part

Re: [Pixman] [cairo] Floating point API in Pixman

2010-08-24 Thread Siarhei Siamashka
negative. If there are lots of pixman users on less capable ARM hardware and they are just silent for whatever reason (maybe they are just perfectly satisfied?), then a bit more feedback would be welcome in order not to forget that they exist :) -- Best regards, Siarhei Siamashka signature.asc

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-21 Thread Siarhei Siamashka
percentage reduced from 68.0% to 62.6% Maybe it is not dramatically, while we are glad to see those gain on both perf and power. A peformance gain in the 4-5% ballpark looks like a major improvement to me. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-21 Thread Siarhei Siamashka
On Friday 20 August 2010 18:39:47 Liu, Xinyun wrote: Hi Siarhei Siamashka, Here is a new patch, can you review it? Thank you! Sure, thanks for the updated patch. Some comments follow. From 9783651899a2763d7fcca2960fc354bd1f541980 Mon Sep 17 00:00:00 2001 From: root r...@d501.localdomain

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-19 Thread Siarhei Siamashka
that 'make check' detects it. It may be also convenient to configure pixman with '--disable-shared' option to make debugging easier. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-16 Thread Siarhei Siamashka
did not say that it can't be solved :) It's just better to address this particular problem in the next revision of ssse3 patch. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-13 Thread Siarhei Siamashka
): [...] The rest of code contains a lot of repeatable patterns (the main loop is replicated 16 times for different alignments). IMHO they could be simplified a lot by using macros, making the code size less scary ;-) -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed

[Pixman] [PATCH/RFC 0/2] Faster 90/180/270 degrees rotation

2010-07-30 Thread Siarhei Siamashka
for this and will submit them a bit later. Comments are very much welcome. Siarhei Siamashka (2): New FAST_PATH_SIMPLE_ROTATE_TRANSFORM flag C fast path for a simple 90/180/270 degrees rotation. pixman/pixman-fast-path.c | 295 + pixman/pixman-image.c

[Pixman] [PATCH] Code simplification (no need advancing 'vx' at the end of scanline)

2010-07-27 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com --- pixman/pixman-fast-path.c |7 --- 1 files changed, 0 insertions(+), 7 deletions(-) diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c index 6ed1580..014bab6 100644 --- a/pixman/pixman-fast-path.c +++ b/pixman/pixman

[Pixman] [PATCH] ARM: 'neon_combine_out_reverse_u' combiner

2010-07-27 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com This operation was seen in mozilla browser profiling logs. Implemented so that 'over' and 'out_reverse' operations now reuse common parts of code. --- pixman/pixman-arm-neon-asm.S | 101 -- pixman/pixman

Re: [Pixman] FAST_PATH_SAMPLES_COVER_CLIP flag fast_composite_scaled_nearest_*

2010-07-26 Thread Siarhei Siamashka
; } } There is also one more call to pixman_transform_bounds() a few lines below which is used to check whether to set FAST_PATH_16BIT_SAFE flag. Shouldn't it be also converted to use compute_sample_extent()? -- Best regards, Siarhei Siamashka ___ Pixman

[Pixman] [PATCHv2 1/3] 'pixman_transform_bounds' fixed to match the rest of transform code

2010-07-20 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com This fixes the discrepancy between how FAST_PATH_SAMPLES_COVER_CLIP flag is calculated and the code in fast_composite_scaled_nearest_* functions. Function 'pixman_transform_bounds' which is used for deciding whether to set

[Pixman] [PATCH 1/3] 'pixman_transform_bounds' fixed to match the rest of transform code

2010-07-19 Thread Siarhei Siamashka
From: Siarhei Siamashka siarhei.siamas...@nokia.com This fixes the discrepancy between how FAST_PATH_SAMPLES_COVER_CLIP flag is calculated and the code in fast_composite_scaled_nearest_* functions. Function 'pixman_transform_bounds' which is used for deciding whether to set

Re: [Pixman] [cairo] is self-copy supposed to work?

2010-05-05 Thread Siarhei Siamashka
On Wednesday 05 May 2010, Soeren Sandmann wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: But the point is that fast scrolling (quite a common operation, not something totally fancy) requires self-copy support. So IMHO it makes sense to have this functionality somehow

[Pixman] [RFC] Extend cairo-perf traces to cover new pixman-0.18.0 optimizations

2010-04-01 Thread Siarhei Siamashka
0.5127 libpixman-1.so.0.18.0sse2_composite_over_n_8_ -- Best regards, Siarhei Siamashka ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] SIMD: Try without any CFLAGS before forcing -mcpu=

2010-03-19 Thread Siarhei Siamashka
On Friday 19 March 2010, Siarhei Siamashka wrote: [...] So when generating binaries for ARMv5, the linker is permitted to do 'bl' - 'blx' conversion. That's what we actually see here, except that we actually want this code to also run on ARMv4. In order to make the code ARMv4 compatible

Re: [Pixman] fast-scale branch performance improvements

2010-03-16 Thread Siarhei Siamashka
On Monday 15 March 2010, Alexander Larsson wrote: On Mon, 2010-03-15 at 03:25 +0200, Siarhei Siamashka wrote: But before it gets committed, some problems with potential fixed point overflows when dealing with large images need to be addressed. I have made a test program which can expose

Re: [Pixman] [PATCH] SIMD: Try without any CFLAGS before forcing -mcpu=

2010-03-16 Thread Siarhei Siamashka
On Sunday 14 March 2010, Loïc Minier wrote: On Wed, Mar 10, 2010, Siarhei Siamashka wrote: I would prefer a bit more descriptive comment (with the details copied from that launchpad page). I see you pushed this now; thanks! Yeah, it's not obvious why one needs to try with the toolchain

Re: [Pixman] [PATCH] SIMD: Try without any CFLAGS before forcing -mcpu=

2010-03-10 Thread Siarhei Siamashka
this patch or try to do something better. -- Best regards, Siarhei Siamashka ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] 0.18.0 schedule

2010-03-06 Thread Siarhei Siamashka
, Siarhei Siamashka #include stdio.h #include stdlib.h #include fenv.h #include pixman.h #include gtk-utils.h int main (int argc, char **argv) { #define WIDTH 400 #define HEIGHT 200 uint32_t *dest = malloc (WIDTH * HEIGHT * 4); pixman_image_t *src_img; pixman_image_t *dest_img

<    1   2   3   4   5