On Wednesday 20 October 2010 13:40:26 Maarten Bosmans wrote:
2010/10/20 Siarhei Siamashka siarhei.siamas...@gmail.com:
Here is a work-in-progress branch with the initial variant slow path
reporting code:
http://cgit.freedesktop.org/~siamashka/pixman/log/?h=perfstat-wip
I tried to compile
On Tuesday 28 September 2010 07:01:55 Soeren Sandmann wrote:
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
Let's face it, I think the end users rarely report performance problems
in pixman unless the performance is really notoriously bad (and not
ignored because it's software
On Tuesday 05 October 2010 20:47:46 Soeren Sandmann wrote:
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
On Sunday 07 March 2010, Søren Sandmann wrote:
Previously this test would try to exhaustively test all combinations
of formats and operators, which meant that it would take
From: Siarhei Siamashka siarhei.siamas...@nokia.com
After fast path cache introduction, the overhead of having this fallback is
insignificant. On the other hand, some of the ARM assembly optimizations (for
example nearest neighbor scaling) do not need NEON.
---
pixman/pixman-arm-neon.c |8
From: Siarhei Siamashka siarhei.siamas...@nokia.com
Without this fix, setting PAD repeat on a source image prevents
the use of any nonscaled standard fast paths, affecting performance
a lot. But as long as no pixels outside the source image boundaries
are touched by the compositing operation, all
From: Siarhei Siamashka siarhei.siamas...@nokia.com
We need to implement a true PIXMAN_REPEAT_NONE support later (padding
the source with zero pixels). So it's better not to use PIXMAN_REPEAT_NONE
for handling FAST_PATH_SAMPLES_COVER_CLIP special case.
---
pixman/pixman-fast-path.c | 10
From: Siarhei Siamashka siarhei.siamas...@nokia.com
Implemented very similar to PAD repeat.
And gcc also seems to be able to completely eliminate the
code responsible for left and right padding pixels for OVER
operation with NONE repeat.
---
pixman/pixman-fast-path.c |5 +
pixman/pixman
From: Siarhei Siamashka siarhei.siamas...@nokia.com
This is the first demo implementation, it should be possible to
generalize it later to cover more operations with less lines of code.
It should be also possible to introduce the use of '__builtin_constant_p'
gcc builtin function
On Friday 17 September 2010 22:45:56 Siarhei Siamashka wrote:
This is the second revision of the nearest neigbour scaling patchset
posted earlier:
http://lists.freedesktop.org/archives/pixman/2010-September/000477.html
The changes since the last submission now include:
- the move
or don't want to do this? At least this may be
also handy for debugging or developing new code when some complex fast path
function is only partially implemented initially.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part
. There is another add_8000_8000 operation which typically accompanies
over_n_8_ when dealing with fonts, it makes sense to optimize it too.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing
even not having the big endian
hardware.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
to TESTPROGRAMS. TESTPROGRAMS are run on
'make check' invocation and are testing correctness. Adding a benchmarking
program there is really not a good idea.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part
On Tuesday 14 September 2010 08:53:37 Soeren Sandmann wrote:
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
+/* A variant of 'over', which works faster for non-additive blending on
the + * platforms which do not have special instructions for saturated
addition + */
+static
. That's much appreciated.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
to be done
for gettimeofday().
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
optimizations as a requirement for approving patches (the all or nothing
approach). But I guess, optimistically, getting more performance is in your
best interests anyway. I just hope that my reply was somewhat useful.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally
From: Siarhei Siamashka siarhei.siamas...@nokia.com
The main loop is split into handling 3 cases:
- opaque source
- translucent source without additive blending
- translucent source with additive blending
When using a normal premultiplied alpha format (by converting to it
from non-premultiplied
On Wednesday 08 September 2010 10:45:07 Siarhei Siamashka wrote:
+/* A variant of 'core_combine_over_u_sse2' with minor tweaks */
+static force_inline void
+scaled_nearest_scanline_sse2___none_OVER (uint32_t* pd,
+ const uint32_t
From: Siarhei Siamashka siarhei.siamas...@nokia.com
Some minor refactoring of ARM NEON optimizations (addition of more
common macros) and the introduction of over_0565_8_0565 fast path.
The patches are also available at:
http://cgit.freedesktop.org/~siamashka/pixman/log/?h
From: Siarhei Siamashka siarhei.siamas...@nokia.com
This is a typical prologue/epilogie for many NEON fast path functions, so
it makes sense to provide common reusable macros for it in the header file.
---
pixman/pixman-arm-neon-asm.S | 52 -
pixman
From: Siarhei Siamashka siarhei.siamas...@nokia.com
The following patches reorganize nearest neighbor scaling fast
path macros to allow SIMD optimizations.
SSE2 optimization for scaled over__ operation with
nearest filter is provided as an example (this operation
becomes more than 2x
From: Siarhei Siamashka siarhei.siamas...@nokia.com
Scanline processing is splitted into a separate function. This provides
an easy way of overriding it with a platform specific implementation,
which may use SIMD optimizations. Only basic C data types are used as
the arguments for this function
Benchmarked on Intel Core i7 860:
== before (nearest OVER) ==
op=3, src_fmt=2002, dst_fmt=2002, speed=142.01 MPix/s
== after (nearest OVER) ==
op=3, src_fmt=2002, dst_fmt=2002, speed=314.99 MPix/s
== performance of nonscaled operation as a reference ==
op=3, src_fmt=2002,
assembly
code yourself or was the output of some C compiler (at least partially) used
for it?
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http
On Friday 03 September 2010 01:39:54 Soeren Sandmann wrote:
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
Apparently software prefetch also disables or interferes with the hardware
prefetcher on Intel Atom, hurting performance a lot. More advanced
processors can cope
and for the trailing
pixels. This can and IMHO should be avoided.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo
On Monday 30 August 2010 23:31:35 Siarhei Siamashka wrote:
And Intel Atom does not like software prefetch very much. This reminds me an
older report:
http://lists.freedesktop.org/archives/pixman/2010-June/000218.html
I can try to run a full set of cairo-perf-trace benchmarks to get more
of accessors?
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
On Thursday 26 August 2010 15:09:32 Soeren Sandmann wrote:
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
+.macro pixman_composite_src_n_8_0565_process_pixblock_head
+/* in */
+vmull.u8q15, d24, d2
+vmull.u8q3, d24, d1
+vmull.u8q2, d24, d0
On Saturday 28 August 2010 12:21:52 Andrea Canciani wrote:
On Fri, Aug 27, 2010 at 3:58 PM, Siarhei Siamashka
Having something like pixman_init()/pixman_cleanup() functions could
solve all the problems, but it's an API change. Other solutions (using
atexit(), using mutexes or atomic
On Sunday 29 August 2010 12:25:47 Xu, Samuel wrote:
Hi, Siarhei Siamashka:
Q:--- What problems do you have without merge mechanism?
A: Of course there isn't correctness issue w/o merge.
Currently, sse2_fast_paths/mmx_fast_paths/c_fast_paths...are excluded each
other, although some checking
On Saturday 28 August 2010 12:21:52 Andrea Canciani wrote:
On Fri, Aug 27, 2010 at 3:58 PM, Siarhei Siamashka wrote:
This code which is setting a global implementation pointer is also not
quite thread safe (though very unlikely to cause any practical problems
other than a bit bigger one
about iPhone, but pixman crosscompiles just fine at
least in linux using ./configure --host=your-crosstoolchain-triplet-here
If you want some help, you would need to provide a bit more information, like
the logs from your failed builds.
--
Best regards,
Siarhei Siamashka
signature.asc
On Friday 27 August 2010 05:59:00 Xu, Samuel wrote:
Hi Siarhei Siamashka,
Here is a new patch, can you review it? Thank you!
It address following suggestions:
1: SSSE3 file is split to a new file.
Thanks.
Comparing with to duplicate every
content from SSE2 file, I added a way to merge
runtime overhead.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
On Friday 27 August 2010 15:00:49 Xu, Samuel wrote:
Hi, Siarhei Siamashka:
Thanks for quick response!
For 64 bit detect_cpu_features(), if ignore HAVE_GETISAX and _MSC_VER,
it is ok for us to simplify it as your example in next update.
If you can ensure MSVC compatibility
On Monday 02 August 2010 17:08:58 Soeren Sandmann wrote:
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
+static pixman_bool_t
+has_suitable_filter_for_simple_rotate (pixman_image_t *image)
+{
+if (image-common.filter == PIXMAN_FILTER_NEAREST)
+ return TRUE
From: Siarhei Siamashka siarhei.siamas...@nokia.com
---
pixman/pixman-arm-neon-asm.S | 36
pixman/pixman-arm-neon.c |4
2 files changed, 40 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S
). But it depends on the
pixel data. Solid filled images are going to be faster than the ones filled
with random data. Also table lookups make SIMD optimizations quite challenging.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part
negative. If there are lots of pixman users on less
capable ARM hardware and they are just silent for whatever reason (maybe they
are just perfectly satisfied?), then a bit more feedback would be welcome in
order not to forget that they exist :)
--
Best regards,
Siarhei Siamashka
signature.asc
percentage reduced from 68.0% to 62.6%
Maybe it is not dramatically, while we are glad to see those gain on
both perf and power.
A peformance gain in the 4-5% ballpark looks like a major improvement to me.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed
On Friday 20 August 2010 18:39:47 Liu, Xinyun wrote:
Hi Siarhei Siamashka,
Here is a new patch, can you review it? Thank you!
Sure, thanks for the updated patch. Some comments follow.
From 9783651899a2763d7fcca2960fc354bd1f541980 Mon Sep 17 00:00:00 2001
From: root r...@d501.localdomain
that 'make check' detects it.
It may be also convenient to configure pixman with '--disable-shared' option to
make debugging easier.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman
did not say that it can't be solved :) It's just better to address
this particular problem in the next revision of ssse3 patch.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing
):
[...]
The rest of code contains a lot of repeatable patterns (the main loop is
replicated 16 times for different alignments). IMHO they could be
simplified a lot by using macros, making the code size less scary ;-)
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed
for this
and will submit them a bit later.
Comments are very much welcome.
Siarhei Siamashka (2):
New FAST_PATH_SIMPLE_ROTATE_TRANSFORM flag
C fast path for a simple 90/180/270 degrees rotation.
pixman/pixman-fast-path.c | 295 +
pixman/pixman-image.c
From: Siarhei Siamashka siarhei.siamas...@nokia.com
---
pixman/pixman-fast-path.c |7 ---
1 files changed, 0 insertions(+), 7 deletions(-)
diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c
index 6ed1580..014bab6 100644
--- a/pixman/pixman-fast-path.c
+++ b/pixman/pixman
From: Siarhei Siamashka siarhei.siamas...@nokia.com
This operation was seen in mozilla browser profiling logs.
Implemented so that 'over' and 'out_reverse' operations
now reuse common parts of code.
---
pixman/pixman-arm-neon-asm.S | 101 --
pixman/pixman
;
}
}
There is also one more call to pixman_transform_bounds() a few lines below
which is used to check whether to set FAST_PATH_16BIT_SAFE flag. Shouldn't it
be also converted to use compute_sample_extent()?
--
Best regards,
Siarhei Siamashka
___
Pixman
From: Siarhei Siamashka siarhei.siamas...@nokia.com
This fixes the discrepancy between how FAST_PATH_SAMPLES_COVER_CLIP
flag is calculated and the code in fast_composite_scaled_nearest_*
functions.
Function 'pixman_transform_bounds' which is used for deciding whether
to set
From: Siarhei Siamashka siarhei.siamas...@nokia.com
This fixes the discrepancy between how FAST_PATH_SAMPLES_COVER_CLIP
flag is calculated and the code in fast_composite_scaled_nearest_*
functions.
Function 'pixman_transform_bounds' which is used for deciding whether
to set
On Wednesday 05 May 2010, Soeren Sandmann wrote:
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
But the point is that fast scrolling (quite a common operation, not
something totally fancy) requires self-copy support. So IMHO it makes
sense to have this functionality somehow
0.5127 libpixman-1.so.0.18.0sse2_composite_over_n_8_
--
Best regards,
Siarhei Siamashka
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
On Friday 19 March 2010, Siarhei Siamashka wrote:
[...]
So when generating binaries for ARMv5, the linker is permitted to
do 'bl' - 'blx' conversion. That's what we actually see here, except that
we actually want this code to also run on ARMv4. In order to make the code
ARMv4 compatible
On Monday 15 March 2010, Alexander Larsson wrote:
On Mon, 2010-03-15 at 03:25 +0200, Siarhei Siamashka wrote:
But before it gets committed, some problems with potential fixed point
overflows when dealing with large images need to be addressed. I have
made a test program which can expose
On Sunday 14 March 2010, Loïc Minier wrote:
On Wed, Mar 10, 2010, Siarhei Siamashka wrote:
I would prefer a bit more descriptive comment (with the details copied
from that launchpad page).
I see you pushed this now; thanks! Yeah, it's not obvious why one
needs to try with the toolchain
this patch or try to do something better.
--
Best regards,
Siarhei Siamashka
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
,
Siarhei Siamashka
#include stdio.h
#include stdlib.h
#include fenv.h
#include pixman.h
#include gtk-utils.h
int
main (int argc, char **argv)
{
#define WIDTH 400
#define HEIGHT 200
uint32_t *dest = malloc (WIDTH * HEIGHT * 4);
pixman_image_t *src_img;
pixman_image_t *dest_img
401 - 459 of 459 matches
Mail list logo