[ANNOUNCE] pixman release 0.43.4 now available
A new pixman release 0.43.4 is now available. tar.gz: https://cairographics.org/releases/pixman-0.43.4.tar.gz https://www.x.org/releases/individual/lib/pixman-0.43.4.tar.gz tar.xz: https://www.x.org/releases/individual/lib/pixman-0.43.4.tar.xz Hashes: SHA256: a0624db90180c7ddb79fc7a9151093dc37c646d8c38d3f232f767cf64b85a226 pixman-0.43.4.tar.gz SHA256: 48d8539f35488d694a2fef3ce17394d1153ed4e71c05d1e621904d574be5df19 pixman-0.43.4.tar.xz SHA512: 08802916648bab51fd804fc3fd823ac2c6e3d622578a534052b657491c38165696d5929d03639c52c4f29d8850d676a909f0299d1a4c76a07df18a34a896e43d pixman-0.43.4.tar.gz SHA512: b40fb05bd58dc78f4e4e9b19c86991ab0611b708657c9a7fb42bfe82d57820a0fde01a34b00a0848a41da6c3fb90c2213942a70f435a0e9467631695d3bc7e36 pixman-0.43.4.tar.xz PGP signature: https://cairographics.org/releases/pixman-0.43.4.tar.gz.sha512.asc Git: https://gitlab.freedesktop.org/pixman/pixman.git tag: pixman-0.43.4 Log: Gayathri Berli (1): Revert the changes to fix the problem in big-endian architectures Heiko Lewin (1): Allow to build pixman on clang/arm32 Makoto Kato (1): pixman-arm: Fix build on clang/arm32 Matt Turner (5): pixman-x86: Use cpuid.h header pixman-x86: Move #include "cpuid.h" inside conditionals Revert "Allow to build pixman on clang/arm32" pixman-arm: Use unified syntax Pre-release version bump to 0.43.4 Simon Ser (1): Post-release version bump to 0.43.3 signature.asc Description: PGP signature
Re: [Pixman] [ANNOUNCE] pixman release 0.42.2 now available
On Wed, Nov 2, 2022 at 1:37 PM Matt Turner wrote: > > A new pixman release 0.42.2 is now available. This is a stable release > in the 0.42 series. > > This version contains a fix for a heap overflow. A CVE has been > requested, and I'll reply to this email with the number when it is > allocated. This has been assigned CVE-2022-44638.
[Pixman] [ANNOUNCE] pixman release 0.42.2 now available
A new pixman release 0.42.2 is now available. This is a stable release in the 0.42 series. This version contains a fix for a heap overflow. A CVE has been requested, and I'll reply to this email with the number when it is allocated. See https://gitlab.freedesktop.org/pixman/pixman/-/commit/a1f88e842e0216a5b4df1ab023caebe33c101395 and https://gitlab.freedesktop.org/pixman/pixman/-/issues/63 for more information. Thanks to Maddie Stone and Google's Project Zero for discovering this issue, providing a proof-of-concept, and a great analysis. tar.gz: https://cairographics.org/releases/pixman-0.42.2.tar.gz https://www.x.org/releases/individual/lib/pixman-0.42.2.tar.gz tar.xz: https://www.x.org/releases/individual/lib/pixman-0.42.2.tar.xz Hashes: SHA256: ea1480efada2fd948bc75366f7c349e1c96d3297d09a3fe62626e38e234a625e pixman-0.42.2.tar.gz SHA256: 5747d2ec498ad0f1594878cc897ef5eb6c29e91c53b899f7f71b506785fc1376 pixman-0.42.2.tar.xz SHA512: 0a4e327aef89c25f8cb474fbd01de834fd2a1b13fdf7db11ab72072082e45881cd16060673b59d02054b1711ae69c6e2395f6ae9214225ee7153939efcd2fa5d pixman-0.42.2.tar.gz SHA512: 3476e2676e66756b1af61b1e532cd80c985c191fb7956eb01702b419726cce99e79163b7f287f74f66414680e7396d13c3fee525cd663f12b6ac4877070ff4e8 pixman-0.42.2.tar.xz GPG signature: https://cairographics.org/releases/pixman-0.42.2.tar.gz.sha512.asc (signed by [ultimate] Matt Turner [ultimate] Matt Turner [ultimate] Matt Turner [ultimate] Matt Turner ) Git: https://gitlab.freedesktop.org/pixman/pixman.git tag: pixman-0.42.2 Log: Matt Turner (4): build: Add a64-neon-test.S to EXTRA_DIST Revert "Fix signed-unsigned semantics in reduce_32" Avoid integer overflow leading to out-of-bounds write Pre-release version bump to 0.42.2 Simon Ser (3): Post-release version bump to 0.42.1 meson: override pixman-1 dependency meson: explicitly set C standard to gnu99 Thomas Klausner (2): configure.ac: avoid unportable test(1) operator Makefile.am: increase shell portability signature.asc Description: PGP signature
Re: [Pixman] Performance regression with pixman 0.40
Cc'ing the patch author, since I don't think he's subscribed. On Fri, Jun 4, 2021 at 12:15 AM wrote: > > Hi, > > We are developping a graphics framework called EGT dedicated to Microchip > parts: > https://github.com/linux4sam/egt > > We are using Cairo, and so Pixman, for the drawing part. Updating our > distribution, we noticed a performance decrease in our benchmark suite, in > the worst case our fps decrease from 200 to 60. > > We have identified the move from Pixman 0.38.4 to 0.40 as the cause. I did a > bisect to find which commit impacts us and it's this one: > > commit 6fe0131394fb029d2fccaee6b8edcb108840ad8a (refs/bisect/bad) > Author: Federico Mena Quintero > Date: Wed Mar 18 18:49:30 2020 -0600 > > Initialize temporary buffers in general_composite_rect() > > Otherwise, Valgrind shows things like "conditional jump or move > depends on uninitialised values" errors much later in calling code. > For example, see https://gitlab.gnome.org/GNOME/librsvg/issues/572 > > Fixes https://gitlab.freedesktop.org/pixman/pixman/issues/9 > > diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c > index 7d74f98..7e5a0d0 100644 > --- a/pixman/pixman-general.c > +++ b/pixman/pixman-general.c > @@ -165,6 +165,12 @@ general_composite_rect (pixman_implementation_t *imp, > > if (!scanline_buffer) > return; > + > + memset (scanline_buffer, 0, width * Bpp * 3 + 15 * 3); > +} > +else > +{ > + memset (stack_scanline_buffer, 0, sizeof (stack_scanline_buffer)); > } > > src_buffer = ALIGN (scanline_buffer); > > > I don't know which drawing paths are impacted by this change, I can dig > further > if needed. We have 2 benches with small performance decrease for all our > devices: armv5 and armv7. And one bench with huge performance decrease on our > armv5 device. This bench is about drawing circles with alpha blending. Other > benches which draw squares, squares with alpha blending, and circles are not > impacted. > > For sure, having an extra memset in the path can explain the performance > decrease. > > Do we have to consider that the new scores we get are the valid ones or can we > find an alternative? > > Thanks > > Regards, > Ludovic > ___ > Pixman mailing list > Pixman@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/pixman ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Prevent empty top-level declaration
On Sun, Nov 17, 2019 at 4:48 PM Michael Forney wrote: > > The expansion of PIXMAN_DEFINE_THREAD_LOCAL(...) may end in a > function definition, so the following semicolon is considered an > empty top-level declaration, which is not allowed in ISO C. > --- > pixman/pixman-compiler.h | 6 +++--- > pixman/pixman-implementation.c | 2 +- > 2 files changed, 4 insertions(+), 4 deletions(-) > Thanks! Committed, and sorry for losing track of the patch. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [ANNOUNCE] pixman release 0.40.0 now available
A new pixman release 0.40.0 is now available. This is a stable release. tar.gz: https://cairographics.org/releases/pixman-0.40.0.tar.gz https://www.x.org/releases/individual/lib/pixman-0.40.0.tar.gz tar.xz: https://www.x.org/releases/individual/lib/pixman-0.40.0.tar.xz Hashes: SHA256: 6d200dec3740d9ec4ec8d1180e25779c00bc749f94278c8b9021f5534db223fc pixman-0.40.0.tar.gz SHA256: da8ed9fe2d1c5ef8ce5d1207992db959226bd4e37e3f88acf908fd9a71e2704e pixman-0.40.0.tar.xz SHA512: 063776e132f5d59a6d3f94497da41d6fc1c7dca0d269149c78247f0e0d7f520a25208d908cf5e421d1564889a91da44267b12d61c0bd7934cd54261729a7de5f pixman-0.40.0.tar.gz SHA512: 8a60edb113d68791b41bd90b761ff7b3934260cb3dada3234c9351416f61394e4157353bc4d61b8f6c2c619de470f6feefffb4935bfcf79d291ece6285de7270 pixman-0.40.0.tar.xz GPG signature: https://cairographics.org/releases/pixman-0.40.0.tar.gz.sha512.asc (signed by [ultimate] Matt Turner [ultimate] Matt Turner [ultimate] Matt Turner [ultimate] Matt Turner ) Git: https://gitlab.freedesktop.org/pixman/pixman.git tag: pixman-0.40.0 Log: Adam Jackson (17): test: Fix undefined left shift in affine-test test: Fix undefined left shift in pixel_checker_init pixman: Fix undefined left shift in pixel_contract_from_float pixman-access: Fix various undefined left shifts pixman-combine: Fix various undefined left shifts pixman-image: Fix undefined left shift pixman-gradient-walker: Fix undefined left shift pixman-sse2: Fix an undefined left shift pixman-fast-path: Fix various undefined left shifts pixman-bits-image: Fix various undefined left shifts pixman-bits-image: Fix left shift of a negative number pixman-matrix: Fix left shift of a negative number test: Fix unrepresentable subtraction in stress-test pixman-mmx: Fix undefined left-shifts pixman-mmx: Fix undefined unaligned loads pixman-sse2: Fix undefined unaligned loads fast-path: Fix some sketchy pointer arithmetic Antonio Ospite (1): pixman-compiler.h: fix building tests with MinGW Basile Clement (6): Fix bilinear filter computation in wide pipeline Implement basic dithering for the wide pipeline, v3 test: Check the dithering path in tolerance-test demos: Add a dithering demo Ordered dithering with blue noise, v2 Don't use GNU extension for binary numbers Christoph Reiter (3): meson: define SIZEOF_LONG and use -Wundef meson: allow building a static library meson: fix TLS support under mingw Chun-wei Fan (11): meson.build: Fix MMX, SSE2 and SSSE3 checks on MSVC meson.build: Disable OpenMP on MSVC builds build: Don't assume PThreads if threading support is found meson.build: Improve libpng search on MSVC pixman/pixman-version.h.in: Add a PIXMAN_API macro pixman/pixman.h: Mark public APIs with PIXMAN_API pixman-[compiler|private].h: Export symbols for tests pixman/meson.build: Define PIXMAN_API on MSVC-style compilers test/solid-test.c: Include stdint.h demos: Define _USE_MATH_DEFINES on MSVC-style compilers thread-test.c: Use Windows Threading API on Windows Dylan Baker (1): meson: don't use link_with for library() Fan Jinke (1): add Hygon Dhyana support to enable X86_MMX_EXTENSIONS feature Federico Mena Quintero (1): Initialize temporary buffers in general_composite_rect() Ghabry (1): Enabled armv6 SIMD for 3DS (devkitARM) and arm neon SIMD for PS Vita (vit Jonathan Kew (2): Explicitly cast byte to uint32_t before left-shifting. Avoid undefined behavior (left-shifting negative value) in pixman_int_to_ Khem Raj (1): test/utils: Check for FE_INVALID definition before use Mathieu Duponchelle (2): meson: finish porting over mmx and ssse2 flags for sun and msvc meson: add missing function check (getisax) Matt Turner (7): Post-release version bump to 0.38.5 lowlevel-blt-bench: Remove unused variable loongson: Avoid C90 mixing-code-and-decls warning Distribute the blue-noise files Build xz tarballs instead of bzip2 Move from MD5/SHA1 to SHA256/SHA512 digests Pre-release version bump to 0.40.0
Re: [Pixman] [PATCH 1/2] configure.ac: Use '-mloongson-mmi' for Loongson MMI.
On Thu, Mar 26, 2020 at 5:57 AM Shiyou Yin wrote: > > It's recommended to use '-mloongson-mmi' for MMI. > --- > configure.ac | 2 +- > meson.build | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/configure.ac b/configure.ac > index 1ca3974..fd7df47 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -273,7 +273,7 @@ dnl > === > dnl Check for Loongson Multimedia Instructions > > if test "x$LS_CFLAGS" = "x" ; then > -LS_CFLAGS="-march=loongson2f" > +LS_CFLAGS="-mloongson-mmi" > fi > > have_loongson_mmi=no > diff --git a/meson.build b/meson.build > index 15d3409..a45c969 100644 > --- a/meson.build > +++ b/meson.build > @@ -51,7 +51,7 @@ endforeach > > use_loongson_mmi = get_option('loongson-mmi') > have_loongson_mmi = false > -loongson_mmi_flags = ['-march=loongson2f'] > +loongson_mmi_flags = ['-mloongson-mmi'] > if not use_loongson_mmi.disabled() >if host_machine.cpu_family() == 'mips64' and cc.compiles(''' >#ifndef __mips_loongson_vector_rev > -- Thanks very much. This looks good to me. My only (minor) concern is that the -mloongson-mmi flag is only available since GCC 9, but likely any users would need to change -march=loongson2f to -march=loongson3a anyway, and they can easily change -mloongson-mmi back to -march=... if needed. I'll just double check that with this patch that the test suite passes on my Yeeloong and then commit it. (and sorry for my delayed response) ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v2] build: improve control logic for enabling MMI.
Thank you for the patch! On Fri, Mar 6, 2020 at 3:28 AM Shiyou Yin wrote: > > From: Yin Shiyou Should be yinshiyou-hf@loongson*.cn*? > > 1. Replace LS_CFLAGS with MMI_CFLAGS to express its intention more accurately. >LS_CFLAGS is still available, but it is not recommended. I'm not aware of any reasons why LS_CFLAGS needs to stay for compatibility. Do we know of any distros that set it to override the -march=... value? > 2. Improve the control logic for enabling MMI. > > Three essential conditions for enabling MMI: > 1) user have not specify --disable-loongson-mmi. > 2) MMI options has been specified by MMI_CFLAGS,CC or compiler's default > setting. > 3) compiler supports these MMI options. > --- > configure.ac | 69 > We should also update meson.build. I expect/hope that the autotools build system will go away sometime in the future. I'm not sure I entirely understand the patch. I understand that the objective is to make it possible to easily build pixman for Loongson3A and use the pixman-mmx.c optimizations. I think it's currently possible to build pixman on mips without specifying -march=loongson* in CFLAGS and it will enable the pixman-mmx.c paths and choose them at runtime. Is part of the goal to keep that working? If so, could we just use the -mloongson-mmi flag to compile pixman-mmx.c? Or does that flag mean the Loongson3A variants of the instructions? What happens if you compile with -march=loongson2f -mloongson-mmi? Does GCC generate instructions compatible with 2F or 3A? ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for Loongson MMI.
On Sat, Feb 22, 2020 at 6:34 AM YunQiang Su wrote: > > Shiyou Yin 于2020年2月22日周六 下午9:26写道: > > > > >-Original Message- > > >From: Adam Jackson [mailto:a...@redhat.com] > > >Sent: Friday, February 21, 2020 11:33 PM > > >To: Yin Shiyou; pixman@lists.freedesktop.org > > >Subject: Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for > > >Loongson MMI. > > > > > >On Thu, 2020-02-20 at 22:23 +0800, Yin Shiyou wrote: > > >> It's suggested to use '-mloongson-mmi' to enable MMI. > > >> To keep compatible with old processor, '-mloongson-mmi' will be > > >> setted for Loongson-3A only. > > > > > >The pattern we've used for other CPUs is to build support for as many > > >ISA extensions as possible, unless they are explicitly disabled. > > >Distributions tend to want to set their own minimum ISA levels, and if > > >they wanted to assert -mloongson-mmi they would already have added it > > >to CFLAGS globally. > > > > > >Do you have any performance data for this change? > > > > > >If setting -mloongson-mmi means the compiler can do useful > > >autovectorization, then that's probably true for other arches too (eg > > >amd64 vs avx2), and we should support this kind of thing more > > >generically. But as it stands I don't think this patch is a good idea. > > > > > First, that's introduce the history of '-march=loongson2f' and > > '-mloongson-mmi'. > > From loongson2f start, mmi is supported by loongson processor. > > Yes. So that's why when we code, we should be very careful, especially > when we work on base part of a OS, just like pixman. > One, history mistake will make all of the people painful. > > An exmaple is about time_t on 32bit system. > > > Unfortunately, the compiler's support for MMI extention is not standardized. > > Gcc compiler use '-march=loongson2f' for loongson2f at first, but from > > Loongson-3A, > > opcode of mmi instruction has changed, and '-march=loongson3a' is in > > replaced. > > That is the reason some of Loongson's extensions make upstream unhappy. > You need be always very careful when you design a CPU. > 如履薄冰. No zuo no die. > > > From last year, compile option for mmi instruction has been standardized. > > Just like -mmsa for mips MSA. (MMI,LSX,LASX is Loongson SIMD extention.) > > -mloongson-mmi for MMI (-march=loongson3a still works, but -mloongson-mmi > > is recommended for new processors except Loongson2f. ) > > -mloongson-sx for LSX > > -mloongson-asxfor LASX > > That is good news. > > > > > Second, back to this patch itself. > > I meet a problem when compile pixman on my Loongson3a with gcc, MMI can't > > be enabled. > > configure check failure: " linking mips:loongson_2f module with previous > > mips:gs464 modules" > > It can be solved by assign LS_CFLAGS="-mloongson-mmi" while config. > > So I submit this patch in hope that no need to assign LS_CFLAGS explicitly. > > This won't have much impact on performance as I know. > > Here is not about performance. You made a bad design, that is burden of > history. If you're referring to using -march=loongson2f in configure.ac, then I should point out that that was my choice, and I don't really know what other options I had -- or even have today. As far as I know, -march=loongson* was, until recently, the only way to enable the SIMD instructions, and worse, if I recall correctly Loongson 2E and 2F are not entirely binary compatible themselves! The only stable Loongson system I've ever had is a Yeeloong -- 2F, so it's what I chose. Like I said in another email, I even tried building pixman-mmx.c multiple times with different -march=... values, linking them all into libpixman, and choosing which to execute at runtime, but binutils does not allow linking object files that are compiled with different -march=... values on mips for reasons I do not know. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for Loongson MMI.
On Sat, Feb 22, 2020 at 5:26 AM Shiyou Yin wrote: > > >-Original Message- > >From: Adam Jackson [mailto:a...@redhat.com] > >Sent: Friday, February 21, 2020 11:33 PM > >To: Yin Shiyou; pixman@lists.freedesktop.org > >Subject: Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for > >Loongson MMI. > > > >On Thu, 2020-02-20 at 22:23 +0800, Yin Shiyou wrote: > >> It's suggested to use '-mloongson-mmi' to enable MMI. > >> To keep compatible with old processor, '-mloongson-mmi' will be > >> setted for Loongson-3A only. > > > >The pattern we've used for other CPUs is to build support for as many > >ISA extensions as possible, unless they are explicitly disabled. > >Distributions tend to want to set their own minimum ISA levels, and if > >they wanted to assert -mloongson-mmi they would already have added it > >to CFLAGS globally. > > > >Do you have any performance data for this change? > > > >If setting -mloongson-mmi means the compiler can do useful > >autovectorization, then that's probably true for other arches too (eg > >amd64 vs avx2), and we should support this kind of thing more > >generically. But as it stands I don't think this patch is a good idea. > > > First, that's introduce the history of '-march=loongson2f' and > '-mloongson-mmi'. > From loongson2f start, mmi is supported by loongson processor. > Unfortunately, the compiler's support for MMI extention is not standardized. > Gcc compiler use '-march=loongson2f' for loongson2f at first, but from > Loongson-3A, > opcode of mmi instruction has changed, and '-march=loongson3a' is in replaced. > From last year, compile option for mmi instruction has been standardized. > Just like -mmsa for mips MSA. (MMI,LSX,LASX is Loongson SIMD extention.) > -mloongson-mmi for MMI (-march=loongson3a still works, but -mloongson-mmi > is recommended for new processors except Loongson2f. ) > -mloongson-sx for LSX > -mloongson-asxfor LASX > > Second, back to this patch itself. > I meet a problem when compile pixman on my Loongson3a with gcc, MMI can't be > enabled. > configure check failure: " linking mips:loongson_2f module with previous > mips:gs464 modules" Do you know why this is? Obviously we can and do build MMX, SSE2, SSSE3 paths and choose to execute them at runtime. Why does binutils not allow combining object files that are compiled with mixed -march=... values on mips? I cannot find the branch now, but I tried once to make pixman build pixman-mmx.c with three different -march=... values (2e, 2f, 3a) and choose which to execute at runtime, but binutils would not allow the files to be linked into the same binary. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] pixman-combine: Fix wrong value of RB_MASK_PLUS_ONE.
On Thu, Feb 20, 2020 at 6:35 AM Shiyou Yin wrote: > Will this patch be merged? Yes, pushed. Thanks! ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] pixman-combine: Fix wrong value of RB_MASK_PLUS_ONE.
On Mon, Feb 3, 2020 at 1:56 AM Yin Shiyou wrote: > > --- > pixman/pixman-combine32.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/pixman/pixman-combine32.h b/pixman/pixman-combine32.h > index cdd56a6..59bb247 100644 > --- a/pixman/pixman-combine32.h > +++ b/pixman/pixman-combine32.h > @@ -12,7 +12,7 @@ > #define RB_MASK 0xff00ff > #define AG_MASK 0xff00ff00 > #define RB_ONE_HALF 0x800080 > -#define RB_MASK_PLUS_ONE 0x1100 > +#define RB_MASK_PLUS_ONE 0x1000100 Thanks. The patch looks correct, but obviously nothing in the test suite is failing. How did you discover this? Does this patch fix something for you? ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Optimize Graphic Routines for s390x in Pixman - Queries
On Sat, Jan 25, 2020 at 4:57 AM Naveen Naidu wrote: > > Hello Everyone, > > I am Naveen a Senior Year Computer Science Undergraduate from India. I am > planning to apply for Open Mainframe Project > Internship(https://github.com/openmainframeproject-internship/resources) > program, whose one of the proposed project is to Optimize graphics routines > for s390x in pixman. > > The description of the project is as follows: > >> With the introduction of VirtIO GPU hardware (virtual graphic adapter for >> KVM-based virtual machines) for the s390x platform it makes sense to provide >> optimized routines in the pixman library also for the s390x architecture. > > > From what I gather from the description, t s390x has support for vector > instruction i.e SIMD instructions and since these instructions quicken the > processing, the project asks us to write an implementation of pixman that > uses the vector instructions for s390x. > > I have also been going through the Implementation for Power VMX SIMD, which > was created to use the Vector instructions for Power PC. But I must confess > that I am a little lost. > > It would be really kind of you all if you could guide me in what I would need > to learn/do in order for me to be able to implement the project. I've had a > course on computer graphics in our undergrad so I do understand the > fundamentals. But I would really like to know the right way of steps to do > the project so that I can get a better understanding of the project. > > Thank you very much for your time, > Naveen Welcome :) Here's some snippets of an email I sent to someone else interested in contributing optimization to pixman: Background information for the operations pixman implements: http://ssp.impulsetrain.com/porterduff.html (written by the author of Pixman) https://en.wikipedia.org/wiki/Alpha_compositing `lowlevel-blt-bench` lives in pixman's test/ directory. It's a small self-contained benchmark. Run with ./test/lowlevel-blt-bench all ./test/lowlevel-blt-bench over__ etc. The -b (bilinear) and -n (nearest) options are useful as well. Firefox traces will show lots of usage of bilinear and nearest scaling functions. There's an environment variable named PIXMAN_DISABLE=... which is very useful for getting side-by-side performance comparisons of MMX vs SSE2 vs AVX2. (For S390, since it doesn't already have some optimizations, it may not be particularly useful). It works for both lowlevel-blt-bench and cairo-perf-trace. Cairo https://cgit.freedesktop.org/cairo/My https://cgit.freedesktop.org/cairo-traces/ `cairo-perf-trace` lives in cairo's perf directory. Run with CAIRO_TEST_TARGET=image16,image ./perf/cairo-perf-trace ~/path/to/trace The trace files in cairo-traces are .lzma files which will have to be decompressed. Decompress with lzma -dk trace.lzma or alternatively run make in cairo-traces to uncompress them all. Pass the uncompressed file to cairo-perf-trace. The arguments to CAIRO_TEST_TARGET specify what backend Cairo should use. 'image' corresponds to 32-bit visuals, and 'image16' is 16-bit visuals. Here's a couple of my blog posts about some work I did on pixman. Maybe you can find something valuable in them. https://mattst88.com/blog/2012/05/17/Optimizing_pixman_for_Loongson:_Process_and_Results/ https://mattst88.com/blog/2012/07/06/My_time_optimizing_graphics_performance_on_the_OLPC_XO_1,75_laptop/ I would look at the pixman_sse2.c file for examples of what pixman optimizations look like. That may be a better starting point than the POWER optimizations. I have a small branch here (https://cgit.freedesktop.org/~mattst88/pixman/log/?h=avx2) that demonstrates adding a set of optimizations for a new instruction set. I expect it would be helpful to look over. Thanks, Matt ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] [dither] Don't use GNU extension for binary numbers
Thanks. Pushed. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Dithering patches, v2
On Sat, May 11, 2019 at 7:42 AM Bryce Harrington wrote: > > On Tue, May 07, 2019 at 09:52:39AM -0700, Matt Turner wrote: > > On Sun, May 5, 2019 at 11:50 AM Bryce Harrington > > wrote: > > > > > > On Mon, Apr 22, 2019 at 09:26:48AM -0700, Matt Turner wrote: > > > > On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington > > > > wrote: > > > > > Inkscape would love to see Basile's dithering patches included. Our > > > > > testing shows that they make a huge quality difference for our users; > > > > > this solves a critical need. > > > > > > > > > > Mc and I have done some preliminary investigation into how to plumb > > > > > this > > > > > into Cairo, and would love to hear your review of Basile's approach to > > > > > the problem. > > > > > > > > I don't feel like I'm experienced enough with that side of pixman to > > > > offer meaningful comments. I've Cc'd Søren in the hopes that he > > > > remains interested enough in the project to review the patches that > > > > Basile says implement the approach Søren described. > > > > > > I totally understand, I'd feel the same. But I think this is an > > > important patch, so how can we move forward with it? > > > > If you're happy with the patches, I'd say let's commit them. > > Works for me, would you prefer me to commit them, or will you be > committing them yourself? I'd prefer you commit them since they're for Inkscape. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Dithering patches, v2
On Sun, May 5, 2019 at 11:50 AM Bryce Harrington wrote: > > On Mon, Apr 22, 2019 at 09:26:48AM -0700, Matt Turner wrote: > > On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington > > wrote: > > > Inkscape would love to see Basile's dithering patches included. Our > > > testing shows that they make a huge quality difference for our users; > > > this solves a critical need. > > > > > > Mc and I have done some preliminary investigation into how to plumb this > > > into Cairo, and would love to hear your review of Basile's approach to > > > the problem. > > > > I don't feel like I'm experienced enough with that side of pixman to > > offer meaningful comments. I've Cc'd Søren in the hopes that he > > remains interested enough in the project to review the patches that > > Basile says implement the approach Søren described. > > I totally understand, I'd feel the same. But I think this is an > important patch, so how can we move forward with it? If you're happy with the patches, I'd say let's commit them. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Dithering patches, v2
On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington wrote: > Inkscape would love to see Basile's dithering patches included. Our > testing shows that they make a huge quality difference for our users; > this solves a critical need. > > Mc and I have done some preliminary investigation into how to plumb this > into Cairo, and would love to hear your review of Basile's approach to > the problem. I don't feel like I'm experienced enough with that side of pixman to offer meaningful comments. I've Cc'd Søren in the hopes that he remains interested enough in the project to review the patches that Basile says implement the approach Søren described. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.
On Thu, Mar 28, 2019 at 10:41 PM Matt Turner wrote: > > On Wed, Mar 27, 2019 at 1:06 PM Matt Turner wrote: > > > > Thank you. I'll run some benchmarks on my KBL system to confirm and > > then commit them. > > > > I'm planning to do a 0.40 release soon with some Meson fixes and other > > small things. Seems like these patches will be good to include to make > > the release have a new feature :) > > Or maybe not. > > I benchmarked cairo-traces. The only thing that improved measurably > was poppler. I thought, well, at least we improved that and then > remembering my patch that also improved it I applied it, only to > realize that you incorporated my patch into your work without > mentioning it. > > And so your poppler improvements are in fact from my patch, now > modified and silently combined into this one. That's really bad form. Review processes undertaken indicate that Raghu wrote this code independently of me. My apologies for suggesting otherwise. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [ANNOUNCE] pixman release 0.38.4 now available
A new pixman release 0.38.4 is now available. This is a stable release in the in the 0.38 series. tar.gz: https://cairographics.org/releases/pixman-0.38.4.tar.gz https://www.x.org/releases/individual/lib/pixman-0.38.4.tar.gz tar.bz2: https://www.x.org/releases/individual/lib/pixman-0.38.4.tar.bz2 Hashes: MD5: 267a7af290f93f643a1bc74490d9fdd1 pixman-0.38.4.tar.gz MD5: 16a350a8a40116ddf67632a1d2623711 pixman-0.38.4.tar.bz2 SHA1: 8594e0a31c1802ae0c155d6b502c0953aa862baa pixman-0.38.4.tar.gz SHA1: 87e1abc91ac4e5dfcc275f744f1d0ec3277ee7cd pixman-0.38.4.tar.bz2 GPG signature: https://cairographics.org/releases/pixman-0.38.4.tar.gz.sha1.asc (signed by [ultimate] Matt Turner [ultimate] Matt Turner [ultimate] Matt Turner [ultimate] Matt Turner ) Git: https://gitlab.freedesktop.org/pixman/pixman.git tag: pixman-0.38.4 Log: Matt Turner (4): Post-release version bump to 0.38.3 Makefile.am: Update download links Makefile.am: Ship Meson assembly test files in the tarball Pre-release version bump to 0.38.4 signature.asc Description: PGP signature ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [ANNOUNCE] pixman release 0.38.2 now available
A new pixman release 0.38.2 is now available. This is a stable release in the in the 0.38 series. This release mostly contains fixes for the Meson build system. tar.gz: https://cairographics.org/releases/pixman-0.38.2.tar.gz https://www.x.org/releases/individual/lib/pixman-0.38.2.tar.gz tar.bz2: https://www.x.org/releases/individual/lib/pixman-0.38.2.tar.bz2 Hashes: MD5: e216abae705641038ca782c6d6fd4204 pixman-0.38.2.tar.gz MD5: dfdbebf2ce6c2ff0891247c55f928d97 pixman-0.38.2.tar.bz2 SHA1: c2abaea13ff9f12f31592859604047d8b1fa082a pixman-0.38.2.tar.gz SHA1: ce40833fe4337aa6329ac5694d9ff342338219c1 pixman-0.38.2.tar.bz2 GPG signature: http://cairographics.org/releases/pixman-0.38.2.tar.gz.sha1.asc (signed by [ultimate] Matt Turner [ultimate] Matt Turner [ultimate] Matt Turner [ultimate] Matt Turner ) Git: https://gitlab.freedesktop.org/pixman/pixman.git tag: pixman-0.38.2 Log: Dylan Baker (6): meson: work around meson issue #5115 meson: fix typo which breaks loongson checks meson: fix copy-n-paste error for arm simd assembly meson: Add proper include paths for the loongson check meson: simplify and fix mmx library compilation meson: store ARM SIMD and NEON tests as text files Matt Turner (2): meson: Correct copy-and-paste mistake Pre-release version bump to 0.38.2 Niveditha Rau (1): void function should not return a value Simon Richter (2): Windows: Show compiler invocation Windows: Support building with SHELL=cmd.exe signature.asc Description: PGP signature ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.
On Wed, Mar 27, 2019 at 1:06 PM Matt Turner wrote: > > Thank you. I'll run some benchmarks on my KBL system to confirm and > then commit them. > > I'm planning to do a 0.40 release soon with some Meson fixes and other > small things. Seems like these patches will be good to include to make > the release have a new feature :) Or maybe not. I benchmarked cairo-traces. The only thing that improved measurably was poppler. I thought, well, at least we improved that and then remembering my patch that also improved it I applied it, only to realize that you incorporated my patch into your work without mentioning it. And so your poppler improvements are in fact from my patch, now modified and silently combined into this one. That's really bad form. From a technical perspective, I think we're back where we started: with an AVX2 implementation of over__ that does not provide a meaningful improvement in any cairo-trace and me doubting whether it's worth pursuing this project any further. To be honest, at this point I would prefer that you not continue this project. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] void function should not return a value
Thanks. Merged. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/2] Windows: Show compiler invocation
Thanks. Merged both. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.
Thank you. I'll run some benchmarks on my KBL system to confirm and then commit them. I'm planning to do a 0.40 release soon with some Meson fixes and other small things. Seems like these patches will be good to include to make the release have a new feature :) ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] avx2: Add fast path for over_reverse_n_8888
lowlevel-blt-bench, over_reverse_n_, 100 iterations: Before After Mean StdDev Mean StdDev Confidence Change L1 2372.6 2.50 4387.6 8.00100.00% +84.9% L2 2490.3 5.29 4326.5 20.79100.00% +73.7% M 2418.3 10.43 3718.0 38.55100.00% +53.7% HT 1555.8 13.35 2112.9 23.85100.00% +35.8% VT 1120.1 9.58 1403.7 15.43100.00% +25.3% R958.5 17.66 1176.9 20.87100.00% +22.8% RT 407.3 6.79450.1 7.22100.00% +10.5% At most 18 outliers rejected per test per set. cairo-perf-trace with trimmed traces, 30 iterations: Before After Mean StdDev Mean StdDev Confidence Change poppler 0.516 0.0030.478 0.002 100.000% +8.1% Cairo perf reports the running time, but the change is computed for operations per second instead (inverse of running time). --- pixman/pixman-avx2.c | 94 1 file changed, 94 insertions(+) diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c index faef552..6a67515 100644 --- a/pixman/pixman-avx2.c +++ b/pixman/pixman-avx2.c @@ -28,6 +28,18 @@ negate_2x256 (__m256i data_lo, *neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2); } +static force_inline __m256i +unpack_32_1x256 (uint32_t data) +{ +return _mm256_unpacklo_epi8 (_mm256_broadcastd_epi32 (_mm_cvtsi32_si128 (data)), _mm256_setzero_si256 ()); +} + +static force_inline __m256i +expand_pixel_32_1x256 (uint32_t data) +{ +return _mm256_shuffle_epi32 (unpack_32_1x256 (data), _MM_SHUFFLE (1, 0, 1, 0)); +} + static force_inline __m256i pack_2x256_256 (__m256i lo, __m256i hi) { @@ -100,6 +112,13 @@ save_256_aligned (__m256i* dst, _mm256_store_si256 (dst, data); } +static force_inline void +save_256_unaligned (__m256i* dst, + __m256i data) +{ +_mm256_storeu_si256 (dst, data); +} + static force_inline int is_opaque_256 (__m256i x) { @@ -429,12 +448,87 @@ avx2_composite_over__ (pixman_implementation_t *imp, src += src_stride; } } + +static void +avx2_composite_over_reverse_n_ (pixman_implementation_t *imp, + pixman_composite_info_t *info) +{ +PIXMAN_COMPOSITE_ARGS (info); +uint32_t src; +uint32_t*dst_line, *dst; +__m256i ymm_src; +__m256i ymm_dst, ymm_dst_lo, ymm_dst_hi; +__m256i ymm_dsta_hi, ymm_dsta_lo; +int dst_stride; +int32_t w; + +src = _pixman_image_get_solid (imp, src_image, dest_image->bits.format); + +if (src == 0) + return; + +PIXMAN_IMAGE_GET_LINE ( + dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 1); + +ymm_src = expand_pixel_32_1x256 (src); + +while (height--) +{ + dst = dst_line; + + dst_line += dst_stride; + w = width; + + while (w >= 8) + { + __m256i tmp_lo, tmp_hi; + + ymm_dst = load_256_unaligned ((__m256i*)dst); + + unpack_256_2x256 (ymm_dst, _dst_lo, _dst_hi); + expand_alpha_2x256 (ymm_dst_lo, ymm_dst_hi, _dsta_lo, _dsta_hi); + + tmp_lo = ymm_src; + tmp_hi = ymm_src; + + over_2x256 (_dst_lo, _dst_hi, + _dsta_lo, _dsta_hi, + _lo, _hi); + + save_256_unaligned ( + (__m256i*)dst, pack_2x256_256 (tmp_lo, tmp_hi)); + + w -= 8; + dst += 8; + } + + while (w) + { + __m128i vd; + + vd = unpack_32_1x128 (*dst); + + *dst = pack_1x128_32 (over_1x128 (vd, expand_alpha_1x128 (vd), + _mm256_castsi256_si128 (ymm_src))); + w--; + dst++; + } + +} + +} + static const pixman_fast_path_t avx2_fast_paths[] = { PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, avx2_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, avx2_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, avx2_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, x8b8g8r8, avx2_composite_over__), + +/* PIXMAN_OP_OVER_REVERSE */ +PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8r8g8b8, avx2_composite_over_reverse_n_), +PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8b8g8r8, avx2_composite_over_reverse_n_), + { PIXMAN_OP_NONE }, }; -- 2.19.2 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 3/3] Rev2 of patch: AVX2 versions of OVER and ROVER operators.
On Wed, Jan 16, 2019 at 4:57 PM Raghuveer Devulapalli wrote: > > From: raghuveer devulapalli > > These were found to be upto 1.8 times faster (depending on the array > size) than the corresponding SSE2 version. The AVX2 and SSE2 were > benchmarked on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz. The AVX2 and > SSE versions were benchmarked by measuring how many TSC cycles each of > the avx2_combine_over_u and sse2_combine_over_u functions took to run > for various array sizes. For the purpose of benchmarking, turbo was > disabled and intel_pstate governor was set to performance to avoid > variance in CPU frequencies across multiple runs. > > | Array size | #cycles SSE2 | #cycles AVX2 | > > | 400| 53966| 32800| > | 800| 107595 | 62595| > | 1600 | 214810 | 122482 | > | 3200 | 429748 | 241971 | > | 6400 | 859070 | 481076 | > > Also ran lowlevel-blt-bench for OVER__ operation and that > also shows a 1.55x-1.79x improvement over SSE2. Here are the details: > > AVX2: OVER__ = L1:2136.35 L2:2109.46 M:1751.99 ( 60.90%) > SSE2: OVER__ = L1:1188.91 L2:1190.63 M:1128.32 ( 40.31%) > > The AVX2 implementation uses the SSE2 version for manipulating pixels > that are not 32 byte aligned. The helper functions from pixman-sse2.h > are re-used for this purpose. I still cannot measure any performance improvement with cairo-traces. If we're not improving performance in any real world application, then I don't think it's worth adding a significant amount of code. As I told you in person and in private mail, I suspect that you're more likely to see real performance improvements in operations that are more compute-heavy, like bilinear filtering. You could probably use AVX2's gather instructions in the bilinear code as well. Filling out the avx2_iters array would also be a good place to start, since those functions execute when we do not have a specific fast-path for an operation (which will be the case for AVX2). I sense that you want to check this off your todo list and move on. If that's the case, we can include the avx2_composite_over_reverse_n_ function I wrote (and will send as a separate patch) to confirm that using AVX2 is capable of giving a performance improvement in some cairo traces. > --- > pixman/pixman-avx2.c | 431 ++- > 1 file changed, 430 insertions(+), 1 deletion(-) > > diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c > index d860d67..faef552 100644 > --- a/pixman/pixman-avx2.c > +++ b/pixman/pixman-avx2.c > @@ -6,13 +6,439 @@ > #include "pixman-private.h" > #include "pixman-combine32.h" > #include "pixman-inlines.h" > +#include "pixman-sse2.h" > > +#define MASK_0080_AVX2 _mm256_set1_epi16(0x0080) > +#define MASK_00FF_AVX2 _mm256_set1_epi16(0x00ff) > +#define MASK_0101_AVX2 _mm256_set1_epi16(0x0101) > + > +static force_inline __m256i > +load_256_aligned (__m256i* src) > +{ > +return _mm256_load_si256(src); > +} > + > +static force_inline void > +negate_2x256 (__m256i data_lo, > + __m256i data_hi, > + __m256i* neg_lo, > + __m256i* neg_hi) > +{ > +*neg_lo = _mm256_xor_si256 (data_lo, MASK_00FF_AVX2); > +*neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2); > +} > + > +static force_inline __m256i > +pack_2x256_256 (__m256i lo, __m256i hi) > +{ > +return _mm256_packus_epi16 (lo, hi); > +} > + Stray space > +static force_inline void > +pix_multiply_2x256 (__m256i* data_lo, > + __m256i* data_hi, > + __m256i* alpha_lo, > + __m256i* alpha_hi, > + __m256i* ret_lo, > + __m256i* ret_hi) > +{ > +__m256i lo, hi; > + > +lo = _mm256_mullo_epi16 (*data_lo, *alpha_lo); > +hi = _mm256_mullo_epi16 (*data_hi, *alpha_hi); > +lo = _mm256_adds_epu16 (lo, MASK_0080_AVX2); > +hi = _mm256_adds_epu16 (hi, MASK_0080_AVX2); > +*ret_lo = _mm256_mulhi_epu16 (lo, MASK_0101_AVX2); > +*ret_hi = _mm256_mulhi_epu16 (hi, MASK_0101_AVX2); > +} > + Stray space > +static force_inline void > +over_2x256 (__m256i* src_lo, > + __m256i* src_hi, > + __m256i* alpha_lo, > + __m256i* alpha_hi, > + __m256i* dst_lo, > + __m256i* dst_hi) > +{ > +__m256i t1, t2; > + > +negate_2x256 (*alpha_lo, *alpha_hi, , ); > + > +pix_multiply_2x256 (dst_lo, dst_hi, , , dst_lo, dst_hi); > + > +*dst_lo = _mm256_adds_epu8 (*src_lo, *dst_lo); > +*dst_hi = _mm256_adds_epu8 (*src_hi, *dst_hi); > +} > + > +static force_inline void > +expand_alpha_2x256 (__m256i data_lo, > + __m256i data_hi, > + __m256i* alpha_lo, > + __m256i* alpha_hi) > +{ > +__m256i lo, hi; > + > +lo = _mm256_shufflelo_epi16 (data_lo, _MM_SHUFFLE (3, 3, 3, 3)); > +hi =
Re: [Pixman] [PATCH 2/3] Moving helper functions in pixman-sse2.c to pixman-sse2.h.
On Wed, Jan 16, 2019 at 4:57 PM Raghuveer Devulapalli wrote: > > From: raghuveer devulapalli > > These helper function will be reused in pixman-avx2.c implementations in > the future. > --- > pixman/pixman-sse2.c | 504 +-- > pixman/pixman-sse2.h | 502 ++ > 2 files changed, 503 insertions(+), 503 deletions(-) > create mode 100644 pixman/pixman-sse2.h > > diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c > index 8955103..8dea0c2 100644 > --- a/pixman/pixman-sse2.c > +++ b/pixman/pixman-sse2.c > @@ -32,509 +32,7 @@ > > /* PSHUFD is slow on a lot of old processors, and new processors have SSSE3 > */ > #define PSHUFD_IS_FAST 0 > - > -#include /* for _mm_shuffle_pi16 and _MM_SHUFFLE */ > -#include /* for SSE2 intrinsics */ > -#include "pixman-private.h" > -#include "pixman-combine32.h" > -#include "pixman-inlines.h" > - > -static __m128i mask_0080; > -static __m128i mask_00ff; > -static __m128i mask_0101; > -static __m128i mask_; > -static __m128i mask_ff00; > -static __m128i mask_alpha; > - > -static __m128i mask_565_r; > -static __m128i mask_565_g1, mask_565_g2; > -static __m128i mask_565_b; > -static __m128i mask_red; > -static __m128i mask_green; > -static __m128i mask_blue; > - > -static __m128i mask_565_fix_rb; > -static __m128i mask_565_fix_g; > - > -static __m128i mask_565_rb; > -static __m128i mask_565_pack_multiplier; > - These are moving to pixman-sse2.h to be used by the code below, which is to be used by the AVX2 code. But they're initialized in _pixman_implementation_create_sse2(), which means if you used PIXMAN_DISABLE=sse2 the AVX2 paths would fail. I suspect these constants do need to be prefixed with "sse2_", and in _pixman_x86_get_implementations() you should disable avx2 if PIXMAN_DISABLE=sse2. > -static force_inline __m128i > -unpack_32_1x128 (uint32_t data) > -{ > -return _mm_unpacklo_epi8 (_mm_cvtsi32_si128 (data), _mm_setzero_si128 > ()); > -} > - > -static force_inline void > -unpack_128_2x128 (__m128i data, __m128i* data_lo, __m128i* data_hi) > -{ > -*data_lo = _mm_unpacklo_epi8 (data, _mm_setzero_si128 ()); > -*data_hi = _mm_unpackhi_epi8 (data, _mm_setzero_si128 ()); > -} > - > -static force_inline __m128i > -unpack_565_to_ (__m128i lo) > -{ > -__m128i r, g, b, rb, t; > - > -r = _mm_and_si128 (_mm_slli_epi32 (lo, 8), mask_red); > -g = _mm_and_si128 (_mm_slli_epi32 (lo, 5), mask_green); > -b = _mm_and_si128 (_mm_slli_epi32 (lo, 3), mask_blue); > - > -rb = _mm_or_si128 (r, b); > -t = _mm_and_si128 (rb, mask_565_fix_rb); > -t = _mm_srli_epi32 (t, 5); > -rb = _mm_or_si128 (rb, t); > - > -t = _mm_and_si128 (g, mask_565_fix_g); > -t = _mm_srli_epi32 (t, 6); > -g = _mm_or_si128 (g, t); > - > -return _mm_or_si128 (rb, g); > -} > - > -static force_inline void > -unpack_565_128_4x128 (__m128i data, > - __m128i* data0, > - __m128i* data1, > - __m128i* data2, > - __m128i* data3) > -{ > -__m128i lo, hi; > - > -lo = _mm_unpacklo_epi16 (data, _mm_setzero_si128 ()); > -hi = _mm_unpackhi_epi16 (data, _mm_setzero_si128 ()); > - > -lo = unpack_565_to_ (lo); > -hi = unpack_565_to_ (hi); > - > -unpack_128_2x128 (lo, data0, data1); > -unpack_128_2x128 (hi, data2, data3); > -} > - > -static force_inline uint16_t > -pack_565_32_16 (uint32_t pixel) > -{ > -return (uint16_t) (((pixel >> 8) & 0xf800) | > - ((pixel >> 5) & 0x07e0) | > - ((pixel >> 3) & 0x001f)); > -} > - > -static force_inline __m128i > -pack_2x128_128 (__m128i lo, __m128i hi) > -{ > -return _mm_packus_epi16 (lo, hi); > -} > - > -static force_inline __m128i > -pack_565_2packedx128_128 (__m128i lo, __m128i hi) > -{ > -__m128i rb0 = _mm_and_si128 (lo, mask_565_rb); > -__m128i rb1 = _mm_and_si128 (hi, mask_565_rb); > - > -__m128i t0 = _mm_madd_epi16 (rb0, mask_565_pack_multiplier); > -__m128i t1 = _mm_madd_epi16 (rb1, mask_565_pack_multiplier); > - > -__m128i g0 = _mm_and_si128 (lo, mask_green); > -__m128i g1 = _mm_and_si128 (hi, mask_green); > - > -t0 = _mm_or_si128 (t0, g0); > -t1 = _mm_or_si128 (t1, g1); > - > -/* Simulates _mm_packus_epi32 */ > -t0 = _mm_slli_epi32 (t0, 16 - 5); > -t1 = _mm_slli_epi32 (t1, 16 - 5); > -t0 = _mm_srai_epi32 (t0, 16); > -t1 = _mm_srai_epi32 (t1, 16); > -return _mm_packs_epi32 (t0, t1); > -} > - > -static force_inline __m128i > -pack_565_2x128_128 (__m128i lo, __m128i hi) > -{ > -__m128i data; > -__m128i r, g1, g2, b; > - > -data = pack_2x128_128 (lo, hi); > - > -r = _mm_and_si128 (data, mask_565_r); > -g1 = _mm_and_si128 (_mm_slli_epi32 (data, 3), mask_565_g1); > -g2 = _mm_and_si128 (_mm_srli_epi32 (data, 5), mask_565_g2); > -b = _mm_and_si128 (_mm_srli_epi32 (data, 3), mask_565_b); > -
Re: [Pixman] [PATCH] mmx: compile on MIPS for Loongson-3A MMI optimizations
On Tue, Sep 18, 2018 at 2:34 AM wrote: > > From: Xianju Diao > > make check: > when I enable the USE_OPENMP, the test of 'glyph-test' and > 'cover-test' will failed on Loongson-3A3000. > Neither of the two test examples passed without optimizing the > code.Maybe be multi-core synchronization > of cpu bug,I will continue to debug this problem, Now, I use the > critical of openMP, 'glyph-test' and ' > cover-test' can passed. > > benchmark: > Running cairo-perf-trace benchmark on Loongson-3A. > image image16 > gvim 5.425 -> 5.069 5.531 -> 5.236 > popler-reseau 2.149 -> 2.13 2.152 -> 2.139 > swfdec-giant-steps-full 18.672 -> 8.21533.167 -> 18.28 > swfdec-giant-steps7.014 -> 2.45512.48 -> 5.982 > xfce4-terminal-al13.695 -> 5.24115.703 -> 5.859 > gonme-system-monitor 12.783 -> 7.05812.780 -> 7.104 > grads-heat-map0.482 -> 0.486 0.516 -> 0.514 > firefox-talos-svg 141.138 -> 134.621 152.495 -> 159.069 > firefox-talos-gfx23.119 -> 14.437 24.870 -> 15.161 > firefox-world-map32.018 -> 27.139 33.817 -> 28.085 > firefox-periodic-table 12.305 -> 12.443 12.876 -> 12.913 > evolution 7.071 -> 3.564 8.550 -> 3.784 > firefox-planet-gnome 77.926 -> 67.526 81.554 -> 65.840 > ocitysmap 4.934 -> 1.702 4.937 -> 1.701 > --- Thanks for the patch. I will review it when I have time (I'm preparing for a trip at the moment). I have a Loongson3 system that I have found to be unstable. I assume it is due to the hardware bugs that must be worked around in gcc and binutils. I have patched both of them with the patches I found in https://github.com/loongson-community/binutils-gdb etc, but I still have instability. I would appreciate it very much if you could offer some suggestions or help in improving the stability of my system. Looks like there are a couple of different things happening in this patch. We should try to split them up. One patch could be making the assembly memcpy implementation usable on mips64. A separate patch would add new functions to pixman-mmx.c. A few quick comments inline. > configure.ac|7 +- > pixman/Makefile.am |4 +- > pixman/loongson-mmintrin.h | 46 ++ > pixman/pixman-combine32.h |6 + > pixman/pixman-mips-dspr2-asm.h |2 +- > pixman/pixman-mips-memcpy-asm.S | 324 +--- > pixman/pixman-mmx.c | 1088 > ++- > pixman/pixman-private.h | 32 +- > pixman/pixman-solid-fill.c | 49 +- > pixman/pixman-utils.c | 65 ++- > test/Makefile.am|2 +- > test/utils.c|8 + This diff stat doesn't correspond to this patch. > 12 files changed, 1418 insertions(+), 215 deletions(-) > > diff --git a/configure.ac b/configure.ac > index e833e45..3e3dde5 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -154,9 +154,9 @@ AC_CHECK_DECL([__amd64], [AMD64_ABI="yes"], > [AMD64_ABI="no"]) > # has set CFLAGS. > if test $SUNCC = yes &&\ > test "x$test_CFLAGS" = "x" && \ > - test "$CFLAGS" = "-g" > + test "$CFLAGS" = "-g -mabi=n64" > then > - CFLAGS="-O -g" > + CFLAGS="-O -g -mabi=n64" This isn't acceptable. > fi > > # > @@ -183,6 +183,7 @@ AC_SUBST(LT_VERSION_INFO) > # Check for dependencies > > PIXMAN_CHECK_CFLAG([-Wall]) > +PIXMAN_CHECK_CFLAG([-mabi=n64]) > PIXMAN_CHECK_CFLAG([-Wdeclaration-after-statement]) > PIXMAN_CHECK_CFLAG([-Wno-unused-local-typedefs]) > PIXMAN_CHECK_CFLAG([-fno-strict-aliasing]) > @@ -273,7 +274,7 @@ dnl > === > dnl Check for Loongson Multimedia Instructions > > if test "x$LS_CFLAGS" = "x" ; then > -LS_CFLAGS="-march=loongson2f" > +LS_CFLAGS="-march=loongson3a" Also not acceptable. I see that recent gcc and binutils have gotten new options for enabling MMI separately from -march=loongson*. Maybe we could use those if available. I'm not sure there is currently a good solution. Let me think about it. > fi > > have_loongson_mmi=no > diff --git a/pixman/Makefile.am b/pixman/Makefile.am > index 581b6f6..e3a080c 100644 > --- a/pixman/Makefile.am > +++ b/pixman/Makefile.am > @@ -122,7 +122,7 @@ libpixman_mips_dspr2_la_SOURCES = \ > pixman-mips-dspr2.h \ > pixman-mips-dspr2-asm.S \ > pixman-mips-dspr2-asm.h \ > -pixman-mips-memcpy-asm.S > +#pixman-mips-memcpy-asm.S Can't do this. > libpixman_1_la_LIBADD += libpixman-mips-dspr2.la > >
Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator
On Wed, Aug 29, 2018 at 12:09 PM Matt Turner wrote: > Trailing whitespace. There's a lot throughout this patch. I'm not > going to point them out individually. I just looked up how to configure git to alert you to bad whitespace: git config core.whitespace indent-with-non-tab,space-before-tab,trailing-space Give that a try. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator
On Wed, Aug 29, 2018 at 12:09 PM Matt Turner wrote: > > On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli > wrote: > > > > The AVX2 implementation of OVER and REVERSE OVER operator was > > found to be upto 2.2 times faster (depending on the array size) than > > the corresponding SSE2 version. The AVX2 and SSE2 were benchmarked > > on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz > > > > Moving the helper functions in pixman-sse2.c to pixman-sse2.h. The AVX2 > > implementation uses the SSE2 version for manipulating pixels that are not > > 32 byte aligned and hence, it made sense to separate the SSE2 helper > > functions into a separate file to be included in the AVX2 file rather > > than duplicate code. > > Let's please move the helpers into pixman-sse2.h in a separate commit > from the one that adds AVX2 code paths. > > We typically have more substantial benchmarks in the commit message. I ran all of the cairo traces in the benchmarks directory and couldn't measure any difference. You'll have to describe your benchmarking. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator
On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli wrote: > > The AVX2 implementation of OVER and REVERSE OVER operator was > found to be upto 2.2 times faster (depending on the array size) than > the corresponding SSE2 version. The AVX2 and SSE2 were benchmarked > on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz > > Moving the helper functions in pixman-sse2.c to pixman-sse2.h. The AVX2 > implementation uses the SSE2 version for manipulating pixels that are not > 32 byte aligned and hence, it made sense to separate the SSE2 helper > functions into a separate file to be included in the AVX2 file rather > than duplicate code. Let's please move the helpers into pixman-sse2.h in a separate commit from the one that adds AVX2 code paths. We typically have more substantial benchmarks in the commit message. Let me run some cairo traces and see what I come up with. Also, what about the problems of AVX2 turbo? https://mobile.twitter.com/rygorous/status/992170573819138048 https://gist.github.com/rygorous/32bc3ea8301dba09358fd2c64e02d774 It doesn't seem like we are doing anything related to it in these patches. > --- > pixman/pixman-avx2.c | 401 > pixman/pixman-sse2.c | 504 > +-- > pixman/pixman-sse2.h | 502 ++ > 3 files changed, 904 insertions(+), 503 deletions(-) > create mode 100644 pixman/pixman-sse2.h > > diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c > index d860d67..60b1b2b 100644 > --- a/pixman/pixman-avx2.c > +++ b/pixman/pixman-avx2.c > @@ -6,6 +6,404 @@ > #include "pixman-private.h" > #include "pixman-combine32.h" > #include "pixman-inlines.h" > +#include "pixman-sse2.h" > + > +#define MASK_0080_AVX2 _mm256_set1_epi16(0x0080) > +#define MASK_00FF_AVX2 _mm256_set1_epi16(0x00ff) > +#define MASK_0101_AVX2 _mm256_set1_epi16(0x0101) > + > +static force_inline __m256i Trailing whitespace. There's a lot throughout this patch. I'm not going to point them out individually. > +load_256_aligned (__m256i* src) > +{ > +return _mm256_load_si256(src); > +} > + > +static force_inline void > +negate_2x256 (__m256i data_lo, > + __m256i data_hi, > + __m256i* neg_lo, > + __m256i* neg_hi) > +{ > +*neg_lo = _mm256_xor_si256 (data_lo, MASK_00FF_AVX2); > +*neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2); > +} > + > +static force_inline __m256i > +pack_2x256_256 (__m256i lo, __m256i hi) > +{ > +return _mm256_packus_epi16 (lo, hi); > +} > + > +static force_inline void > +pix_multiply_2x256 (__m256i* data_lo, > + __m256i* data_hi, > + __m256i* alpha_lo, > + __m256i* alpha_hi, > + __m256i* ret_lo, > + __m256i* ret_hi) > +{ > +__m256i lo, hi; > + > +lo = _mm256_mullo_epi16 (*data_lo, *alpha_lo); > +hi = _mm256_mullo_epi16 (*data_hi, *alpha_hi); > +lo = _mm256_adds_epu16 (lo, MASK_0080_AVX2); > +hi = _mm256_adds_epu16 (hi, MASK_0080_AVX2); > +*ret_lo = _mm256_mulhi_epu16 (lo, MASK_0101_AVX2); > +*ret_hi = _mm256_mulhi_epu16 (hi, MASK_0101_AVX2); > +} > + > +static force_inline void > +over_2x256 (__m256i* src_lo, > + __m256i* src_hi, > + __m256i* alpha_lo, > + __m256i* alpha_hi, > + __m256i* dst_lo, > + __m256i* dst_hi) > +{ > +__m256i t1, t2; > + > +negate_2x256 (*alpha_lo, *alpha_hi, , ); > + > +pix_multiply_2x256 (dst_lo, dst_hi, , , dst_lo, dst_hi); > + > +*dst_lo = _mm256_adds_epu8 (*src_lo, *dst_lo); > +*dst_hi = _mm256_adds_epu8 (*src_hi, *dst_hi); > +} > + > +static force_inline void > +expand_alpha_2x256 (__m256i data_lo, > + __m256i data_hi, > + __m256i* alpha_lo, > + __m256i* alpha_hi) > +{ > +__m256i lo, hi; > + > +lo = _mm256_shufflelo_epi16 (data_lo, _MM_SHUFFLE (3, 3, 3, 3)); > +hi = _mm256_shufflelo_epi16 (data_hi, _MM_SHUFFLE (3, 3, 3, 3)); > + > +*alpha_lo = _mm256_shufflehi_epi16 (lo, _MM_SHUFFLE (3, 3, 3, 3)); > +*alpha_hi = _mm256_shufflehi_epi16 (hi, _MM_SHUFFLE (3, 3, 3, 3)); > +} > + > +static force_inline void > +unpack_256_2x256 (__m256i data, __m256i* data_lo, __m256i* data_hi) > +{ > +*data_lo = _mm256_unpacklo_epi8 (data, _mm256_setzero_si256 ()); > +*data_hi = _mm256_unpackhi_epi8 (data, _mm256_setzero_si256 ()); > +} > + > +/* save 4 pixels on a 16-byte boundary aligned address */ > +static force_inline void > +save_256_aligned (__m256i* dst, > + __m256i data) > +{ > +_mm256_store_si256 (dst, data); > +} > + > +static force_inline int > +is_opaque_256 (__m256i x) > +{ > +__m256i ffs = _mm256_cmpeq_epi8 (x, x); > + > +return (_mm256_movemask_epi8 > + (_mm256_cmpeq_epi8 (x, ffs)) & 0x) == 0x; > +} > + > +static force_inline int > +is_zero_256 (__m256i x) > +{ > +
Re: [Pixman] [PATCH] Adding infrastructure to permit future AVX2 implementations
Thank you for the patches! Some comments inline. On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli wrote: > > --- > configure.ac| 44 > pixman/Makefile.am | 12 > pixman/pixman-avx2.c| 32 > pixman/pixman-private.h | 5 + > pixman/pixman-x86.c | 15 +-- > 5 files changed, 106 insertions(+), 2 deletions(-) > create mode 100644 pixman/pixman-avx2.c > > diff --git a/configure.ac b/configure.ac > index e833e45..27f4305 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -503,6 +503,48 @@ fi > AM_CONDITIONAL(USE_SSSE3, test $have_ssse3_intrinsics = yes) > > dnl > === > +dnl Check for AVX2 Trailing whitespace > + > +if test "x$AVX2_CFLAGS" = "x" ; then > +AVX2_CFLAGS="-mavx2 -Winline" > +fi > + > +have_avx2_intrinsics=no > +AC_MSG_CHECKING(whether to use AVX2 intrinsics) > +xserver_save_CFLAGS=$CFLAGS > +CFLAGS="$AVX2_CFLAGS $CFLAGS" > + > +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ > +#include > +int param; > +int main () { > +__m256i a = _mm256_set1_epi32 (param), b = _mm256_set1_epi32 (param + > 1), c; > +c = _mm256_maddubs_epi16 (a, b); > +return _mm256_cvtsi256_si32(c); > +}]])], have_avx2_intrinsics=yes) > +CFLAGS=$xserver_save_CFLAGS > + > +AC_ARG_ENABLE(avx2, > + [AC_HELP_STRING([--disable-avx2], > + [disable AVX2 fast paths])], > + [enable_avx2=$enableval], [enable_avx2=auto]) > + > +if test $enable_avx2 = no ; then > + have_avx2_intrinsics=disabled > +fi > + > +if test $have_avx2_intrinsics = yes ; then > + AC_DEFINE(USE_AVX2, 1, [use AVX2 compiler intrinsics]) > +fi > + > +AC_MSG_RESULT($have_avx2_intrinsics) > +if test $enable_avx2 = yes && test $have_avx2_intrinsics = no ; then > + AC_MSG_ERROR([AVX2 intrinsics not detected]) > +fi > + > +AM_CONDITIONAL(USE_AVX2, test $have_avx2_intrinsics = yes) > + > +dnl > === > dnl Other special flags needed when building code using MMX or SSE > instructions > case $host_os in > solaris*) > @@ -538,6 +580,8 @@ AC_SUBST(MMX_LDFLAGS) > AC_SUBST(SSE2_CFLAGS) > AC_SUBST(SSE2_LDFLAGS) > AC_SUBST(SSSE3_CFLAGS) > +AC_SUBST(AVX2_CFLAGS) > +AC_SUBST(AVX2_LDFLAGS) > > dnl > === > dnl Check for VMX/Altivec > diff --git a/pixman/Makefile.am b/pixman/Makefile.am > index 581b6f6..7204621 100644 > --- a/pixman/Makefile.am > +++ b/pixman/Makefile.am > @@ -64,6 +64,18 @@ libpixman_1_la_LIBADD += libpixman-ssse3.la > ASM_CFLAGS_ssse3=$(SSSE3_CFLAGS) > endif > > +# avx2 code > +if USE_AVX2 > +noinst_LTLIBRARIES += libpixman-avx2.la > +libpixman_avx2_la_SOURCES = \ > + pixman-avx2.c > +libpixman_avx2_la_CFLAGS = $(AVX2_CFLAGS) > +libpixman_1_la_LDFLAGS += $(AVX2_LDFLAGS) > +libpixman_1_la_LIBADD += libpixman-avx2.la > + > +ASM_CFLAGS_avx2=$(AVX2_CFLAGS) > +endif > + > # arm simd code > if USE_ARM_SIMD > noinst_LTLIBRARIES += libpixman-arm-simd.la > diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c > new file mode 100644 > index 000..d860d67 > --- /dev/null > +++ b/pixman/pixman-avx2.c > @@ -0,0 +1,32 @@ > +#ifdef HAVE_CONFIG_H > +#include > +#endif > + > +#include /* for AVX2 intrinsics */ > +#include "pixman-private.h" > +#include "pixman-combine32.h" > +#include "pixman-inlines.h" > + > +static const pixman_fast_path_t avx2_fast_paths[] = > +{ > +{ PIXMAN_OP_NONE }, > +}; > + > +static const pixman_iter_info_t avx2_iters[] = Trailing whitespace > +{ > +{ PIXMAN_null }, > +}; > + > +#if defined(__GNUC__) && !defined(__x86_64__) && !defined(__amd64__) > +__attribute__((__force_align_arg_pointer__)) > +#endif > +pixman_implementation_t * > +_pixman_implementation_create_avx2 (pixman_implementation_t *fallback) > +{ > +pixman_implementation_t *imp = _pixman_implementation_create (fallback, > avx2_fast_paths); > + > +/* Set up function pointers */ > +imp->iter_info = avx2_iters; > + > +return imp; > +} > diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h > index 73a5414..b6b15df 100644 > --- a/pixman/pixman-private.h > +++ b/pixman/pixman-private.h > @@ -597,6 +597,11 @@ pixman_implementation_t * > _pixman_implementation_create_ssse3 (pixman_implementation_t *fallback); > #endif > > +#ifdef USE_AVX2 > +pixman_implementation_t * > +_pixman_implementation_create_avx2 (pixman_implementation_t *fallback); > +#endif > + > #ifdef USE_ARM_SIMD > pixman_implementation_t * > _pixman_implementation_create_arm_simd (pixman_implementation_t *fallback); > diff --git a/pixman/pixman-x86.c b/pixman/pixman-x86.c > index 05297c4..687c83b 100644 > --- a/pixman/pixman-x86.c > +++ b/pixman/pixman-x86.c At the top of this file there is a preprocessor check: #if defined(USE_X86_MMX) || defined (USE_SSE2) || defined
Re: [Pixman] [Patch 1/1] Clang compile failure due to use of __builtin_shuffle
On Tue, Aug 7, 2018 at 2:50 AM StormByte wrote: > > While playing with Clang and compiling a Gentoo system with it, I realized > that pixman is not compiling because of the use of __builtin_shuffle which > according to LLVM mailing list, should not be used directly [1]. > > As such, I investigated a bit, and made a patch for making it compile > compatible with Clang that I attach here in the hope that it is reviewed. > Thanks, > David C. Manuelda > [1]: http://lists.llvm.org/pipermail/cfe-dev/2017-August/055142.html Thanks. This has already been reported as https://bugs.gentoo.org/646360 and I committed a patch two months ago to fix it -- see https://gitlab.freedesktop.org/pixman/pixman/commit/bd2b49185b28c5024597a5e530af9fc25de3193a The next version of pixman will include the patch. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Pushing unreviewed patches to the pixman git repository
On Tue, Jun 5, 2018 at 6:06 PM, Siarhei Siamashka wrote: > Hello, > > I noticed that some people with commit access started pushing patches > to the pixman git repository without giving the pixman mailing list > subscribers any reasonable chance to review them: > > https://cgit.freedesktop.org/pixman/commit/?id=8b95e0e460baa499e54c19d29bf761d34c25badc > https://cgit.freedesktop.org/pixman/commit/?id=bd2b49185b28c5024597a5e530af9fc25de3193a > > Yes, these fixes were trivial. But still it would be more polite to > actually post patches to the mailing list, collect some reviews and > then *wait* at least severaldays before pushing them to the repository > (unless the issue is really urgent). Not everyone constantly monitors > the mailing list and is able to provide an instant response. I hope you don't consider those two patches to be similar cases. One was committed without going to the mailing list by someone with one patch in pixman every 5 years. The other was was sent to the mailing list by a person with plenty of pixman contributions and reviewed by two people. In Mesa we wait 24 hours, for the reasons you describe. Looks like it was close to 24 hours in this case. I'm happy to wait more than 24 hours in the future -- that's no problem. I'm just taking issue with the suggestion that the two cited examples are somehow the same. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] test: Adjust for clang's removal of __builtin_shuffle
On Mon, Jun 4, 2018 at 10:37 AM, Adam Jackson wrote: > On Mon, 2018-06-04 at 10:04 -0700, Matt Turner wrote: > >> #ifdef HAVE_GCC_VECTOR_EXTENSIONS >> -const uint8x16 bswap_shufflemask = >> +# if __has_builtin(__builtin_shufflevector) >> +randdata.vb = >> +__builtin_shufflevector (randdata.vb, randdata.vb, >> + 3, 2, 1, 0, 7, 6 , 5, 4, >> + 11, 10, 9, 8, 15, 14, 13, 12); >> +# else >> +static const uint8x16 bswap_shufflemask = > ^^^ > > Seems superfluous, though I guess it doesn't change semantics. With or > without that bit: Oh, I think I added that when I was trying to consolidate the constants between the two paths. I'll remove that. > Reviewed-by: Adam Jackson > > I think we're starting to be well overdue for an 0.36 release, but I'd > like to take the opportunity to suggest moving to fdo's gitlab as we do > that. I already have a copy imported personally and have CI working: > > https://gitlab.freedesktop.org/ajax/pixman/-/jobs/986 Agreed. I would like to make 0.36 pass the test suite with clang, so if you have any time or interest I'd appreciate a second set of eyes. I'll filed https://bugs.freedesktop.org/show_bug.cgi?id=106818 so we can track it. I guess it's possible it's a clang bug. I also need to take some time to look into the Loongson3 patch. If you're not in a particular hurry, it would be nice to get that in. ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] test: Adjust for clang's removal of __builtin_shuffle
From: Vladimir Smirnov __builtin_shuffle was removed in clang 5.0. Build log says: test/utils-prng.c:207:27: error: use of unknown builtin '__builtin_shuffle' [-Wimplicit-function-declaration] randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask); ^ test/utils-prng.c:207:25: error: assigning to 'uint8x16' (vector of 16 'uint8_t' values) from incompatible type 'int' randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask); ^ ~~ 2 errors generated Link to original discussion: http://lists.llvm.org/pipermail/cfe-dev/2017-August/055140.html It's possible to build pixman if attached patch is applied. Basically patch adds check for __builtin_shuffle support and in case there is none, falls back to clang-specific __builtin_shufflevector that do the same but have different API. Bugzilla: https://bugs.gentoo.org/646360 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104886 Tested-by: Philip Chimento Reviewed-by: Matt Turner --- I turned https://bugs.freedesktop.org/show_bug.cgi?id=104886#c2 into a Tested-by tag for Philip. I also reversed the order of the preprocessor conditions in order to simplify it a bit (the !defined(__clang__) looked like a problem waiting to happen). Unfortunately combiner-test, gradient-crash-test, and stress-test fail when built with clang for unrelated reasons. test/utils-prng.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/test/utils-prng.c b/test/utils-prng.c index c27b5be..0cf53dd 100644 --- a/test/utils-prng.c +++ b/test/utils-prng.c @@ -199,12 +199,25 @@ randmemset_internal (prng_t *prng, } else { + +#ifndef __has_builtin +#define __has_builtin(x) 0 +#endif + #ifdef HAVE_GCC_VECTOR_EXTENSIONS -const uint8x16 bswap_shufflemask = +# if __has_builtin(__builtin_shufflevector) +randdata.vb = +__builtin_shufflevector (randdata.vb, randdata.vb, + 3, 2, 1, 0, 7, 6 , 5, 4, + 11, 10, 9, 8, 15, 14, 13, 12); +# else +static const uint8x16 bswap_shufflemask = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask); +# endif + store_rand_128_data (buf, , aligned); buf += 16; #else -- 2.16.1 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] vmx: Fix vector loads on ppc64le
Tested-by: Matt Turner <matts...@gmail.com> ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Pixman not building on MacOS X 10.11
On Sun, Oct 11, 2015 at 10:34 AM, Andrea Cancianiwrote: > On Sun, Oct 11, 2015 at 5:30 AM, Siarhei Siamashka > wrote: >> >> On Sun, 11 Oct 2015 04:53:08 +0300 >> Siarhei Siamashka wrote: >> >> > On Sat, 10 Oct 2015 16:03:53 -0700 >> > Jeremy Huddleston Sequoia wrote: >> > >> > > > On Oct 10, 2015, at 13:48, Andrea Canciani >> > > > wrote: >> > > > The attached hack gets the code to compile on modern clang, but I >> > > > believe first of all we should improve the configure.ac detection >> > > > code >> > > > so that pixman can actually build both on old and on new clang >> > > > versions (possibly with mmx disabled, if the asm constraints we need >> > > > are not implemented). >> > >> > This workaround looks reasonable to me. We should probably just drop >> > the whole "ifdef __OPTIMIZE__" part in >> > >> > http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n92 >> > >> > I don't quite like the fact that this way of returning results from >> > a macro is a GNU C specific extension. But as you said, the configure >> > test can be updated to better match the code and also check if the >> > compiler supports this particular construct. >> > >> > Could you please submit the final variant of your patch in a >> > "git format-patch" format with a commit message and your >> > Signed-off-by tag? >> >> After looking at this issue a bit more, I realized that we are >> about to add a second layer of workarounds on top of the existing >> old workarounds :-) > > > The attached patch should fix the issue with only minor changes. > It keeps the workarounds :( but somewhat it simplifies them :) > I followed your suggestion of checking block expressions. > Given that the _mm_shuffle_pi16() function is always used in a "return" > statement, if needed we could avoid the usage of block expressions by > defining a macro "_return_mm_shuffle_pi16()" (which would return the result > of the operation instead of making it available as an expression) both for > the xmmintrin branch and for the hand-coded one. > >> The original problem is that certain compilers (just GCC?) did not >> support some intrinsics when compiling MMX code (_mm_movemask_pi8, >> _mm_mulhi_pu16, _mm_shuffle_pi16) and we got the following code: >> >> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n66 >> >> In fact, these instructions were not available as part of the original >> MMX, but only got introduced later with AMD Extended 3DNow! and Intel >> SSE1. This is mentioned in the commit messages: >> >> http://cgit.freedesktop.org/pixman/commit/?id=84221f4c1687b8ea14e9cbdc78b2ba7258e62c9e >> >> http://cgit.freedesktop.org/pixman/commit/?id=14208344964f341a7b4a704b05cf4804c23792e9 >> >> These extra instructions are unofficially known as MMX2. But GCC does >> not have a separate option for "-mmmx2". Instead the GCC manual says >> that these intrinsics are available when either "-msse" or a >> combination of "-m3dnow -march=athlon" is used: >> >> https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions >> >> >> Now I wonder if the comment "We have to compile with -msse to use >> xmmintrin.h" is still valid. I tried to tweak the following ifdef to >> use the part of code, which includes and the it compiled >> fine for me with CFLAGS="-O2 -m32" using recent versions of GCC and >> Clang: >> >> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n63 >> >> I believe that this might be somehow related to the new __ALL_ISA__ >> define, which had been mentioned in 2013: >> https://gcc.gnu.org/ml/gcc-patches/2013-04/txts5M0c0uU9y.txt >> >> So what about just dropping this ugly stuff and adding a configure >> check, which would verify if the MMX code can include ? > > > I would love getting rid of the workarounds, but I'm somewhat worried about > the possibility of regressions. > If you believe is a valid option, we might definitely try to pursue it. > > What is the best way forward? I've now reverted my commit and pushed yours. Thanks. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
On Sun, Oct 25, 2015 at 1:13 PM, Matt Turner <matts...@gmail.com> wrote: > On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <matts...@gmail.com> wrote: >> We had lots of hacks to handle the inability to include xmmintrin.h >> without compiling with -msse (lest SSE instructions be used in >> pixman-mmx.c). Some recent version of gcc relaxed this restriction. >> >> Change configure.ac to test that xmmintrin.h can be included and that we >> can use some intrinsics from it, and remove the work-around code from >> pixman-mmx.c. >> >> Evidently allows gcc 4.9.3 to optimize better as well: >> >>textdata bss dec hex filename >> 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before >> 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after >> >> Signed-off-by: Matt Turner <matts...@gmail.com> >> --- > > Ugh. This is apparently not sufficient... > > https://bugs.gentoo.org/show_bug.cgi?id=564024 > > GCC allows you to *include* xmmintrin.h without enabling SSE, but it > still doesn't allow you to use any of the functions: > > conftest.c: In function ‘main’: > /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1: > error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’: > target specific option mismatch > _mm_mulhi_pu16 (__m64 __A, __m64 __B) > ^ > conftest.c:12:7: error: called from here > w = _mm_mulhi_pu16(w, w); > > I'm not sure what to do except to revert. > > The MMX but no SSE case is important, at least it was in the past > because of OLPC's XO-1. > > Suggestions besides reverting this? I've now reverted this commit and committed Andrea's fix for clang. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Pixman not building on MacOS X 10.11
On Wed, Nov 18, 2015 at 8:35 PM, Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote: > On Wed, 18 Nov 2015 14:22:09 -0800 > Matt Turner <matts...@gmail.com> wrote: > >> On Sun, Oct 11, 2015 at 10:34 AM, Andrea Canciani <ranm...@gmail.com> wrote: >> > On Sun, Oct 11, 2015 at 5:30 AM, Siarhei Siamashka >> > <siarhei.siamas...@gmail.com> wrote: >> >> >> >> On Sun, 11 Oct 2015 04:53:08 +0300 >> >> Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote: >> >> >> >> > On Sat, 10 Oct 2015 16:03:53 -0700 >> >> > Jeremy Huddleston Sequoia <jerem...@freedesktop.org> wrote: >> >> > >> >> > > > On Oct 10, 2015, at 13:48, Andrea Canciani <ranm...@gmail.com> >> >> > > > wrote: >> >> > > > The attached hack gets the code to compile on modern clang, but I >> >> > > > believe first of all we should improve the configure.ac detection >> >> > > > code >> >> > > > so that pixman can actually build both on old and on new clang >> >> > > > versions (possibly with mmx disabled, if the asm constraints we need >> >> > > > are not implemented). >> >> > >> >> > This workaround looks reasonable to me. We should probably just drop >> >> > the whole "ifdef __OPTIMIZE__" part in >> >> > >> >> > http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n92 >> >> > >> >> > I don't quite like the fact that this way of returning results from >> >> > a macro is a GNU C specific extension. But as you said, the configure >> >> > test can be updated to better match the code and also check if the >> >> > compiler supports this particular construct. >> >> > >> >> > Could you please submit the final variant of your patch in a >> >> > "git format-patch" format with a commit message and your >> >> > Signed-off-by tag? >> >> >> >> After looking at this issue a bit more, I realized that we are >> >> about to add a second layer of workarounds on top of the existing >> >> old workarounds :-) >> > >> > >> > The attached patch should fix the issue with only minor changes. >> > It keeps the workarounds :( but somewhat it simplifies them :) >> > I followed your suggestion of checking block expressions. >> > Given that the _mm_shuffle_pi16() function is always used in a "return" >> > statement, if needed we could avoid the usage of block expressions by >> > defining a macro "_return_mm_shuffle_pi16()" (which would return the result >> > of the operation instead of making it available as an expression) both for >> > the xmmintrin branch and for the hand-coded one. >> > >> >> The original problem is that certain compilers (just GCC?) did not >> >> support some intrinsics when compiling MMX code (_mm_movemask_pi8, >> >> _mm_mulhi_pu16, _mm_shuffle_pi16) and we got the following code: >> >> >> >> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n66 >> >> >> >> In fact, these instructions were not available as part of the original >> >> MMX, but only got introduced later with AMD Extended 3DNow! and Intel >> >> SSE1. This is mentioned in the commit messages: >> >> >> >> http://cgit.freedesktop.org/pixman/commit/?id=84221f4c1687b8ea14e9cbdc78b2ba7258e62c9e >> >> >> >> http://cgit.freedesktop.org/pixman/commit/?id=14208344964f341a7b4a704b05cf4804c23792e9 >> >> >> >> These extra instructions are unofficially known as MMX2. But GCC does >> >> not have a separate option for "-mmmx2". Instead the GCC manual says >> >> that these intrinsics are available when either "-msse" or a >> >> combination of "-m3dnow -march=athlon" is used: >> >> >> >> https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions >> >> >> >> >> >> Now I wonder if the comment "We have to compile with -msse to use >> >> xmmintrin.h" is still valid. I tried to tweak the following ifdef to >> >> use the part of code, which includes and the it compiled >> >> fine for me with CFLAGS="-O2 -m32" using recent versions of GCC and >>
Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
On Sun, Oct 25, 2015 at 5:10 PM, Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote: > On Sun, 25 Oct 2015 13:13:09 -0700 > Matt Turner <matts...@gmail.com> wrote: > >> On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <matts...@gmail.com> wrote: >> > We had lots of hacks to handle the inability to include xmmintrin.h >> > without compiling with -msse (lest SSE instructions be used in >> > pixman-mmx.c). Some recent version of gcc relaxed this restriction. >> > >> > Change configure.ac to test that xmmintrin.h can be included and that we >> > can use some intrinsics from it, and remove the work-around code from >> > pixman-mmx.c. >> > >> > Evidently allows gcc 4.9.3 to optimize better as well: >> > >> >textdata bss dec hex filename >> > 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before >> > 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after >> > >> > Signed-off-by: Matt Turner <matts...@gmail.com> >> > --- >> >> Ugh. This is apparently not sufficient... >> >> https://bugs.gentoo.org/show_bug.cgi?id=564024 >> >> GCC allows you to *include* xmmintrin.h without enabling SSE, but it >> still doesn't allow you to use any of the functions: >> >> conftest.c: In function ‘main’: >> /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1: >> error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’: >> target specific option mismatch >> _mm_mulhi_pu16 (__m64 __A, __m64 __B) >> ^ >> conftest.c:12:7: error: called from here >> w = _mm_mulhi_pu16(w, w); > > Oh, looks like the restriction used to be relaxed for a while, but then > GCC 4.9 started to be strict again: > https://bugzilla.redhat.com/show_bug.cgi?id=1092991#c1 > >> I'm not sure what to do except to revert. > > The real problem is that GCC does not provide a separate option for > MMX2 (a common subset of 3DNOW and SSE). We usually solve compiler > problems by reporting bugs to compiler developers. This particular > case had not been handled according to the usual rule, and now > we have a nice practical demonstration of the consequences ;-) > > BTW, we can still report a bug to GCC. Better late than never. Yeah, I suppose. The disappointing thing is that Google says an -m3dnowext flag existed at one point... >> The MMX but no SSE case is important, at least it was in the past >> because of OLPC's XO-1. > > I'm not sure how many OLPC XO-1 laptops might be still remaining in > real use in the hands of real people: > http://www.olpcnews.com/about_olpc_news/goodbye_one_laptop_per_child.html > >> Suggestions besides reverting this? > > Because OLPC XO-1 is using the AMD Geode processor, we could probably > treat the code in pixman-mmx.c as 3dnow optimizations on x86 hardware? The problem is that -m3dnow isn't sufficient. The instructions we want to use are a subset of SSE that AMD implemented in the Athlon. We need an -m3dnowext flag. We can't pass -march=athlon in MMX_CFLAGS either, since the user is likely to have specified a -march= value of their own. > Another option is to start using assembly instead of intrinsics. > Unless a miracle happens and somebody decides to pay for this job, > we definitely don't have resources to do a high quality assembly > implementation for MMX/MMX2. But we still can take the assembly > output of GCC and tweak it a bit. This is ugly and not very > maintainable though. Been there, done that with ARMv6. Not interested. > Or we could simply do nothing and finally retire MMX support on x86. > If OLPC XO-1 users still do exist, they can always contact us. I don't care so much about XO-1, but I do want to retain the ability to test the MMX code on x86. iwMMXt/loongson systems are slow, and most development can be done on a fast desktop this way. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
On Sun, Oct 25, 2015 at 7:12 PM, Søren Sandmannwrote: > On Sun, Oct 25, 2015 at 8:10 PM, Siarhei Siamashka > wrote: > >> >> Or we could simply do nothing and finally retire MMX support on x86. >> If OLPC XO-1 users still do exist, they can always contact us. > > > This is probably the way forward. Except for XO-1, MMX hasn't really done > anything useful on > x86 for a long time, but it has been an endless source of compiler headaches > and maintenance > issues. I agree that it has caused a huge number of compiler headaches. I suppose I'd be okay with disabling it by default, but like I said to Siarhei I would like to keep it working on x86 because that's a much easier way to test and prototype code than using slow iwMMXt/loongson systems. Though, I do fear that if we disable it by default it'll just get close to zero testing. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <matts...@gmail.com> wrote: > We had lots of hacks to handle the inability to include xmmintrin.h > without compiling with -msse (lest SSE instructions be used in > pixman-mmx.c). Some recent version of gcc relaxed this restriction. > > Change configure.ac to test that xmmintrin.h can be included and that we > can use some intrinsics from it, and remove the work-around code from > pixman-mmx.c. > > Evidently allows gcc 4.9.3 to optimize better as well: > >textdata bss dec hex filename > 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before > 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after > > Signed-off-by: Matt Turner <matts...@gmail.com> > --- Ugh. This is apparently not sufficient... https://bugs.gentoo.org/show_bug.cgi?id=564024 GCC allows you to *include* xmmintrin.h without enabling SSE, but it still doesn't allow you to use any of the functions: conftest.c: In function ‘main’: /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1: error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’: target specific option mismatch _mm_mulhi_pu16 (__m64 __A, __m64 __B) ^ conftest.c:12:7: error: called from here w = _mm_mulhi_pu16(w, w); I'm not sure what to do except to revert. The MMX but no SSE case is important, at least it was in the past because of OLPC's XO-1. Suggestions besides reverting this? ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
We had lots of hacks to handle the inability to include xmmintrin.h without compiling with -msse (lest SSE instructions be used in pixman-mmx.c). Some recent version of gcc relaxed this restriction. Change configure.ac to test that xmmintrin.h can be included and that we can use some intrinsics from it, and remove the work-around code from pixman-mmx.c. Evidently allows gcc to optimize better as well: textdata bss dec hex filename 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after --- configure.ac| 15 -- pixman/pixman-mmx.c | 60 + 2 files changed, 5 insertions(+), 70 deletions(-) diff --git a/configure.ac b/configure.ac index 424bfd3..b04cc69 100644 --- a/configure.ac +++ b/configure.ac @@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ #error "Need GCC >= 3.4 for MMX intrinsics" #endif #include +#include int main () { __m64 v = _mm_cvtsi32_si64 (1); __m64 w; -/* Some versions of clang will choke on K */ -asm ("pshufw %2, %1, %0\n\t" -: "=y" (w) -: "y" (v), "K" (5) -); - -/* Some versions of clang will choke on this */ -asm ("pmulhuw %1, %0\n\t" - : "+y" (w) - : "y" (v) -); +/* Test some intrinsics from xmmintrin.h */ +w = _mm_shuffle_pi16(v, 5); +w = _mm_mulhi_pu16(w, w); return _mm_cvtsi64_si32 (v); }]])], have_mmx_intrinsics=yes) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 05c48a4..6bcdee2 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -39,6 +39,7 @@ #include #else #include +#include #endif #include "pixman-private.h" #include "pixman-combine32.h" @@ -59,65 +60,6 @@ _mm_empty (void) } #endif -#ifdef USE_X86_MMX -# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64)) -# include -# else -/* We have to compile with -msse to use xmmintrin.h, but that causes SSE - * instructions to be generated that we don't want. Just duplicate the - * functions we want to use. */ -extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_movemask_pi8 (__m64 __A) -{ -int ret; - -asm ("pmovmskb %1, %0\n\t" - : "=r" (ret) - : "y" (__A) -); - -return ret; -} - -extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mulhi_pu16 (__m64 __A, __m64 __B) -{ -asm ("pmulhuw %1, %0\n\t" - : "+y" (__A) - : "y" (__B) -); -return __A; -} - -# ifdef __OPTIMIZE__ -extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_shuffle_pi16 (__m64 __A, int8_t const __N) -{ -__m64 ret; - -asm ("pshufw %2, %1, %0\n\t" - : "=y" (ret) - : "y" (__A), "K" (__N) -); - -return ret; -} -# else -# define _mm_shuffle_pi16(A, N) \ -({ \ - __m64 ret; \ - \ - asm ("pshufw %2, %1, %0\n\t"\ -: "=y" (ret) \ -: "y" (A), "K" ((const int8_t)N) \ - ); \ - \ - ret;\ -}) -# endif -# endif -#endif - #ifndef _MSC_VER #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \ (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0)) -- 2.4.9 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
We had lots of hacks to handle the inability to include xmmintrin.h without compiling with -msse (lest SSE instructions be used in pixman-mmx.c). Some recent version of gcc relaxed this restriction. Change configure.ac to test that xmmintrin.h can be included and that we can use some intrinsics from it, and remove the work-around code from pixman-mmx.c. Evidently allows gcc 4.9.3 to optimize better as well: textdata bss dec hex filename 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after Signed-off-by: Matt Turner <matts...@gmail.com> --- Looks like _MM_SHUFFLE isn't defined by ARM's mmintrin.h. configure.ac| 15 - pixman/pixman-mmx.c | 64 - 2 files changed, 8 insertions(+), 71 deletions(-) diff --git a/configure.ac b/configure.ac index 424bfd3..b04cc69 100644 --- a/configure.ac +++ b/configure.ac @@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ #error "Need GCC >= 3.4 for MMX intrinsics" #endif #include +#include int main () { __m64 v = _mm_cvtsi32_si64 (1); __m64 w; -/* Some versions of clang will choke on K */ -asm ("pshufw %2, %1, %0\n\t" -: "=y" (w) -: "y" (v), "K" (5) -); - -/* Some versions of clang will choke on this */ -asm ("pmulhuw %1, %0\n\t" - : "+y" (w) - : "y" (v) -); +/* Test some intrinsics from xmmintrin.h */ +w = _mm_shuffle_pi16(v, 5); +w = _mm_mulhi_pu16(w, w); return _mm_cvtsi64_si32 (v); }]])], have_mmx_intrinsics=yes) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 05c48a4..88c3a39 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -40,6 +40,9 @@ #else #include #endif +#ifdef USE_X86_MMX +#include +#endif #include "pixman-private.h" #include "pixman-combine32.h" #include "pixman-inlines.h" @@ -59,66 +62,7 @@ _mm_empty (void) } #endif -#ifdef USE_X86_MMX -# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64)) -# include -# else -/* We have to compile with -msse to use xmmintrin.h, but that causes SSE - * instructions to be generated that we don't want. Just duplicate the - * functions we want to use. */ -extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_movemask_pi8 (__m64 __A) -{ -int ret; - -asm ("pmovmskb %1, %0\n\t" - : "=r" (ret) - : "y" (__A) -); - -return ret; -} - -extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mulhi_pu16 (__m64 __A, __m64 __B) -{ -asm ("pmulhuw %1, %0\n\t" - : "+y" (__A) - : "y" (__B) -); -return __A; -} - -# ifdef __OPTIMIZE__ -extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_shuffle_pi16 (__m64 __A, int8_t const __N) -{ -__m64 ret; - -asm ("pshufw %2, %1, %0\n\t" - : "=y" (ret) - : "y" (__A), "K" (__N) -); - -return ret; -} -# else -# define _mm_shuffle_pi16(A, N) \ -({ \ - __m64 ret; \ - \ - asm ("pshufw %2, %1, %0\n\t"\ -: "=y" (ret) \ -: "y" (A), "K" ((const int8_t)N) \ - ); \ - \ - ret;\ -}) -# endif -# endif -#endif - -#ifndef _MSC_VER +#ifndef _MM_SHUFFLE #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \ (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0)) #endif -- 2.4.9 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
On Sun, Oct 11, 2015 at 8:41 PM, Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote: > On Sun, 11 Oct 2015 14:55:28 -0700 > Matt Turner <matts...@gmail.com> wrote: > > Hello, > > Thanks. The patch looks good. In fact, it also allows the MMX code to > be compiled with the Intel Compiler now (previously it was disabled by > the configure check). A few minor things need to be fixed though. See > the comments below. > >> We had lots of hacks to handle the inability to include xmmintrin.h >> without compiling with -msse (lest SSE instructions be used in > > "lest" -> "lets" ? Nope, I mean "lest" (means "otherwise something bad would happen") >> pixman-mmx.c). Some recent version of gcc relaxed this restriction. >> >> Change configure.ac to test that xmmintrin.h can be included and that we >> can use some intrinsics from it, and remove the work-around code from >> pixman-mmx.c. >> >> Evidently allows gcc to optimize better as well: >> >>text data bss dec hex filename >> 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before >> 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after > > It is always a good idea to mention the exact version of gcc in the > commit message. For example, it could help if somebody happens to be > reading this commit message a few years in the future. Sure, will do. > As for being able to optimize better. Yes, the "asm" blocks are > treated by the compiler as opaque boxes (with just the input/output > interface specified by constraints). The optimizer has difficulties > generating efficient code if it has to deal with these bubbles. So > it is a good idea to use intrinsics instead of single-instruction > "asm" statements. > > Also I'm not completely sure, but now we probably prefer (require?) the > "Signed-off-by" tags in commit messages. Will do. >> --- >> configure.ac| 15 -- >> pixman/pixman-mmx.c | 60 >> + >> 2 files changed, 5 insertions(+), 70 deletions(-) > > Nice stats :-) > >> >> diff --git a/configure.ac b/configure.ac >> index 424bfd3..b04cc69 100644 >> --- a/configure.ac >> +++ b/configure.ac >> @@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ >> #error "Need GCC >= 3.4 for MMX intrinsics" >> #endif >> #include >> +#include > > We still would want to have this under the USE_X86_MMX ifdef check. > Otherwise crosscompiling for ARM fails: > > $ ./configure --host=arm-linux-gnueabihf --disable-libpng --disable-gtk > $ make > > pixman-mmx.c:42:23: fatal error: xmmintrin.h: No such file or directory > #include >^ Heh, can't believe I forgot about that since I added the iwMMXt support. :) >> int main () { >> __m64 v = _mm_cvtsi32_si64 (1); >> __m64 w; >> >> -/* Some versions of clang will choke on K */ >> -asm ("pshufw %2, %1, %0\n\t" >> -: "=y" (w) >> -: "y" (v), "K" (5) >> -); >> - >> -/* Some versions of clang will choke on this */ >> -asm ("pmulhuw %1, %0\n\t" >> - : "+y" (w) >> - : "y" (v) >> -); >> +/* Test some intrinsics from xmmintrin.h */ >> +w = _mm_shuffle_pi16(v, 5); >> +w = _mm_mulhi_pu16(w, w); >> >> return _mm_cvtsi64_si32 (v); >> }]])], have_mmx_intrinsics=yes) >> diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c >> index 05c48a4..6bcdee2 100644 >> --- a/pixman/pixman-mmx.c >> +++ b/pixman/pixman-mmx.c >> @@ -39,6 +39,7 @@ >> #include >> #else >> #include >> +#include >> #endif >> #include "pixman-private.h" >> #include "pixman-combine32.h" >> @@ -59,65 +60,6 @@ _mm_empty (void) >> } >> #endif >> >> -#ifdef USE_X86_MMX >> -# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64)) >> -# include >> -# else >> -/* We have to compile with -msse to use xmmintrin.h, but that causes SSE >> - * instructions to be generated that we don't want. Just duplicate the >> - * functions we want to use. */ >> -extern __inline int __attribute__((__gnu_inline__, __always_inline__, >> __artificial__)) >> -_mm_movemask_pi8 (__m64 __A) >> -{ >> -int ret; >> - >
Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)
On Thu, Aug 20, 2015 at 6:58 AM, Pekka Paalanen ppaala...@gmail.com wrote: From: Ben Avison bavi...@riscosopen.org This is a C fast path, useful for reference or for platforms that don't have their own fast path for this operation. This new fast path is initially disabled by putting the entries in the lookup table after the sentinel. The compiler cannot tell the new code is not used, so it cannot eliminate the code. Also the lookup table size will include the new fast path. When the follow-up patch then enables the new fast path, the binary layout (alignments, size, etc.) will stay the same compared to the disabled case. Keeping the binary layout identical is important for benchmarking on Raspberry Pi 1. The addresses at which functions are loaded will have a significant impact on benchmark results, causing unexpected performance changes. Keeping all function addresses the same across the patch enabling a new fast path improves the reliability of benchmarks. Benchmark results are included in the patch enabling this fast path. [Pekka: disabled the fast path, commit message] Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk I don't care strongly, but I might just squash 1+2, 3+4 together and make a mention in the commit message of exactly what the benchmark numbers are comparing. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] test: Add cover-test
On Tue, May 26, 2015 at 3:58 PM, Ben Avison bavi...@riscosopen.org wrote: This test aims to verify both numerical correctness and the honouring of array bounds for scaled plots (both nearest-neighbour and bilinear) at or close to the boundary conditions for applicability of cover type fast paths and iter fetch routines. It has a secondary purpose: by setting the env var EXACT (to any value) it will only test plots that are exactly on the boundary condition. This makes it possible to ensure that cover routines are being used to the maximum, although this requires the use of a debugger or code instrumentation to verify. --- Note that this must be pushed after Pekka's fence-image patches. test/Makefile.sources |1 + test/cover-test.c | 376 + 2 files changed, 377 insertions(+), 0 deletions(-) create mode 100644 test/cover-test.c diff --git a/test/Makefile.sources b/test/Makefile.sources index 14a3710..5b901db 100644 --- a/test/Makefile.sources +++ b/test/Makefile.sources @@ -26,6 +26,7 @@ TESTPROGRAMS = \ glyph-test\ solid-test\ stress-test \ + cover-test\ Remember to add cover-test to .gitignore. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Is Pixman being maintained at all?
On Thu, Apr 2, 2015 at 2:26 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Wed, 1 Apr 2015 18:46:10 -0700 Matt Turner matts...@gmail.com wrote: On Mon, Mar 30, 2015 at 10:58 AM, Bill Spitzak spit...@gmail.com wrote: On 03/30/2015 10:25 AM, Matt Turner wrote: Do you just need someone to push them? I'm not capable of reviewing these. Since Søren isn't really maintaining pixman anymore I'm not really sure how to proceed. Is this true? I don't see anyone but Pekka reviewing patches and there hasn't been a release in 15 months, so yeah. I think something needs to be done about this as all new work on X and Cairo is depending on pixman. I mean, sure. I have had an outstanding patch set for 8 months now. Søren responded to an earlier version and I tried to address it but have not heard anything since. This is very frustrating as I would like to work on this but I'm not going to do it if it is useless. As far as I know, Søren isn't working at Redhat any more, so I don't think you can expect him to continue maintaining pixman. Ok. Søren, Matt, Siarhei, how can we get the Pixman maintenance communitized? Maybe a la libdrm, because no-one has the resources to become a dedicated maintainer? Seems fine to me, though I don't really feel like a pixman maintainer. :) What does it take to get push and release authorization, in the political sense that Pixman quality would not degrade and the current/old maintainers would approve? What kind of review policies should be enforced? Søren told me back in December on IRC Feel free to do a release. I'm happy to have people commit to pixman who have a track record of contributions to other X.Org projects. What development guidelines should there be? Should it be strictly no new API/ABI nor features, only performance work and new platform support like the latest new ARM? I'm not aware of any backwards-incompatible changes to pixman, at least in a really long time. Keeping that policy in place seems like a good idea. New APIs do happen. I think that's probably fine. If there is one person contributing arch or cpu-specific optimizations in assembly that no-one is willing to review apart from the scope of code changes and style, should we trust that one person and just land his work if he shows the performance numbers are good? I might be a bit biased in my answer, since I have some patches to the MMX code in my tree that I don't expect anyone to review, but yeah I think we should mostly trust the author (obviously depends on the author's credibility). I mean, I'm a newbie here. I don't want to hijack this project and push it only to my own directions, also because I cannot become a dedicated maintainer, nor promise to review anyone else's stuff. But, there are patches I'd like to see landed. I could work on them with Ben, but if there is no-one upstream to tell us what goes and what doesn't, we are left to our own judgement. Would you trust my and Ben's judgement so that I could land Ben's patches and make Pixman releases? I don't think you're hijacking at all. I think this conversation needed to happen sooner or later, though I do wish Søren or Siarhei could spend a little time on it. You probably don't have a good understanding about how I work and what kind of a developer I am, nor have that kind of trust in me. That is fine. We need time to build that trust through discussion and patches. But it's hard to have a discussion if no-one can reply. I also understand that because I will not promise to be a maintainer, there is less incentive in educating me. It is quite likely that I hang around here for a while and then wander off when my needs are filled. I haven't worked with you, but I'm familiar with your contributions. I'd trust you to commit to pixman. But I don't think I could really educate anyone except in the MMX and SSE2 code. The same goes for everyone, I believe. What could we do to let Pixman go forward? I suppose a project in a similar state would just get forked by some new people, who will then drive it with their own goals. Except here that doesn't work, because the fork would soon fall into the same state as the original project, except the world would just be more fragmented. Couldn't we as well just loosen up on the master branch and let stuff land whenever someone is active and someone else doesn't see anything bad in it? There are always the stable branches, too, for those who want to stick to old and well-tested code. Yes, the software quality will likely degrade somewhat, at least from the old maintainers' perspective. However, the alternative seems to be a completely stalled project. Which one is better? FWIW, distros (well, Raspbian at least) already maintain their own forks, most likely as a single-person project. At upstream we could at least aim for a different person to review a change than the one who wrote it. For distribution
Re: [Pixman] [PATCH 1/5] armv6: Fix typo in preload macro
Pushed. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Is Pixman being maintained at all?
On Mon, Mar 30, 2015 at 10:58 AM, Bill Spitzak spit...@gmail.com wrote: On 03/30/2015 10:25 AM, Matt Turner wrote: Do you just need someone to push them? I'm not capable of reviewing these. Since Søren isn't really maintaining pixman anymore I'm not really sure how to proceed. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman Is this true? I don't see anyone but Pekka reviewing patches and there hasn't been a release in 15 months, so yeah. I think something needs to be done about this as all new work on X and Cairo is depending on pixman. I mean, sure. I have had an outstanding patch set for 8 months now. Søren responded to an earlier version and I tried to address it but have not heard anything since. This is very frustrating as I would like to work on this but I'm not going to do it if it is useless. As far as I know, Søren isn't working at Redhat any more, so I don't think you can expect him to continue maintaining pixman. If nothing is going to change in pixman I think Cairo is going to have to fork it and make a local copy. This is going to remove the ability for Cairo to use X remote rendering (since X will still be using the old pixman), though it is unclear if any serious software is using this mode any more. Sounds ridiculous. Get a Cairo developer to review and commit your pixman changes? I don't know. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/3] armv7: Faster fill operations
On Wed, Mar 4, 2015 at 5:56 PM, Ben Avison bavi...@riscosopen.org wrote: This eliminates a number of branches over blocks of code that are either empty or can be trivially combined with a separate code block at the start and end of each scanline. This has a surprisingly big effect, at least on Cortex-A7, for src_n_8: Before After Mean StdDev Mean StdDev Confidence Change L1 1570.4 133.11639.6 110.7 100.0% +4.4% L2 1042.6 19.9 1086.6 23.4100.0% +4.2% M 1030.8 7.2 1036.8 3.2 100.0% +0.6% HT 287.4 3.5 303.3 2.9 100.0% +5.5% VT 262.0 2.6 263.3 2.6 99.9% +0.5% R 206.5 2.4 209.9 2.4 100.0% +1.7% RT 56.5 1.0 59.2 0.5 100.0% +4.7% --- What do you use to generate this? I'd certainly like to use it. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Unable to build master on Raspberry PI
On Wed, Dec 3, 2014 at 9:18 AM, Andrea Giammarchi andrea.giammar...@gmail.com wrote: Thank you very much Siarhei, I am still building something huge and had no way to double check but at least I can confirm the gcc is 4.9.2. I will try to --disable-arm-iwmmxt when it shows arm6l as uname -m and let you know if that fixed. Do you think it should be enabled in the future or it's needed to let pixman properly work? iwMMXt is a SIMD instruction set that the Raspberry Pi's CPU doesn't support, so it's not useful for your use case. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/2] mmx nearest scaling paths
On Tue, Sep 23, 2014 at 12:24 PM, Søren Sandmann soren.sandm...@gmail.com wrote: IIRC, we have already discussed it before. Maybe we should just disable MMX support for x86 and use it only for MIPS Loongson and ARM IWMMXT? I don't really see the benefit. The bugs we've had have all been trivially fixed. I'm concerned that if we disable the MMX code on x86 that over time we might not notice a bug and it'll become harder to debug. But I suppose you had to disable SSE2 to find those bugs anyway.. I'd be in favor of that. For a long time the only real use case for MMX/x86 has been the XO 1 laptops, and I really doubt that they are getting updated pixman libraries any more. Søren Cc'ing Daniel Drake, who should know. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 1/2] mmx: Add nearest over_8888_n_8888
lowlevel-blt-bench -n, over__n_, 15 iterations on Loongson 2f: Before After Mean StdDev Mean StdDev Change L1 9.7 0.01 19.2 0.02 +98.2% L2 9.6 0.11 19.2 0.16 +99.5% M 7.3 0.02 12.5 0.01 +72.0% HT 6.6 0.01 13.4 0.02 +103.2% VT 6.4 0.01 12.6 0.03 +96.1% R 6.3 0.01 11.2 0.01 +76.5% RT 4.4 0.01 8.1 0.03 +82.6% --- pixman/pixman-mmx.c | 62 + 1 file changed, 62 insertions(+) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index f9a92ce..63f4cdf 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -3555,6 +3555,59 @@ mmx_composite_over_reverse_n_ (pixman_implementation_t *imp, _mm_empty (); } +static force_inline void +scaled_nearest_scanline_mmx__n__OVER (const uint32_t * mask, + uint32_t * dst, + const uint32_t * src, + int32_t w, + pixman_fixed_t vx, + pixman_fixed_t unit_x, + pixman_fixed_t src_width_fixed, + pixman_bool_tzero_src) +{ +__m64 mm_mask; + +if (zero_src || (*mask 24) == 0) + return; + +mm_mask = expand_alpha (load (mask)); + +while (w) +{ + uint32_t s = *(src + pixman_fixed_to_int (vx)); + vx += unit_x; + while (vx = 0) + vx -= src_width_fixed; + + if (s) + { + __m64 ms = load (s); + __m64 alpha = expand_alpha (ms); + __m64 dest = load (dst); + + store (dst, (in_over (ms, alpha, mm_mask, dest))); + } + + dst++; + w--; +} + +_mm_empty (); +} + +FAST_NEAREST_MAINLOOP_COMMON (mmx__n__cover_OVER, + scaled_nearest_scanline_mmx__n__OVER, + uint32_t, uint32_t, uint32_t, COVER, TRUE, TRUE) +FAST_NEAREST_MAINLOOP_COMMON (mmx__n__pad_OVER, + scaled_nearest_scanline_mmx__n__OVER, + uint32_t, uint32_t, uint32_t, PAD, TRUE, TRUE) +FAST_NEAREST_MAINLOOP_COMMON (mmx__n__none_OVER, + scaled_nearest_scanline_mmx__n__OVER, + uint32_t, uint32_t, uint32_t, NONE, TRUE, TRUE) +FAST_NEAREST_MAINLOOP_COMMON (mmx__n__normal_OVER, + scaled_nearest_scanline_mmx__n__OVER, + uint32_t, uint32_t, uint32_t, NORMAL, TRUE, TRUE) + #define BSHIFT ((1 BILINEAR_INTERPOLATION_BITS)) #define BMSK (BSHIFT - 1) @@ -3995,6 +4048,15 @@ static const pixman_fast_path_t mmx_fast_paths[] = PIXMAN_STD_FAST_PATH(IN, a8, null, a8, mmx_composite_in_8_8 ), PIXMAN_STD_FAST_PATH(IN, solid,a8, a8, mmx_composite_in_n_8_8), +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, mmx__n_ ), +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, mmx__n_ ), +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, mmx__n_ ), +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, mmx__n_ ), +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, a8r8g8b8, mmx__n_ ), +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, a8b8g8r8, mmx__n_ ), +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, x8r8g8b8, mmx__n_ ), +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, x8b8g8r8, mmx__n_ ), + SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, a8r8g8b8, mmx__ ), SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, x8r8g8b8, mmx__ ), SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8, x8r8g8b8, mmx__ ), -- 1.8.5.5 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 0/2] mmx nearest scaling paths
Here are a couple of nearest scaling MMX paths I wrote a long time ago for Loongson and other things using the MMX code. I've got a few more patches for the MMX code that I'll send out as I benchmark them. I don't really expect any reviews, so barring objections I'll plan to commit them in a few days. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] Test suite failures on 32-bit x86?
Building the 0.32.2 release and from git with CC=gcc -m32 ./autogen.sh make check PASS: prng-test PASS: a1-trap-test PASS: region-translate-test PASS: pdf-op-test PASS: region-test PASS: fetch-test ../test-driver: line 95: 3312 Segmentation fault $@ $log_file 21 FAIL: rotate-test PASS: oob-test PASS: infinite-loop PASS: combiner-test PASS: pixel-test PASS: trap-crasher PASS: alpha-loop PASS: thread-test PASS: scaling-helpers-test PASS: scaling-crash-test ../test-driver: line 95: 3571 Segmentation fault $@ $log_file 21 FAIL: matrix-test PASS: gradient-crash-test ../test-driver: line 95: 3637 Segmentation fault $@ $log_file 21 FAIL: blitters-test ../test-driver: line 95: 3659 Segmentation fault $@ $log_file 21 FAIL: glyph-test ../test-driver: line 95: 3681 Segmentation fault $@ $log_file 21 FAIL: scaling-test ../test-driver: line 95: 3703 Segmentation fault $@ $log_file 21 FAIL: affine-test PASS: alphamap PASS: composite-traps-test PASS: region-contains-test PASS: stress-test PASS: composite Manually running the tests shows that they all crash in prng_rand_128_r (utils-prng.h:138): uint32x4 e = x-a - ((x-b 27) + (x-b (32 - 27))); which is code inside an #ifdef GCC_VECTOR_EXTENSIONS_SUPPORTED block. I realize this may be a gcc bug, so I tested with 4.8.1 and 4.7.2 and got the same results. Testing with 4.6.3 leads to only a single failure, in matrix-test (with a different backtrace, so probably different). Do we need some kind of configure check to make sure that our use of gcc's vector extensions is actually going to work? ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Latest GIT source for 'pixman-sse2.c'
On Sun, Oct 6, 2013 at 1:50 AM, John Emmas john...@tiscali.co.uk wrote: On 05/10/2013 19:32, John Emmas wrote: On 5 Oct 2013, at 19:00, Siarhei Siamashka wrote: Andrea Canciani has already investigated the problem and submitted the fixes here: http://lists.freedesktop.org/archives/pixman/2013-September/002954.html Many thanks for the super fast response guys. I'm at a different PC now but I'll apply that patch tomorrow. I applied that patch this morning and sure enough, it does fix the problem. Thanks to Andrea for noticing it. BTW... while reading the patch I noticed that, quite by accident, the source file 'pixman-mmx.c' had somehow gotten excluded from my MSVC build project, so I took the opportunity to add it. Although the build still succeeds, I see several warnings of this form while building 'pixman-mmx.c':- pixman-mmx.c(586) : warning C4799: function 'whatever' has no EMMS instruction I don't know if that means anything bad but I thought it wouldn't do any harm flag it up. Here's a list of the affected functions:- function 'expand_4xpacked565' has no EMMS instruction function 'is_opaque' has no EMMS instruction function 'is_equal' has no EMMS instruction function 'to_uint64' has no EMMS instruction function 'expand_4x565' has no EMMS instruction function 'is_zero' has no EMMS instruction function 'store' has no EMMS instruction All of these are all inline functions, so _mm_empty() isn't required. function 'fast_composite_scaled_bilinear_mmx__8__none_OVER' has no EMMS instruction function 'fast_composite_scaled_bilinear_mmx__8__pad_OVER' has no EMMS instruction This has _mm_empty(). ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/2] Add empty SSSE3 implementation
On Thu, Aug 29, 2013 at 10:02 AM, Søren Sandmann Pedersen sandm...@cs.au.dk wrote: This commit adds a new, empty SSSE3 implementation and the associated build system support. configure.ac: detect whether the compiler understands SSSE3 intrinsics and set up the required CFLAGS Makefile.am:Add libpixman-ssse3.la pixman-x86.c: Add X86_SSSE3 feature flag and detect it in detect_cpu_features(). pixman-ssse3.c: New file with an empty SSSE3 implementation --- configure.ac| 46 +++ pixman/Makefile.am | 12 +++ pixman/pixman-private.h |5 pixman/pixman-ssse3.c | 50 +++ pixman/pixman-x86.c | 15 - 5 files changed, 126 insertions(+), 2 deletions(-) create mode 100644 pixman/pixman-ssse3.c diff --git a/configure.ac b/configure.ac index 5b9512c..ff97bfb 100644 --- a/configure.ac +++ b/configure.ac @@ -437,6 +437,50 @@ fi AM_CONDITIONAL(USE_SSE2, test $have_sse2_intrinsics = yes) dnl === +dnl Check for SSSE3 + +if test x$SSSE3_CFLAGS = x ; then +SSSE3_CFLAGS=-mssse3 -Winline +fi + +have_ssse3_intrinsics=no +AC_MSG_CHECKING(whether to use SSSE3 intrinsics) +xserver_save_CFLAGS=$CFLAGS +CFLAGS=$SSSE3_CFLAGS $CFLAGS + +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ +#include mmintrin.h +#include xmmintrin.h +#include emmintrin.h +#include tmmintrin.h +int main () { +__m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c; +c = _mm_maddubs_epi16 (a, b); +return 0; +}]])], have_ssse3_intrinsics=yes) +CFLAGS=$xserver_save_CFLAGS + +AC_ARG_ENABLE(ssse3, + [AC_HELP_STRING([--disable-ssse3], + [disable SSSE3 fast paths])], + [enable_ssse3=$enableval], [enable_ssse3=auto]) + +if test $enable_ssse3 = no ; then + have_ssse3_intrinsics=disabled +fi + +if test $have_ssse3_intrinsics = yes ; then + AC_DEFINE(USE_SSSE3, 1, [use SSSE3 compiler intrinsics]) +fi + +AC_MSG_RESULT($have_ssse3_intrinsics) +if test $enable_ssse3 = yes test $have_ssse3_intrinsics = no ; then + AC_MSG_ERROR([SSSE3 intrinsics not detected]) +fi + +AM_CONDITIONAL(USE_SSSE3, test $have_ssse3_intrinsics = yes) + +dnl === dnl Other special flags needed when building code using MMX or SSE instructions case $host_os in solaris*) @@ -471,6 +515,8 @@ AC_SUBST(MMX_CFLAGS) AC_SUBST(MMX_LDFLAGS) AC_SUBST(SSE2_CFLAGS) AC_SUBST(SSE2_LDFLAGS) +AC_SUBST(SSSE3_CFLAGS) +AC_SUBST(SSSE3_LDFLAGS) No need for SSSE3_LDFLAGS. Remove it? dnl === dnl Check for VMX/Altivec diff --git a/pixman/Makefile.am b/pixman/Makefile.am index b9ea754..b376d9a 100644 --- a/pixman/Makefile.am +++ b/pixman/Makefile.am @@ -52,6 +52,18 @@ libpixman_1_la_LIBADD += libpixman-sse2.la ASM_CFLAGS_sse2=$(SSE2_CFLAGS) endif +# ssse3 code +if USE_SSSE3 +noinst_LTLIBRARIES += libpixman-ssse3.la +libpixman_ssse3_la_SOURCES = \ + pixman-ssse3.c +libpixman_ssse3_la_CFLAGS = $(SSSE3_CFLAGS) +libpixman_1_la_LDFLAGS += $(SSSE3_LDFLAGS) +libpixman_1_la_LIBADD += libpixman-ssse3.la + +ASM_CFLAGS_ssse3=$(SSSE3_CFLAGS) +endif + # arm simd code if USE_ARM_SIMD noinst_LTLIBRARIES += libpixman-arm-simd.la diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h index 0afabad..732f3d1 100644 --- a/pixman/pixman-private.h +++ b/pixman/pixman-private.h @@ -593,6 +593,11 @@ pixman_implementation_t * _pixman_implementation_create_sse2 (pixman_implementation_t *fallback); #endif +#ifdef USE_SSSE3 +pixman_implementation_t * +_pixman_implementation_create_ssse3 (pixman_implementation_t *fallback); +#endif + #ifdef USE_ARM_SIMD pixman_implementation_t * _pixman_implementation_create_arm_simd (pixman_implementation_t *fallback); diff --git a/pixman/pixman-ssse3.c b/pixman/pixman-ssse3.c new file mode 100644 index 000..19d71e7 --- /dev/null +++ b/pixman/pixman-ssse3.c @@ -0,0 +1,50 @@ +/* + * Copyright © 2013 Soren Sandmann Pedersen + * Copyright © 2013 Red Hat, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS
[Pixman] [PATCH] mmx: Document implementation(s) of pix_multiply().
--- I look at that function and can never remember what it does or how it manages to do it. pixman/pixman-mmx.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 14790c0..746ecd6 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -301,6 +301,29 @@ negate (__m64 mask) return _mm_xor_si64 (mask, MC (4x00ff)); } +/* Computes the product of two unsigned fixed-point 8-bit values from 0 to 1 + * and maps its result to the same range. + * + * Jim Blinn gives multiple ways to compute this in Jim Blinn's Corner: + * Notation, Notation, Notation, the first of which is + * + * prod(a, b) = (a * b + 128) / 255. + * + * By approximating the division by 255 as 257/65536 it can be replaced by a + * multiply and a right shift. This is the implementation that we use in + * pix_multiply(), but we _mm_mulhi_pu16() by 257 (part of SSE1 or Extended + * 3DNow!, and unavailable at the time of the book's publication) to perform + * the multiplication and right shift in a single operation. + * + * prod(a, b) = ((a * b + 128) * 257) 16. + * + * A third way (how pix_multiply() was implemented prior to 14208344) exists + * also that performs the multiplication by 257 with adds and shifts. + * + * Where temp = a * b + 128 + * + * prod(a, b) = (temp + (temp 8)) 8. + */ static force_inline __m64 pix_multiply (__m64 a, __m64 b) { -- 1.8.1.5 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] Use AC_LINK_IFELSE to check if the Loongson MMI code can link
From: Markos Chandras markos.chand...@imgtec.com The Loongson code is compiled with -march=loongson2f to enable the MMI instructions, but binutils refuses to link object code compiled with different -march settings, leading to link failures later in the compile. This avoids that problem by checking if we can link code compiled for Loongson. Signed-off-by: Markos Chandras markos.chand...@imgtec.com --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index c43a0d2..221179f 100644 --- a/configure.ac +++ b/configure.ac @@ -279,7 +279,7 @@ AC_MSG_CHECKING(whether to use Loongson MMI assembler) xserver_save_CFLAGS=$CFLAGS CFLAGS= $LS_CFLAGS $CFLAGS -I$srcdir -AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ +AC_LINK_IFELSE([AC_LANG_SOURCE([[ #ifndef __mips_loongson_vector_rev #error Loongson Multimedia Instructions are only available on Loongson #endif -- 1.8.1.5 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] As per : Please report to pixman@lists.freedesktop.org
On Tue, Apr 9, 2013 at 2:39 PM, David Lisle da...@lisle.ca wrote: Thanks for responding, the problem remains a mystery bu the overall project now is operational. I appreciate that you took time. I really meant that there certainly must have been more error output that wasn't in your email. This would lead to the actual problem. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] As per : Please report to pixman@lists.freedesktop.org
On Mon, Apr 8, 2013 at 11:13 AM, David Lisle da...@lisle.ca wrote: === make[2]: *** [check-TESTS] Error 1 make[2]: Leaving directory `/usr/src/pixman-0.28.2/test' make[1]: *** [check-am] Error 2 make[1]: Leaving directory `/usr/src/pixman-0.28.2/test' make: *** [check-recursive] Error 1 == The test failed. Which test? I am using Slackware 2.6.37.6-smp KDE SC Version 4.5.5(KDE 4.5.5) Compiles as root, added other programs that were dependencies i.e. wv-1.2.4 prior to configuration and make. Make gave no error messages or warnings. Seems doubtful. This program did not correctly pass the tests, therefore installation is held in abeyance until it does. There is insufficient information for me to solve this problem. Us too. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/4] Added fast path for pad type repeats
On Tue, Feb 5, 2013 at 4:39 PM, Ben Avison bavi...@riscosopen.org wrote: diff --git a/test/Makefile.sources b/test/Makefile.sources index e323a8e..bcbca37 100644 --- a/test/Makefile.sources +++ b/test/Makefile.sources @@ -1,6 +1,7 @@ # Tests (sorted by expected completion time) TESTPROGRAMS = \ prng-test \ + repeat-test \ Update .gitignore for the new test. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] 0.29.2
On Sun, Jan 27, 2013 at 11:43 AM, Siarhei Siamashka siarhei.siamas...@gmail.com wrote: Still, I'm not very happy about the code duplication. We already have similar iterators (fetch only, no writeback) in pixman-mmx.c: http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.28.2#n3904 Ideally, a lot of this code can be reused in different backends. The only unique parts are just the fetch/store functions themselves. I'm not sure I understand totally. Is the suggestion adding writeback iterators, thereby allowing the removal of src_x888_0565? ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a
Some preemptive explanations: On Sat, Jan 26, 2013 at 6:54 PM, Matt Turner matts...@gmail.com wrote: diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c index 3048813..77bef5c 100644 --- a/pixman/pixman-mips.c +++ b/pixman/pixman-mips.c @@ -27,6 +27,10 @@ #if defined(USE_MIPS_DSPR2) || defined(USE_LOONGSON_MMI) +#ifdef DLOPEN_LOONGSON_MMI +#include dlfcn.h +#endif + #include string.h #include stdlib.h @@ -69,10 +73,64 @@ pixman_implementation_t * _pixman_mips_get_implementations (pixman_implementation_t *imp) { #ifdef USE_LOONGSON_MMI +void *mmi_handle = NULL; mmi_handle is outside of DLOPEN_LOONGSON_MMI so that I don't have to do funny things to the if-statements below. In the !dlopen case, I expect gcc to recognize that it's always NULL and optimize it completely out. +#ifdef DLOPEN_LOONGSON_MMI +pixman_implementation_t *(*_pixman_implementation_create_mmx) (pixman_implementation_t *); +#endif /* I really don't know if some Loongson CPUs don't have MMI. */ -if (!_pixman_disabled (loongson-mmi) have_feature (Loongson)) +#ifdef HAVE_LOONGSON2E_MMI +if (!mmi_handle !_pixman_disabled (loongson-mmi) +have_feature (Loongson) have_feature (-2e)) +{ +#ifdef DLOPEN_LOONGSON_MMI + mmi_handle = dlopen(libpixman-1-loongson2e-mmi.so, RTLD_LAZY | RTLD_LOCAL); +#else + imp = _pixman_implementation_create_mmx (imp); +#endif +} +#endif +#ifdef HAVE_LOONGSON2F_MMI +if (!mmi_handle !_pixman_disabled (loongson-mmi) +have_feature (Loongson) have_feature (-2f)) +{ +#ifdef DLOPEN_LOONGSON_MMI + mmi_handle = dlopen(libpixman-1-loongson2f-mmi.so, RTLD_LAZY | RTLD_LOCAL); +#else + imp = _pixman_implementation_create_mmx (imp); +#endif +} +#endif +#ifdef HAVE_LOONGSON3A_MMI +if (!mmi_handle !_pixman_disabled (loongson-mmi) +have_feature (Loongson-3A)) +{ +#ifdef DLOPEN_LOONGSON_MMI + mmi_handle = dlopen(libpixman-1-loongson3a-mmi.so, RTLD_LAZY | RTLD_LOCAL); +#else imp = _pixman_implementation_create_mmx (imp); #endif +} +#endif + +#ifdef DLOPEN_LOONGSON_MMI +if (mmi_handle) +{ + _pixman_implementation_create_mmx = dlsym(mmi_handle, _pixman_implementation_create_mmx); + if (_pixman_implementation_create_mmx) + { + imp = _pixman_implementation_create_mmx (imp); + } + else + { + puts(dlerror()); + } +} +else +{ + puts(dlerror()); +} +#endif +#endif I don't ever dlclose() the handle. I expect that it will be live for the rest of process execution. I think there are other cases of leaks like this in pixman already. #ifdef USE_MIPS_DSPR2 if (!_pixman_disabled (mips-dspr2)) ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] sse2: Implement simple bilinear scaling for x8r8g8b8 to a8r8g8b8
On Wed, Jan 23, 2013 at 6:37 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: Improves firefon-tron on a IVB i7-3720qm: 68.6s to 45.2s. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- pixman/pixman-sse2.c | 63 ++ 1 file changed, 63 insertions(+) diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c index fc873cc..bc3e2f1 100644 --- a/pixman/pixman-sse2.c +++ b/pixman/pixman-sse2.c @@ -5679,6 +5679,67 @@ FAST_BILINEAR_MAINLOOP_COMMON (sse2___normal_SRC, NORMAL, FLAG_NONE) static force_inline void +scaled_bilinear_scanline_sse2_0888__SRC (uint32_t * dst, Maybe some funny whitespace before dst? Or maybe just a spaces vs tabs issue. Anyway, Reviewed-by: Matt Turner matts...@gmail.com +const uint32_t * mask, +const uint32_t * src_top, +const uint32_t * src_bottom, +int32_t w, +int wt, +int wb, +pixman_fixed_t vx, +pixman_fixed_t unit_x, +pixman_fixed_t max_vx, +pixman_bool_tzero_src) +{ +BILINEAR_DECLARE_VARIABLES; +uint32_t pix1, pix2, pix3, pix4; + +while ((w -= 4) = 0) +{ + BILINEAR_INTERPOLATE_ONE_PIXEL (pix1); + BILINEAR_INTERPOLATE_ONE_PIXEL (pix2); + BILINEAR_INTERPOLATE_ONE_PIXEL (pix3); + BILINEAR_INTERPOLATE_ONE_PIXEL (pix4); + *dst++ = pix1 | 0xff00; + *dst++ = pix2 | 0xff00; + *dst++ = pix3 | 0xff00; + *dst++ = pix4 | 0xff00; +} + +if (w 2) +{ + BILINEAR_INTERPOLATE_ONE_PIXEL (pix1); + BILINEAR_INTERPOLATE_ONE_PIXEL (pix2); + *dst++ = pix1 | 0xff00; + *dst++ = pix2 | 0xff00; +} + +if (w 1) +{ + BILINEAR_INTERPOLATE_ONE_PIXEL (pix1); + *dst = pix1 | 0xff00; +} + +} + +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__cover_SRC, + scaled_bilinear_scanline_sse2_0888__SRC, + uint32_t, uint32_t, uint32_t, + COVER, FLAG_NONE) +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__pad_SRC, + scaled_bilinear_scanline_sse2_0888__SRC, + uint32_t, uint32_t, uint32_t, + PAD, FLAG_NONE) +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__none_SRC, + scaled_bilinear_scanline_sse2_0888__SRC, + uint32_t, uint32_t, uint32_t, + NONE, FLAG_NONE) +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__normal_SRC, + scaled_bilinear_scanline_sse2_0888__SRC, + uint32_t, uint32_t, uint32_t, + NORMAL, FLAG_NONE) + +static force_inline void scaled_bilinear_scanline_sse2___OVER (uint32_t * dst, const uint32_t * mask, const uint32_t * src_top, @@ -6185,6 +6246,8 @@ static const pixman_fast_path_t sse2_fast_paths[] = SIMPLE_BILINEAR_FAST_PATH (SRC, a8b8g8r8, a8b8g8r8, sse2__), SIMPLE_BILINEAR_FAST_PATH (SRC, a8b8g8r8, x8b8g8r8, sse2__), SIMPLE_BILINEAR_FAST_PATH (SRC, x8b8g8r8, x8b8g8r8, sse2__), +SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8, a8r8g8b8, sse2_0888_), +SIMPLE_BILINEAR_FAST_PATH (SRC, x8b8g8r8, a8b8g8r8, sse2_0888_), SIMPLE_BILINEAR_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, sse2__), SIMPLE_BILINEAR_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, sse2__), -- 1.7.10.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] Add new demos and tests to .gitignore
--- .gitignore | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.gitignore b/.gitignore index a4d9f99..dcb3f8e 100644 --- a/.gitignore +++ b/.gitignore @@ -37,6 +37,7 @@ demos/quad2quad demos/radial-test demos/screen-test demos/srgb-test +demos/srgb-trap-test demos/trap-test demos/tri-test pixman/pixman-combine32.c @@ -61,6 +62,7 @@ test/fetch-test test/glyph-test test/gradient-crash-test test/gradient-test +test/infinite-loop test/lowlevel-blt-bench test/oob-test test/pdf-op-test @@ -68,6 +70,7 @@ test/region-contains-test test/region-test test/region-translate test/region-translate-test +test/rotate-test test/scaling-crash-test test/scaling-helpers-test test/scaling-test -- 1.7.12.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] Convert INCLUDES to AM_CPPFLAGS
INCLUDES has been deprecated starting with automake 1.13. Convert all occurrences with the recommended AM_CPPFLAGS replacement. --- demos/Makefile.am | 2 +- pixman/Makefile.am | 2 +- test/Makefile.am | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/demos/Makefile.am b/demos/Makefile.am index f324f5f..fca2710 100644 --- a/demos/Makefile.am +++ b/demos/Makefile.am @@ -4,7 +4,7 @@ AM_CFLAGS = $(OPENMP_CFLAGS) AM_LDFLAGS = $(OPENMP_CFLAGS) LDADD = $(top_builddir)/pixman/libpixman-1.la -lm $(GTK_LIBS) $(PNG_LIBS) -INCLUDES = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(GTK_CFLAGS) $(PNG_CFLAGS) +AM_CPPFLAGS = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(GTK_CFLAGS) $(PNG_CFLAGS) GTK_UTILS = gtk-utils.c gtk-utils.h ../test/utils.c ../test/utils.h diff --git a/pixman/Makefile.am b/pixman/Makefile.am index 270d65e..d4b7bb3 100644 --- a/pixman/Makefile.am +++ b/pixman/Makefile.am @@ -91,7 +91,7 @@ noinst_LTLIBRARIES += libpixman-iwmmxt.la libpixman_1_la_LIBADD += libpixman-iwmmxt.la libpixman_iwmmxt_la-pixman-mmx.lo: pixman-mmx.c - $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(CFLAGS) $(IWMMXT_CFLAGS) -MT libpixman_iwmmxt_la-pixman-mmx.lo -MD -MP -MF $(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo -c -o libpixman_iwmmxt_la-pixman-mmx.lo `test -f 'pixman-mmx.c' || echo '$(srcdir)/'`pixman-mmx.c + $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(AM_CPPFLAGS) $(AM_CPPFLAGS) $(CPPFLAGS) $(CFLAGS) $(IWMMXT_CFLAGS) -MT libpixman_iwmmxt_la-pixman-mmx.lo -MD -MP -MF $(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo -c -o libpixman_iwmmxt_la-pixman-mmx.lo `test -f 'pixman-mmx.c' || echo '$(srcdir)/'`pixman-mmx.c $(AM_V_at)$(am__mv) $(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo $(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Plo libpixman_iwmmxt_la_DEPENDENCIES = $(am__DEPENDENCIES_1) diff --git a/test/Makefile.am b/test/Makefile.am index eeb3679..ca87f4e 100644 --- a/test/Makefile.am +++ b/test/Makefile.am @@ -3,7 +3,7 @@ include $(top_srcdir)/test/Makefile.sources AM_CFLAGS = $(OPENMP_CFLAGS) AM_LDFLAGS = $(OPENMP_CFLAGS) $(TESTPROGS_EXTRA_LDFLAGS) LDADD = libutils.la $(top_builddir)/pixman/libpixman-1.la -lm $(PNG_LIBS) -INCLUDES = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(PNG_CFLAGS) +AM_CPPFLAGS = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(PNG_CFLAGS) libutils_la_SOURCES = $(libutils_sources) $(libutils_headers) -- 1.7.12.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] 0.29.2
On Fri, Jan 18, 2013 at 4:15 PM, Søren Sandmann sandm...@cs.au.dk wrote: Hi, It's about time to get a 0.29.2 development snapshot out, but there are some outstanding patches I'd like to get my triple build loongson patch in, but haven't gotten any testers yet. I'll set up a chroot this weekend to test it. Matt ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a
On Sun, Jan 6, 2013 at 7:46 PM, Cyril Brulebois k...@debian.org wrote: Hello Matt, Matt Turner matts...@gmail.com (06/01/2013): On Sat, Sep 15, 2012 at 11:59 PM, Matt Turner matts...@gmail.com wrote: pixman/Makefile.am contains a hack that allows pixman-mmx.c to be compiled with different overriding CFLAGS, since automake seriously doesn't have a way to do this. Seriously stupid. It works by defining a new rule and recursively calling make with modified CFLAGS set. Note the difference between the USE_LOONGSON* and HAVE_LOONGSON* preprocessor macros. Cc: Cyril Brulebois k...@debian.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451 --- Cyril, I've updated the patch so that it builds .so files for each architecture against which pixman links and attached it to the bug report. Please give it a test. I cannot test it, as my system is compiled with -march=loongson2f and therefore I cannot even link code compiled with -march=loongson2e with my C library. thanks; unfortunately I'm busy working on the Debian Installer right now and pixman is a bit further down my todo list. Adding debian-mips@ to Cc, hoping somebody there will be able to perform some tests/share some insight. Mraw, KiBi. Any testers, debian-mips@? ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] sse2: Add fast paths for bilinear source with a solid mask
On Tue, Jan 8, 2013 at 12:55 PM, Chris Wilson ch...@chris-wilson.co.uk wrote: Based on the existing sse2__n_ nearest scaling routines. fishbowl on an i5-2500: 60.9s - 56.9s Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- Looks good to me. Reviewed-by: Matt Turner matts...@gmail.com ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a
On Sat, Sep 15, 2012 at 11:59 PM, Matt Turner matts...@gmail.com wrote: pixman/Makefile.am contains a hack that allows pixman-mmx.c to be compiled with different overriding CFLAGS, since automake seriously doesn't have a way to do this. Seriously stupid. It works by defining a new rule and recursively calling make with modified CFLAGS set. Note the difference between the USE_LOONGSON* and HAVE_LOONGSON* preprocessor macros. Cc: Cyril Brulebois k...@debian.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451 --- Cyril, I've updated the patch so that it builds .so files for each architecture against which pixman links and attached it to the bug report. Please give it a test. I cannot test it, as my system is compiled with -march=loongson2f and therefore I cannot even link code compiled with -march=loongson2e with my C library. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Fix build with automake-1.13
On Wed, Jan 2, 2013 at 8:38 PM, Marko Lindqvist cazf...@gmail.com wrote: Automake-1.13 has removed long obsolete AM_CONFIG_HEADER macro ( http://lists.gnu.org/archive/html/automake/2012-12/msg00038.html ) and autoreconf errors out upon seeing it. Attached patch replaces obsolete AM_CONFIG_HEADER with now proper AC_CONFIG_HEADERS. I'm not subscribed to the mailing list. Thanks, I tried to apply this, but git won't let me push... will try to get this worked out. In the future, please use git format-patch and git send-email. To apply your patch, I had to patch -p1 ... git commit --author=Marko Lindqvist cazf...@gmail.com -a write a commit title and summary message It's a lot nicer to just be able to type git am :) Matt ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888
On Wed, Jan 2, 2013 at 3:01 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: This path is being exercised by inplace compositing of trapezoids, for instance as used in the firefox-asteroids cairo-trace. cairo-perf-trace numbers from firefox-asteroids would be cool to have in the commit message. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888
On Wed, Jan 2, 2013 at 3:01 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: This path is being exercised by inplace compositing of trapezoids, for instance as used in the firefox-asteroids cairo-trace. core2 @ 2.66GHz, reference memcpy speed = 4898.2MB/s (1224.6MP/s for 32bpp fills) before: add_n_ = L1: 4.36 L2: 4.27 M: 1.61 ( 0.13%) HT: 1.65 VT: 1.63 R: 1.63 RT: 1.59 ( 21Kops/s) after: add_n_ = L1:2969.09 L2:3926.11 M:603.30 ( 49.27%) HT:524.69 VT:401.01 R:407.59 RT:210.34 ( 804Kops/s) Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- pixman/pixman-sse2.c | 63 ++ 1 file changed, 63 insertions(+) diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c index 665eead..73eee68 100644 --- a/pixman/pixman-sse2.c +++ b/pixman/pixman-sse2.c @@ -4519,9 +4519,70 @@ sse2_composite_add__ (pixman_implementation_t *imp, sse2_combine_add_u (imp, op, dst, src, NULL, width); } +} + +static void +sse2_composite_add_n_ (pixman_implementation_t *imp, + pixman_composite_info_t *info) +{ +PIXMAN_COMPOSITE_ARGS (info); +uint32_t *dst_line, *dst, src; +int dst_stride; + +__m128i xmm_src; + +PIXMAN_IMAGE_GET_LINE (dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 1); + +src = _pixman_image_get_solid (imp, src_image, dest_image-bits.format); +if (src == 0) + return; + +if (src == ~0) +{ + pixman_fill (dest_image-bits.bits, dest_image-bits.rowstride, 32, +dest_x, dest_y, width, height, ~0); + + return; +} + +xmm_src = _mm_set_epi32 (src, src, src, src); +while (height--) +{ + int w = width; + uint32_t d; + dst = dst_line; + dst_line += dst_stride; + + while (w (unsigned long)dst 15) Use uintptr_t instead. The rest of the patch looks good to me. + { + d = *dst; + *dst++ = + _mm_cvtsi128_si32 ( _mm_adds_epu8 (xmm_src, _mm_cvtsi32_si128 (d))); + w--; + } + + while (w = 4) + { + save_128_aligned + ((__m128i*)dst, +_mm_adds_epu8 (xmm_src, load_128_aligned ((__m128i*)dst))); + + dst += 4; + w -= 4; + } + + while (w--) + { + d = *dst; + *dst++ = + _mm_cvtsi128_si32 (_mm_adds_epu8 (xmm_src, + _mm_cvtsi32_si128 (d))); + } +} } + static pixman_bool_t pixman_blt_sse2 (uint32_t *src_bits, uint32_t *dst_bits, @@ -5814,6 +5875,8 @@ static const pixman_fast_path_t sse2_fast_paths[] = PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, a8b8g8r8, sse2_composite_add__), PIXMAN_STD_FAST_PATH (ADD, solid, a8, a8, sse2_composite_add_n_8_8), PIXMAN_STD_FAST_PATH (ADD, solid, null, a8, sse2_composite_add_n_8), +PIXMAN_STD_FAST_PATH (ADD, solid, null, x8r8g8b8, sse2_composite_add_n_), +PIXMAN_STD_FAST_PATH (ADD, solid, null, a8r8g8b8, sse2_composite_add_n_), /* PIXMAN_OP_SRC */ PIXMAN_STD_FAST_PATH (SRC, solid, a8, a8r8g8b8, sse2_composite_src_n_8_), -- 1.7.10.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [cairo] issue with blend modes in pixman
On Mon, Dec 31, 2012 at 1:05 PM, Rik Cabanier caban...@gmail.com wrote: Looking at the formulas, I can see what's wrong but I don't know who to contact. These mailing lists are perfect. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Always use xmmintrin.h for 64 bit Windows
On Tue, Nov 13, 2012 at 10:44 AM, Stefan Weil s...@weilnetz.de wrote: MinGW-w64 uses the GNU compiler and does not define _MSC_VER. Nevertheless, it provides xmmintrin.h and must be handled here like the MS compiler. Otherwise compilation fails due to conflicting declarations. Signed-off-by: Stefan Weil s...@weilnetz.de --- pixman/pixman-mmx.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index c2ae4ea..aef468a 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -62,7 +62,7 @@ _mm_empty (void) #endif #ifdef USE_X86_MMX -# if (defined(__SUNPRO_C) || defined(_MSC_VER)) +# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64)) # include xmmintrin.h # else /* We have to compile with -msse to use xmmintrin.h, but that causes SSE -- 1.7.10.4 If you're compiling for Win64, you have SSE2. Why even compile the MMX code? ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Questionable numbers from lowlevel-blt-bench
On Mon, Oct 1, 2012 at 1:17 AM, Jonathan Morton jonathan.mor...@movial.com wrote: On Sun, 30 Sep 2012 15:05:18 -0700, Matt Turner matts...@gmail.com wrote: In doing performance work, I've noticed some weird results from lowlevel-blt-bench. Often it has seemed that the RT results determined the Kops/s almost entirely. I came across an instance of this today which was particularly striking: Before: add__ = L1: 47.01 L2: 36.84 M: 18.96 ( 33.14%) HT: 35.94 VT: 33.82 R: 30.64 RT: 19.36 ( 181Kops/s) After: add__ = L1: 230.78 L2: 200.86 M: 90.48 (159.44%) HT: 48.41 VT: 45.46 R: 42.78 RT: 19.22 ( 181Kops/s) L1/L2/M numbers are improved by ~5x. HT, VT, and R numbers are improved by ~1.35x. RT doesn't change... neither does Kops/s. What's going on here, and can we make the composite result more sensible? The figures in brackets are derived directly from one or more of the other figures. In this case, the Kops/s number is derived directly from the RT number, which should explain why they correlate. Ahh. At least I (and I'm pretty sure others too) thought that the Kops number was supposed to be a composite of HT, VT, RT, and R. This explains it then. The percentage figure, meanwhile, represents a percentage of memory bandwidth used by this blitter (under the M test), the peak bandwidth being derived from an earlier measurement. (You're seeing more than 100%, which suggests that the earlier measurement is not optimal.) Indeed. I'm prefetching in the modified function. The RT figure is meant to measure, as directly as possible, the per-call overhead which does not depend on the number of pixels involved. Accordingly, it is not expected to change significantly when doing pixel-related optimisations. Right, makes sense. Thanks! Matt ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] Questionable numbers from lowlevel-blt-bench
Hi Jonathan, In doing performance work, I've noticed some weird results from lowlevel-blt-bench. Often it has seemed that the RT results determined the Kops/s almost entirely. I came across an instance of this today which was particularly striking: Before: add__ = L1: 47.01 L2: 36.84 M: 18.96 ( 33.14%) HT: 35.94 VT: 33.82 R: 30.64 RT: 19.36 ( 181Kops/s) After: add__ = L1: 230.78 L2: 200.86 M: 90.48 (159.44%) HT: 48.41 VT: 45.46 R: 42.78 RT: 19.22 ( 181Kops/s) L1/L2/M numbers are improved by ~5x. HT, VT, and R numbers are improved by ~1.35x. RT doesn't change... neither does Kops/s. What's going on here, and can we make the composite result more sensible? Thanks, Matt ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [test PATCH] Use _mm_maddubs_epi16 in BILINEAR_INTERPOLATE_ONE_PIXEL
Siarhei, can you measure any performance improvement with this? I can't... :( --- pixman/pixman-sse2.c |8 +++- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c index efed310..4fbc045 100644 --- a/pixman/pixman-sse2.c +++ b/pixman/pixman-sse2.c @@ -32,6 +32,7 @@ #include xmmintrin.h /* for _mm_shuffle_pi16 and _MM_SHUFFLE */ #include emmintrin.h /* for SSE2 intrinsics */ +#include tmmintrin.h /* for SSSE3 intrinsics */ #include pixman-private.h #include pixman-combine32.h #include pixman-inlines.h @@ -5414,7 +5415,7 @@ FAST_NEAREST_MAINLOOP_COMMON (sse2__n__normal_OVER, #define BILINEAR_INTERPOLATE_ONE_PIXEL(pix) \ do { \ -__m128i xmm_wh, xmm_lo, xmm_hi, a; \ +__m128i xmm_wh, a; \ /* fetch 2x2 pixel block into sse2 registers */ \ __m128i tltr = _mm_loadl_epi64 ( \ (__m128i *)src_top[pixman_fixed_to_int (vx)]); \ @@ -5443,10 +5444,7 @@ do { \ _mm_srli_epi16 (xmm_x, 16 - BILINEAR_INTERPOLATION_BITS))); \ xmm_x = _mm_add_epi16 (xmm_x, xmm_ux); \ /* horizontal interpolation */ \ - xmm_lo = _mm_mullo_epi16 (a, xmm_wh); \ - xmm_hi = _mm_mulhi_epu16 (a, xmm_wh); \ - a = _mm_add_epi32 (_mm_unpacklo_epi16 (xmm_lo, xmm_hi), \ - _mm_unpackhi_epi16 (xmm_lo, xmm_hi)); \ + a = _mm_maddubs_epi16 (a, xmm_wh); \ } \ /* shift and pack the result */ \ a = _mm_srli_epi32 (a, BILINEAR_INTERPOLATION_BITS * 2); \ -- 1.7.8.6 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 05/10] pixman-utils.c, pixman-private.h: Add floating point conversion routines
On Wed, Sep 26, 2012 at 1:43 PM, Søren Sandmann sandm...@cs.au.dk wrote: From: Søren Sandmann Pedersen s...@redhat.com A new struct argb_t containing a floating point pixel is added to pixman-private.h, and conversion routines are added to pixman-utils.c to convert normalized integers to and from that struct. New functions: - pixman_expand_to_float() Expands a buffer of integer pixels to a buffer of argb_t pixels - pixman_contract_from_float() Converts a buffer of argb_t pixels to a buffer integer pixels - pixman_float_to_unorm() Converts a floating point number to an unsigned normalized integer - pixman_unorm_to_float() Converts an unsigned normalized integer to a floating point number --- pixman/pixman-private.h | 35 +++ pixman/pixman-utils.c | 107 +++ 2 files changed, 142 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h index c82316f..91f35ed 100644 --- a/pixman/pixman-private.h +++ b/pixman/pixman-private.h @@ -45,6 +45,16 @@ typedef struct radial_gradient radial_gradient_t; typedef struct bits_image bits_image_t; typedef struct circle circle_t; +typedef struct argb_t argb_t; + +struct argb_t +{ +float a; +float r; +float g; +float b; +}; + typedef void (*fetch_scanline_t) (pixman_image_t *image, int x, int y, @@ -792,12 +802,34 @@ pixman_expand (uint64_t * dst, const uint32_t * src, pixman_format_code_t format, int width); +void +pixman_expand_to_float (argb_t *dst, + const uint32_t *src, + pixman_format_code_t format, + int width); void pixman_contract (uint32_t * dst, const uint64_t *src, int width); +void +pixman_contract_from_float (uint32_t *dst, + const argb_t *src, + int width); + +pixman_bool_t +_pixman_lookup_composite_function (pixman_implementation_t *toplevel, + pixman_op_t op, + pixman_format_code_t src_format, + uint32_t src_flags, + pixman_format_code_t mask_format, + uint32_t mask_flags, + pixman_format_code_t dest_format, + uint32_t dest_flags, + pixman_implementation_t**out_imp, + pixman_composite_func_t *out_func); + /* Region Helpers */ pixman_bool_t pixman_region32_copy_from_region16 (pixman_region32_t *dst, @@ -957,6 +989,9 @@ unorm_to_unorm (uint32_t val, int from_bits, int to_bits) return result; } +uint16_t pixman_float_to_unorm (float f, int n_bits); +float pixman_unorm_to_float (uint16_t u, int n_bits); + /* * Various debugging code */ diff --git a/pixman/pixman-utils.c b/pixman/pixman-utils.c index e4a9730..4f9db29 100644 --- a/pixman/pixman-utils.c +++ b/pixman/pixman-utils.c @@ -162,6 +162,113 @@ pixman_expand (uint64_t * dst, } } +static force_inline uint16_t +float_to_unorm (float f, int n_bits) +{ +uint32_t u; + +if (f 1.0) + f = 1.0; +if (f 0.0) + f = 0.0; + +u = f * (1 n_bits); +u -= (u n_bits); + +return u; +} + +static force_inline float +unorm_to_float (uint16_t u, int n_bits) +{ +uint32_t m = ((1 n_bits) - 1); + +return (u m) * (1.f / (float)m); +} + +/* + * This function expands images from a8r8g8b8 to argb_t. To preserve + * precision, it needs to know from which source format the a8r8g8b8 pixels + * originally came. + * + * For example, if the source was PIXMAN_x1r5g5b5 and the red component + * contained bits 12345, then the 8-bit value is 12345123. To correctly + * expand this to floating point, it should be 12345 / 31.0 and not + * 12345123 / 255.0. + */ +void +pixman_expand_to_float (argb_t *dst, + const uint32_t *src, + pixman_format_code_t format, + int width) +{ +int a_size, r_size, g_size, b_size; +int a_shift, r_shift, g_shift, b_shift; +int i; + +if (!PIXMAN_FORMAT_VIS (format)) + format = PIXMAN_a8r8g8b8; + +/* + * Determine the sizes of each component and the masks and shifts + * required to extract them from the source pixel. + */ +
[Pixman] [PATCH] sse2: mark pack_565_2x128_128 as static force_inline
--- pixman/pixman-sse2.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c index e273a95..cf21ef8 100644 --- a/pixman/pixman-sse2.c +++ b/pixman/pixman-sse2.c @@ -146,7 +146,7 @@ pack_565_2packedx128_128 (__m128i lo, __m128i hi) return _mm_packs_epi32 (t0, t1); } -__m128i +static force_inline __m128i pack_565_2x128_128 (__m128i lo, __m128i hi) { __m128i data; -- 1.7.8.6 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] build: Remove useless DEP_CFLAGS/DEP_LIBS variables
Reduces the size of the generated pixman/Makefile from 46k to 41k. --- configure.ac |2 -- pixman-1.pc.in |4 ++-- pixman/Makefile.am | 23 +-- 3 files changed, 7 insertions(+), 22 deletions(-) diff --git a/configure.ac b/configure.ac index e3a5ff9..5fda547 100644 --- a/configure.ac +++ b/configure.ac @@ -796,8 +796,6 @@ AM_CONDITIONAL(HAVE_GTK, [test x$enable_gtk = xyes]) AC_SUBST(GTK_CFLAGS) AC_SUBST(GTK_LIBS) -AC_SUBST(DEP_CFLAGS) -AC_SUBST(DEP_LIBS) dnl = dnl posix_memalign, sigaction, alarm, gettimeofday diff --git a/pixman-1.pc.in b/pixman-1.pc.in index 936d95d..e3b9711 100644 --- a/pixman-1.pc.in +++ b/pixman-1.pc.in @@ -6,6 +6,6 @@ includedir=@includedir@ Name: Pixman Description: The pixman library (version 1) Version: @PACKAGE_VERSION@ -Cflags: -I${includedir}/pixman-1 @DEP_CFLAGS@ -Libs: -L${libdir} -lpixman-1 @DEP_LIBS@ +Cflags: -I${includedir}/pixman-1 +Libs: -L${libdir} -lpixman-1 diff --git a/pixman/Makefile.am b/pixman/Makefile.am index 843711a..270d65e 100644 --- a/pixman/Makefile.am +++ b/pixman/Makefile.am @@ -3,7 +3,7 @@ include $(top_srcdir)/pixman/Makefile.sources lib_LTLIBRARIES = libpixman-1.la libpixman_1_la_LDFLAGS = -version-info $(LT_VERSION_INFO) -no-undefined @PTHREAD_LDFLAGS@ -libpixman_1_la_LIBADD = @PTHREAD_LIBS@ @DEP_LIBS@ -lm +libpixman_1_la_LIBADD = @PTHREAD_LIBS@ -lm libpixman_1_la_SOURCES = $(libpixman_sources) $(libpixman_headers) libpixmanincludedir = $(includedir)/pixman-1 @@ -27,8 +27,7 @@ if USE_X86_MMX noinst_LTLIBRARIES += libpixman-mmx.la libpixman_mmx_la_SOURCES = \ pixman-mmx.c -libpixman_mmx_la_CFLAGS = $(DEP_CFLAGS) $(MMX_CFLAGS) -libpixman_mmx_la_LIBADD = $(DEP_LIBS) +libpixman_mmx_la_CFLAGS = $(MMX_CFLAGS) libpixman_1_la_LDFLAGS += $(MMX_LDFLAGS) libpixman_1_la_LIBADD += libpixman-mmx.la @@ -41,8 +40,7 @@ noinst_LTLIBRARIES += libpixman-vmx.la libpixman_vmx_la_SOURCES = \ pixman-vmx.c \ pixman-combine32.h -libpixman_vmx_la_CFLAGS = $(DEP_CFLAGS) $(VMX_CFLAGS) -libpixman_vmx_la_LIBADD = $(DEP_LIBS) +libpixman_vmx_la_CFLAGS = $(VMX_CFLAGS) libpixman_1_la_LIBADD += libpixman-vmx.la ASM_CFLAGS_vmx=$(VMX_CFLAGS) @@ -53,8 +51,7 @@ if USE_SSE2 noinst_LTLIBRARIES += libpixman-sse2.la libpixman_sse2_la_SOURCES = \ pixman-sse2.c -libpixman_sse2_la_CFLAGS = $(DEP_CFLAGS) $(SSE2_CFLAGS) -libpixman_sse2_la_LIBADD = $(DEP_LIBS) +libpixman_sse2_la_CFLAGS = $(SSE2_CFLAGS) libpixman_1_la_LDFLAGS += $(SSE2_LDFLAGS) libpixman_1_la_LIBADD += libpixman-sse2.la @@ -68,8 +65,6 @@ libpixman_arm_simd_la_SOURCES = \ pixman-arm-simd.c \ pixman-arm-common.h \ pixman-arm-simd-asm.S -libpixman_arm_simd_la_CFLAGS = $(DEP_CFLAGS) -libpixman_arm_simd_la_LIBADD = $(DEP_LIBS) libpixman_1_la_LIBADD += libpixman-arm-simd.la ASM_CFLAGS_arm_simd= @@ -84,8 +79,6 @@ libpixman_arm_neon_la_SOURCES = \ pixman-arm-neon-asm.S \ pixman-arm-neon-asm-bilinear.S \ pixman-arm-neon-asm.h -libpixman_arm_neon_la_CFLAGS = $(DEP_CFLAGS) -libpixman_arm_neon_la_LIBADD = $(DEP_LIBS) libpixman_1_la_LIBADD += libpixman-arm-neon.la ASM_CFLAGS_arm_neon= @@ -106,7 +99,6 @@ libpixman_iwmmxt_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \ $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link $(CCLD) \ $(CFLAGS) $(IWMMXT_CFLAGS) $(AM_LDFLAGS) \ $(LDFLAGS) -o $@ -libpixman_iwmmxt_la_LIBADD = $(DEP_LIBS) libpixman-iwmmxt.la: libpixman_iwmmxt_la-pixman-mmx.lo $(libpixman_iwmmxt_la_DEPENDENCIES) $(AM_V_CCLD)$(libpixman_iwmmxt_la_LINK) libpixman_iwmmxt_la-pixman-mmx.lo $(libpixman_iwmmxt_la_LIBADD) $(LIBS) @@ -121,8 +113,6 @@ libpixman_mips_dspr2_la_SOURCES = \ pixman-mips-dspr2-asm.S \ pixman-mips-dspr2-asm.h \ pixman-mips-memcpy-asm.S -libpixman_mips_dspr2_la_CFLAGS = $(DEP_CFLAGS) -libpixman_mips_dspr2_la_LIBADD = $(DEP_LIBS) libpixman_1_la_LIBADD += libpixman-mips-dspr2.la ASM_CFLAGS_mips_dspr2= @@ -132,12 +122,9 @@ endif if USE_LOONGSON_MMI noinst_LTLIBRARIES += libpixman-loongson-mmi.la libpixman_loongson_mmi_la_SOURCES = pixman-mmx.c loongson-mmintrin.h -libpixman_loongson_mmi_la_CFLAGS = $(DEP_CFLAGS) $(LS_CFLAGS) -libpixman_loongson_mmi_la_LIBADD = $(DEP_LIBS) +libpixman_loongson_mmi_la_CFLAGS = $(LS_CFLAGS) libpixman_1_la_LDFLAGS += $(LS_LDFLAGS) libpixman_1_la_LIBADD += libpixman-loongson-mmi.la - -ASM_CFLAGS_ls=$(LS_CFLAGS) endif .c.s : $(libpixmaninclude_HEADERS) $(BUILT_SOURCES) -- 1.7.8.6 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a
pixman/Makefile.am contains a hack that allows pixman-mmx.c to be compiled with different overriding CFLAGS, since automake seriously doesn't have a way to do this. Seriously stupid. It works by defining a new rule and recursively calling make with modified CFLAGS set. Note the difference between the USE_LOONGSON* and HAVE_LOONGSON* preprocessor macros. Cc: Cyril Brulebois k...@debian.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451 --- This patch applies on top of the previous. Although the build system works, linking unfortunately doesn't. gcc refuses to link object files that have been compiled with different -march=loongson* options together. This sucks. I'm not sure what to do. I guess I could make them separate shared objects or even dlopen them, but that really sucks, especially when I don't see a reason why gcc shouldn't be able to link this code together. Anyone have any other ideas? It's really obnoxious that there's not just a simple -mloongson-mmi flag irrespective of -march=... configure.ac| 87 ++ pixman/Makefile.am | 36 +--- pixman/pixman-mips.c| 16 +++- pixman/pixman-mmx.c | 10 +- pixman/pixman-private.h | 13 +++ 5 files changed, 146 insertions(+), 16 deletions(-) diff --git a/configure.ac b/configure.ac index 5fda547..f3804ba 100644 --- a/configure.ac +++ b/configure.ac @@ -273,21 +273,27 @@ PIXMAN_CHECK_CFLAG([-xldscope=hidden], [dnl dnl === dnl Check for Loongson Multimedia Instructions -if test x$LS_CFLAGS = x ; then -LS_CFLAGS=-march=loongson2f +if test x$LS2E_CFLAGS = x ; then +LS2E_CFLAGS=-march=loongson2e +fi +if test x$LS2F_CFLAGS = x ; then +LS2F_CFLAGS=-march=loongson2f +fi +if test x$LS3A_CFLAGS = x ; then +LS3A_CFLAGS=-march=loongson3a fi have_loongson_mmi=no AC_MSG_CHECKING(whether to use Loongson MMI assembler) xserver_save_CFLAGS=$CFLAGS -CFLAGS= $LS_CFLAGS $CFLAGS -I$srcdir +CFLAGS= $LS2F_CFLAGS $CFLAGS -I$srcdir AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ #ifndef __mips_loongson_vector_rev #error Loongson Multimedia Instructions are only available on Loongson #endif #if defined(__GNUC__) (__GNUC__ 4 || (__GNUC__ == 4 __GNUC_MINOR__ 4)) -#error Need GCC = 4.4 for Loongson MMI compilation +#error Need GCC = 4.4 for Loongson 2e/f MMI compilation #endif #include pixman/loongson-mmintrin.h int main () { @@ -299,29 +305,95 @@ int main () { __m64 c = _mm_srli_pi16 (a.v, b); return 0; }]])], have_loongson_mmi=yes) +have_loongson2e_mmi=$have_loongson_mmi +have_loongson2f_mmi=$have_loongson_mmi +CFLAGS=$xserver_save_CFLAGS + +xserver_save_CFLAGS=$CFLAGS +CFLAGS= $LS3A_CFLAGS $CFLAGS -I$srcdir +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ +#ifndef __mips_loongson_vector_rev +#error Loongson Multimedia Instructions are only available on Loongson +#endif +#if defined(__GNUC__) (__GNUC__ 4 || (__GNUC__ == 4 __GNUC_MINOR__ 6)) +#error Need GCC = 4.6 for Loongson 3A MMI compilation +#endif +#include pixman/loongson-mmintrin.h +int main () { +union { +__m64 v; +char c[8]; +} a = { .c = {1, 2, 3, 4, 5, 6, 7, 8} }; +int b = 4; +__m64 c = _mm_srli_pi16 (a.v, b); +return 0; +}]])], have_loongson3a_mmi=yes) CFLAGS=$xserver_save_CFLAGS AC_ARG_ENABLE(loongson-mmi, [AC_HELP_STRING([--disable-loongson-mmi], [disable Loongson MMI fast paths])], [enable_loongson_mmi=$enableval], [enable_loongson_mmi=auto]) +AC_ARG_ENABLE(loongson2e-mmi, + [AC_HELP_STRING([--disable-loongson2e-mmi], + [do not build Loongson MMI fast paths for 2e])], + [enable_loongson2e_mmi=$enableval], [enable_loongson2e_mmi=auto]) +AC_ARG_ENABLE(loongson2f-mmi, + [AC_HELP_STRING([--disable-loongson2f-mmi], + [do not build Loongson MMI fast paths for 2f])], + [enable_loongson2f_mmi=$enableval], [enable_loongson2f_mmi=auto]) +AC_ARG_ENABLE(loongson3a-mmi, + [AC_HELP_STRING([--disable-loongson3a-mmi], + [do not build Loongson MMI fast paths for 3a])], + [enable_loongson3a_mmi=$enableval], [enable_loongson3a_mmi=auto]) if test $enable_loongson_mmi = no ; then have_loongson_mmi=disabled fi +if test $enable_loongson2e_mmi = no ; then + have_loongson2e_mmi=disabled +fi +if test $enable_loongson2f_mmi = no ; then + have_loongson2f_mmi=disabled +fi +if test $enable_loongson3a_mmi = no ; then + have_loongson3a_mmi=disabled +fi if test $have_loongson_mmi = yes ; then + loongson_msg=yes: AC_DEFINE(USE_LOONGSON_MMI, 1, [use Loongson Multimedia Instructions]) + if test $have_loongson2e_mmi = yes ; then + loongson_msg=$loongson_msg 2e + AC_DEFINE(HAVE_LOONGSON2E_MMI, 1, [use Loongson 2e Multimedia Instructions]) + fi + if test $have_loongson2f_mmi = yes ; then + loongson_msg=$loongson_msg 2f +
Re: [Pixman] [PATCH] Make pixman-mmx.c compile on x86-32 without optimization
On Mon, Jul 9, 2012 at 10:19 PM, Matt Turner matts...@gmail.com wrote: Works for me. On second glance, did I just make a mistake in b87cd1f and write ifdef instead of ifndef? ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Make pixman-mmx.c compile on x86-32 without optimization
On Mon, Jul 9, 2012 at 7:31 AM, Søren Sandmann sandm...@cs.au.dk wrote: From: Søren Sandmann Pedersen s...@redhat.com When not optimizing, write _mm_shuffle_pi16() as a statement expression with inline assembly. That way we avoid __builtin_ia32_pshufw(), which is only available when compiling with -msse, while still allowing the non-optimizing gcc to understand that the second argument is a compile time constant. Cc: matts...@gmail.com --- pixman/pixman-mmx.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 5441d6b..74a5e87 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -105,8 +105,17 @@ _mm_shuffle_pi16 (__m64 __A, int8_t const __N) return ret; } # else -# define _mm_shuffle_pi16(A, N) \ -((__m64) __builtin_ia32_pshufw ((__v4hi)(__m64)(A), (int)(N))) +# define _mm_shuffle_pi16(A, N) \ +({ \ + __m64 ret; \ + \ + asm (pshufw %2, %1, %0\n\t\ +: =y (ret) \ +: y (A), K ((const int8_t)N) \ + ); \ + \ + ret;\ +}) # endif # endif #endif -- 1.7.4 Works for me. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [BUG pixman] f9c91ee2f27eaea68d8c3a130bf7d4bc0c860834 breaks compilation
On Mon, Jul 9, 2012 at 1:55 AM, Knut Petersen knut_peter...@t-online.de wrote: Søren, the bad commit was supposed to fix a gcc -O0 compile problem, but it breaks gcc -O0 compilation here. Reverting f9c91ee2 fixes the problem for me. Is this build automated? If it's an automated build that runs the test suite, you're actually spending way more time running the test suite when built with -O0 than you save by building with -O0. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/5] mmx: add scaled bilinear src_8888_8888
On Sun, Jul 1, 2012 at 12:56 PM, Søren Sandmann sandm...@cs.au.dk wrote: Matt Turner matts...@gmail.com writes: +SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, a8r8g8b8, mmx__ ), +SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, x8r8g8b8, mmx__ ), +SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8, x8r8g8b8, mmx__ ), + Looks like the abrg entries are missing. Soren Indeed. They're missing from SSE2 as well. I'll fix that up when I push it. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Use a compile-time constant for the K constraint in the MMX detection.
On Sun, Jul 1, 2012 at 5:03 PM, Søren Sandman sandm...@cs.au.dk wrote: From: Søren Sandmann Pedersen s...@redhat.com When compiling with -O0, gcc doesn't understand that in signed char x = 0; ... asm (..., : K (x)); x is constant. Fix this by using an immediate constant instead of a variable. --- configure.ac |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 2b9d1ba..36f423e 100644 --- a/configure.ac +++ b/configure.ac @@ -351,12 +351,11 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ int main () { __m64 v = _mm_cvtsi32_si64 (1); __m64 w; -signed char x = 0; /* Some versions of clang will choke on K */ asm (pshufw %2, %1, %0\n\t : =y (w) -: y (v), K (x) +: y (v), K (5) ); return _mm_cvtsi64_si32 (v); -- 1.7.10.4 Seems like the smart thing to me. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [ANNOUNCE] pixman release 0.26.2 now available
On Sat, Jun 30, 2012 at 3:04 AM, Andreas Radke a.ra...@arcor.de wrote: Somehow I get different checksums: [andyrtr@workstation64 trunk]$ md5sum pixman-0.26.2.tar.* 6b3e4c5300adb893a2baa9631c23efb2 pixman-0.26.2.tar.bz2 276242da5b3af1258d072cf205d18f0b pixman-0.26.2.tar.gz Can you confirm the sums please again? Confirmed. Not sure exactly how this happened. It was the first release I've done, and I hit a couple of permissions hick-ups uploading the tarballs to cairo.fdo. Sorry about that. The .sha1 files should be right though. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [ANNOUNCE] pixman release 0.26.2 now available
A new pixman release 0.26.2 is now available. This is a stable release. It contains some bug fixes, custom build rules for ARM/iwMMXt, and an important bug fix for MMX/x86. tar.gz: http://cairographics.org/releases/pixman-0.26.2.tar.gz http://xorg.freedesktop.org/archive/individual/lib/pixman-0.26.2.tar.gz tar.bz2: http://xorg.freedesktop.org/archive/individual/lib/pixman-0.26.2.tar.bz2 Hashes: MD5: 69af3cf4ce6515ee01b0960edf8009fb pixman-0.26.2.tar.gz MD5: 2b57fb3038be4890ec433d11176280cd pixman-0.26.2.tar.bz2 SHA1: ba71d029d174aa8b9d23b1072ab76e6b4ea3de59 pixman-0.26.2.tar.gz SHA1: c7cdb5803061ee6614acc66258b0339ad4e52314 pixman-0.26.2.tar.bz2 GPG signature: http://cairographics.org/releases/pixman-0.26.2.tar.gz.sha1.asc (signed by Matt Turner matts...@gmail.com) Git: git://git.freedesktop.org/git/pixman tag: pixman-0.26.2 Log: Matt Turner (6): Post-release version bump to 0.26.1 mmx: add missing _mm_empty calls autotools: use custom build rule to build iwMMXt code configure.ac: add iwmmxt2 configure flag Fix distcheck due to custom iwMMXt rules Pre-release version bump to 0.26.2 Siarhei Siamashka (2): test: OpenMP 2.5 requires signed loop iteration variables test: fix bisecting issue in fuzzer-find-diff.pl Søren Sandmann Pedersen (1): test: Add missing break in stress-test.c pgpgxsYR8Ypir.pgp Description: PGP signature ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] mmx: Use expand_alpha instead of mask/shift
--- pixman/pixman-mmx.c |8 ++-- 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index bff8585..071cdfd 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -1618,9 +1618,7 @@ mmx_composite_over__n_ (pixman_implementation_t *imp, PIXMAN_IMAGE_GET_LINE (src_image, src_x, src_y, uint32_t, src_stride, src_line, 1); mask = _pixman_image_get_solid (imp, mask_image, dest_image-bits.format); -mask = 0xff00; -mask = mask | mask 8 | mask 16 | mask 24; -vmask = load (mask); +vmask = expand_alpha (load (mask)); while (height--) { @@ -1689,9 +1687,7 @@ mmx_composite_over_x888_n_ (pixman_implementation_t *imp, PIXMAN_IMAGE_GET_LINE (src_image, src_x, src_y, uint32_t, src_stride, src_line, 1); mask = _pixman_image_get_solid (imp, mask_image, dest_image-bits.format); -mask = 0xff00; -mask = mask | mask 8 | mask 16 | mask 24; -vmask = load (mask); +vmask = expand_alpha (load (mask)); srca = MC (4x00ff); while (height--) -- 1.7.3.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 00/10] Cleanups to CPU detection
On Fri, Jun 29, 2012 at 5:20 PM, Alan Coopersmith alan.coopersm...@oracle.com wrote: On 06/29/12 01:44 PM, Søren Sandmann Pedersen wrote: I was looking at making use of some of the newer x86 SIMD instruction sets and realized that (a) we don't ever call cpuid on x86-64, we just assume that MMX and SSE2 are present, I thought the amd64 ABI guaranteed MMX SSE2 would always be present - is that not the case? SSE2 seems to be required by the ABI, but I don't know why MMX would (maybe x87 FPU is, and by extension MMX?). I'm guessing here -- but since newer AMD chips dropped 3DNow, I would think it'd be possible for future chips to drop MMX as well. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 3/5] mmx: add scaled bilinear over_8888_8_8888
On Wed, Jun 27, 2012 at 10:38 PM, Matt Turner matts...@gmail.com wrote: Reduces runtime of firefox-fishtank trace from 1510 to 1030 seconds on Loongson. --- pixman/pixman-mmx.c | 84 +++ 1 files changed, 84 insertions(+), 0 deletions(-) Loongson: image firefox-fishtank 1665.163 1670.370 0.17%3/3 image firefox-fishtank 1037.738 1040.218 0.19%3/3 ARM/iwMMXt: image firefox-fishtank 2042.723 2045.308 0.10%3/3 image firefox-fishtank 1487.282 1492.640 0.17%3/3 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 5/5] mmx: optimize bilinear function when using 7-bit precision
On Wed, Jun 27, 2012 at 10:38 PM, Matt Turner matts...@gmail.com wrote: --- Reduces runtime of firefox-planet-gnome trace from 156 to 153 seconds on Loongson. Increases runtime of firefox-fishtank trace from 1030 to 1060 seconds. Why? pixman/pixman-mmx.c | 45 - 1 files changed, 32 insertions(+), 13 deletions(-) Loongson: image firefox-fishtank 1037.738 1040.218 0.19%3/3 image firefox-fishtank 1056.611 1057.581 0.20%3/3 ARM/iwMMXt: image firefox-fishtank 1487.282 1492.640 0.17%3/3 image firefox-fishtank 1363.913 1364.366 0.11%3/3 I'm mostly okay with the slight decrease in performance on Loongson, given the speed-up on ARM (and on x86). Maybe look at it later.. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] Bilinear scaling patches for MMX
These five patches implement the same bilinear scaling compositing functions as provided by the SSE2 code. They pass the test suite on x86, Loongson, and iwMMXt, but I haven't done extensive benchmarking yet on iwMMXt. The fifth patch optimizes the functions for 7-bit bilinear interpolation, but doesn't give the performance differences I would expect. firefox-planet-gnome performance is increased by ~1% and firefox-fishtank performance is reduced. Matt ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 1/5] mmx: add scaled bilinear src_8888_8888
--- pixman/loongson-mmintrin.h | 73 ++ pixman/pixman-mmx.c| 93 2 files changed, 166 insertions(+), 0 deletions(-) diff --git a/pixman/loongson-mmintrin.h b/pixman/loongson-mmintrin.h index 1a114fe..f0931ac 100644 --- a/pixman/loongson-mmintrin.h +++ b/pixman/loongson-mmintrin.h @@ -45,6 +45,28 @@ _mm_setzero_si64 (void) } extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_pi16 (__m64 __m1, __m64 __m2) +{ + __m64 ret; + asm(paddh %0, %1, %2\n\t + : =f (ret) + : f (__m1), f (__m2) + ); + return ret; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_pi32 (__m64 __m1, __m64 __m2) +{ + __m64 ret; + asm(paddw %0, %1, %2\n\t + : =f (ret) + : f (__m1), f (__m2) + ); + return ret; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_adds_pu16 (__m64 __m1, __m64 __m2) { __m64 ret; @@ -150,6 +172,35 @@ _mm_packs_pu16 (__m64 __m1, __m64 __m2) } extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packs_pi32 (__m64 __m1, __m64 __m2) +{ + __m64 ret; + asm(packsswh %0, %1, %2\n\t + : =f (ret) + : f (__m1), f (__m2) + ); + return ret; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pi16 (uint16_t __w3, uint16_t __w2, uint16_t __w1, uint16_t __w0) +{ + uint64_t val = ((uint64_t)__w3 48) +| ((uint64_t)__w2 32) +| ((uint64_t)__w1 16) +| ((uint64_t)__w0 0); + return *(__m64 *)val; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pi32 (unsigned __i1, unsigned __i0) +{ + uint64_t val = ((uint64_t)__i1 32) +| ((uint64_t)__i0 0); + return *(__m64 *)val; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_shuffle_pi16 (__m64 __m, int64_t __n) { __m64 ret; @@ -193,6 +244,17 @@ _mm_srli_pi16 (__m64 __m, int64_t __count) } extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srli_pi32 (__m64 __m, int64_t __count) +{ + __m64 ret; + asm(psrlw %0, %1, %2\n\t + : =f (ret) + : f (__m), f (*(__m64 *)__count) + ); + return ret; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_srli_si64 (__m64 __m, int64_t __count) { __m64 ret; @@ -204,6 +266,17 @@ _mm_srli_si64 (__m64 __m, int64_t __count) } extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_pi16 (__m64 __m1, __m64 __m2) +{ + __m64 ret; + asm(psubh %0, %1, %2\n\t + : =f (ret) + : f (__m1), f (__m2) + ); + return ret; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_unpackhi_pi8 (__m64 __m1, __m64 __m2) { __m64 ret; diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index d869c04..904529f 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -42,6 +42,7 @@ #endif #include pixman-private.h #include pixman-combine32.h +#include pixman-inlines.h #define no_vERBOSE @@ -3506,6 +3507,94 @@ mmx_composite_over_reverse_n_ (pixman_implementation_t *imp, _mm_empty (); } +#define BSHIFT ((1 BILINEAR_INTERPOLATION_BITS)) + +#define BILINEAR_DECLARE_VARIABLES \ +const __m64 mm_wt = _mm_set_pi16 (wt, wt, wt, wt); \ +const __m64 mm_wb = _mm_set_pi16 (wb, wb, wb, wb); \ +const __m64 mm_BSHIFT = _mm_set_pi16 (BSHIFT, BSHIFT, BSHIFT, BSHIFT); \ +const __m64 mm_ux = _mm_set_pi16 (unit_x, unit_x, unit_x, unit_x); \ +const __m64 mm_zero = _mm_setzero_si64 (); \ +__m64 mm_x = _mm_set_pi16 (vx, vx, vx, vx) + +#define BILINEAR_INTERPOLATE_ONE_PIXEL(pix) \ +do { \ +/* fetch 2x2 pixel block into 2 mmx registers */ \ +__m64 t = ldq_u ((__m64 *)src_top [pixman_fixed_to_int (vx)]); \ +__m64 b = ldq_u ((__m64 *)src_bottom [pixman_fixed_to_int (vx)]); \ +vx += unit_x; \ +/* vertical interpolation */ \ +__m64 t_hi = _mm_mullo_pi16 (_mm_unpackhi_pi8 (t, mm_zero), mm_wt); \ +__m64 t_lo = _mm_mullo_pi16 (_mm_unpacklo_pi8 (t,
[Pixman] [PATCH 3/5] mmx: add scaled bilinear over_8888_8_8888
Reduces runtime of firefox-fishtank trace from 1510 to 1030 seconds on Loongson. --- pixman/pixman-mmx.c | 84 +++ 1 files changed, 84 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index a504b60..ea732bb 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -3571,6 +3571,12 @@ do { \ pix = lo; \ } while (0) +#define BILINEAR_SKIP_ONE_PIXEL() \ +do { \ +vx += unit_x; \ +mm_x = _mm_add_pi16 (mm_x, mm_ux); \ +} while(0) + static force_inline void scaled_bilinear_scanline_mmx___SRC (uint32_t * dst, const uint32_t * mask, @@ -3663,6 +3669,79 @@ FAST_BILINEAR_MAINLOOP_COMMON (mmx___normal_OVER, scaled_bilinear_scanline_mmx___OVER, uint32_t, uint32_t, uint32_t, NORMAL, FLAG_NONE) + +static force_inline void +scaled_bilinear_scanline_mmx__8__OVER (uint32_t * dst, + const uint8_t * mask, + const uint32_t * src_top, + const uint32_t * src_bottom, + int32_t w, + int wt, + int wb, + pixman_fixed_t vx, + pixman_fixed_t unit_x, + pixman_fixed_t max_vx, + pixman_bool_tzero_src) +{ +BILINEAR_DECLARE_VARIABLES; +__m64 pix1, pix2; +uint32_t m; + +while (w) +{ + m = (uint32_t) *mask++; + + if (m) + { + BILINEAR_INTERPOLATE_ONE_PIXEL (pix1); + + if (m == 0xff is_opaque (pix1)) + { + store (dst, pix1); + } + else + { + __m64 ms, md, ma, msa; + + pix2 = load (dst); + ma = expand_alpha_rev (to_m64 (m)); + ms = _mm_unpacklo_pi8 (pix1, _mm_setzero_si64 ()); + md = _mm_unpacklo_pi8 (pix2, _mm_setzero_si64 ()); + + msa = expand_alpha (ms); + + store (dst, (in_over (ms, msa, ma, md))); + } + } + else + { + BILINEAR_SKIP_ONE_PIXEL (); + } + + w--; + dst++; +} + +_mm_empty (); +} + +FAST_BILINEAR_MAINLOOP_COMMON (mmx__8__cover_OVER, + scaled_bilinear_scanline_mmx__8__OVER, + uint32_t, uint8_t, uint32_t, + COVER, FLAG_HAVE_NON_SOLID_MASK) +FAST_BILINEAR_MAINLOOP_COMMON (mmx__8__pad_OVER, + scaled_bilinear_scanline_mmx__8__OVER, + uint32_t, uint8_t, uint32_t, + PAD, FLAG_HAVE_NON_SOLID_MASK) +FAST_BILINEAR_MAINLOOP_COMMON (mmx__8__none_OVER, + scaled_bilinear_scanline_mmx__8__OVER, + uint32_t, uint8_t, uint32_t, + NONE, FLAG_HAVE_NON_SOLID_MASK) +FAST_BILINEAR_MAINLOOP_COMMON (mmx__8__normal_OVER, + scaled_bilinear_scanline_mmx__8__OVER, + uint32_t, uint8_t, uint32_t, + NORMAL, FLAG_HAVE_NON_SOLID_MASK) + static uint32_t * mmx_fetch_x8r8g8b8 (pixman_iter_t *iter, const uint32_t *mask) { @@ -3927,6 +4006,11 @@ static const pixman_fast_path_t mmx_fast_paths[] = SIMPLE_BILINEAR_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, mmx__ ), SIMPLE_BILINEAR_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, mmx__ ), +SIMPLE_BILINEAR_A8_MASK_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, mmx__8_ ), +SIMPLE_BILINEAR_A8_MASK_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, mmx__8_ ), +SIMPLE_BILINEAR_A8_MASK_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, mmx__8_ ), +SIMPLE_BILINEAR_A8_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, mmx__8_ ), + { PIXMAN_OP_NONE }, }; -- 1.7.3.4 ___ Pixman mailing list Pixman@lists.freedesktop.org
[Pixman] [PATCH 5/5] mmx: optimize bilinear function when using 7-bit precision
--- Reduces runtime of firefox-planet-gnome trace from 156 to 153 seconds on Loongson. Increases runtime of firefox-fishtank trace from 1030 to 1060 seconds. Why? pixman/pixman-mmx.c | 45 - 1 files changed, 32 insertions(+), 13 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index ea732bb..bff8585 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -3526,11 +3526,14 @@ mmx_composite_over_reverse_n_ (pixman_implementation_t *imp, } #define BSHIFT ((1 BILINEAR_INTERPOLATION_BITS)) +#define BMSK (BSHIFT - 1) #define BILINEAR_DECLARE_VARIABLES \ const __m64 mm_wt = _mm_set_pi16 (wt, wt, wt, wt); \ const __m64 mm_wb = _mm_set_pi16 (wb, wb, wb, wb); \ const __m64 mm_BSHIFT = _mm_set_pi16 (BSHIFT, BSHIFT, BSHIFT, BSHIFT); \ +const __m64 mm_addc7 = _mm_set_pi16 (0, 1, 0, 1); \ +const __m64 mm_xorc7 = _mm_set_pi16 (0, BMSK, 0, BMSK); \ const __m64 mm_ux = _mm_set_pi16 (unit_x, unit_x, unit_x, unit_x); \ const __m64 mm_zero = _mm_setzero_si64 (); \ __m64 mm_x = _mm_set_pi16 (vx, vx, vx, vx) @@ -3548,21 +3551,37 @@ do { \ __m64 b_lo = _mm_mullo_pi16 (_mm_unpacklo_pi8 (b, mm_zero), mm_wb); \ __m64 hi = _mm_add_pi16 (t_hi, b_hi); \ __m64 lo = _mm_add_pi16 (t_lo, b_lo); \ -/* calculate horizontal weights */ \ -__m64 mm_wh_lo = _mm_sub_pi16 (mm_BSHIFT, _mm_srli_pi16 (mm_x, \ +if (BILINEAR_INTERPOLATION_BITS 8) \ +{ \ + /* calculate horizontal weights */ \ + __m64 mm_wh = _mm_add_pi16 (mm_addc7, _mm_xor_si64 (mm_xorc7, \ + _mm_srli_pi16 (mm_x, \ +16 - BILINEAR_INTERPOLATION_BITS))); \ + mm_x = _mm_add_pi16 (mm_x, mm_ux); \ + /* horizontal interpolation */ \ + __m64 p = _mm_unpacklo_pi16 (lo, hi); \ + __m64 q = _mm_unpackhi_pi16 (lo, hi); \ + lo = _mm_madd_pi16 (p, mm_wh); \ + hi = _mm_madd_pi16 (q, mm_wh); \ +} \ +else \ +{ \ + /* calculate horizontal weights */ \ + __m64 mm_wh_lo = _mm_sub_pi16 (mm_BSHIFT, _mm_srli_pi16 (mm_x, \ 16 - BILINEAR_INTERPOLATION_BITS)); \ -__m64 mm_wh_hi = _mm_srli_pi16 (mm_x, \ + __m64 mm_wh_hi = _mm_srli_pi16 (mm_x, \ 16 - BILINEAR_INTERPOLATION_BITS); \ -mm_x = _mm_add_pi16 (mm_x, mm_ux); \ -/* horizontal interpolation */ \ -__m64 mm_lo_lo = _mm_mullo_pi16 (lo, mm_wh_lo); \ -__m64 mm_lo_hi = _mm_mullo_pi16 (hi, mm_wh_hi); \ -__m64 mm_hi_lo = _mm_mulhi_pu16 (lo, mm_wh_lo); \ -__m64 mm_hi_hi = _mm_mulhi_pu16 (hi, mm_wh_hi); \ -lo = _mm_add_pi32 (_mm_unpacklo_pi16 (mm_lo_lo, mm_hi_lo), \ - _mm_unpacklo_pi16 (mm_lo_hi, mm_hi_hi)); \ -hi = _mm_add_pi32 (_mm_unpackhi_pi16 (mm_lo_lo, mm_hi_lo), \ - _mm_unpackhi_pi16 (mm_lo_hi, mm_hi_hi)); \ + mm_x = _mm_add_pi16 (mm_x, mm_ux); \ + /* horizontal interpolation */ \ + __m64 mm_lo_lo = _mm_mullo_pi16 (lo, mm_wh_lo); \ + __m64 mm_lo_hi = _mm_mullo_pi16 (hi, mm_wh_hi); \ + __m64 mm_hi_lo = _mm_mulhi_pu16 (lo, mm_wh_lo); \ + __m64 mm_hi_hi = _mm_mulhi_pu16 (hi, mm_wh_hi); \ + lo = _mm_add_pi32 (_mm_unpacklo_pi16 (mm_lo_lo, mm_hi_lo),