from:"Matt Turner"

[ANNOUNCE] pixman release 0.43.4 now available

2024-02-29 Thread Matt Turner

A new pixman release 0.43.4 is now available.

tar.gz:
https://cairographics.org/releases/pixman-0.43.4.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.43.4.tar.gz

tar.xz:
https://www.x.org/releases/individual/lib/pixman-0.43.4.tar.xz

Hashes:
SHA256: 
a0624db90180c7ddb79fc7a9151093dc37c646d8c38d3f232f767cf64b85a226  
pixman-0.43.4.tar.gz
SHA256: 
48d8539f35488d694a2fef3ce17394d1153ed4e71c05d1e621904d574be5df19  
pixman-0.43.4.tar.xz
SHA512: 
08802916648bab51fd804fc3fd823ac2c6e3d622578a534052b657491c38165696d5929d03639c52c4f29d8850d676a909f0299d1a4c76a07df18a34a896e43d
  pixman-0.43.4.tar.gz
SHA512: 
b40fb05bd58dc78f4e4e9b19c86991ab0611b708657c9a7fb42bfe82d57820a0fde01a34b00a0848a41da6c3fb90c2213942a70f435a0e9467631695d3bc7e36
  pixman-0.43.4.tar.xz

PGP signature:
https://cairographics.org/releases/pixman-0.43.4.tar.gz.sha512.asc

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.43.4

Log:

Gayathri Berli (1):
  Revert the changes to fix the problem in big-endian architectures

Heiko Lewin (1):
  Allow to build pixman on clang/arm32

Makoto Kato (1):
  pixman-arm: Fix build on clang/arm32

Matt Turner (5):
  pixman-x86: Use cpuid.h header
  pixman-x86: Move #include "cpuid.h" inside conditionals
  Revert "Allow to build pixman on clang/arm32"
  pixman-arm: Use unified syntax
  Pre-release version bump to 0.43.4

Simon Ser (1):
  Post-release version bump to 0.43.3



signature.asc
Description: PGP signature

Re: [Pixman] [ANNOUNCE] pixman release 0.42.2 now available

2022-11-03 Thread Matt Turner

On Wed, Nov 2, 2022 at 1:37 PM Matt Turner  wrote:
>
> A new pixman release 0.42.2 is now available. This is a stable release
> in the 0.42 series.
>
> This version contains a fix for a heap overflow. A CVE has been
> requested, and I'll reply to this email with the number when it is
> allocated.

This has been assigned CVE-2022-44638.

[Pixman] [ANNOUNCE] pixman release 0.42.2 now available

2022-11-02 Thread Matt Turner

A new pixman release 0.42.2 is now available. This is a stable release
in the 0.42 series.

This version contains a fix for a heap overflow. A CVE has been
requested, and I'll reply to this email with the number when it is
allocated. 

See 
https://gitlab.freedesktop.org/pixman/pixman/-/commit/a1f88e842e0216a5b4df1ab023caebe33c101395
and https://gitlab.freedesktop.org/pixman/pixman/-/issues/63 for more 
information.

Thanks to Maddie Stone and Google's Project Zero for discovering this
issue, providing a proof-of-concept, and a great analysis.

tar.gz:
https://cairographics.org/releases/pixman-0.42.2.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.42.2.tar.gz

tar.xz:
https://www.x.org/releases/individual/lib/pixman-0.42.2.tar.xz

Hashes:
SHA256: 
ea1480efada2fd948bc75366f7c349e1c96d3297d09a3fe62626e38e234a625e  
pixman-0.42.2.tar.gz
SHA256: 
5747d2ec498ad0f1594878cc897ef5eb6c29e91c53b899f7f71b506785fc1376  
pixman-0.42.2.tar.xz
SHA512: 
0a4e327aef89c25f8cb474fbd01de834fd2a1b13fdf7db11ab72072082e45881cd16060673b59d02054b1711ae69c6e2395f6ae9214225ee7153939efcd2fa5d
  pixman-0.42.2.tar.gz
SHA512: 
3476e2676e66756b1af61b1e532cd80c985c191fb7956eb01702b419726cce99e79163b7f287f74f66414680e7396d13c3fee525cd663f12b6ac4877070ff4e8
  pixman-0.42.2.tar.xz

GPG signature:
https://cairographics.org/releases/pixman-0.42.2.tar.gz.sha512.asc
(signed by [ultimate] Matt Turner 
 [ultimate] Matt Turner 
 [ultimate] Matt Turner 
 [ultimate] Matt Turner )

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.42.2

Log:
Matt Turner (4):
  build: Add a64-neon-test.S to EXTRA_DIST
  Revert "Fix signed-unsigned semantics in reduce_32"
  Avoid integer overflow leading to out-of-bounds write
  Pre-release version bump to 0.42.2

Simon Ser (3):
  Post-release version bump to 0.42.1
  meson: override pixman-1 dependency
  meson: explicitly set C standard to gnu99

Thomas Klausner (2):
  configure.ac: avoid unportable test(1) operator
  Makefile.am: increase shell portability


signature.asc
Description: PGP signature

Re: [Pixman] Performance regression with pixman 0.40

2021-06-15 Thread Matt Turner

Cc'ing the patch author, since I don't think he's subscribed.

On Fri, Jun 4, 2021 at 12:15 AM  wrote:
>
> Hi,
>
> We are developping a graphics framework called EGT dedicated to Microchip 
> parts:
> https://github.com/linux4sam/egt
>
> We are using Cairo, and so Pixman, for the drawing part. Updating our
> distribution, we noticed a performance decrease in our benchmark suite, in
> the worst case our fps decrease from 200 to 60.
>
> We have identified the move from Pixman 0.38.4 to 0.40 as the cause. I did a
> bisect to find which commit impacts us and it's this one:
>
> commit 6fe0131394fb029d2fccaee6b8edcb108840ad8a (refs/bisect/bad)
> Author: Federico Mena Quintero 
> Date:   Wed Mar 18 18:49:30 2020 -0600
>
> Initialize temporary buffers in general_composite_rect()
>
> Otherwise, Valgrind shows things like "conditional jump or move
> depends on uninitialised values" errors much later in calling code.
> For example, see https://gitlab.gnome.org/GNOME/librsvg/issues/572
>
> Fixes https://gitlab.freedesktop.org/pixman/pixman/issues/9
>
> diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c
> index 7d74f98..7e5a0d0 100644
> --- a/pixman/pixman-general.c
> +++ b/pixman/pixman-general.c
> @@ -165,6 +165,12 @@ general_composite_rect  (pixman_implementation_t *imp,
>
> if (!scanline_buffer)
> return;
> +
> +   memset (scanline_buffer, 0, width * Bpp * 3 + 15 * 3);
> +}
> +else
> +{
> +   memset (stack_scanline_buffer, 0, sizeof (stack_scanline_buffer));
>  }
>
>  src_buffer = ALIGN (scanline_buffer);
>
>
> I don't know which drawing paths are impacted by this change, I can dig 
> further
> if needed. We have 2 benches with small performance decrease for all our
> devices: armv5 and armv7. And one bench with huge performance decrease on our
> armv5 device. This bench is about drawing circles with alpha blending. Other
> benches which draw squares, squares with alpha blending, and circles are not
> impacted.
>
> For sure, having an extra memset in the path can explain the performance
> decrease.
>
> Do we have to consider that the new scores we get are the valid ones or can we
> find an alternative?
>
> Thanks
>
> Regards,
> Ludovic
> ___
> Pixman mailing list
> Pixman@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/pixman
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Prevent empty top-level declaration

2020-04-26 Thread Matt Turner

On Sun, Nov 17, 2019 at 4:48 PM Michael Forney  wrote:
>
> The expansion of PIXMAN_DEFINE_THREAD_LOCAL(...) may end in a
> function definition, so the following semicolon is considered an
> empty top-level declaration, which is not allowed in ISO C.
> ---
>  pixman/pixman-compiler.h   | 6 +++---
>  pixman/pixman-implementation.c | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>

Thanks! Committed, and sorry for losing track of the patch.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [ANNOUNCE] pixman release 0.40.0 now available

2020-04-19 Thread Matt Turner

A new pixman release 0.40.0 is now available. This is a stable release.

tar.gz:
https://cairographics.org/releases/pixman-0.40.0.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.40.0.tar.gz

tar.xz:
https://www.x.org/releases/individual/lib/pixman-0.40.0.tar.xz

Hashes:
SHA256: 
6d200dec3740d9ec4ec8d1180e25779c00bc749f94278c8b9021f5534db223fc  
pixman-0.40.0.tar.gz
SHA256: 
da8ed9fe2d1c5ef8ce5d1207992db959226bd4e37e3f88acf908fd9a71e2704e  
pixman-0.40.0.tar.xz
SHA512: 
063776e132f5d59a6d3f94497da41d6fc1c7dca0d269149c78247f0e0d7f520a25208d908cf5e421d1564889a91da44267b12d61c0bd7934cd54261729a7de5f
  pixman-0.40.0.tar.gz
SHA512: 
8a60edb113d68791b41bd90b761ff7b3934260cb3dada3234c9351416f61394e4157353bc4d61b8f6c2c619de470f6feefffb4935bfcf79d291ece6285de7270
  pixman-0.40.0.tar.xz

GPG signature:
https://cairographics.org/releases/pixman-0.40.0.tar.gz.sha512.asc
(signed by [ultimate] Matt Turner 
 [ultimate] Matt Turner 
 [ultimate] Matt Turner 
 [ultimate] Matt Turner )

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.40.0

Log:
Adam Jackson (17):
  test: Fix undefined left shift in affine-test
  test: Fix undefined left shift in pixel_checker_init
  pixman: Fix undefined left shift in pixel_contract_from_float
  pixman-access: Fix various undefined left shifts
  pixman-combine: Fix various undefined left shifts
  pixman-image: Fix undefined left shift
  pixman-gradient-walker: Fix undefined left shift
  pixman-sse2: Fix an undefined left shift
  pixman-fast-path: Fix various undefined left shifts
  pixman-bits-image: Fix various undefined left shifts
  pixman-bits-image: Fix left shift of a negative number
  pixman-matrix: Fix left shift of a negative number
  test: Fix unrepresentable subtraction in stress-test
  pixman-mmx: Fix undefined left-shifts
  pixman-mmx: Fix undefined unaligned loads
  pixman-sse2: Fix undefined unaligned loads
  fast-path: Fix some sketchy pointer arithmetic

Antonio Ospite (1):
  pixman-compiler.h: fix building tests with MinGW

Basile Clement (6):
  Fix bilinear filter computation in wide pipeline
  Implement basic dithering for the wide pipeline, v3
  test: Check the dithering path in tolerance-test
  demos: Add a dithering demo
  Ordered dithering with blue noise, v2
  Don't use GNU extension for binary numbers

Christoph Reiter (3):
  meson: define SIZEOF_LONG  and use -Wundef
  meson: allow building a static library
  meson: fix TLS support under mingw

Chun-wei Fan (11):
  meson.build: Fix MMX, SSE2 and SSSE3 checks on MSVC
  meson.build: Disable OpenMP on MSVC builds
  build: Don't assume PThreads if threading support is found
  meson.build: Improve libpng search on MSVC
  pixman/pixman-version.h.in: Add a PIXMAN_API macro
  pixman/pixman.h: Mark public APIs with PIXMAN_API
  pixman-[compiler|private].h: Export symbols for tests
  pixman/meson.build: Define PIXMAN_API on MSVC-style compilers
  test/solid-test.c: Include stdint.h
  demos: Define _USE_MATH_DEFINES on MSVC-style compilers
  thread-test.c: Use Windows Threading API on Windows

Dylan Baker (1):
  meson: don't use link_with for library()

Fan Jinke (1):
  add Hygon Dhyana support to enable X86_MMX_EXTENSIONS feature

Federico Mena Quintero (1):
  Initialize temporary buffers in general_composite_rect()

Ghabry (1):
  Enabled armv6 SIMD for 3DS (devkitARM) and arm neon SIMD for PS 
Vita (vit

Jonathan Kew (2):
  Explicitly cast byte to uint32_t before left-shifting.
  Avoid undefined behavior (left-shifting negative value) in 
pixman_int_to_

Khem Raj (1):
  test/utils: Check for FE_INVALID definition before use

Mathieu Duponchelle (2):
  meson: finish porting over mmx and ssse2 flags for sun and msvc
  meson: add missing function check (getisax)

Matt Turner (7):
  Post-release version bump to 0.38.5
  lowlevel-blt-bench: Remove unused variable
  loongson: Avoid C90 mixing-code-and-decls warning
  Distribute the blue-noise files
  Build xz tarballs instead of bzip2
  Move from MD5/SHA1 to SHA256/SHA512 digests
  Pre-release version bump to 0.40.0

Re: [Pixman] [PATCH 1/2] configure.ac: Use '-mloongson-mmi' for Loongson MMI.

2020-04-06 Thread Matt Turner

On Thu, Mar 26, 2020 at 5:57 AM Shiyou Yin  wrote:
>
> It's recommended to use '-mloongson-mmi' for MMI.
> ---
>  configure.ac | 2 +-
>  meson.build  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index 1ca3974..fd7df47 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -273,7 +273,7 @@ dnl 
> ===
>  dnl Check for Loongson Multimedia Instructions
>
>  if test "x$LS_CFLAGS" = "x" ; then
> -LS_CFLAGS="-march=loongson2f"
> +LS_CFLAGS="-mloongson-mmi"
>  fi
>
>  have_loongson_mmi=no
> diff --git a/meson.build b/meson.build
> index 15d3409..a45c969 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -51,7 +51,7 @@ endforeach
>
>  use_loongson_mmi = get_option('loongson-mmi')
>  have_loongson_mmi = false
> -loongson_mmi_flags = ['-march=loongson2f']
> +loongson_mmi_flags = ['-mloongson-mmi']
>  if not use_loongson_mmi.disabled()
>if host_machine.cpu_family() == 'mips64' and cc.compiles('''
>#ifndef __mips_loongson_vector_rev
> --

Thanks very much. This looks good to me. My only (minor) concern is
that the -mloongson-mmi flag is only available since GCC 9, but likely
any users would need to change -march=loongson2f to -march=loongson3a
anyway, and they can easily change -mloongson-mmi back to -march=...
if needed.

I'll just double check that with this patch that the test suite passes
on my Yeeloong and then commit it. (and sorry for my delayed response)
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH v2] build: improve control logic for enabling MMI.

2020-03-08 Thread Matt Turner

Thank you for the patch!

On Fri, Mar 6, 2020 at 3:28 AM Shiyou Yin  wrote:
>
> From: Yin Shiyou 

Should be yinshiyou-hf@loongson*.cn*?

>
> 1. Replace LS_CFLAGS with MMI_CFLAGS to express its intention more accurately.
>LS_CFLAGS is still available, but it is not recommended.

I'm not aware of any reasons why LS_CFLAGS needs to stay for
compatibility. Do we know of any distros that set it to override the
-march=... value?

> 2. Improve the control logic for enabling MMI.
>
> Three essential conditions for enabling MMI:
> 1) user have not specify --disable-loongson-mmi.
> 2) MMI options has been specified by MMI_CFLAGS,CC or compiler's default 
> setting.
> 3) compiler supports these MMI options.
> ---
>  configure.ac | 69 
> 

We should also update meson.build. I expect/hope that the autotools
build system will go away sometime in the future.

I'm not sure I entirely understand the patch. I understand that the
objective is to make it possible to easily build pixman for Loongson3A
and use the pixman-mmx.c optimizations.

I think it's currently possible to build pixman on mips without
specifying -march=loongson* in CFLAGS and it will enable the
pixman-mmx.c paths and choose them at runtime. Is part of the goal to
keep that working? If so, could we just use the -mloongson-mmi flag to
compile pixman-mmx.c?

Or does that flag mean the Loongson3A variants of the instructions?
What happens if you compile with -march=loongson2f -mloongson-mmi?
Does GCC generate instructions compatible with 2F or 3A?
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for Loongson MMI.

2020-03-08 Thread Matt Turner

On Sat, Feb 22, 2020 at 6:34 AM YunQiang Su  wrote:
>
> Shiyou Yin  于2020年2月22日周六 下午9:26写道：
> >
> > >-Original Message-
> > >From: Adam Jackson [mailto:a...@redhat.com]
> > >Sent: Friday, February 21, 2020 11:33 PM
> > >To: Yin Shiyou; pixman@lists.freedesktop.org
> > >Subject: Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for 
> > >Loongson MMI.
> > >
> > >On Thu, 2020-02-20 at 22:23 +0800, Yin Shiyou wrote:
> > >> It's suggested to use '-mloongson-mmi' to enable MMI.
> > >> To keep compatible with old processor, '-mloongson-mmi' will be
> > >> setted for Loongson-3A only.
> > >
> > >The pattern we've used for other CPUs is to build support for as many
> > >ISA extensions as possible, unless they are explicitly disabled.
> > >Distributions tend to want to set their own minimum ISA levels, and if
> > >they wanted to assert -mloongson-mmi they would already have added it
> > >to CFLAGS globally.
> > >
> > >Do you have any performance data for this change?
> > >
> > >If setting -mloongson-mmi means the compiler can do useful
> > >autovectorization, then that's probably true for other arches too (eg
> > >amd64 vs avx2), and we should support this kind of thing more
> > >generically. But as it stands I don't think this patch is a good idea.
> > >
> > First, that's introduce the history of '-march=loongson2f' and 
> > '-mloongson-mmi'.
> > From loongson2f start, mmi is supported by loongson processor.
>
> Yes. So that's why when we code, we should be very careful, especially
> when we work on base part of a OS, just like pixman.
> One, history mistake will make all of the people painful.
>
> An exmaple is about time_t on 32bit system.
>
> > Unfortunately, the compiler's support for MMI extention is not standardized.
> > Gcc compiler use '-march=loongson2f' for loongson2f at first, but from 
> > Loongson-3A,
> > opcode of mmi instruction has changed, and '-march=loongson3a' is in 
> > replaced.
>
> That is the reason some of Loongson's extensions make upstream unhappy.
> You need be always very careful when you design a CPU.
> 如履薄冰. No zuo no die.
>
> > From last year, compile option for mmi instruction has been standardized.
> > Just like -mmsa for mips MSA. (MMI,LSX,LASX is Loongson SIMD extention.)
> > -mloongson-mmi   for MMI (-march=loongson3a still works, but -mloongson-mmi 
> > is recommended for new processors except Loongson2f. )
> > -mloongson-sx for LSX
> > -mloongson-asxfor LASX
>
> That is good news.
>
> >
> > Second, back to this patch itself.
> > I meet a problem when compile pixman on my Loongson3a with gcc, MMI can't 
> > be enabled.
> > configure check failure: " linking mips:loongson_2f module with previous 
> > mips:gs464 modules"
> > It can be solved by assign LS_CFLAGS="-mloongson-mmi" while config.
> > So I submit this patch in hope that no need to assign LS_CFLAGS explicitly.
> > This won't have much impact on performance as I know.
>
> Here is not about performance. You made a bad design, that is burden of 
> history.

If you're referring to using -march=loongson2f in configure.ac, then I
should point out that that was my choice, and I don't really know what
other options I had -- or even have today. As far as I know,
-march=loongson* was, until recently, the only way to enable the SIMD
instructions, and worse, if I recall correctly Loongson 2E and 2F are
not entirely binary compatible themselves!

The only stable Loongson system I've ever had is a Yeeloong -- 2F, so
it's what I chose. Like I said in another email, I even tried building
pixman-mmx.c multiple times with different -march=... values, linking
them all into libpixman, and choosing which to execute at runtime, but
binutils does not allow linking object files that are compiled with
different -march=... values on mips for reasons I do not know.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for Loongson MMI.

2020-03-08 Thread Matt Turner

On Sat, Feb 22, 2020 at 5:26 AM Shiyou Yin  wrote:
>
> >-Original Message-
> >From: Adam Jackson [mailto:a...@redhat.com]
> >Sent: Friday, February 21, 2020 11:33 PM
> >To: Yin Shiyou; pixman@lists.freedesktop.org
> >Subject: Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for 
> >Loongson MMI.
> >
> >On Thu, 2020-02-20 at 22:23 +0800, Yin Shiyou wrote:
> >> It's suggested to use '-mloongson-mmi' to enable MMI.
> >> To keep compatible with old processor, '-mloongson-mmi' will be
> >> setted for Loongson-3A only.
> >
> >The pattern we've used for other CPUs is to build support for as many
> >ISA extensions as possible, unless they are explicitly disabled.
> >Distributions tend to want to set their own minimum ISA levels, and if
> >they wanted to assert -mloongson-mmi they would already have added it
> >to CFLAGS globally.
> >
> >Do you have any performance data for this change?
> >
> >If setting -mloongson-mmi means the compiler can do useful
> >autovectorization, then that's probably true for other arches too (eg
> >amd64 vs avx2), and we should support this kind of thing more
> >generically. But as it stands I don't think this patch is a good idea.
> >
> First, that's introduce the history of '-march=loongson2f' and 
> '-mloongson-mmi'.
> From loongson2f start, mmi is supported by loongson processor.
> Unfortunately, the compiler's support for MMI extention is not standardized.
> Gcc compiler use '-march=loongson2f' for loongson2f at first, but from 
> Loongson-3A,
> opcode of mmi instruction has changed, and '-march=loongson3a' is in replaced.
> From last year, compile option for mmi instruction has been standardized.
> Just like -mmsa for mips MSA. (MMI,LSX,LASX is Loongson SIMD extention.)
> -mloongson-mmi   for MMI (-march=loongson3a still works, but -mloongson-mmi 
> is recommended for new processors except Loongson2f. )
> -mloongson-sx for LSX
> -mloongson-asxfor LASX
>
> Second, back to this patch itself.
> I meet a problem when compile pixman on my Loongson3a with gcc, MMI can't be 
> enabled.
> configure check failure: " linking mips:loongson_2f module with previous 
> mips:gs464 modules"

Do you know why this is?

Obviously we can and do build MMX, SSE2, SSSE3 paths and choose to
execute them at runtime.

Why does binutils not allow combining object files that are compiled
with mixed -march=... values on mips? I cannot find the branch now,
but I tried once to make pixman build pixman-mmx.c with three
different -march=... values (2e, 2f, 3a) and choose which to execute
at runtime, but binutils would not allow the files to be linked into
the same binary.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] pixman-combine: Fix wrong value of RB_MASK_PLUS_ONE.

2020-02-20 Thread Matt Turner

On Thu, Feb 20, 2020 at 6:35 AM Shiyou Yin  wrote:
> Will this patch be merged?

Yes, pushed. Thanks!
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] pixman-combine: Fix wrong value of RB_MASK_PLUS_ONE.

2020-02-08 Thread Matt Turner

On Mon, Feb 3, 2020 at 1:56 AM Yin Shiyou  wrote:
>
> ---
>  pixman/pixman-combine32.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/pixman/pixman-combine32.h b/pixman/pixman-combine32.h
> index cdd56a6..59bb247 100644
> --- a/pixman/pixman-combine32.h
> +++ b/pixman/pixman-combine32.h
> @@ -12,7 +12,7 @@
>  #define RB_MASK 0xff00ff
>  #define AG_MASK 0xff00ff00
>  #define RB_ONE_HALF 0x800080
> -#define RB_MASK_PLUS_ONE 0x1100
> +#define RB_MASK_PLUS_ONE 0x1000100

Thanks. The patch looks correct, but obviously nothing in the test
suite is failing. How did you discover this? Does this patch fix
something for you?
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Optimize Graphic Routines for s390x in Pixman - Queries

2020-02-08 Thread Matt Turner

On Sat, Jan 25, 2020 at 4:57 AM Naveen Naidu  wrote:
>
> Hello Everyone,
>
> I am Naveen a Senior Year Computer Science Undergraduate from India. I am 
> planning to apply for Open Mainframe Project 
> Internship(https://github.com/openmainframeproject-internship/resources) 
> program, whose one of the proposed project is to Optimize graphics routines 
> for s390x in pixman.
>
> The description of the project is as follows:
>
>> With the introduction of VirtIO GPU hardware (virtual graphic adapter for 
>> KVM-based virtual machines) for the s390x platform it makes sense to provide 
>> optimized routines in the pixman library also for the s390x architecture.
>
>
> From what I gather from the description, t s390x has support for vector 
> instruction i.e SIMD instructions and since these instructions quicken the 
> processing, the project asks us to write an implementation of pixman that 
> uses the vector instructions for s390x.
>
> I have also been going through the Implementation for Power VMX SIMD, which 
> was created to use the Vector instructions for Power PC. But I must confess 
> that I am a little lost.
>
> It would be really kind of you all if you could guide me in what I would need 
> to learn/do in order for me to be able to implement the project. I've had a 
> course on computer graphics in our undergrad so I do understand the 
> fundamentals. But I would really like to know the right way of steps to do 
> the project so that I can get a better understanding of the project.
>
> Thank you very much for your time,
> Naveen

Welcome :)

Here's some snippets of an email I sent to someone else interested in
contributing optimization to pixman:

Background information for the operations pixman implements:
http://ssp.impulsetrain.com/porterduff.html (written by the author of Pixman)
https://en.wikipedia.org/wiki/Alpha_compositing

`lowlevel-blt-bench` lives in pixman's test/ directory. It's a small
self-contained benchmark. Run with

   ./test/lowlevel-blt-bench all
   ./test/lowlevel-blt-bench over__

etc. The -b (bilinear) and -n (nearest) options are useful as well.
Firefox traces will show lots of usage of bilinear and nearest scaling
functions.

There's an environment variable named PIXMAN_DISABLE=... which is very
useful for getting side-by-side performance comparisons of MMX vs SSE2
vs AVX2. (For S390, since it doesn't already have some optimizations,
it may not be particularly useful). It works for both
lowlevel-blt-bench and cairo-perf-trace.

Cairo
https://cgit.freedesktop.org/cairo/My
https://cgit.freedesktop.org/cairo-traces/

`cairo-perf-trace` lives in cairo's perf directory. Run with

   CAIRO_TEST_TARGET=image16,image ./perf/cairo-perf-trace ~/path/to/trace

The trace files in cairo-traces are .lzma files which will have to be
decompressed. Decompress with lzma -dk trace.lzma or alternatively run
make in cairo-traces to uncompress them all. Pass the uncompressed
file to cairo-perf-trace. The arguments to CAIRO_TEST_TARGET specify
what backend Cairo should use. 'image' corresponds to 32-bit visuals,
and 'image16' is 16-bit visuals.

Here's a couple of my blog posts about some work I did on pixman.
Maybe you can find something valuable in them.
https://mattst88.com/blog/2012/05/17/Optimizing_pixman_for_Loongson:_Process_and_Results/
https://mattst88.com/blog/2012/07/06/My_time_optimizing_graphics_performance_on_the_OLPC_XO_1,75_laptop/

I would look at the pixman_sse2.c file for examples of what pixman
optimizations look like. That may be a better starting point than the
POWER optimizations. I have a small branch here
(https://cgit.freedesktop.org/~mattst88/pixman/log/?h=avx2) that
demonstrates adding a set of optimizations for a new instruction set.
I expect it would be helpful to look over.

Thanks,
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] [dither] Don't use GNU extension for binary numbers

2019-06-10 Thread Matt Turner

Thanks. Pushed.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Dithering patches, v2

2019-05-13 Thread Matt Turner

On Sat, May 11, 2019 at 7:42 AM Bryce Harrington
 wrote:
>
> On Tue, May 07, 2019 at 09:52:39AM -0700, Matt Turner wrote:
> > On Sun, May 5, 2019 at 11:50 AM Bryce Harrington
> >  wrote:
> > >
> > > On Mon, Apr 22, 2019 at 09:26:48AM -0700, Matt Turner wrote:
> > > > On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington
> > > >  wrote:
> > > > > Inkscape would love to see Basile's dithering patches included.  Our
> > > > > testing shows that they make a huge quality difference for our users;
> > > > > this solves a critical need.
> > > > >
> > > > > Mc and I have done some preliminary investigation into how to plumb 
> > > > > this
> > > > > into Cairo, and would love to hear your review of Basile's approach to
> > > > > the problem.
> > > >
> > > > I don't feel like I'm experienced enough with that side of pixman to
> > > > offer meaningful comments. I've Cc'd Søren in the hopes that he
> > > > remains interested enough in the project to review the patches that
> > > > Basile says implement the approach Søren described.
> > >
> > > I totally understand, I'd feel the same.  But I think this is an
> > > important patch, so how can we move forward with it?
> >
> > If you're happy with the patches, I'd say let's commit them.
>
> Works for me, would you prefer me to commit them, or will you be
> committing them yourself?

I'd prefer you commit them since they're for Inkscape.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Dithering patches, v2

2019-05-07 Thread Matt Turner

On Sun, May 5, 2019 at 11:50 AM Bryce Harrington
 wrote:
>
> On Mon, Apr 22, 2019 at 09:26:48AM -0700, Matt Turner wrote:
> > On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington
> >  wrote:
> > > Inkscape would love to see Basile's dithering patches included.  Our
> > > testing shows that they make a huge quality difference for our users;
> > > this solves a critical need.
> > >
> > > Mc and I have done some preliminary investigation into how to plumb this
> > > into Cairo, and would love to hear your review of Basile's approach to
> > > the problem.
> >
> > I don't feel like I'm experienced enough with that side of pixman to
> > offer meaningful comments. I've Cc'd Søren in the hopes that he
> > remains interested enough in the project to review the patches that
> > Basile says implement the approach Søren described.
>
> I totally understand, I'd feel the same.  But I think this is an
> important patch, so how can we move forward with it?

If you're happy with the patches, I'd say let's commit them.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Dithering patches, v2

2019-04-22 Thread Matt Turner

On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington
 wrote:
> Inkscape would love to see Basile's dithering patches included.  Our
> testing shows that they make a huge quality difference for our users;
> this solves a critical need.
>
> Mc and I have done some preliminary investigation into how to plumb this
> into Cairo, and would love to hear your review of Basile's approach to
> the problem.

I don't feel like I'm experienced enough with that side of pixman to
offer meaningful comments. I've Cc'd Søren in the hopes that he
remains interested enough in the project to review the patches that
Basile says implement the approach Søren described.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.

2019-04-16 Thread Matt Turner

On Thu, Mar 28, 2019 at 10:41 PM Matt Turner  wrote:
>
> On Wed, Mar 27, 2019 at 1:06 PM Matt Turner  wrote:
> >
> > Thank you. I'll run some benchmarks on my KBL system to confirm and
> > then commit them.
> >
> > I'm planning to do a 0.40 release soon with some Meson fixes and other
> > small things. Seems like these patches will be good to include to make
> > the release have a new feature :)
>
> Or maybe not.
>
> I benchmarked cairo-traces. The only thing that improved measurably
> was poppler. I thought, well, at least we improved that and then
> remembering my patch that also improved it I applied it, only to
> realize that you incorporated my patch into your work without
> mentioning it.
>
> And so your poppler improvements are in fact from my patch, now
> modified and silently combined into this one. That's really bad form.

Review processes undertaken indicate that Raghu wrote this code
independently of me. My apologies for suggesting otherwise.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [ANNOUNCE] pixman release 0.38.4 now available

2019-04-10 Thread Matt Turner


A new pixman release 0.38.4 is now available. This is a stable release in the
in the 0.38 series.

tar.gz:
https://cairographics.org/releases/pixman-0.38.4.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.38.4.tar.gz

tar.bz2:
https://www.x.org/releases/individual/lib/pixman-0.38.4.tar.bz2

Hashes:
MD5:  267a7af290f93f643a1bc74490d9fdd1  pixman-0.38.4.tar.gz
MD5:  16a350a8a40116ddf67632a1d2623711  pixman-0.38.4.tar.bz2
SHA1: 8594e0a31c1802ae0c155d6b502c0953aa862baa  pixman-0.38.4.tar.gz
SHA1: 87e1abc91ac4e5dfcc275f744f1d0ec3277ee7cd  pixman-0.38.4.tar.bz2

GPG signature:
https://cairographics.org/releases/pixman-0.38.4.tar.gz.sha1.asc
(signed by [ultimate] Matt Turner 
[ultimate] Matt Turner 
[ultimate] Matt Turner 
[ultimate] Matt Turner )

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.38.4

Log:
Matt Turner (4):
  Post-release version bump to 0.38.3
  Makefile.am: Update download links
  Makefile.am: Ship Meson assembly test files in the tarball
  Pre-release version bump to 0.38.4


signature.asc
Description: PGP signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [ANNOUNCE] pixman release 0.38.2 now available

2019-04-07 Thread Matt Turner



A new pixman release 0.38.2 is now available. This is a stable release in the
in the 0.38 series.

This release mostly contains fixes for the Meson build system.

tar.gz:
https://cairographics.org/releases/pixman-0.38.2.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.38.2.tar.gz

tar.bz2:
https://www.x.org/releases/individual/lib/pixman-0.38.2.tar.bz2

Hashes:
MD5:  e216abae705641038ca782c6d6fd4204  pixman-0.38.2.tar.gz
MD5:  dfdbebf2ce6c2ff0891247c55f928d97  pixman-0.38.2.tar.bz2
SHA1: c2abaea13ff9f12f31592859604047d8b1fa082a  pixman-0.38.2.tar.gz
SHA1: ce40833fe4337aa6329ac5694d9ff342338219c1  pixman-0.38.2.tar.bz2

GPG signature:
http://cairographics.org/releases/pixman-0.38.2.tar.gz.sha1.asc
(signed by [ultimate] Matt Turner 
[ultimate] Matt Turner 
[ultimate] Matt Turner 
[ultimate] Matt Turner )

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.38.2

Log:
Dylan Baker (6):
  meson: work around meson issue #5115
  meson: fix typo which breaks loongson checks
  meson: fix copy-n-paste error for arm simd assembly
  meson: Add proper include paths for the loongson check
  meson: simplify and fix mmx library compilation
  meson: store ARM SIMD and NEON tests as text files

Matt Turner (2):
  meson: Correct copy-and-paste mistake
  Pre-release version bump to 0.38.2

Niveditha Rau (1):
  void function should not return a value

Simon Richter (2):
  Windows: Show compiler invocation
  Windows: Support building with SHELL=cmd.exe


signature.asc
Description: PGP signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.

2019-03-28 Thread Matt Turner

On Wed, Mar 27, 2019 at 1:06 PM Matt Turner  wrote:
>
> Thank you. I'll run some benchmarks on my KBL system to confirm and
> then commit them.
>
> I'm planning to do a 0.40 release soon with some Meson fixes and other
> small things. Seems like these patches will be good to include to make
> the release have a new feature :)

Or maybe not.

I benchmarked cairo-traces. The only thing that improved measurably
was poppler. I thought, well, at least we improved that and then
remembering my patch that also improved it I applied it, only to
realize that you incorporated my patch into your work without
mentioning it.

And so your poppler improvements are in fact from my patch, now
modified and silently combined into this one. That's really bad form.

From a technical perspective, I think we're back where we started:
with an AVX2 implementation of over__ that does not provide a
meaningful improvement in any cairo-trace and me doubting whether it's
worth pursuing this project any further. To be honest, at this point I
would prefer that you not continue this project.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] void function should not return a value

2019-03-27 Thread Matt Turner

Thanks. Merged.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 1/2] Windows: Show compiler invocation

2019-03-27 Thread Matt Turner

Thanks. Merged both.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.

2019-03-27 Thread Matt Turner

Thank you. I'll run some benchmarks on my KBL system to confirm and
then commit them.

I'm planning to do a 0.40 release soon with some Meson fixes and other
small things. Seems like these patches will be good to include to make
the release have a new feature :)
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] avx2: Add fast path for over_reverse_n_8888

2019-01-21 Thread Matt Turner

lowlevel-blt-bench, over_reverse_n_, 100 iterations:

   Before  After
  Mean StdDev Mean StdDev   Confidence   Change
L1  2372.6   2.50   4387.6   8.00100.00% +84.9%
L2  2490.3   5.29   4326.5  20.79100.00% +73.7%
M   2418.3  10.43   3718.0  38.55100.00% +53.7%
HT  1555.8  13.35   2112.9  23.85100.00% +35.8%
VT  1120.1   9.58   1403.7  15.43100.00% +25.3%
R958.5  17.66   1176.9  20.87100.00% +22.8%
RT   407.3   6.79450.1   7.22100.00% +10.5%

At most 18 outliers rejected per test per set.

cairo-perf-trace with trimmed traces, 30 iterations:

Before  After
   Mean StdDev Mean StdDev   Confidence   Change
poppler   0.516  0.0030.478  0.002   100.000%  +8.1%

Cairo perf reports the running time, but the change is computed for
operations per second instead (inverse of running time).
---
 pixman/pixman-avx2.c | 94 
 1 file changed, 94 insertions(+)

diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c
index faef552..6a67515 100644
--- a/pixman/pixman-avx2.c
+++ b/pixman/pixman-avx2.c
@@ -28,6 +28,18 @@ negate_2x256 (__m256i  data_lo,
 *neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2);
 }
 
+static force_inline __m256i
+unpack_32_1x256 (uint32_t data)
+{
+return _mm256_unpacklo_epi8 (_mm256_broadcastd_epi32 (_mm_cvtsi32_si128 
(data)), _mm256_setzero_si256 ());
+}
+
+static force_inline __m256i
+expand_pixel_32_1x256 (uint32_t data)
+{
+return _mm256_shuffle_epi32 (unpack_32_1x256 (data), _MM_SHUFFLE (1, 0, 1, 
0));
+}
+
 static force_inline __m256i
 pack_2x256_256 (__m256i lo, __m256i hi)
 {
@@ -100,6 +112,13 @@ save_256_aligned (__m256i* dst,
 _mm256_store_si256 (dst, data);
 }
 
+static force_inline void
+save_256_unaligned (__m256i* dst,
+   __m256i  data)
+{
+_mm256_storeu_si256 (dst, data);
+}
+
 static force_inline int
 is_opaque_256 (__m256i x)
 {
@@ -429,12 +448,87 @@ avx2_composite_over__ (pixman_implementation_t 
*imp,
src += src_stride;
 }
 }
+
+static void
+avx2_composite_over_reverse_n_ (pixman_implementation_t *imp,
+   pixman_composite_info_t *info)
+{
+PIXMAN_COMPOSITE_ARGS (info);
+uint32_t src;
+uint32_t*dst_line, *dst;
+__m256i ymm_src;
+__m256i ymm_dst, ymm_dst_lo, ymm_dst_hi;
+__m256i ymm_dsta_hi, ymm_dsta_lo;
+int dst_stride;
+int32_t w;
+
+src = _pixman_image_get_solid (imp, src_image, dest_image->bits.format);
+
+if (src == 0)
+   return;
+
+PIXMAN_IMAGE_GET_LINE (
+   dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 1);
+
+ymm_src = expand_pixel_32_1x256 (src);
+
+while (height--)
+{
+   dst = dst_line;
+
+   dst_line += dst_stride;
+   w = width;
+
+   while (w >= 8)
+   {
+   __m256i tmp_lo, tmp_hi;
+
+   ymm_dst = load_256_unaligned ((__m256i*)dst);
+
+   unpack_256_2x256 (ymm_dst, _dst_lo, _dst_hi);
+   expand_alpha_2x256 (ymm_dst_lo, ymm_dst_hi, _dsta_lo, 
_dsta_hi);
+
+   tmp_lo = ymm_src;
+   tmp_hi = ymm_src;
+
+   over_2x256 (_dst_lo, _dst_hi,
+   _dsta_lo, _dsta_hi,
+   _lo, _hi);
+
+   save_256_unaligned (
+   (__m256i*)dst, pack_2x256_256 (tmp_lo, tmp_hi));
+
+   w -= 8;
+   dst += 8;
+   }
+
+   while (w)
+   {
+   __m128i vd;
+
+   vd = unpack_32_1x128 (*dst);
+
+   *dst = pack_1x128_32 (over_1x128 (vd, expand_alpha_1x128 (vd),
+ _mm256_castsi256_si128 
(ymm_src)));
+   w--;
+   dst++;
+   }
+
+}
+
+}
+
 static const pixman_fast_path_t avx2_fast_paths[] =
 {
 PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, 
avx2_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, 
avx2_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, 
avx2_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, x8b8g8r8, 
avx2_composite_over__),
+
+/* PIXMAN_OP_OVER_REVERSE */
+PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8r8g8b8, 
avx2_composite_over_reverse_n_),
+PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8b8g8r8, 
avx2_composite_over_reverse_n_),
+
 { PIXMAN_OP_NONE },
 };
 
-- 
2.19.2

___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 3/3] Rev2 of patch: AVX2 versions of OVER and ROVER operators.

2019-01-21 Thread Matt Turner

On Wed, Jan 16, 2019 at 4:57 PM Raghuveer Devulapalli
 wrote:
>
> From: raghuveer devulapalli 
>
> These were found to be upto 1.8 times faster (depending on the array
> size) than the corresponding SSE2 version. The AVX2 and SSE2 were
> benchmarked on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz. The AVX2 and
> SSE versions were benchmarked by measuring how many TSC cycles each of
> the avx2_combine_over_u and sse2_combine_over_u functions took to run
> for various array sizes. For the purpose of benchmarking, turbo was
> disabled and intel_pstate governor was set to performance to avoid
> variance in CPU frequencies across multiple runs.
>
> | Array size | #cycles SSE2 | #cycles AVX2 |
> 
> | 400| 53966| 32800|
> | 800| 107595   | 62595|
> | 1600   | 214810   | 122482   |
> | 3200   | 429748   | 241971   |
> | 6400   | 859070   | 481076   |
>
> Also ran lowlevel-blt-bench for OVER__ operation and that
> also shows a 1.55x-1.79x improvement over SSE2. Here are the details:
>
> AVX2: OVER__ =  L1:2136.35  L2:2109.46  M:1751.99 ( 60.90%)
> SSE2: OVER__ =  L1:1188.91  L2:1190.63  M:1128.32 ( 40.31%)
>
> The AVX2 implementation uses the SSE2 version for manipulating pixels
> that are not 32 byte aligned. The helper functions from pixman-sse2.h
> are re-used for this purpose.

I still cannot measure any performance improvement with cairo-traces.
If we're not improving performance in any real world application, then
I don't think it's worth adding a significant amount of code.

As I told you in person and in private mail, I suspect that you're
more likely to see real performance improvements in operations that
are more compute-heavy, like bilinear filtering. You could probably
use AVX2's gather instructions in the bilinear code as well. Filling
out the avx2_iters array would also be a good place to start, since
those functions execute when we do not have a specific fast-path for
an operation (which will be the case for AVX2).

I sense that you want to check this off your todo list and move on. If
that's the case, we can include the avx2_composite_over_reverse_n_
function I wrote (and will send as a separate patch) to confirm that
using AVX2 is capable of giving a performance improvement in some
cairo traces.

> ---
>  pixman/pixman-avx2.c | 431 ++-
>  1 file changed, 430 insertions(+), 1 deletion(-)
>
> diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c
> index d860d67..faef552 100644
> --- a/pixman/pixman-avx2.c
> +++ b/pixman/pixman-avx2.c
> @@ -6,13 +6,439 @@
>  #include "pixman-private.h"
>  #include "pixman-combine32.h"
>  #include "pixman-inlines.h"
> +#include "pixman-sse2.h"
>
> +#define MASK_0080_AVX2 _mm256_set1_epi16(0x0080)
> +#define MASK_00FF_AVX2 _mm256_set1_epi16(0x00ff)
> +#define MASK_0101_AVX2 _mm256_set1_epi16(0x0101)
> +
> +static force_inline __m256i
> +load_256_aligned (__m256i* src)
> +{
> +return _mm256_load_si256(src);
> +}
> +
> +static force_inline void
> +negate_2x256 (__m256i  data_lo,
> + __m256i  data_hi,
> + __m256i* neg_lo,
> + __m256i* neg_hi)
> +{
> +*neg_lo = _mm256_xor_si256 (data_lo, MASK_00FF_AVX2);
> +*neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2);
> +}
> +
> +static force_inline __m256i
> +pack_2x256_256 (__m256i lo, __m256i hi)
> +{
> +return _mm256_packus_epi16 (lo, hi);
> +}
> +

Stray space

> +static force_inline void
> +pix_multiply_2x256 (__m256i* data_lo,
> +   __m256i* data_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi,
> +   __m256i* ret_lo,
> +   __m256i* ret_hi)
> +{
> +__m256i lo, hi;
> +
> +lo = _mm256_mullo_epi16 (*data_lo, *alpha_lo);
> +hi = _mm256_mullo_epi16 (*data_hi, *alpha_hi);
> +lo = _mm256_adds_epu16 (lo, MASK_0080_AVX2);
> +hi = _mm256_adds_epu16 (hi, MASK_0080_AVX2);
> +*ret_lo = _mm256_mulhi_epu16 (lo, MASK_0101_AVX2);
> +*ret_hi = _mm256_mulhi_epu16 (hi, MASK_0101_AVX2);
> +}
> +

Stray space

> +static force_inline void
> +over_2x256 (__m256i* src_lo,
> +   __m256i* src_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi,
> +   __m256i* dst_lo,
> +   __m256i* dst_hi)
> +{
> +__m256i t1, t2;
> +
> +negate_2x256 (*alpha_lo, *alpha_hi, , );
> +
> +pix_multiply_2x256 (dst_lo, dst_hi, , , dst_lo, dst_hi);
> +
> +*dst_lo = _mm256_adds_epu8 (*src_lo, *dst_lo);
> +*dst_hi = _mm256_adds_epu8 (*src_hi, *dst_hi);
> +}
> +
> +static force_inline void
> +expand_alpha_2x256 (__m256i  data_lo,
> +   __m256i  data_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi)
> +{
> +__m256i lo, hi;
> +
> +lo = _mm256_shufflelo_epi16 (data_lo, _MM_SHUFFLE (3, 3, 3, 3));
> +hi =

Re: [Pixman] [PATCH 2/3] Moving helper functions in pixman-sse2.c to pixman-sse2.h.

2019-01-21 Thread Matt Turner

On Wed, Jan 16, 2019 at 4:57 PM Raghuveer Devulapalli
 wrote:
>
> From: raghuveer devulapalli 
>
> These helper function will be reused in pixman-avx2.c implementations in
> the future.
> ---
>  pixman/pixman-sse2.c | 504 +--
>  pixman/pixman-sse2.h | 502 ++
>  2 files changed, 503 insertions(+), 503 deletions(-)
>  create mode 100644 pixman/pixman-sse2.h
>
> diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
> index 8955103..8dea0c2 100644
> --- a/pixman/pixman-sse2.c
> +++ b/pixman/pixman-sse2.c
> @@ -32,509 +32,7 @@
>
>  /* PSHUFD is slow on a lot of old processors, and new processors have SSSE3 
> */
>  #define PSHUFD_IS_FAST 0
> -
> -#include  /* for _mm_shuffle_pi16 and _MM_SHUFFLE */
> -#include  /* for SSE2 intrinsics */
> -#include "pixman-private.h"
> -#include "pixman-combine32.h"
> -#include "pixman-inlines.h"
> -
> -static __m128i mask_0080;
> -static __m128i mask_00ff;
> -static __m128i mask_0101;
> -static __m128i mask_;
> -static __m128i mask_ff00;
> -static __m128i mask_alpha;
> -
> -static __m128i mask_565_r;
> -static __m128i mask_565_g1, mask_565_g2;
> -static __m128i mask_565_b;
> -static __m128i mask_red;
> -static __m128i mask_green;
> -static __m128i mask_blue;
> -
> -static __m128i mask_565_fix_rb;
> -static __m128i mask_565_fix_g;
> -
> -static __m128i mask_565_rb;
> -static __m128i mask_565_pack_multiplier;
> -

These are moving to pixman-sse2.h to be used by the code below, which
is to be used by the AVX2 code. But they're initialized in
_pixman_implementation_create_sse2(), which means if you used
PIXMAN_DISABLE=sse2 the AVX2 paths would fail.

I suspect these constants do need to be prefixed with "sse2_", and in
_pixman_x86_get_implementations() you should disable avx2 if
PIXMAN_DISABLE=sse2.
> -static force_inline __m128i
> -unpack_32_1x128 (uint32_t data)
> -{
> -return _mm_unpacklo_epi8 (_mm_cvtsi32_si128 (data), _mm_setzero_si128 
> ());
> -}
> -
> -static force_inline void
> -unpack_128_2x128 (__m128i data, __m128i* data_lo, __m128i* data_hi)
> -{
> -*data_lo = _mm_unpacklo_epi8 (data, _mm_setzero_si128 ());
> -*data_hi = _mm_unpackhi_epi8 (data, _mm_setzero_si128 ());
> -}
> -
> -static force_inline __m128i
> -unpack_565_to_ (__m128i lo)
> -{
> -__m128i r, g, b, rb, t;
> -
> -r = _mm_and_si128 (_mm_slli_epi32 (lo, 8), mask_red);
> -g = _mm_and_si128 (_mm_slli_epi32 (lo, 5), mask_green);
> -b = _mm_and_si128 (_mm_slli_epi32 (lo, 3), mask_blue);
> -
> -rb = _mm_or_si128 (r, b);
> -t  = _mm_and_si128 (rb, mask_565_fix_rb);
> -t  = _mm_srli_epi32 (t, 5);
> -rb = _mm_or_si128 (rb, t);
> -
> -t  = _mm_and_si128 (g, mask_565_fix_g);
> -t  = _mm_srli_epi32 (t, 6);
> -g  = _mm_or_si128 (g, t);
> -
> -return _mm_or_si128 (rb, g);
> -}
> -
> -static force_inline void
> -unpack_565_128_4x128 (__m128i  data,
> -  __m128i* data0,
> -  __m128i* data1,
> -  __m128i* data2,
> -  __m128i* data3)
> -{
> -__m128i lo, hi;
> -
> -lo = _mm_unpacklo_epi16 (data, _mm_setzero_si128 ());
> -hi = _mm_unpackhi_epi16 (data, _mm_setzero_si128 ());
> -
> -lo = unpack_565_to_ (lo);
> -hi = unpack_565_to_ (hi);
> -
> -unpack_128_2x128 (lo, data0, data1);
> -unpack_128_2x128 (hi, data2, data3);
> -}
> -
> -static force_inline uint16_t
> -pack_565_32_16 (uint32_t pixel)
> -{
> -return (uint16_t) (((pixel >> 8) & 0xf800) |
> -  ((pixel >> 5) & 0x07e0) |
> -  ((pixel >> 3) & 0x001f));
> -}
> -
> -static force_inline __m128i
> -pack_2x128_128 (__m128i lo, __m128i hi)
> -{
> -return _mm_packus_epi16 (lo, hi);
> -}
> -
> -static force_inline __m128i
> -pack_565_2packedx128_128 (__m128i lo, __m128i hi)
> -{
> -__m128i rb0 = _mm_and_si128 (lo, mask_565_rb);
> -__m128i rb1 = _mm_and_si128 (hi, mask_565_rb);
> -
> -__m128i t0 = _mm_madd_epi16 (rb0, mask_565_pack_multiplier);
> -__m128i t1 = _mm_madd_epi16 (rb1, mask_565_pack_multiplier);
> -
> -__m128i g0 = _mm_and_si128 (lo, mask_green);
> -__m128i g1 = _mm_and_si128 (hi, mask_green);
> -
> -t0 = _mm_or_si128 (t0, g0);
> -t1 = _mm_or_si128 (t1, g1);
> -
> -/* Simulates _mm_packus_epi32 */
> -t0 = _mm_slli_epi32 (t0, 16 - 5);
> -t1 = _mm_slli_epi32 (t1, 16 - 5);
> -t0 = _mm_srai_epi32 (t0, 16);
> -t1 = _mm_srai_epi32 (t1, 16);
> -return _mm_packs_epi32 (t0, t1);
> -}
> -
> -static force_inline __m128i
> -pack_565_2x128_128 (__m128i lo, __m128i hi)
> -{
> -__m128i data;
> -__m128i r, g1, g2, b;
> -
> -data = pack_2x128_128 (lo, hi);
> -
> -r  = _mm_and_si128 (data, mask_565_r);
> -g1 = _mm_and_si128 (_mm_slli_epi32 (data, 3), mask_565_g1);
> -g2 = _mm_and_si128 (_mm_srli_epi32 (data, 5), mask_565_g2);
> -b  = _mm_and_si128 (_mm_srli_epi32 (data, 3), mask_565_b);
> -

Re: [Pixman] [PATCH] mmx: compile on MIPS for Loongson-3A MMI optimizations

2018-09-19 Thread Matt Turner

On Tue, Sep 18, 2018 at 2:34 AM  wrote:
>
> From: Xianju Diao 
>
> make check:
> when I enable the USE_OPENMP, the test of 'glyph-test' and 
> 'cover-test' will failed on Loongson-3A3000.
> Neither of the two test examples passed without optimizing the 
> code.Maybe be multi-core synchronization
> of cpu bug,I will continue to debug this problem, Now, I use the 
> critical of openMP, 'glyph-test' and '
> cover-test' can passed.
>
> benchmark:
> Running cairo-perf-trace benchmark on Loongson-3A.
>   image image16
> gvim  5.425 -> 5.069 5.531 -> 5.236
> popler-reseau 2.149 -> 2.13  2.152 -> 2.139
> swfdec-giant-steps-full  18.672 -> 8.21533.167 -> 18.28
> swfdec-giant-steps7.014 -> 2.45512.48  -> 5.982
> xfce4-terminal-al13.695 -> 5.24115.703 -> 5.859
> gonme-system-monitor 12.783 -> 7.05812.780 -> 7.104
> grads-heat-map0.482 -> 0.486 0.516 -> 0.514
> firefox-talos-svg   141.138 -> 134.621 152.495 -> 159.069
> firefox-talos-gfx23.119 -> 14.437   24.870 -> 15.161
> firefox-world-map32.018 -> 27.139   33.817 -> 28.085
> firefox-periodic-table   12.305 -> 12.443   12.876 -> 12.913
> evolution 7.071 -> 3.564 8.550 -> 3.784
> firefox-planet-gnome 77.926 -> 67.526   81.554 -> 65.840
> ocitysmap 4.934 -> 1.702 4.937 -> 1.701
> ---

Thanks for the patch. I will review it when I have time (I'm preparing
for a trip at the moment).

I have a Loongson3 system that I have found to be unstable. I assume
it is due to the hardware bugs that must be worked around in gcc and
binutils. I have patched both of them with the patches I found in
https://github.com/loongson-community/binutils-gdb etc, but I still
have instability. I would appreciate it very much if you could offer
some suggestions or help in improving the stability of my system.

Looks like there are a couple of different things happening in this
patch. We should try to split them up. One patch could be making the
assembly memcpy implementation usable on mips64. A separate patch
would add new functions to pixman-mmx.c.

A few quick comments inline.

>  configure.ac|7 +-
>  pixman/Makefile.am  |4 +-
>  pixman/loongson-mmintrin.h  |   46 ++
>  pixman/pixman-combine32.h   |6 +
>  pixman/pixman-mips-dspr2-asm.h  |2 +-
>  pixman/pixman-mips-memcpy-asm.S |  324 +---
>  pixman/pixman-mmx.c | 1088 
> ++-
>  pixman/pixman-private.h |   32 +-
>  pixman/pixman-solid-fill.c  |   49 +-
>  pixman/pixman-utils.c   |   65 ++-
>  test/Makefile.am|2 +-
>  test/utils.c|8 +

This diff stat doesn't correspond to this patch.

>  12 files changed, 1418 insertions(+), 215 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index e833e45..3e3dde5 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -154,9 +154,9 @@ AC_CHECK_DECL([__amd64], [AMD64_ABI="yes"], 
> [AMD64_ABI="no"])
>  # has set CFLAGS.
>  if test $SUNCC = yes &&\
> test "x$test_CFLAGS" = "x" &&   \
> -   test "$CFLAGS" = "-g"
> +   test "$CFLAGS" = "-g -mabi=n64"
>  then
> -  CFLAGS="-O -g"
> +  CFLAGS="-O -g -mabi=n64"

This isn't acceptable.

>  fi
>
>  #
> @@ -183,6 +183,7 @@ AC_SUBST(LT_VERSION_INFO)
>  # Check for dependencies
>
>  PIXMAN_CHECK_CFLAG([-Wall])
> +PIXMAN_CHECK_CFLAG([-mabi=n64])
>  PIXMAN_CHECK_CFLAG([-Wdeclaration-after-statement])
>  PIXMAN_CHECK_CFLAG([-Wno-unused-local-typedefs])
>  PIXMAN_CHECK_CFLAG([-fno-strict-aliasing])
> @@ -273,7 +274,7 @@ dnl 
> ===
>  dnl Check for Loongson Multimedia Instructions
>
>  if test "x$LS_CFLAGS" = "x" ; then
> -LS_CFLAGS="-march=loongson2f"
> +LS_CFLAGS="-march=loongson3a"

Also not acceptable. I see that recent gcc and binutils have gotten
new options for enabling MMI separately from -march=loongson*. Maybe
we could use those if available.

I'm not sure there is currently a good solution. Let me think about it.

>  fi
>
>  have_loongson_mmi=no
> diff --git a/pixman/Makefile.am b/pixman/Makefile.am
> index 581b6f6..e3a080c 100644
> --- a/pixman/Makefile.am
> +++ b/pixman/Makefile.am
> @@ -122,7 +122,7 @@ libpixman_mips_dspr2_la_SOURCES = \
>  pixman-mips-dspr2.h \
>  pixman-mips-dspr2-asm.S \
>  pixman-mips-dspr2-asm.h \
> -pixman-mips-memcpy-asm.S
> +#pixman-mips-memcpy-asm.S

Can't do this.

>  libpixman_1_la_LIBADD += libpixman-mips-dspr2.la
>
>

Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator

2018-08-30 Thread Matt Turner

On Wed, Aug 29, 2018 at 12:09 PM Matt Turner  wrote:
> Trailing whitespace. There's a lot throughout this patch. I'm not
> going to point them out individually.

I just looked up how to configure git to alert you to bad whitespace:

git config core.whitespace indent-with-non-tab,space-before-tab,trailing-space

Give that a try.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator

2018-08-30 Thread Matt Turner

On Wed, Aug 29, 2018 at 12:09 PM Matt Turner  wrote:
>
> On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli
>  wrote:
> >
> > The AVX2 implementation of OVER and REVERSE OVER operator was
> > found to be upto 2.2 times faster (depending on the array size) than
> > the corresponding SSE2 version. The AVX2 and SSE2 were benchmarked
> > on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz
> >
> > Moving the helper functions in pixman-sse2.c to pixman-sse2.h. The AVX2
> > implementation uses the SSE2 version for manipulating pixels that are not
> > 32 byte aligned and hence, it made sense to separate the SSE2 helper
> > functions into a separate file to be included in the AVX2 file rather
> > than duplicate code.
>
> Let's please move the helpers into pixman-sse2.h in a separate commit
> from the one that adds AVX2 code paths.
>
> We typically have more substantial benchmarks in the commit message.

I ran all of the cairo traces in the benchmarks directory and couldn't
measure any difference. You'll have to describe your benchmarking.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator

2018-08-29 Thread Matt Turner

On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli
 wrote:
>
> The AVX2 implementation of OVER and REVERSE OVER operator was
> found to be upto 2.2 times faster (depending on the array size) than
> the corresponding SSE2 version. The AVX2 and SSE2 were benchmarked
> on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz
>
> Moving the helper functions in pixman-sse2.c to pixman-sse2.h. The AVX2
> implementation uses the SSE2 version for manipulating pixels that are not
> 32 byte aligned and hence, it made sense to separate the SSE2 helper
> functions into a separate file to be included in the AVX2 file rather
> than duplicate code.

Let's please move the helpers into pixman-sse2.h in a separate commit
from the one that adds AVX2 code paths.

We typically have more substantial benchmarks in the commit message.

Let me run some cairo traces and see what I come up with.

Also, what about the problems of AVX2 turbo?

https://mobile.twitter.com/rygorous/status/992170573819138048
https://gist.github.com/rygorous/32bc3ea8301dba09358fd2c64e02d774

It doesn't seem like we are doing anything related to it in these patches.

> ---
>  pixman/pixman-avx2.c | 401 
>  pixman/pixman-sse2.c | 504 
> +--
>  pixman/pixman-sse2.h | 502 ++
>  3 files changed, 904 insertions(+), 503 deletions(-)
>  create mode 100644 pixman/pixman-sse2.h
>
> diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c
> index d860d67..60b1b2b 100644
> --- a/pixman/pixman-avx2.c
> +++ b/pixman/pixman-avx2.c
> @@ -6,6 +6,404 @@
>  #include "pixman-private.h"
>  #include "pixman-combine32.h"
>  #include "pixman-inlines.h"
> +#include "pixman-sse2.h"
> +
> +#define MASK_0080_AVX2 _mm256_set1_epi16(0x0080)
> +#define MASK_00FF_AVX2 _mm256_set1_epi16(0x00ff)
> +#define MASK_0101_AVX2 _mm256_set1_epi16(0x0101)
> +
> +static force_inline __m256i

Trailing whitespace. There's a lot throughout this patch. I'm not
going to point them out individually.

> +load_256_aligned (__m256i* src)
> +{
> +return _mm256_load_si256(src);
> +}
> +
> +static force_inline void
> +negate_2x256 (__m256i  data_lo,
> + __m256i  data_hi,
> + __m256i* neg_lo,
> + __m256i* neg_hi)
> +{
> +*neg_lo = _mm256_xor_si256 (data_lo, MASK_00FF_AVX2);
> +*neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2);
> +}
> +
> +static force_inline __m256i
> +pack_2x256_256 (__m256i lo, __m256i hi)
> +{
> +return _mm256_packus_epi16 (lo, hi);
> +}
> +
> +static force_inline void
> +pix_multiply_2x256 (__m256i* data_lo,
> +   __m256i* data_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi,
> +   __m256i* ret_lo,
> +   __m256i* ret_hi)
> +{
> +__m256i lo, hi;
> +
> +lo = _mm256_mullo_epi16 (*data_lo, *alpha_lo);
> +hi = _mm256_mullo_epi16 (*data_hi, *alpha_hi);
> +lo = _mm256_adds_epu16 (lo, MASK_0080_AVX2);
> +hi = _mm256_adds_epu16 (hi, MASK_0080_AVX2);
> +*ret_lo = _mm256_mulhi_epu16 (lo, MASK_0101_AVX2);
> +*ret_hi = _mm256_mulhi_epu16 (hi, MASK_0101_AVX2);
> +}
> +
> +static force_inline void
> +over_2x256 (__m256i* src_lo,
> +   __m256i* src_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi,
> +   __m256i* dst_lo,
> +   __m256i* dst_hi)
> +{
> +__m256i t1, t2;
> +
> +negate_2x256 (*alpha_lo, *alpha_hi, , );
> +
> +pix_multiply_2x256 (dst_lo, dst_hi, , , dst_lo, dst_hi);
> +
> +*dst_lo = _mm256_adds_epu8 (*src_lo, *dst_lo);
> +*dst_hi = _mm256_adds_epu8 (*src_hi, *dst_hi);
> +}
> +
> +static force_inline void
> +expand_alpha_2x256 (__m256i  data_lo,
> +   __m256i  data_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi)
> +{
> +__m256i lo, hi;
> +
> +lo = _mm256_shufflelo_epi16 (data_lo, _MM_SHUFFLE (3, 3, 3, 3));
> +hi = _mm256_shufflelo_epi16 (data_hi, _MM_SHUFFLE (3, 3, 3, 3));
> +
> +*alpha_lo = _mm256_shufflehi_epi16 (lo, _MM_SHUFFLE (3, 3, 3, 3));
> +*alpha_hi = _mm256_shufflehi_epi16 (hi, _MM_SHUFFLE (3, 3, 3, 3));
> +}
> +
> +static force_inline  void
> +unpack_256_2x256 (__m256i data, __m256i* data_lo, __m256i* data_hi)
> +{
> +*data_lo = _mm256_unpacklo_epi8 (data, _mm256_setzero_si256 ());
> +*data_hi = _mm256_unpackhi_epi8 (data, _mm256_setzero_si256 ());
> +}
> +
> +/* save 4 pixels on a 16-byte boundary aligned address */
> +static force_inline void
> +save_256_aligned (__m256i* dst,
> + __m256i  data)
> +{
> +_mm256_store_si256 (dst, data);
> +}
> +
> +static force_inline int
> +is_opaque_256 (__m256i x)
> +{
> +__m256i ffs = _mm256_cmpeq_epi8 (x, x);
> +
> +return (_mm256_movemask_epi8
> +   (_mm256_cmpeq_epi8 (x, ffs)) & 0x) == 0x;
> +}
> +
> +static force_inline int
> +is_zero_256 (__m256i x)
> +{
> +

Re: [Pixman] [PATCH] Adding infrastructure to permit future AVX2 implementations

2018-08-29 Thread Matt Turner

Thank you for the patches! Some comments inline.

On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli
 wrote:
>
> ---
>  configure.ac| 44 
>  pixman/Makefile.am  | 12 
>  pixman/pixman-avx2.c| 32 
>  pixman/pixman-private.h |  5 +
>  pixman/pixman-x86.c | 15 +--
>  5 files changed, 106 insertions(+), 2 deletions(-)
>  create mode 100644 pixman/pixman-avx2.c
>
> diff --git a/configure.ac b/configure.ac
> index e833e45..27f4305 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -503,6 +503,48 @@ fi
>  AM_CONDITIONAL(USE_SSSE3, test $have_ssse3_intrinsics = yes)
>
>  dnl 
> ===
> +dnl Check for AVX2

Trailing whitespace

> +
> +if test "x$AVX2_CFLAGS" = "x" ; then
> +AVX2_CFLAGS="-mavx2 -Winline"
> +fi
> +
> +have_avx2_intrinsics=no
> +AC_MSG_CHECKING(whether to use AVX2 intrinsics)
> +xserver_save_CFLAGS=$CFLAGS
> +CFLAGS="$AVX2_CFLAGS $CFLAGS"
> +
> +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
> +#include 
> +int param;
> +int main () {
> +__m256i a = _mm256_set1_epi32 (param), b = _mm256_set1_epi32 (param + 
> 1), c;
> +c = _mm256_maddubs_epi16 (a, b);
> +return _mm256_cvtsi256_si32(c);
> +}]])], have_avx2_intrinsics=yes)
> +CFLAGS=$xserver_save_CFLAGS
> +
> +AC_ARG_ENABLE(avx2,
> +   [AC_HELP_STRING([--disable-avx2],
> +   [disable AVX2 fast paths])],
> +   [enable_avx2=$enableval], [enable_avx2=auto])
> +
> +if test $enable_avx2 = no ; then
> +   have_avx2_intrinsics=disabled
> +fi
> +
> +if test $have_avx2_intrinsics = yes ; then
> +   AC_DEFINE(USE_AVX2, 1, [use AVX2 compiler intrinsics])
> +fi
> +
> +AC_MSG_RESULT($have_avx2_intrinsics)
> +if test $enable_avx2 = yes && test $have_avx2_intrinsics = no ; then
> +   AC_MSG_ERROR([AVX2 intrinsics not detected])
> +fi
> +
> +AM_CONDITIONAL(USE_AVX2, test $have_avx2_intrinsics = yes)
> +
> +dnl 
> ===
>  dnl Other special flags needed when building code using MMX or SSE 
> instructions
>  case $host_os in
> solaris*)
> @@ -538,6 +580,8 @@ AC_SUBST(MMX_LDFLAGS)
>  AC_SUBST(SSE2_CFLAGS)
>  AC_SUBST(SSE2_LDFLAGS)
>  AC_SUBST(SSSE3_CFLAGS)
> +AC_SUBST(AVX2_CFLAGS)
> +AC_SUBST(AVX2_LDFLAGS)
>
>  dnl 
> ===
>  dnl Check for VMX/Altivec
> diff --git a/pixman/Makefile.am b/pixman/Makefile.am
> index 581b6f6..7204621 100644
> --- a/pixman/Makefile.am
> +++ b/pixman/Makefile.am
> @@ -64,6 +64,18 @@ libpixman_1_la_LIBADD += libpixman-ssse3.la
>  ASM_CFLAGS_ssse3=$(SSSE3_CFLAGS)
>  endif
>
> +# avx2 code
> +if USE_AVX2
> +noinst_LTLIBRARIES += libpixman-avx2.la
> +libpixman_avx2_la_SOURCES = \
> +   pixman-avx2.c
> +libpixman_avx2_la_CFLAGS = $(AVX2_CFLAGS)
> +libpixman_1_la_LDFLAGS += $(AVX2_LDFLAGS)
> +libpixman_1_la_LIBADD += libpixman-avx2.la
> +
> +ASM_CFLAGS_avx2=$(AVX2_CFLAGS)
> +endif
> +
>  # arm simd code
>  if USE_ARM_SIMD
>  noinst_LTLIBRARIES += libpixman-arm-simd.la
> diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c
> new file mode 100644
> index 000..d860d67
> --- /dev/null
> +++ b/pixman/pixman-avx2.c
> @@ -0,0 +1,32 @@
> +#ifdef HAVE_CONFIG_H
> +#include 
> +#endif
> +
> +#include  /* for AVX2 intrinsics */
> +#include "pixman-private.h"
> +#include "pixman-combine32.h"
> +#include "pixman-inlines.h"
> +
> +static const pixman_fast_path_t avx2_fast_paths[] =
> +{
> +{ PIXMAN_OP_NONE },
> +};
> +
> +static const pixman_iter_info_t avx2_iters[] =

Trailing whitespace

> +{
> +{ PIXMAN_null },
> +};
> +
> +#if defined(__GNUC__) && !defined(__x86_64__) && !defined(__amd64__)
> +__attribute__((__force_align_arg_pointer__))
> +#endif
> +pixman_implementation_t *
> +_pixman_implementation_create_avx2 (pixman_implementation_t *fallback)
> +{
> +pixman_implementation_t *imp = _pixman_implementation_create (fallback, 
> avx2_fast_paths);
> +
> +/* Set up function pointers */
> +imp->iter_info = avx2_iters;
> +
> +return imp;
> +}
> diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
> index 73a5414..b6b15df 100644
> --- a/pixman/pixman-private.h
> +++ b/pixman/pixman-private.h
> @@ -597,6 +597,11 @@ pixman_implementation_t *
>  _pixman_implementation_create_ssse3 (pixman_implementation_t *fallback);
>  #endif
>
> +#ifdef USE_AVX2
> +pixman_implementation_t *
> +_pixman_implementation_create_avx2 (pixman_implementation_t *fallback);
> +#endif
> +
>  #ifdef USE_ARM_SIMD
>  pixman_implementation_t *
>  _pixman_implementation_create_arm_simd (pixman_implementation_t *fallback);
> diff --git a/pixman/pixman-x86.c b/pixman/pixman-x86.c
> index 05297c4..687c83b 100644
> --- a/pixman/pixman-x86.c
> +++ b/pixman/pixman-x86.c

At the top of this file there is a preprocessor check:

#if defined(USE_X86_MMX) || defined (USE_SSE2) || defined

Re: [Pixman] [Patch 1/1] Clang compile failure due to use of __builtin_shuffle

2018-08-14 Thread Matt Turner

On Tue, Aug 7, 2018 at 2:50 AM StormByte  wrote:
>
> While playing with Clang and compiling a Gentoo system with it, I realized 
> that pixman is not compiling because of the use of __builtin_shuffle which 
> according to LLVM mailing list, should not be used directly [1].
>
> As such, I investigated a bit, and made a patch for making it compile 
> compatible with Clang that I attach here in the hope that it is reviewed.
> Thanks,
> David C. Manuelda
> [1]: http://lists.llvm.org/pipermail/cfe-dev/2017-August/055142.html

Thanks. This has already been reported as
https://bugs.gentoo.org/646360 and I committed a patch two months ago
to fix it -- see
https://gitlab.freedesktop.org/pixman/pixman/commit/bd2b49185b28c5024597a5e530af9fc25de3193a

The next version of pixman will include the patch.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Pushing unreviewed patches to the pixman git repository

2018-06-06 Thread Matt Turner

On Tue, Jun 5, 2018 at 6:06 PM, Siarhei Siamashka
 wrote:
> Hello,
>
> I noticed that some people with commit access started pushing patches
> to the pixman git repository without giving the pixman mailing list
> subscribers any reasonable chance to review them:
>
> https://cgit.freedesktop.org/pixman/commit/?id=8b95e0e460baa499e54c19d29bf761d34c25badc
> https://cgit.freedesktop.org/pixman/commit/?id=bd2b49185b28c5024597a5e530af9fc25de3193a
>
> Yes, these fixes were trivial. But still it would be more polite to
> actually post patches to the mailing list, collect some reviews and
> then *wait* at least severaldays before pushing them to the repository
> (unless the issue is really urgent). Not everyone constantly monitors
> the mailing list and is able to provide an instant response.

I hope you don't consider those two patches to be similar cases.

One was committed without going to the mailing list by someone with
one patch in pixman every 5 years.

The other was was sent to the mailing list by a person with plenty of
pixman contributions and reviewed by two people. In Mesa we wait 24
hours, for the reasons you describe. Looks like it was close to 24
hours in this case.

I'm happy to wait more than 24 hours in the future -- that's no
problem. I'm just taking issue with the suggestion that the two cited
examples are somehow the same.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] test: Adjust for clang's removal of __builtin_shuffle

2018-06-04 Thread Matt Turner

On Mon, Jun 4, 2018 at 10:37 AM, Adam Jackson  wrote:
> On Mon, 2018-06-04 at 10:04 -0700, Matt Turner wrote:
>
>> #ifdef HAVE_GCC_VECTOR_EXTENSIONS
>> -const uint8x16 bswap_shufflemask =
>> +# if __has_builtin(__builtin_shufflevector)
>> +randdata.vb =
>> +__builtin_shufflevector (randdata.vb, randdata.vb,
>> +  3,  2,  1,  0,  7,  6 , 5,  4,
>> + 11, 10,  9,  8, 15, 14, 13, 12);
>> +# else
>> +static const uint8x16 bswap_shufflemask =
>   ^^^
>
> Seems superfluous, though I guess it doesn't change semantics. With or
> without that bit:

Oh, I think I added that when I was trying to consolidate the
constants between the two paths. I'll remove that.

> Reviewed-by: Adam Jackson 
>
> I think we're starting to be well overdue for an 0.36 release, but I'd
> like to take the opportunity to suggest moving to fdo's gitlab as we do
> that. I already have a copy imported personally and have CI working:
>
> https://gitlab.freedesktop.org/ajax/pixman/-/jobs/986

Agreed.

I would like to make 0.36 pass the test suite with clang, so if you
have any time or interest I'd appreciate a second set of eyes. I'll
filed https://bugs.freedesktop.org/show_bug.cgi?id=106818 so we can
track it.

I guess it's possible it's a clang bug.

I also need to take some time to look into the Loongson3 patch. If
you're not in a particular hurry, it would be nice to get that in.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] test: Adjust for clang's removal of __builtin_shuffle

2018-06-04 Thread Matt Turner

From: Vladimir Smirnov 

__builtin_shuffle was removed in clang 5.0.

Build log says:
test/utils-prng.c:207:27: error: use of unknown builtin '__builtin_shuffle' 
[-Wimplicit-function-declaration]
randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
  ^
test/utils-prng.c:207:25: error: assigning to 'uint8x16' (vector of 16 
'uint8_t' values) from incompatible type 'int'
randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
^ ~~
2 errors generated

Link to original discussion:
http://lists.llvm.org/pipermail/cfe-dev/2017-August/055140.html

It's possible to build pixman if attached patch is applied. Basically
patch adds check for __builtin_shuffle support and in case there is
none, falls back to clang-specific __builtin_shufflevector that do the
same but have different API.

Bugzilla: https://bugs.gentoo.org/646360
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104886
Tested-by: Philip Chimento 
Reviewed-by: Matt Turner 
---
I turned https://bugs.freedesktop.org/show_bug.cgi?id=104886#c2 into a
Tested-by tag for Philip.

I also reversed the order of the preprocessor conditions in order to
simplify it a bit (the !defined(__clang__) looked like a problem waiting
to happen).

Unfortunately combiner-test, gradient-crash-test, and stress-test fail
when built with clang for unrelated reasons.

 test/utils-prng.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/test/utils-prng.c b/test/utils-prng.c
index c27b5be..0cf53dd 100644
--- a/test/utils-prng.c
+++ b/test/utils-prng.c
@@ -199,12 +199,25 @@ randmemset_internal (prng_t  *prng,
 }
 else
 {
+
+#ifndef __has_builtin
+#define __has_builtin(x) 0
+#endif
+
 #ifdef HAVE_GCC_VECTOR_EXTENSIONS
-const uint8x16 bswap_shufflemask =
+# if __has_builtin(__builtin_shufflevector)
+randdata.vb =
+__builtin_shufflevector (randdata.vb, randdata.vb,
+  3,  2,  1,  0,  7,  6 , 5,  4,
+ 11, 10,  9,  8, 15, 14, 13, 12);
+# else
+static const uint8x16 bswap_shufflemask =
 {
 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
 };
 randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
+# endif
+
 store_rand_128_data (buf, , aligned);
 buf += 16;
 #else
-- 
2.16.1

___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] vmx: Fix vector loads on ppc64le

2018-05-10 Thread Matt Turner

Tested-by: Matt Turner <matts...@gmail.com>
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Pixman not building on MacOS X 10.11

2015-11-18 Thread Matt Turner

On Sun, Oct 11, 2015 at 10:34 AM, Andrea Canciani  wrote:
> On Sun, Oct 11, 2015 at 5:30 AM, Siarhei Siamashka
>  wrote:
>>
>> On Sun, 11 Oct 2015 04:53:08 +0300
>> Siarhei Siamashka  wrote:
>>
>> > On Sat, 10 Oct 2015 16:03:53 -0700
>> > Jeremy Huddleston Sequoia  wrote:
>> >
>> > > > On Oct 10, 2015, at 13:48, Andrea Canciani 
>> > > > wrote:
>> > > > The attached hack gets the code to compile on modern clang, but I
>> > > > believe first of all we should improve the configure.ac detection
>> > > > code
>> > > > so that pixman can actually build both on old and on new clang
>> > > > versions (possibly with mmx disabled, if the asm constraints we need
>> > > > are not implemented).
>> >
>> > This workaround looks reasonable to me. We should probably just drop
>> > the whole "ifdef __OPTIMIZE__" part in
>> >
>> > http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n92
>> >
>> > I don't quite like the fact that this way of returning results from
>> > a macro is a GNU C specific extension. But as you said, the configure
>> > test can be updated to better match the code and also check if the
>> > compiler supports this particular construct.
>> >
>> > Could you please submit the final variant of your patch in a
>> > "git format-patch" format with a commit message and your
>> > Signed-off-by tag?
>>
>> After looking at this issue a bit more, I realized that we are
>> about to add a second layer of workarounds on top of the existing
>> old workarounds :-)
>
>
> The attached patch should fix the issue with only minor changes.
> It keeps the workarounds :( but somewhat it simplifies them :)
> I followed your suggestion of checking block expressions.
> Given that the _mm_shuffle_pi16() function is always used in a "return"
> statement, if needed we could avoid the usage of block expressions by
> defining a macro "_return_mm_shuffle_pi16()" (which would return the result
> of the operation instead of making it available as an expression) both for
> the xmmintrin branch and for the hand-coded one.
>
>> The original problem is that certain compilers (just GCC?) did not
>> support some intrinsics when compiling MMX code (_mm_movemask_pi8,
>> _mm_mulhi_pu16, _mm_shuffle_pi16) and we got the following code:
>>
>> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n66
>>
>> In fact, these instructions were not available as part of the original
>> MMX, but only got introduced later with AMD Extended 3DNow! and Intel
>> SSE1. This is mentioned in the commit messages:
>>
>> http://cgit.freedesktop.org/pixman/commit/?id=84221f4c1687b8ea14e9cbdc78b2ba7258e62c9e
>>
>> http://cgit.freedesktop.org/pixman/commit/?id=14208344964f341a7b4a704b05cf4804c23792e9
>>
>> These extra instructions are unofficially known as MMX2. But GCC does
>> not have a separate option for "-mmmx2". Instead the GCC manual says
>> that these intrinsics are available when either "-msse" or a
>> combination of "-m3dnow -march=athlon" is used:
>>
>> https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions
>>
>>
>> Now I wonder if the comment "We have to compile with -msse to use
>> xmmintrin.h" is still valid. I tried to tweak the following ifdef to
>> use the part of code, which includes  and the it compiled
>> fine for me with CFLAGS="-O2 -m32" using recent versions of GCC and
>> Clang:
>>
>> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n63
>>
>> I believe that this might be somehow related to the new __ALL_ISA__
>> define, which had been mentioned in 2013:
>> https://gcc.gnu.org/ml/gcc-patches/2013-04/txts5M0c0uU9y.txt
>>
>> So what about just dropping this ugly stuff and adding a configure
>> check, which would verify if the MMX code can include ?
>
>
> I would love getting rid of the workarounds, but I'm somewhat worried about
> the possibility of regressions.
> If you believe is a valid option, we might definitely try to pursue it.
>
> What is the best way forward?

I've now reverted my commit and pushed yours.

Thanks.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-11-18 Thread Matt Turner

On Sun, Oct 25, 2015 at 1:13 PM, Matt Turner <matts...@gmail.com> wrote:
> On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <matts...@gmail.com> wrote:
>> We had lots of hacks to handle the inability to include xmmintrin.h
>> without compiling with -msse (lest SSE instructions be used in
>> pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>>
>> Change configure.ac to test that xmmintrin.h can be included and that we
>> can use some intrinsics from it, and remove the work-around code from
>> pixman-mmx.c.
>>
>> Evidently allows gcc 4.9.3 to optimize better as well:
>>
>>textdata bss dec hex filename
>>  657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
>>  656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after
>>
>> Signed-off-by: Matt Turner <matts...@gmail.com>
>> ---
>
> Ugh. This is apparently not sufficient...
>
> https://bugs.gentoo.org/show_bug.cgi?id=564024
>
> GCC allows you to *include* xmmintrin.h without enabling SSE, but it
> still doesn't allow you to use any of the functions:
>
> conftest.c: In function ‘main’:
> /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1:
> error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
> target specific option mismatch
>  _mm_mulhi_pu16 (__m64 __A, __m64 __B)
>  ^
> conftest.c:12:7: error: called from here
>  w = _mm_mulhi_pu16(w, w);
>
> I'm not sure what to do except to revert.
>
> The MMX but no SSE case is important, at least it was in the past
> because of OLPC's XO-1.
>
> Suggestions besides reverting this?

I've now reverted this commit and committed Andrea's fix for clang.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Pixman not building on MacOS X 10.11

2015-11-18 Thread Matt Turner

On Wed, Nov 18, 2015 at 8:35 PM, Siarhei Siamashka
<siarhei.siamas...@gmail.com> wrote:
> On Wed, 18 Nov 2015 14:22:09 -0800
> Matt Turner <matts...@gmail.com> wrote:
>
>> On Sun, Oct 11, 2015 at 10:34 AM, Andrea Canciani <ranm...@gmail.com> wrote:
>> > On Sun, Oct 11, 2015 at 5:30 AM, Siarhei Siamashka
>> > <siarhei.siamas...@gmail.com> wrote:
>> >>
>> >> On Sun, 11 Oct 2015 04:53:08 +0300
>> >> Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote:
>> >>
>> >> > On Sat, 10 Oct 2015 16:03:53 -0700
>> >> > Jeremy Huddleston Sequoia <jerem...@freedesktop.org> wrote:
>> >> >
>> >> > > > On Oct 10, 2015, at 13:48, Andrea Canciani <ranm...@gmail.com>
>> >> > > > wrote:
>> >> > > > The attached hack gets the code to compile on modern clang, but I
>> >> > > > believe first of all we should improve the configure.ac detection
>> >> > > > code
>> >> > > > so that pixman can actually build both on old and on new clang
>> >> > > > versions (possibly with mmx disabled, if the asm constraints we need
>> >> > > > are not implemented).
>> >> >
>> >> > This workaround looks reasonable to me. We should probably just drop
>> >> > the whole "ifdef __OPTIMIZE__" part in
>> >> >
>> >> > http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n92
>> >> >
>> >> > I don't quite like the fact that this way of returning results from
>> >> > a macro is a GNU C specific extension. But as you said, the configure
>> >> > test can be updated to better match the code and also check if the
>> >> > compiler supports this particular construct.
>> >> >
>> >> > Could you please submit the final variant of your patch in a
>> >> > "git format-patch" format with a commit message and your
>> >> > Signed-off-by tag?
>> >>
>> >> After looking at this issue a bit more, I realized that we are
>> >> about to add a second layer of workarounds on top of the existing
>> >> old workarounds :-)
>> >
>> >
>> > The attached patch should fix the issue with only minor changes.
>> > It keeps the workarounds :( but somewhat it simplifies them :)
>> > I followed your suggestion of checking block expressions.
>> > Given that the _mm_shuffle_pi16() function is always used in a "return"
>> > statement, if needed we could avoid the usage of block expressions by
>> > defining a macro "_return_mm_shuffle_pi16()" (which would return the result
>> > of the operation instead of making it available as an expression) both for
>> > the xmmintrin branch and for the hand-coded one.
>> >
>> >> The original problem is that certain compilers (just GCC?) did not
>> >> support some intrinsics when compiling MMX code (_mm_movemask_pi8,
>> >> _mm_mulhi_pu16, _mm_shuffle_pi16) and we got the following code:
>> >>
>> >> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n66
>> >>
>> >> In fact, these instructions were not available as part of the original
>> >> MMX, but only got introduced later with AMD Extended 3DNow! and Intel
>> >> SSE1. This is mentioned in the commit messages:
>> >>
>> >> http://cgit.freedesktop.org/pixman/commit/?id=84221f4c1687b8ea14e9cbdc78b2ba7258e62c9e
>> >>
>> >> http://cgit.freedesktop.org/pixman/commit/?id=14208344964f341a7b4a704b05cf4804c23792e9
>> >>
>> >> These extra instructions are unofficially known as MMX2. But GCC does
>> >> not have a separate option for "-mmmx2". Instead the GCC manual says
>> >> that these intrinsics are available when either "-msse" or a
>> >> combination of "-m3dnow -march=athlon" is used:
>> >>
>> >> https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions
>> >>
>> >>
>> >> Now I wonder if the comment "We have to compile with -msse to use
>> >> xmmintrin.h" is still valid. I tried to tweak the following ifdef to
>> >> use the part of code, which includes  and the it compiled
>> >> fine for me with CFLAGS="-O2 -m32" using recent versions of GCC and
>>

Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-11-03 Thread Matt Turner

On Sun, Oct 25, 2015 at 5:10 PM, Siarhei Siamashka
<siarhei.siamas...@gmail.com> wrote:
> On Sun, 25 Oct 2015 13:13:09 -0700
> Matt Turner <matts...@gmail.com> wrote:
>
>> On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <matts...@gmail.com> wrote:
>> > We had lots of hacks to handle the inability to include xmmintrin.h
>> > without compiling with -msse (lest SSE instructions be used in
>> > pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>> >
>> > Change configure.ac to test that xmmintrin.h can be included and that we
>> > can use some intrinsics from it, and remove the work-around code from
>> > pixman-mmx.c.
>> >
>> > Evidently allows gcc 4.9.3 to optimize better as well:
>> >
>> >textdata bss dec hex filename
>> >  657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
>> >  656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after
>> >
>> > Signed-off-by: Matt Turner <matts...@gmail.com>
>> > ---
>>
>> Ugh. This is apparently not sufficient...
>>
>> https://bugs.gentoo.org/show_bug.cgi?id=564024
>>
>> GCC allows you to *include* xmmintrin.h without enabling SSE, but it
>> still doesn't allow you to use any of the functions:
>>
>> conftest.c: In function ‘main’:
>> /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1:
>> error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
>> target specific option mismatch
>>  _mm_mulhi_pu16 (__m64 __A, __m64 __B)
>>  ^
>> conftest.c:12:7: error: called from here
>>  w = _mm_mulhi_pu16(w, w);
>
> Oh, looks like the restriction used to be relaxed for a while, but then
> GCC 4.9 started to be strict again:
> https://bugzilla.redhat.com/show_bug.cgi?id=1092991#c1
>
>> I'm not sure what to do except to revert.
>
> The real problem is that GCC does not provide a separate option for
> MMX2 (a common subset of 3DNOW and SSE). We usually solve compiler
> problems by reporting bugs to compiler developers. This particular
> case had not been handled according to the usual rule, and now
> we have a nice practical demonstration of the consequences ;-)
>
> BTW, we can still report a bug to GCC. Better late than never.

Yeah, I suppose. The disappointing thing is that Google says an
-m3dnowext flag existed at one point...

>> The MMX but no SSE case is important, at least it was in the past
>> because of OLPC's XO-1.
>
> I'm not sure how many OLPC XO-1 laptops might be still remaining in
> real use in the hands of real people:
> http://www.olpcnews.com/about_olpc_news/goodbye_one_laptop_per_child.html
>
>> Suggestions besides reverting this?
>
> Because OLPC XO-1 is using the AMD Geode processor, we could probably
> treat the code in pixman-mmx.c as 3dnow optimizations on x86 hardware?

The problem is that -m3dnow isn't sufficient. The instructions we want
to use are a subset of SSE that AMD implemented in the Athlon. We need
an -m3dnowext flag.

We can't pass -march=athlon in MMX_CFLAGS either, since the user is
likely to have specified a -march= value of their own.

> Another option is to start using assembly instead of intrinsics.
> Unless a miracle happens and somebody decides to pay for this job,
> we definitely don't have resources to do a high quality assembly
> implementation for MMX/MMX2. But we still can take the assembly
> output of GCC and tweak it a bit. This is ugly and not very
> maintainable though. Been there, done that with ARMv6.

Not interested.

> Or we could simply do nothing and finally retire MMX support on x86.
> If OLPC XO-1 users still do exist, they can always contact us.

I don't care so much about XO-1, but I do want to retain the ability
to test the MMX code on x86. iwMMXt/loongson systems are slow, and
most development can be done on a fast desktop this way.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-11-03 Thread Matt Turner

On Sun, Oct 25, 2015 at 7:12 PM, Søren Sandmann
 wrote:
> On Sun, Oct 25, 2015 at 8:10 PM, Siarhei Siamashka
>  wrote:
>
>>
>> Or we could simply do nothing and finally retire MMX support on x86.
>> If OLPC XO-1 users still do exist, they can always contact us.
>
>
> This is probably the way forward. Except for XO-1, MMX hasn't really done
> anything useful on
> x86 for a long time, but it has been an endless source of compiler headaches
> and maintenance
> issues.

I agree that it has caused a huge number of compiler headaches. I
suppose I'd be okay with disabling it by default, but like I said to
Siarhei I would like to keep it working on x86 because that's a much
easier way to test and prototype code than using slow iwMMXt/loongson
systems. Though, I do fear that if we disable it by default it'll just
get close to zero testing.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-10-25 Thread Matt Turner

On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <matts...@gmail.com> wrote:
> We had lots of hacks to handle the inability to include xmmintrin.h
> without compiling with -msse (lest SSE instructions be used in
> pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>
> Change configure.ac to test that xmmintrin.h can be included and that we
> can use some intrinsics from it, and remove the work-around code from
> pixman-mmx.c.
>
> Evidently allows gcc 4.9.3 to optimize better as well:
>
>textdata bss dec hex filename
>  657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
>  656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after
>
> Signed-off-by: Matt Turner <matts...@gmail.com>
> ---

Ugh. This is apparently not sufficient...

https://bugs.gentoo.org/show_bug.cgi?id=564024

GCC allows you to *include* xmmintrin.h without enabling SSE, but it
still doesn't allow you to use any of the functions:

conftest.c: In function ‘main’:
/usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1:
error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
target specific option mismatch
 _mm_mulhi_pu16 (__m64 __A, __m64 __B)
 ^
conftest.c:12:7: error: called from here
 w = _mm_mulhi_pu16(w, w);

I'm not sure what to do except to revert.

The MMX but no SSE case is important, at least it was in the past
because of OLPC's XO-1.

Suggestions besides reverting this?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-10-11 Thread Matt Turner

We had lots of hacks to handle the inability to include xmmintrin.h
without compiling with -msse (lest SSE instructions be used in
pixman-mmx.c). Some recent version of gcc relaxed this restriction.

Change configure.ac to test that xmmintrin.h can be included and that we
can use some intrinsics from it, and remove the work-around code from
pixman-mmx.c.

Evidently allows gcc to optimize better as well:

   textdata bss dec hex filename
 657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
 656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after
---
 configure.ac| 15 --
 pixman/pixman-mmx.c | 60 +
 2 files changed, 5 insertions(+), 70 deletions(-)

diff --git a/configure.ac b/configure.ac
index 424bfd3..b04cc69 100644
--- a/configure.ac
+++ b/configure.ac
@@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
 #error "Need GCC >= 3.4 for MMX intrinsics"
 #endif
 #include 
+#include 
 int main () {
 __m64 v = _mm_cvtsi32_si64 (1);
 __m64 w;
 
-/* Some versions of clang will choke on K */
-asm ("pshufw %2, %1, %0\n\t"
-: "=y" (w)
-: "y" (v), "K" (5)
-);
-
-/* Some versions of clang will choke on this */
-asm ("pmulhuw %1, %0\n\t"
-   : "+y" (w)
-   : "y" (v)
-);
+/* Test some intrinsics from xmmintrin.h */
+w = _mm_shuffle_pi16(v, 5);
+w = _mm_mulhi_pu16(w, w);
 
 return _mm_cvtsi64_si32 (v);
 }]])], have_mmx_intrinsics=yes)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 05c48a4..6bcdee2 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -39,6 +39,7 @@
 #include 
 #else
 #include 
+#include 
 #endif
 #include "pixman-private.h"
 #include "pixman-combine32.h"
@@ -59,65 +60,6 @@ _mm_empty (void)
 }
 #endif
 
-#ifdef USE_X86_MMX
-# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
-#  include 
-# else
-/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
- * instructions to be generated that we don't want. Just duplicate the
- * functions we want to use.  */
-extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_movemask_pi8 (__m64 __A)
-{
-int ret;
-
-asm ("pmovmskb %1, %0\n\t"
-   : "=r" (ret)
-   : "y" (__A)
-);
-
-return ret;
-}
-
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_mulhi_pu16 (__m64 __A, __m64 __B)
-{
-asm ("pmulhuw %1, %0\n\t"
-   : "+y" (__A)
-   : "y" (__B)
-);
-return __A;
-}
-
-#  ifdef __OPTIMIZE__
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_shuffle_pi16 (__m64 __A, int8_t const __N)
-{
-__m64 ret;
-
-asm ("pshufw %2, %1, %0\n\t"
-   : "=y" (ret)
-   : "y" (__A), "K" (__N)
-);
-
-return ret;
-}
-#  else
-#   define _mm_shuffle_pi16(A, N)  \
-({ \
-   __m64 ret;  \
-   \
-   asm ("pshufw %2, %1, %0\n\t"\
-: "=y" (ret)   \
-: "y" (A), "K" ((const int8_t)N)   \
-   );  \
-   \
-   ret;\
-})
-#  endif
-# endif
-#endif
-
 #ifndef _MSC_VER
 #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
  (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))
-- 
2.4.9

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-10-11 Thread Matt Turner

We had lots of hacks to handle the inability to include xmmintrin.h
without compiling with -msse (lest SSE instructions be used in
pixman-mmx.c). Some recent version of gcc relaxed this restriction.

Change configure.ac to test that xmmintrin.h can be included and that we
can use some intrinsics from it, and remove the work-around code from
pixman-mmx.c.

Evidently allows gcc 4.9.3 to optimize better as well:

   textdata bss dec hex filename
 657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
 656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after

Signed-off-by: Matt Turner <matts...@gmail.com>
---
Looks like _MM_SHUFFLE isn't defined by ARM's mmintrin.h.

 configure.ac| 15 -
 pixman/pixman-mmx.c | 64 -
 2 files changed, 8 insertions(+), 71 deletions(-)

diff --git a/configure.ac b/configure.ac
index 424bfd3..b04cc69 100644
--- a/configure.ac
+++ b/configure.ac
@@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
 #error "Need GCC >= 3.4 for MMX intrinsics"
 #endif
 #include 
+#include 
 int main () {
 __m64 v = _mm_cvtsi32_si64 (1);
 __m64 w;
 
-/* Some versions of clang will choke on K */
-asm ("pshufw %2, %1, %0\n\t"
-: "=y" (w)
-: "y" (v), "K" (5)
-);
-
-/* Some versions of clang will choke on this */
-asm ("pmulhuw %1, %0\n\t"
-   : "+y" (w)
-   : "y" (v)
-);
+/* Test some intrinsics from xmmintrin.h */
+w = _mm_shuffle_pi16(v, 5);
+w = _mm_mulhi_pu16(w, w);
 
 return _mm_cvtsi64_si32 (v);
 }]])], have_mmx_intrinsics=yes)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 05c48a4..88c3a39 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -40,6 +40,9 @@
 #else
 #include 
 #endif
+#ifdef USE_X86_MMX
+#include 
+#endif
 #include "pixman-private.h"
 #include "pixman-combine32.h"
 #include "pixman-inlines.h"
@@ -59,66 +62,7 @@ _mm_empty (void)
 }
 #endif
 
-#ifdef USE_X86_MMX
-# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
-#  include 
-# else
-/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
- * instructions to be generated that we don't want. Just duplicate the
- * functions we want to use.  */
-extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_movemask_pi8 (__m64 __A)
-{
-int ret;
-
-asm ("pmovmskb %1, %0\n\t"
-   : "=r" (ret)
-   : "y" (__A)
-);
-
-return ret;
-}
-
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_mulhi_pu16 (__m64 __A, __m64 __B)
-{
-asm ("pmulhuw %1, %0\n\t"
-   : "+y" (__A)
-   : "y" (__B)
-);
-return __A;
-}
-
-#  ifdef __OPTIMIZE__
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_shuffle_pi16 (__m64 __A, int8_t const __N)
-{
-__m64 ret;
-
-asm ("pshufw %2, %1, %0\n\t"
-   : "=y" (ret)
-   : "y" (__A), "K" (__N)
-);
-
-return ret;
-}
-#  else
-#   define _mm_shuffle_pi16(A, N)  \
-({ \
-   __m64 ret;  \
-   \
-   asm ("pshufw %2, %1, %0\n\t"\
-: "=y" (ret)   \
-: "y" (A), "K" ((const int8_t)N)   \
-   );  \
-   \
-   ret;\
-})
-#  endif
-# endif
-#endif
-
-#ifndef _MSC_VER
+#ifndef _MM_SHUFFLE
 #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
  (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))
 #endif
-- 
2.4.9

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-10-11 Thread Matt Turner

On Sun, Oct 11, 2015 at 8:41 PM, Siarhei Siamashka
<siarhei.siamas...@gmail.com> wrote:
> On Sun, 11 Oct 2015 14:55:28 -0700
> Matt Turner <matts...@gmail.com> wrote:
>
> Hello,
>
> Thanks. The patch looks good. In fact, it also allows the MMX code to
> be compiled with the Intel Compiler now (previously it was disabled by
> the configure check). A few minor things need to be fixed though. See
> the comments below.
>
>> We had lots of hacks to handle the inability to include xmmintrin.h
>> without compiling with -msse (lest SSE instructions be used in
>
> "lest" -> "lets" ?

Nope, I mean "lest" (means "otherwise something bad would happen")

>> pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>>
>> Change configure.ac to test that xmmintrin.h can be included and that we
>> can use some intrinsics from it, and remove the work-around code from
>> pixman-mmx.c.
>>
>> Evidently allows gcc to optimize better as well:
>>
>>text  data bss dec hex filename
>>  657078 30848 680  688606   a81de libpixman-1.so.0.33.3 before
>>  656710 30848 680  688238   a806e libpixman-1.so.0.33.3 after
>
> It is always a good idea to mention the exact version of gcc in the
> commit message. For example, it could help if somebody happens to be
> reading this commit message a few years in the future.

Sure, will do.

> As for being able to optimize better. Yes, the "asm" blocks are
> treated by the compiler as opaque boxes (with just the input/output
> interface specified by constraints). The optimizer has difficulties
> generating efficient code if it has to deal with these bubbles. So
> it is a good idea to use intrinsics instead of single-instruction
> "asm" statements.
>
> Also I'm not completely sure, but now we probably prefer (require?) the
> "Signed-off-by" tags in commit messages.

Will do.

>> ---
>>  configure.ac| 15 --
>>  pixman/pixman-mmx.c | 60 
>> +
>>  2 files changed, 5 insertions(+), 70 deletions(-)
>
> Nice stats :-)
>
>>
>> diff --git a/configure.ac b/configure.ac
>> index 424bfd3..b04cc69 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
>>  #error "Need GCC >= 3.4 for MMX intrinsics"
>>  #endif
>>  #include 
>> +#include 
>
> We still would want to have this under the USE_X86_MMX ifdef check.
> Otherwise crosscompiling for ARM fails:
>
> $ ./configure --host=arm-linux-gnueabihf --disable-libpng --disable-gtk
> $ make
>
> pixman-mmx.c:42:23: fatal error: xmmintrin.h: No such file or directory
>  #include 
>^

Heh, can't believe I forgot about that since I added the iwMMXt support. :)

>>  int main () {
>>  __m64 v = _mm_cvtsi32_si64 (1);
>>  __m64 w;
>>
>> -/* Some versions of clang will choke on K */
>> -asm ("pshufw %2, %1, %0\n\t"
>> -: "=y" (w)
>> -: "y" (v), "K" (5)
>> -);
>> -
>> -/* Some versions of clang will choke on this */
>> -asm ("pmulhuw %1, %0\n\t"
>> - : "+y" (w)
>> - : "y" (v)
>> -);
>> +/* Test some intrinsics from xmmintrin.h */
>> +w = _mm_shuffle_pi16(v, 5);
>> +w = _mm_mulhi_pu16(w, w);
>>
>>  return _mm_cvtsi64_si32 (v);
>>  }]])], have_mmx_intrinsics=yes)
>> diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
>> index 05c48a4..6bcdee2 100644
>> --- a/pixman/pixman-mmx.c
>> +++ b/pixman/pixman-mmx.c
>> @@ -39,6 +39,7 @@
>>  #include 
>>  #else
>>  #include 
>> +#include 
>>  #endif
>>  #include "pixman-private.h"
>>  #include "pixman-combine32.h"
>> @@ -59,65 +60,6 @@ _mm_empty (void)
>>  }
>>  #endif
>>
>> -#ifdef USE_X86_MMX
>> -# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
>> -#  include 
>> -# else
>> -/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
>> - * instructions to be generated that we don't want. Just duplicate the
>> - * functions we want to use.  */
>> -extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
>> __artificial__))
>> -_mm_movemask_pi8 (__m64 __A)
>> -{
>> -int ret;
>> -
>

Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)

2015-08-22 Thread Matt Turner

On Thu, Aug 20, 2015 at 6:58 AM, Pekka Paalanen ppaala...@gmail.com wrote:
 From: Ben Avison bavi...@riscosopen.org

 This is a C fast path, useful for reference or for platforms that don't
 have their own fast path for this operation.

 This new fast path is initially disabled by putting the entries in the
 lookup table after the sentinel. The compiler cannot tell the new code
 is not used, so it cannot eliminate the code. Also the lookup table size
 will include the new fast path. When the follow-up patch then enables
 the new fast path, the binary layout (alignments, size, etc.) will stay
 the same compared to the disabled case.

 Keeping the binary layout identical is important for benchmarking on
 Raspberry Pi 1. The addresses at which functions are loaded will have a
 significant impact on benchmark results, causing unexpected performance
 changes. Keeping all function addresses the same across the patch
 enabling a new fast path improves the reliability of benchmarks.

 Benchmark results are included in the patch enabling this fast path.

 [Pekka: disabled the fast path, commit message]
 Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk

I don't care strongly, but I might just squash 1+2, 3+4 together and
make a mention in the commit message of exactly what the benchmark
numbers are comparing.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] test: Add cover-test

2015-05-26 Thread Matt Turner

On Tue, May 26, 2015 at 3:58 PM, Ben Avison bavi...@riscosopen.org wrote:
 This test aims to verify both numerical correctness and the honouring of
 array bounds for scaled plots (both nearest-neighbour and bilinear) at or
 close to the boundary conditions for applicability of cover type fast paths
 and iter fetch routines.

 It has a secondary purpose: by setting the env var EXACT (to any value) it
 will only test plots that are exactly on the boundary condition. This makes
 it possible to ensure that cover routines are being used to the maximum,
 although this requires the use of a debugger or code instrumentation to
 verify.
 ---
 Note that this must be pushed after Pekka's fence-image patches.

  test/Makefile.sources |1 +
  test/cover-test.c |  376 
 +
  2 files changed, 377 insertions(+), 0 deletions(-)
  create mode 100644 test/cover-test.c

 diff --git a/test/Makefile.sources b/test/Makefile.sources
 index 14a3710..5b901db 100644
 --- a/test/Makefile.sources
 +++ b/test/Makefile.sources
 @@ -26,6 +26,7 @@ TESTPROGRAMS =  \
 glyph-test\
 solid-test\
 stress-test   \
 +   cover-test\

Remember to add cover-test to .gitignore.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Is Pixman being maintained at all?

2015-04-07 Thread Matt Turner

On Thu, Apr 2, 2015 at 2:26 AM, Pekka Paalanen ppaala...@gmail.com wrote:
 On Wed, 1 Apr 2015 18:46:10 -0700
 Matt Turner matts...@gmail.com wrote:

 On Mon, Mar 30, 2015 at 10:58 AM, Bill Spitzak spit...@gmail.com wrote:
  On 03/30/2015 10:25 AM, Matt Turner wrote:
 
  Do you just need someone to push them?
 
  I'm not capable of reviewing these.
 
  Since Søren isn't really maintaining pixman anymore I'm not really
  sure how to proceed.
 
 
  Is this true?

 I don't see anyone but Pekka reviewing patches and there hasn't been a
 release in 15 months, so yeah.

  I think something needs to be done about this as all new work on X and 
  Cairo
  is depending on pixman.

 I mean, sure.

  I have had an outstanding patch set for 8 months now. Søren responded to an
  earlier version and I tried to address it but have not heard anything 
  since.
  This is very frustrating as I would like to work on this but I'm not going
  to do it if it is useless.

 As far as I know, Søren isn't working at Redhat any more, so I don't
 think you can expect him to continue maintaining pixman.

 Ok.

 Søren, Matt, Siarhei,

 how can we get the Pixman maintenance communitized? Maybe a la
 libdrm, because no-one has the resources to become a dedicated
 maintainer?

Seems fine to me, though I don't really feel like a pixman maintainer. :)

 What does it take to get push and release authorization, in the
 political sense that Pixman quality would not degrade and the
 current/old maintainers would approve?
 What kind of review policies should be enforced?

Søren told me back in December on IRC Feel free to do a release.

I'm happy to have people commit to pixman who have a track record of
contributions to other X.Org projects.

 What development guidelines should there be? Should it be strictly no
 new API/ABI nor features, only performance work and new platform
 support like the latest new ARM?

I'm not aware of any backwards-incompatible changes to pixman, at
least in a really long time. Keeping that policy in place seems like a
good idea.

New APIs do happen. I think that's probably fine.

 If there is one person contributing arch or cpu-specific optimizations
 in assembly that no-one is willing to review apart from the scope of
 code changes and style, should we trust that one person and just land
 his work if he shows the performance numbers are good?

I might be a bit biased in my answer, since I have some patches to the
MMX code in my tree that I don't expect anyone to review, but yeah I
think we should mostly trust the author (obviously depends on the
author's credibility).

 I mean, I'm a newbie here. I don't want to hijack this project and push
 it only to my own directions, also because I cannot become a dedicated
 maintainer, nor promise to review anyone else's stuff. But, there are
 patches I'd like to see landed. I could work on them with Ben, but if
 there is no-one upstream to tell us what goes and what doesn't, we
 are left to our own judgement. Would you trust my and Ben's judgement
 so that I could land Ben's patches and make Pixman releases?

I don't think you're hijacking at all. I think this conversation
needed to happen sooner or later, though I do wish Søren or Siarhei
could spend a little time on it.

 You probably don't have a good understanding about how I work and what
 kind of a developer I am, nor have that kind of trust in me. That is
 fine. We need time to build that trust through discussion and patches.
 But it's hard to have a discussion if no-one can reply. I also
 understand that because I will not promise to be a maintainer, there is
 less incentive in educating me. It is quite likely that I hang around
 here for a while and then wander off when my needs are filled.

I haven't worked with you, but I'm familiar with your contributions.
I'd trust you to commit to pixman.

But I don't think I could really educate anyone except in the MMX and SSE2 code.

 The same goes for everyone, I believe.

 What could we do to let Pixman go forward?

 I suppose a project in a similar state would just get forked by some
 new people, who will then drive it with their own goals. Except here
 that doesn't work, because the fork would soon fall into the same state
 as the original project, except the world would just be more
 fragmented. Couldn't we as well just loosen up on the master branch and
 let stuff land whenever someone is active and someone else doesn't see
 anything bad in it? There are always the stable branches, too, for
 those who want to stick to old and well-tested code.

 Yes, the software quality will likely degrade somewhat, at least from
 the old maintainers' perspective. However, the alternative seems to be a
 completely stalled project. Which one is better?

 FWIW, distros (well, Raspbian at least) already maintain their own
 forks, most likely as a single-person project. At upstream we could at
 least aim for a different person to review a change than the one who
 wrote it. For distribution

Re: [Pixman] [PATCH 1/5] armv6: Fix typo in preload macro

2015-04-01 Thread Matt Turner

Pushed.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Is Pixman being maintained at all?

2015-04-01 Thread Matt Turner

On Mon, Mar 30, 2015 at 10:58 AM, Bill Spitzak spit...@gmail.com wrote:
 On 03/30/2015 10:25 AM, Matt Turner wrote:

 Do you just need someone to push them?

 I'm not capable of reviewing these.

 Since Søren isn't really maintaining pixman anymore I'm not really
 sure how to proceed.
 ___
 Pixman mailing list
 Pixman@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/pixman


 Is this true?

I don't see anyone but Pekka reviewing patches and there hasn't been a
release in 15 months, so yeah.

 I think something needs to be done about this as all new work on X and Cairo
 is depending on pixman.

I mean, sure.

 I have had an outstanding patch set for 8 months now. Søren responded to an
 earlier version and I tried to address it but have not heard anything since.
 This is very frustrating as I would like to work on this but I'm not going
 to do it if it is useless.

As far as I know, Søren isn't working at Redhat any more, so I don't
think you can expect him to continue maintaining pixman.

 If nothing is going to change in pixman I think Cairo is going to have to
 fork it and make a local copy. This is going to remove the ability for Cairo
 to use X remote rendering (since X will still be using the old pixman),
 though it is unclear if any serious software is using this mode any more.

Sounds ridiculous.

Get a Cairo developer to review and commit your pixman changes? I don't know.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/3] armv7: Faster fill operations

2015-03-04 Thread Matt Turner

On Wed, Mar 4, 2015 at 5:56 PM, Ben Avison bavi...@riscosopen.org wrote:
 This eliminates a number of branches over blocks of code that are either
 empty or can be trivially combined with a separate code block at the start
 and end of each scanline. This has a surprisingly big effect, at least on
 Cortex-A7, for src_n_8:

 Before  After
 Mean   StdDev   Mean   StdDev  Confidence  Change
 L1  1570.4 133.11639.6 110.7   100.0%  +4.4%
 L2  1042.6 19.9 1086.6 23.4100.0%  +4.2%
 M   1030.8 7.2  1036.8 3.2 100.0%  +0.6%
 HT  287.4  3.5  303.3  2.9 100.0%  +5.5%
 VT  262.0  2.6  263.3  2.6 99.9%   +0.5%
 R   206.5  2.4  209.9  2.4 100.0%  +1.7%
 RT  56.5   1.0  59.2   0.5 100.0%  +4.7%
 ---

What do you use to generate this?

I'd certainly like to use it.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Unable to build master on Raspberry PI

2014-12-03 Thread Matt Turner

On Wed, Dec 3, 2014 at 9:18 AM, Andrea Giammarchi
andrea.giammar...@gmail.com wrote:
 Thank you very much Siarhei, I am still building something huge and had no
 way to double check but at least I can confirm the gcc is 4.9.2.

 I will try to --disable-arm-iwmmxt when it shows arm6l as uname -m and let
 you know if that fixed.

 Do you think it should be enabled in the future or it's needed to let pixman
 properly work?

iwMMXt is a SIMD instruction set that the Raspberry Pi's CPU doesn't
support, so it's not useful for your use case.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 0/2] mmx nearest scaling paths

2014-09-26 Thread Matt Turner

On Tue, Sep 23, 2014 at 12:24 PM, Søren Sandmann
soren.sandm...@gmail.com wrote:

 IIRC, we have already discussed it before. Maybe we should just disable
 MMX support for x86 and use it only for MIPS Loongson and ARM IWMMXT?

I don't really see the benefit. The bugs we've had have all been
trivially fixed.

I'm concerned that if we disable the MMX code on x86 that over time we
might not notice a bug and it'll become harder to debug. But I suppose
you had to disable SSE2 to find those bugs anyway..

 I'd be in favor of that. For a long time the only real use case for MMX/x86
 has been the XO 1 laptops, and I really doubt that they are getting updated
 pixman libraries any more.

 Søren

Cc'ing Daniel Drake, who should know.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH 1/2] mmx: Add nearest over_8888_n_8888

2014-09-05 Thread Matt Turner

lowlevel-blt-bench -n, over__n_, 15 iterations on Loongson 2f:

   Before  After
  Mean StdDev Mean StdDev   Change
L1 9.7   0.01 19.2   0.02   +98.2%
L2 9.6   0.11 19.2   0.16   +99.5%
M  7.3   0.02 12.5   0.01   +72.0%
HT 6.6   0.01 13.4   0.02  +103.2%
VT 6.4   0.01 12.6   0.03   +96.1%
R  6.3   0.01 11.2   0.01   +76.5%
RT 4.4   0.01  8.1   0.03   +82.6%
---
 pixman/pixman-mmx.c | 62 +
 1 file changed, 62 insertions(+)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index f9a92ce..63f4cdf 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -3555,6 +3555,59 @@ mmx_composite_over_reverse_n_ 
(pixman_implementation_t *imp,
 _mm_empty ();
 }
 
+static force_inline void
+scaled_nearest_scanline_mmx__n__OVER (const uint32_t * mask,
+ uint32_t *   dst,
+ const uint32_t * src,
+ int32_t  w,
+ pixman_fixed_t   vx,
+ pixman_fixed_t   unit_x,
+ pixman_fixed_t   src_width_fixed,
+ pixman_bool_tzero_src)
+{
+__m64 mm_mask;
+
+if (zero_src || (*mask  24) == 0)
+   return;
+
+mm_mask = expand_alpha (load (mask));
+
+while (w)
+{
+   uint32_t s = *(src + pixman_fixed_to_int (vx));
+   vx += unit_x;
+   while (vx = 0)
+   vx -= src_width_fixed;
+
+   if (s)
+   {
+   __m64 ms = load (s);
+   __m64 alpha = expand_alpha (ms);
+   __m64 dest  = load (dst);
+
+   store (dst, (in_over (ms, alpha, mm_mask, dest)));
+   }
+
+   dst++;
+   w--;
+}
+
+_mm_empty ();
+}
+
+FAST_NEAREST_MAINLOOP_COMMON (mmx__n__cover_OVER,
+ scaled_nearest_scanline_mmx__n__OVER,
+ uint32_t, uint32_t, uint32_t, COVER, TRUE, TRUE)
+FAST_NEAREST_MAINLOOP_COMMON (mmx__n__pad_OVER,
+ scaled_nearest_scanline_mmx__n__OVER,
+ uint32_t, uint32_t, uint32_t, PAD, TRUE, TRUE)
+FAST_NEAREST_MAINLOOP_COMMON (mmx__n__none_OVER,
+ scaled_nearest_scanline_mmx__n__OVER,
+ uint32_t, uint32_t, uint32_t, NONE, TRUE, TRUE)
+FAST_NEAREST_MAINLOOP_COMMON (mmx__n__normal_OVER,
+ scaled_nearest_scanline_mmx__n__OVER,
+ uint32_t, uint32_t, uint32_t, NORMAL, TRUE, TRUE)
+
 #define BSHIFT ((1  BILINEAR_INTERPOLATION_BITS))
 #define BMSK (BSHIFT - 1)
 
@@ -3995,6 +4048,15 @@ static const pixman_fast_path_t mmx_fast_paths[] =
 PIXMAN_STD_FAST_PATH(IN,   a8,   null, a8,   
mmx_composite_in_8_8  ),
 PIXMAN_STD_FAST_PATH(IN,   solid,a8,   a8,   
mmx_composite_in_n_8_8),
 
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, 
mmx__n_ ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, 
mmx__n_ ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, 
mmx__n_ ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, 
mmx__n_ ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, a8r8g8b8, 
mmx__n_  ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, a8b8g8r8, 
mmx__n_  ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, x8r8g8b8, 
mmx__n_  ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, x8b8g8r8, 
mmx__n_  ),
+
 SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8,  a8r8g8b8, mmx__ 
),
 SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8,  x8r8g8b8, mmx__ 
),
 SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8,  x8r8g8b8, mmx__ 
),
-- 
1.8.5.5

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH 0/2] mmx nearest scaling paths

2014-09-05 Thread Matt Turner

Here are a couple of nearest scaling MMX paths I wrote a long time ago
for Loongson and other things using the MMX code.

I've got a few more patches for the MMX code that I'll send out as I
benchmark them.

I don't really expect any reviews, so barring objections I'll plan to
commit them in a few days.

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] Test suite failures on 32-bit x86?

2013-11-12 Thread Matt Turner

Building the 0.32.2 release and from git with

CC=gcc -m32 ./autogen.sh  make check

PASS: prng-test
PASS: a1-trap-test
PASS: region-translate-test
PASS: pdf-op-test
PASS: region-test
PASS: fetch-test
../test-driver: line 95:  3312 Segmentation fault  $@  $log_file 21
FAIL: rotate-test
PASS: oob-test
PASS: infinite-loop
PASS: combiner-test
PASS: pixel-test
PASS: trap-crasher
PASS: alpha-loop
PASS: thread-test
PASS: scaling-helpers-test
PASS: scaling-crash-test
../test-driver: line 95:  3571 Segmentation fault  $@  $log_file 21
FAIL: matrix-test
PASS: gradient-crash-test
../test-driver: line 95:  3637 Segmentation fault  $@  $log_file 21
FAIL: blitters-test
../test-driver: line 95:  3659 Segmentation fault  $@  $log_file 21
FAIL: glyph-test
../test-driver: line 95:  3681 Segmentation fault  $@  $log_file 21
FAIL: scaling-test
../test-driver: line 95:  3703 Segmentation fault  $@  $log_file 21
FAIL: affine-test
PASS: alphamap
PASS: composite-traps-test
PASS: region-contains-test
PASS: stress-test
PASS: composite

Manually running the tests shows that they all crash in
prng_rand_128_r (utils-prng.h:138):

 uint32x4 e = x-a - ((x-b  27) + (x-b  (32 - 27)));

which is code inside an #ifdef GCC_VECTOR_EXTENSIONS_SUPPORTED block.

I realize this may be a gcc bug, so I tested with 4.8.1 and 4.7.2 and
got the same results. Testing with 4.6.3 leads to only a single
failure, in matrix-test (with a different backtrace, so probably
different).

Do we need some kind of configure check to make sure that our use of
gcc's vector extensions is actually going to work?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Latest GIT source for 'pixman-sse2.c'

2013-10-06 Thread Matt Turner

On Sun, Oct 6, 2013 at 1:50 AM, John Emmas john...@tiscali.co.uk wrote:
 On 05/10/2013 19:32, John Emmas wrote:

 On 5 Oct 2013, at 19:00, Siarhei Siamashka wrote:

 Andrea Canciani has already investigated the problem and submitted the
 fixes here:


 http://lists.freedesktop.org/archives/pixman/2013-September/002954.html

 Many thanks for the super fast response guys.  I'm at a different PC now
 but I'll apply that patch tomorrow.


 I applied that patch this morning and sure enough, it does fix the problem.
 Thanks to Andrea for noticing it.

 BTW...  while reading the patch I noticed that, quite by accident, the
 source file 'pixman-mmx.c' had somehow gotten excluded from my MSVC build
 project, so I took the opportunity to add it.  Although the build still
 succeeds, I see several warnings of this form while building
 'pixman-mmx.c':-

   pixman-mmx.c(586) : warning C4799: function 'whatever' has no EMMS
 instruction

 I don't know if that means anything bad but I thought it wouldn't do any
 harm flag it up.  Here's a list of the affected functions:-

   function 'expand_4xpacked565' has no EMMS instruction
   function 'is_opaque' has no EMMS instruction
   function 'is_equal' has no EMMS instruction
   function 'to_uint64' has no EMMS instruction
   function 'expand_4x565' has no EMMS instruction
   function 'is_zero' has no EMMS instruction
   function 'store' has no EMMS instruction

All of these are all inline functions, so _mm_empty() isn't required.

   function 'fast_composite_scaled_bilinear_mmx__8__none_OVER'
 has no EMMS instruction
   function 'fast_composite_scaled_bilinear_mmx__8__pad_OVER' has
 no EMMS instruction

This has _mm_empty().
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 1/2] Add empty SSSE3 implementation

2013-09-05 Thread Matt Turner

On Thu, Aug 29, 2013 at 10:02 AM, Søren Sandmann Pedersen
sandm...@cs.au.dk wrote:
 This commit adds a new, empty SSSE3 implementation and the associated
 build system support.

 configure.ac:   detect whether the compiler understands SSSE3
 intrinsics and set up the required CFLAGS

 Makefile.am:Add libpixman-ssse3.la

 pixman-x86.c:   Add X86_SSSE3 feature flag and detect it in
 detect_cpu_features().

 pixman-ssse3.c: New file with an empty SSSE3 implementation
 ---
  configure.ac|   46 +++
  pixman/Makefile.am  |   12 +++
  pixman/pixman-private.h |5 
  pixman/pixman-ssse3.c   |   50 
 +++
  pixman/pixman-x86.c |   15 -
  5 files changed, 126 insertions(+), 2 deletions(-)
  create mode 100644 pixman/pixman-ssse3.c

 diff --git a/configure.ac b/configure.ac
 index 5b9512c..ff97bfb 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -437,6 +437,50 @@ fi
  AM_CONDITIONAL(USE_SSE2, test $have_sse2_intrinsics = yes)

  dnl 
 ===
 +dnl Check for SSSE3
 +
 +if test x$SSSE3_CFLAGS = x ; then
 +SSSE3_CFLAGS=-mssse3 -Winline
 +fi
 +
 +have_ssse3_intrinsics=no
 +AC_MSG_CHECKING(whether to use SSSE3 intrinsics)
 +xserver_save_CFLAGS=$CFLAGS
 +CFLAGS=$SSSE3_CFLAGS $CFLAGS
 +
 +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
 +#include mmintrin.h
 +#include xmmintrin.h
 +#include emmintrin.h
 +#include tmmintrin.h
 +int main () {
 +__m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c;
 +c = _mm_maddubs_epi16 (a, b);
 +return 0;
 +}]])], have_ssse3_intrinsics=yes)
 +CFLAGS=$xserver_save_CFLAGS
 +
 +AC_ARG_ENABLE(ssse3,
 +   [AC_HELP_STRING([--disable-ssse3],
 +   [disable SSSE3 fast paths])],
 +   [enable_ssse3=$enableval], [enable_ssse3=auto])
 +
 +if test $enable_ssse3 = no ; then
 +   have_ssse3_intrinsics=disabled
 +fi
 +
 +if test $have_ssse3_intrinsics = yes ; then
 +   AC_DEFINE(USE_SSSE3, 1, [use SSSE3 compiler intrinsics])
 +fi
 +
 +AC_MSG_RESULT($have_ssse3_intrinsics)
 +if test $enable_ssse3 = yes  test $have_ssse3_intrinsics = no ; then
 +   AC_MSG_ERROR([SSSE3 intrinsics not detected])
 +fi
 +
 +AM_CONDITIONAL(USE_SSSE3, test $have_ssse3_intrinsics = yes)
 +
 +dnl 
 ===
  dnl Other special flags needed when building code using MMX or SSE 
 instructions
  case $host_os in
 solaris*)
 @@ -471,6 +515,8 @@ AC_SUBST(MMX_CFLAGS)
  AC_SUBST(MMX_LDFLAGS)
  AC_SUBST(SSE2_CFLAGS)
  AC_SUBST(SSE2_LDFLAGS)
 +AC_SUBST(SSSE3_CFLAGS)
 +AC_SUBST(SSSE3_LDFLAGS)

No need for SSSE3_LDFLAGS. Remove it?

  dnl 
 ===
  dnl Check for VMX/Altivec
 diff --git a/pixman/Makefile.am b/pixman/Makefile.am
 index b9ea754..b376d9a 100644
 --- a/pixman/Makefile.am
 +++ b/pixman/Makefile.am
 @@ -52,6 +52,18 @@ libpixman_1_la_LIBADD += libpixman-sse2.la
  ASM_CFLAGS_sse2=$(SSE2_CFLAGS)
  endif

 +# ssse3 code
 +if USE_SSSE3
 +noinst_LTLIBRARIES += libpixman-ssse3.la
 +libpixman_ssse3_la_SOURCES = \
 +   pixman-ssse3.c
 +libpixman_ssse3_la_CFLAGS = $(SSSE3_CFLAGS)
 +libpixman_1_la_LDFLAGS += $(SSSE3_LDFLAGS)
 +libpixman_1_la_LIBADD += libpixman-ssse3.la
 +
 +ASM_CFLAGS_ssse3=$(SSSE3_CFLAGS)
 +endif
 +
  # arm simd code
  if USE_ARM_SIMD
  noinst_LTLIBRARIES += libpixman-arm-simd.la
 diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
 index 0afabad..732f3d1 100644
 --- a/pixman/pixman-private.h
 +++ b/pixman/pixman-private.h
 @@ -593,6 +593,11 @@ pixman_implementation_t *
  _pixman_implementation_create_sse2 (pixman_implementation_t *fallback);
  #endif

 +#ifdef USE_SSSE3
 +pixman_implementation_t *
 +_pixman_implementation_create_ssse3 (pixman_implementation_t *fallback);
 +#endif
 +
  #ifdef USE_ARM_SIMD
  pixman_implementation_t *
  _pixman_implementation_create_arm_simd (pixman_implementation_t *fallback);
 diff --git a/pixman/pixman-ssse3.c b/pixman/pixman-ssse3.c
 new file mode 100644
 index 000..19d71e7
 --- /dev/null
 +++ b/pixman/pixman-ssse3.c
 @@ -0,0 +1,50 @@
 +/*
 + * Copyright © 2013 Soren Sandmann Pedersen
 + * Copyright © 2013 Red Hat, Inc.
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS

[Pixman] [PATCH] mmx: Document implementation(s) of pix_multiply().

2013-05-15 Thread Matt Turner

---
I look at that function and can never remember what it does or how it
manages to do it.

 pixman/pixman-mmx.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 14790c0..746ecd6 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -301,6 +301,29 @@ negate (__m64 mask)
 return _mm_xor_si64 (mask, MC (4x00ff));
 }
 
+/* Computes the product of two unsigned fixed-point 8-bit values from 0 to 1
+ * and maps its result to the same range.
+ *
+ * Jim Blinn gives multiple ways to compute this in Jim Blinn's Corner:
+ * Notation, Notation, Notation, the first of which is
+ *
+ *   prod(a, b) = (a * b + 128) / 255.
+ *
+ * By approximating the division by 255 as 257/65536 it can be replaced by a
+ * multiply and a right shift. This is the implementation that we use in
+ * pix_multiply(), but we _mm_mulhi_pu16() by 257 (part of SSE1 or Extended
+ * 3DNow!, and unavailable at the time of the book's publication) to perform
+ * the multiplication and right shift in a single operation.
+ *
+ *   prod(a, b) = ((a * b + 128) * 257)  16.
+ *
+ * A third way (how pix_multiply() was implemented prior to 14208344) exists
+ * also that performs the multiplication by 257 with adds and shifts.
+ *
+ * Where temp = a * b + 128
+ *
+ *   prod(a, b) = (temp + (temp  8))  8.
+ */
 static force_inline __m64
 pix_multiply (__m64 a, __m64 b)
 {
-- 
1.8.1.5

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] Use AC_LINK_IFELSE to check if the Loongson MMI code can link

2013-05-15 Thread Matt Turner

From: Markos Chandras markos.chand...@imgtec.com

The Loongson code is compiled with -march=loongson2f to enable the MMI
instructions, but binutils refuses to link object code compiled with
different -march settings, leading to link failures later in the
compile. This avoids that problem by checking if we can link code
compiled for Loongson.

Signed-off-by: Markos Chandras markos.chand...@imgtec.com
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index c43a0d2..221179f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -279,7 +279,7 @@ AC_MSG_CHECKING(whether to use Loongson MMI assembler)
 
 xserver_save_CFLAGS=$CFLAGS
 CFLAGS= $LS_CFLAGS $CFLAGS -I$srcdir
-AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
+AC_LINK_IFELSE([AC_LANG_SOURCE([[
 #ifndef __mips_loongson_vector_rev
 #error Loongson Multimedia Instructions are only available on Loongson
 #endif
-- 
1.8.1.5

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] As per : Please report to pixman@lists.freedesktop.org

2013-04-09 Thread Matt Turner

On Tue, Apr 9, 2013 at 2:39 PM, David Lisle da...@lisle.ca wrote:
 Thanks for responding, the problem remains a mystery bu the overall project
 now is operational. I appreciate that you took time.

I really meant that there certainly must have been more error output
that wasn't in your email. This would lead to the actual problem.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] As per : Please report to pixman@lists.freedesktop.org

2013-04-08 Thread Matt Turner

On Mon, Apr 8, 2013 at 11:13 AM, David Lisle da...@lisle.ca wrote:
 ===
 make[2]: *** [check-TESTS] Error 1
 make[2]: Leaving directory `/usr/src/pixman-0.28.2/test'
 make[1]: *** [check-am] Error 2
 make[1]: Leaving directory `/usr/src/pixman-0.28.2/test'
 make: *** [check-recursive] Error 1
 ==

 The test failed.

Which test?

 I am using Slackware 2.6.37.6-smp
 KDE SC Version 4.5.5(KDE 4.5.5)

 Compiles as root, added other programs that were dependencies i.e. wv-1.2.4
 prior to configuration and make. Make gave no error messages or warnings.

Seems doubtful.

 This program did not correctly pass the tests, therefore installation is
 held in abeyance until it does.

 There is insufficient information for me to solve this problem.

Us too.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/4] Added fast path for pad type repeats

2013-02-05 Thread Matt Turner

On Tue, Feb 5, 2013 at 4:39 PM, Ben Avison bavi...@riscosopen.org wrote:
 diff --git a/test/Makefile.sources b/test/Makefile.sources
 index e323a8e..bcbca37 100644
 --- a/test/Makefile.sources
 +++ b/test/Makefile.sources
 @@ -1,6 +1,7 @@
  # Tests (sorted by expected completion time)
  TESTPROGRAMS = \
 prng-test   \
 +   repeat-test \

Update .gitignore for the new test.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] 0.29.2

2013-01-27 Thread Matt Turner

On Sun, Jan 27, 2013 at 11:43 AM, Siarhei Siamashka
siarhei.siamas...@gmail.com wrote:
 Still, I'm not very happy about the code duplication. We already have
 similar iterators (fetch only, no writeback) in pixman-mmx.c:

 
 http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.28.2#n3904

 Ideally, a lot of this code can be reused in different backends. The
 only unique parts are just the fetch/store functions themselves.

I'm not sure I understand totally. Is the suggestion adding writeback
iterators, thereby allowing the removal of src_x888_0565?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a

2013-01-26 Thread Matt Turner

Some preemptive explanations:

On Sat, Jan 26, 2013 at 6:54 PM, Matt Turner matts...@gmail.com wrote:
 diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c
 index 3048813..77bef5c 100644
 --- a/pixman/pixman-mips.c
 +++ b/pixman/pixman-mips.c
 @@ -27,6 +27,10 @@

  #if defined(USE_MIPS_DSPR2) || defined(USE_LOONGSON_MMI)

 +#ifdef DLOPEN_LOONGSON_MMI
 +#include dlfcn.h
 +#endif
 +
  #include string.h
  #include stdlib.h

 @@ -69,10 +73,64 @@ pixman_implementation_t *
  _pixman_mips_get_implementations (pixman_implementation_t *imp)
  {
  #ifdef USE_LOONGSON_MMI
 +void *mmi_handle = NULL;

mmi_handle is outside of DLOPEN_LOONGSON_MMI so that I don't have to
do funny things to the if-statements below. In the !dlopen case, I
expect gcc to recognize that it's always NULL and optimize it
completely out.

 +#ifdef DLOPEN_LOONGSON_MMI
 +pixman_implementation_t *(*_pixman_implementation_create_mmx) 
 (pixman_implementation_t *);
 +#endif
  /* I really don't know if some Loongson CPUs don't have MMI. */
 -if (!_pixman_disabled (loongson-mmi)  have_feature (Loongson))
 +#ifdef HAVE_LOONGSON2E_MMI
 +if (!mmi_handle  !_pixman_disabled (loongson-mmi)
 +have_feature (Loongson)  have_feature (-2e))
 +{
 +#ifdef DLOPEN_LOONGSON_MMI
 +   mmi_handle = dlopen(libpixman-1-loongson2e-mmi.so, RTLD_LAZY | 
 RTLD_LOCAL);
 +#else
 +   imp = _pixman_implementation_create_mmx (imp);
 +#endif
 +}
 +#endif
 +#ifdef HAVE_LOONGSON2F_MMI
 +if (!mmi_handle  !_pixman_disabled (loongson-mmi)
 +have_feature (Loongson)  have_feature (-2f))
 +{
 +#ifdef DLOPEN_LOONGSON_MMI
 +   mmi_handle = dlopen(libpixman-1-loongson2f-mmi.so, RTLD_LAZY | 
 RTLD_LOCAL);
 +#else
 +   imp = _pixman_implementation_create_mmx (imp);
 +#endif
 +}
 +#endif
 +#ifdef HAVE_LOONGSON3A_MMI
 +if (!mmi_handle  !_pixman_disabled (loongson-mmi)
 +have_feature (Loongson-3A))
 +{
 +#ifdef DLOPEN_LOONGSON_MMI
 +   mmi_handle = dlopen(libpixman-1-loongson3a-mmi.so, RTLD_LAZY | 
 RTLD_LOCAL);
 +#else
 imp = _pixman_implementation_create_mmx (imp);
  #endif
 +}
 +#endif
 +
 +#ifdef DLOPEN_LOONGSON_MMI
 +if (mmi_handle)
 +{
 +   _pixman_implementation_create_mmx = dlsym(mmi_handle, 
 _pixman_implementation_create_mmx);
 +   if (_pixman_implementation_create_mmx)
 +   {
 +   imp = _pixman_implementation_create_mmx (imp);
 +   }
 +   else
 +   {
 +   puts(dlerror());
 +   }
 +}
 +else
 +{
 +   puts(dlerror());
 +}
 +#endif
 +#endif

I don't ever dlclose() the handle. I expect that it will be live for
the rest of process execution. I think there are other cases of
leaks like this in pixman already.

  #ifdef USE_MIPS_DSPR2
  if (!_pixman_disabled (mips-dspr2))
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] sse2: Implement simple bilinear scaling for x8r8g8b8 to a8r8g8b8

2013-01-23 Thread Matt Turner

On Wed, Jan 23, 2013 at 6:37 AM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 Improves firefon-tron on a IVB i7-3720qm: 68.6s to 45.2s.

 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  pixman/pixman-sse2.c |   63 
 ++
  1 file changed, 63 insertions(+)

 diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
 index fc873cc..bc3e2f1 100644
 --- a/pixman/pixman-sse2.c
 +++ b/pixman/pixman-sse2.c
 @@ -5679,6 +5679,67 @@ FAST_BILINEAR_MAINLOOP_COMMON 
 (sse2___normal_SRC,
NORMAL, FLAG_NONE)

  static force_inline void
 +scaled_bilinear_scanline_sse2_0888__SRC (uint32_t *   dst,

Maybe some funny whitespace before dst? Or maybe just a spaces vs tabs issue.

Anyway, Reviewed-by: Matt Turner matts...@gmail.com

 +const uint32_t * mask,
 +const uint32_t * src_top,
 +const uint32_t * src_bottom,
 +int32_t  w,
 +int  wt,
 +int  wb,
 +pixman_fixed_t   vx,
 +pixman_fixed_t   unit_x,
 +pixman_fixed_t   max_vx,
 +pixman_bool_tzero_src)
 +{
 +BILINEAR_DECLARE_VARIABLES;
 +uint32_t pix1, pix2, pix3, pix4;
 +
 +while ((w -= 4) = 0)
 +{
 +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix1);
 +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix2);
 +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix3);
 +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix4);
 +   *dst++ = pix1 | 0xff00;
 +   *dst++ = pix2 | 0xff00;
 +   *dst++ = pix3 | 0xff00;
 +   *dst++ = pix4 | 0xff00;
 +}
 +
 +if (w  2)
 +{
 +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix1);
 +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix2);
 +   *dst++ = pix1 | 0xff00;
 +   *dst++ = pix2 | 0xff00;
 +}
 +
 +if (w  1)
 +{
 +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix1);
 +   *dst = pix1 | 0xff00;
 +}
 +
 +}
 +
 +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__cover_SRC,
 +  scaled_bilinear_scanline_sse2_0888__SRC,
 +  uint32_t, uint32_t, uint32_t,
 +  COVER, FLAG_NONE)
 +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__pad_SRC,
 +  scaled_bilinear_scanline_sse2_0888__SRC,
 +  uint32_t, uint32_t, uint32_t,
 +  PAD, FLAG_NONE)
 +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__none_SRC,
 +  scaled_bilinear_scanline_sse2_0888__SRC,
 +  uint32_t, uint32_t, uint32_t,
 +  NONE, FLAG_NONE)
 +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__normal_SRC,
 +  scaled_bilinear_scanline_sse2_0888__SRC,
 +  uint32_t, uint32_t, uint32_t,
 +  NORMAL, FLAG_NONE)
 +
 +static force_inline void
  scaled_bilinear_scanline_sse2___OVER (uint32_t *   dst,
   const uint32_t * mask,
   const uint32_t * src_top,
 @@ -6185,6 +6246,8 @@ static const pixman_fast_path_t sse2_fast_paths[] =
  SIMPLE_BILINEAR_FAST_PATH (SRC, a8b8g8r8, a8b8g8r8, sse2__),
  SIMPLE_BILINEAR_FAST_PATH (SRC, a8b8g8r8, x8b8g8r8, sse2__),
  SIMPLE_BILINEAR_FAST_PATH (SRC, x8b8g8r8, x8b8g8r8, sse2__),
 +SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8, a8r8g8b8, sse2_0888_),
 +SIMPLE_BILINEAR_FAST_PATH (SRC, x8b8g8r8, a8b8g8r8, sse2_0888_),

  SIMPLE_BILINEAR_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, sse2__),
  SIMPLE_BILINEAR_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, sse2__),
 --
 1.7.10.4

 ___
 Pixman mailing list
 Pixman@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/pixman
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] Add new demos and tests to .gitignore

2013-01-18 Thread Matt Turner

---
 .gitignore | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.gitignore b/.gitignore
index a4d9f99..dcb3f8e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -37,6 +37,7 @@ demos/quad2quad
 demos/radial-test
 demos/screen-test
 demos/srgb-test
+demos/srgb-trap-test
 demos/trap-test
 demos/tri-test
 pixman/pixman-combine32.c
@@ -61,6 +62,7 @@ test/fetch-test
 test/glyph-test
 test/gradient-crash-test
 test/gradient-test
+test/infinite-loop
 test/lowlevel-blt-bench
 test/oob-test
 test/pdf-op-test
@@ -68,6 +70,7 @@ test/region-contains-test
 test/region-test
 test/region-translate
 test/region-translate-test
+test/rotate-test
 test/scaling-crash-test
 test/scaling-helpers-test
 test/scaling-test
-- 
1.7.12.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] Convert INCLUDES to AM_CPPFLAGS

2013-01-18 Thread Matt Turner

INCLUDES has been deprecated starting with automake 1.13. Convert all
occurrences with the recommended AM_CPPFLAGS replacement.
---
 demos/Makefile.am  | 2 +-
 pixman/Makefile.am | 2 +-
 test/Makefile.am   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/demos/Makefile.am b/demos/Makefile.am
index f324f5f..fca2710 100644
--- a/demos/Makefile.am
+++ b/demos/Makefile.am
@@ -4,7 +4,7 @@ AM_CFLAGS = $(OPENMP_CFLAGS)
 AM_LDFLAGS = $(OPENMP_CFLAGS)
 
 LDADD = $(top_builddir)/pixman/libpixman-1.la -lm $(GTK_LIBS) $(PNG_LIBS)
-INCLUDES = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(GTK_CFLAGS) 
$(PNG_CFLAGS)
+AM_CPPFLAGS = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(GTK_CFLAGS) 
$(PNG_CFLAGS)
 
 GTK_UTILS = gtk-utils.c gtk-utils.h ../test/utils.c ../test/utils.h
 
diff --git a/pixman/Makefile.am b/pixman/Makefile.am
index 270d65e..d4b7bb3 100644
--- a/pixman/Makefile.am
+++ b/pixman/Makefile.am
@@ -91,7 +91,7 @@ noinst_LTLIBRARIES += libpixman-iwmmxt.la
 libpixman_1_la_LIBADD += libpixman-iwmmxt.la
 
 libpixman_iwmmxt_la-pixman-mmx.lo: pixman-mmx.c
-   $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) 
$(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) 
$(AM_CPPFLAGS) $(CPPFLAGS) $(CFLAGS) $(IWMMXT_CFLAGS) -MT 
libpixman_iwmmxt_la-pixman-mmx.lo -MD -MP -MF 
$(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo -c -o 
libpixman_iwmmxt_la-pixman-mmx.lo `test -f 'pixman-mmx.c' || echo 
'$(srcdir)/'`pixman-mmx.c
+   $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) 
$(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(AM_CPPFLAGS) 
$(AM_CPPFLAGS) $(CPPFLAGS) $(CFLAGS) $(IWMMXT_CFLAGS) -MT 
libpixman_iwmmxt_la-pixman-mmx.lo -MD -MP -MF 
$(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo -c -o 
libpixman_iwmmxt_la-pixman-mmx.lo `test -f 'pixman-mmx.c' || echo 
'$(srcdir)/'`pixman-mmx.c
$(AM_V_at)$(am__mv) $(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo 
$(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Plo
 
 libpixman_iwmmxt_la_DEPENDENCIES = $(am__DEPENDENCIES_1)
diff --git a/test/Makefile.am b/test/Makefile.am
index eeb3679..ca87f4e 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -3,7 +3,7 @@ include $(top_srcdir)/test/Makefile.sources
 AM_CFLAGS = $(OPENMP_CFLAGS)
 AM_LDFLAGS = $(OPENMP_CFLAGS) $(TESTPROGS_EXTRA_LDFLAGS)
 LDADD = libutils.la $(top_builddir)/pixman/libpixman-1.la -lm  $(PNG_LIBS)
-INCLUDES = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(PNG_CFLAGS)
+AM_CPPFLAGS = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(PNG_CFLAGS)
 
 libutils_la_SOURCES = $(libutils_sources) $(libutils_headers)
 
-- 
1.7.12.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] 0.29.2

2013-01-18 Thread Matt Turner

On Fri, Jan 18, 2013 at 4:15 PM, Søren Sandmann sandm...@cs.au.dk wrote:
 Hi,

 It's about time to get a 0.29.2 development snapshot out, but there are
 some outstanding patches

I'd like to get my triple build loongson patch in, but haven't gotten
any testers yet. I'll set up a chroot this weekend to test it.

Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a

2013-01-17 Thread Matt Turner

On Sun, Jan 6, 2013 at 7:46 PM, Cyril Brulebois k...@debian.org wrote:
 Hello Matt,

 Matt Turner matts...@gmail.com (06/01/2013):
 On Sat, Sep 15, 2012 at 11:59 PM, Matt Turner matts...@gmail.com wrote:
  pixman/Makefile.am contains a hack that allows pixman-mmx.c to
  be compiled with different overriding CFLAGS, since automake
  seriously doesn't have a way to do this. Seriously stupid.
 
  It works by defining a new rule and recursively calling make
  with modified CFLAGS set.
 
  Note the difference between the USE_LOONGSON* and HAVE_LOONGSON*
  preprocessor macros.
 
  Cc: Cyril Brulebois k...@debian.org
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451
  ---

 Cyril,

 I've updated the patch so that it builds .so files for each
 architecture against which pixman links and attached it to the bug
 report. Please give it a test. I cannot test it, as my system is
 compiled with -march=loongson2f and therefore I cannot even link code
 compiled with -march=loongson2e with my C library.

 thanks; unfortunately I'm busy working on the Debian Installer right
 now and pixman is a bit further down my todo list. Adding debian-mips@
 to Cc, hoping somebody there will be able to perform some tests/share
 some insight.

 Mraw,
 KiBi.

Any testers, debian-mips@?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] sse2: Add fast paths for bilinear source with a solid mask

2013-01-08 Thread Matt Turner

On Tue, Jan 8, 2013 at 12:55 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 Based on the existing sse2__n_ nearest scaling routines.

 fishbowl on an i5-2500: 60.9s - 56.9s

 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---

Looks good to me. Reviewed-by: Matt Turner matts...@gmail.com
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a

2013-01-06 Thread Matt Turner

On Sat, Sep 15, 2012 at 11:59 PM, Matt Turner matts...@gmail.com wrote:
 pixman/Makefile.am contains a hack that allows pixman-mmx.c to
 be compiled with different overriding CFLAGS, since automake
 seriously doesn't have a way to do this. Seriously stupid.

 It works by defining a new rule and recursively calling make
 with modified CFLAGS set.

 Note the difference between the USE_LOONGSON* and HAVE_LOONGSON*
 preprocessor macros.

 Cc: Cyril Brulebois k...@debian.org
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451
 ---

Cyril,

I've updated the patch so that it builds .so files for each
architecture against which pixman links and attached it to the bug
report. Please give it a test. I cannot test it, as my system is
compiled with -march=loongson2f and therefore I cannot even link code
compiled with -march=loongson2e with my C library.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Fix build with automake-1.13

2013-01-03 Thread Matt Turner

On Wed, Jan 2, 2013 at 8:38 PM, Marko Lindqvist cazf...@gmail.com wrote:
 Automake-1.13 has removed long obsolete AM_CONFIG_HEADER macro (
 http://lists.gnu.org/archive/html/automake/2012-12/msg00038.html )
 and autoreconf errors out upon seeing it.

 Attached patch replaces obsolete AM_CONFIG_HEADER with now proper
 AC_CONFIG_HEADERS.

 I'm not subscribed to the mailing list.

Thanks, I tried to apply this, but git won't let me push... will try
to get this worked out.

In the future, please use git format-patch and git send-email. To
apply your patch, I had to

patch -p1  ...
git commit --author=Marko Lindqvist cazf...@gmail.com -a
write a commit title and summary message

It's a lot nicer to just be able to type git am :)

Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888

2013-01-02 Thread Matt Turner

On Wed, Jan 2, 2013 at 3:01 AM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 This path is being exercised by inplace compositing of trapezoids, for
 instance as used in the firefox-asteroids cairo-trace.

cairo-perf-trace numbers from firefox-asteroids would be cool to have
in the commit message.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888

2013-01-02 Thread Matt Turner

On Wed, Jan 2, 2013 at 3:01 AM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 This path is being exercised by inplace compositing of trapezoids, for
 instance as used in the firefox-asteroids cairo-trace.

 core2 @ 2.66GHz,

 reference memcpy speed = 4898.2MB/s (1224.6MP/s for 32bpp fills)

 before: add_n_ = L1:   4.36  L2:   4.27  M:  1.61 (  0.13%)  HT:
 1.65  VT:  1.63  R:  1.63  RT:  1.59 (  21Kops/s)

 after:  add_n_ = L1:2969.09  L2:3926.11  M:603.30 ( 49.27%)  HT:524.69
 VT:401.01  R:407.59  RT:210.34 ( 804Kops/s)

 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  pixman/pixman-sse2.c |   63 
 ++
  1 file changed, 63 insertions(+)

 diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
 index 665eead..73eee68 100644
 --- a/pixman/pixman-sse2.c
 +++ b/pixman/pixman-sse2.c
 @@ -4519,9 +4519,70 @@ sse2_composite_add__ (pixman_implementation_t 
 *imp,

 sse2_combine_add_u (imp, op, dst, src, NULL, width);
  }
 +}
 +
 +static void
 +sse2_composite_add_n_ (pixman_implementation_t *imp,
 +  pixman_composite_info_t *info)
 +{
 +PIXMAN_COMPOSITE_ARGS (info);
 +uint32_t *dst_line, *dst, src;
 +int dst_stride;
 +
 +__m128i xmm_src;
 +
 +PIXMAN_IMAGE_GET_LINE (dest_image, dest_x, dest_y, uint32_t, dst_stride, 
 dst_line, 1);
 +
 +src = _pixman_image_get_solid (imp, src_image, dest_image-bits.format);
 +if (src == 0)
 +   return;
 +
 +if (src == ~0)
 +{
 +   pixman_fill (dest_image-bits.bits, dest_image-bits.rowstride, 32,
 +dest_x, dest_y, width, height, ~0);
 +
 +   return;
 +}
 +
 +xmm_src = _mm_set_epi32 (src, src, src, src);
 +while (height--)
 +{
 +   int w = width;
 +   uint32_t d;

 +   dst = dst_line;
 +   dst_line += dst_stride;
 +
 +   while (w  (unsigned long)dst  15)

Use uintptr_t instead. The rest of the patch looks good to me.

 +   {
 +   d = *dst;
 +   *dst++ =
 +   _mm_cvtsi128_si32 ( _mm_adds_epu8 (xmm_src, _mm_cvtsi32_si128 
 (d)));
 +   w--;
 +   }
 +
 +   while (w = 4)
 +   {
 +   save_128_aligned
 +   ((__m128i*)dst,
 +_mm_adds_epu8 (xmm_src, load_128_aligned ((__m128i*)dst)));
 +
 +   dst += 4;
 +   w -= 4;
 +   }
 +
 +   while (w--)
 +   {
 +   d = *dst;
 +   *dst++ =
 +   _mm_cvtsi128_si32 (_mm_adds_epu8 (xmm_src,
 + _mm_cvtsi32_si128 (d)));
 +   }
 +}
  }

 +
  static pixman_bool_t
  pixman_blt_sse2 (uint32_t *src_bits,
   uint32_t *dst_bits,
 @@ -5814,6 +5875,8 @@ static const pixman_fast_path_t sse2_fast_paths[] =
  PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, a8b8g8r8, 
 sse2_composite_add__),
  PIXMAN_STD_FAST_PATH (ADD, solid, a8, a8, sse2_composite_add_n_8_8),
  PIXMAN_STD_FAST_PATH (ADD, solid, null, a8, sse2_composite_add_n_8),
 +PIXMAN_STD_FAST_PATH (ADD, solid, null, x8r8g8b8, 
 sse2_composite_add_n_),
 +PIXMAN_STD_FAST_PATH (ADD, solid, null, a8r8g8b8, 
 sse2_composite_add_n_),

  /* PIXMAN_OP_SRC */
  PIXMAN_STD_FAST_PATH (SRC, solid, a8, a8r8g8b8, 
 sse2_composite_src_n_8_),
 --
 1.7.10.4
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [cairo] issue with blend modes in pixman

2012-12-31 Thread Matt Turner

On Mon, Dec 31, 2012 at 1:05 PM, Rik Cabanier caban...@gmail.com wrote:
 Looking at the formulas, I can see what's wrong but I don't know who to
 contact.

These mailing lists are perfect.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Always use xmmintrin.h for 64 bit Windows

2012-11-16 Thread Matt Turner

On Tue, Nov 13, 2012 at 10:44 AM, Stefan Weil s...@weilnetz.de wrote:
 MinGW-w64 uses the GNU compiler and does not define _MSC_VER.
 Nevertheless, it provides xmmintrin.h and must be handled
 here like the MS compiler. Otherwise compilation fails due to
 conflicting declarations.

 Signed-off-by: Stefan Weil s...@weilnetz.de
 ---
  pixman/pixman-mmx.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
 index c2ae4ea..aef468a 100644
 --- a/pixman/pixman-mmx.c
 +++ b/pixman/pixman-mmx.c
 @@ -62,7 +62,7 @@ _mm_empty (void)
  #endif

  #ifdef USE_X86_MMX
 -# if (defined(__SUNPRO_C) || defined(_MSC_VER))
 +# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
  #  include xmmintrin.h
  # else
  /* We have to compile with -msse to use xmmintrin.h, but that causes SSE
 --
 1.7.10.4

If you're compiling for Win64, you have SSE2. Why even compile the MMX code?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Questionable numbers from lowlevel-blt-bench

2012-10-01 Thread Matt Turner

On Mon, Oct 1, 2012 at 1:17 AM, Jonathan Morton
jonathan.mor...@movial.com wrote:
 On Sun, 30 Sep 2012 15:05:18 -0700, Matt Turner matts...@gmail.com
 wrote:
 In doing performance work, I've noticed some weird results from
 lowlevel-blt-bench. Often it has seemed that the RT results determined
 the Kops/s almost entirely. I came across an instance of this today
 which was particularly striking:

 Before:
 add__ =  L1:  47.01  L2:  36.84  M: 18.96 ( 33.14%)  HT: 35.94
  VT: 33.82  R: 30.64  RT: 19.36 ( 181Kops/s)

 After:
 add__ =  L1: 230.78  L2: 200.86  M: 90.48 (159.44%)  HT: 48.41
  VT: 45.46  R: 42.78  RT: 19.22 ( 181Kops/s)

 L1/L2/M numbers are improved by ~5x. HT, VT, and R numbers are
 improved by ~1.35x. RT doesn't change... neither does Kops/s.

 What's going on here, and can we make the composite result more sensible?

 The figures in brackets are derived directly from one or more of the
 other figures.  In this case, the Kops/s number is derived directly
 from the RT number, which should explain why they correlate.

Ahh. At least I (and I'm pretty sure others too) thought that the Kops
number was supposed to be a composite of HT, VT, RT, and R. This
explains it then.

 The percentage figure, meanwhile, represents a percentage of memory
 bandwidth used by this blitter (under the M test), the peak bandwidth
 being derived from an earlier measurement.  (You're seeing more than
 100%, which suggests that the earlier measurement is not optimal.)

Indeed. I'm prefetching in the modified function.

 The RT figure is meant to measure, as directly as possible, the per-call
 overhead which does not depend on the number of pixels involved.
 Accordingly, it is not expected to change significantly when doing
 pixel-related optimisations.

Right, makes sense.

Thanks!
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] Questionable numbers from lowlevel-blt-bench

2012-09-30 Thread Matt Turner

Hi Jonathan,

In doing performance work, I've noticed some weird results from
lowlevel-blt-bench. Often it has seemed that the RT results determined
the Kops/s almost entirely. I came across an instance of this today
which was particularly striking:

Before:
add__ =  L1:  47.01  L2:  36.84  M: 18.96 ( 33.14%)  HT: 35.94
 VT: 33.82  R: 30.64  RT: 19.36 ( 181Kops/s)

After:
add__ =  L1: 230.78  L2: 200.86  M: 90.48 (159.44%)  HT: 48.41
 VT: 45.46  R: 42.78  RT: 19.22 ( 181Kops/s)

L1/L2/M numbers are improved by ~5x. HT, VT, and R numbers are
improved by ~1.35x. RT doesn't change... neither does Kops/s.

What's going on here, and can we make the composite result more sensible?

Thanks,
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [test PATCH] Use _mm_maddubs_epi16 in BILINEAR_INTERPOLATE_ONE_PIXEL

2012-09-29 Thread Matt Turner

Siarhei, can you measure any performance improvement with this? I
can't... :(
---
 pixman/pixman-sse2.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
index efed310..4fbc045 100644
--- a/pixman/pixman-sse2.c
+++ b/pixman/pixman-sse2.c
@@ -32,6 +32,7 @@
 
 #include xmmintrin.h /* for _mm_shuffle_pi16 and _MM_SHUFFLE */
 #include emmintrin.h /* for SSE2 intrinsics */
+#include tmmintrin.h /* for SSSE3 intrinsics */
 #include pixman-private.h
 #include pixman-combine32.h
 #include pixman-inlines.h
@@ -5414,7 +5415,7 @@ FAST_NEAREST_MAINLOOP_COMMON 
(sse2__n__normal_OVER,
 
 #define BILINEAR_INTERPOLATE_ONE_PIXEL(pix)
\
 do {   
\
-__m128i xmm_wh, xmm_lo, xmm_hi, a; 
\
+__m128i xmm_wh, a; 
\
 /* fetch 2x2 pixel block into sse2 registers */
\
 __m128i tltr = _mm_loadl_epi64 (   
\
(__m128i *)src_top[pixman_fixed_to_int (vx)]); 
\
@@ -5443,10 +5444,7 @@ do { 
\
_mm_srli_epi16 (xmm_x, 16 - BILINEAR_INTERPOLATION_BITS))); 
\
xmm_x = _mm_add_epi16 (xmm_x, xmm_ux);  
\
/* horizontal interpolation */  
\
-   xmm_lo = _mm_mullo_epi16 (a, xmm_wh);   
\
-   xmm_hi = _mm_mulhi_epu16 (a, xmm_wh);   
\
-   a = _mm_add_epi32 (_mm_unpacklo_epi16 (xmm_lo, xmm_hi), 
\
-  _mm_unpackhi_epi16 (xmm_lo, xmm_hi));
\
+   a = _mm_maddubs_epi16 (a, xmm_wh);  
\
 }  
\
 /* shift and pack the result */
\
 a = _mm_srli_epi32 (a, BILINEAR_INTERPOLATION_BITS * 2);   
\
-- 
1.7.8.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 05/10] pixman-utils.c, pixman-private.h: Add floating point conversion routines

2012-09-26 Thread Matt Turner

On Wed, Sep 26, 2012 at 1:43 PM, Søren Sandmann sandm...@cs.au.dk wrote:
 From: Søren Sandmann Pedersen s...@redhat.com

 A new struct argb_t containing a floating point pixel is added to
 pixman-private.h, and conversion routines are added to pixman-utils.c
 to convert normalized integers to and from that struct.

 New functions:

   - pixman_expand_to_float()
 Expands a buffer of integer pixels to a buffer of argb_t pixels

   - pixman_contract_from_float()
 Converts a buffer of argb_t pixels to a buffer integer pixels

   - pixman_float_to_unorm()
 Converts a floating point number to an unsigned normalized integer

   - pixman_unorm_to_float()
 Converts an unsigned normalized integer to a floating point number
 ---
  pixman/pixman-private.h |   35 +++
  pixman/pixman-utils.c   |  107 
 +++
  2 files changed, 142 insertions(+), 0 deletions(-)

 diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
 index c82316f..91f35ed 100644
 --- a/pixman/pixman-private.h
 +++ b/pixman/pixman-private.h
 @@ -45,6 +45,16 @@ typedef struct radial_gradient radial_gradient_t;
  typedef struct bits_image bits_image_t;
  typedef struct circle circle_t;

 +typedef struct argb_t argb_t;
 +
 +struct argb_t
 +{
 +float a;
 +float r;
 +float g;
 +float b;
 +};
 +
  typedef void (*fetch_scanline_t) (pixman_image_t *image,
   int x,
   int y,
 @@ -792,12 +802,34 @@ pixman_expand (uint64_t *   dst,
 const uint32_t * src,
 pixman_format_code_t format,
 int  width);
 +void
 +pixman_expand_to_float (argb_t   *dst,
 +   const uint32_t   *src,
 +   pixman_format_code_t  format,
 +   int   width);

  void
  pixman_contract (uint32_t *  dst,
   const uint64_t *src,
   int width);

 +void
 +pixman_contract_from_float (uint32_t *dst,
 +   const argb_t *src,
 +   int   width);
 +
 +pixman_bool_t
 +_pixman_lookup_composite_function (pixman_implementation_t *toplevel,
 +  pixman_op_t  op,
 +  pixman_format_code_t src_format,
 +  uint32_t src_flags,
 +  pixman_format_code_t mask_format,
 +  uint32_t mask_flags,
 +  pixman_format_code_t dest_format,
 +  uint32_t dest_flags,
 +  pixman_implementation_t**out_imp,
 +  pixman_composite_func_t *out_func);
 +
  /* Region Helpers */
  pixman_bool_t
  pixman_region32_copy_from_region16 (pixman_region32_t *dst,
 @@ -957,6 +989,9 @@ unorm_to_unorm (uint32_t val, int from_bits, int to_bits)
  return result;
  }

 +uint16_t pixman_float_to_unorm (float f, int n_bits);
 +float pixman_unorm_to_float (uint16_t u, int n_bits);
 +
  /*
   * Various debugging code
   */
 diff --git a/pixman/pixman-utils.c b/pixman/pixman-utils.c
 index e4a9730..4f9db29 100644
 --- a/pixman/pixman-utils.c
 +++ b/pixman/pixman-utils.c
 @@ -162,6 +162,113 @@ pixman_expand (uint64_t *   dst,
  }
  }

 +static force_inline uint16_t
 +float_to_unorm (float f, int n_bits)
 +{
 +uint32_t u;
 +
 +if (f  1.0)
 +   f = 1.0;
 +if (f  0.0)
 +   f = 0.0;
 +
 +u = f * (1  n_bits);
 +u -= (u  n_bits);
 +
 +return u;
 +}
 +
 +static force_inline float
 +unorm_to_float (uint16_t u, int n_bits)
 +{
 +uint32_t m = ((1  n_bits) - 1);
 +
 +return (u  m) * (1.f / (float)m);
 +}
 +
 +/*
 + * This function expands images from a8r8g8b8 to argb_t.  To preserve
 + * precision, it needs to know from which source format the a8r8g8b8 pixels
 + * originally came.
 + *
 + * For example, if the source was PIXMAN_x1r5g5b5 and the red component
 + * contained bits 12345, then the 8-bit value is 12345123.  To correctly
 + * expand this to floating point, it should be 12345 / 31.0 and not
 + * 12345123 / 255.0.
 + */
 +void
 +pixman_expand_to_float (argb_t   *dst,
 +   const uint32_t   *src,
 +   pixman_format_code_t  format,
 +   int   width)
 +{
 +int a_size, r_size, g_size, b_size;
 +int a_shift, r_shift, g_shift, b_shift;
 +int i;
 +
 +if (!PIXMAN_FORMAT_VIS (format))
 +   format = PIXMAN_a8r8g8b8;
 +
 +/*
 + * Determine the sizes of each component and the masks and shifts
 + * required to extract them from the source pixel.
 + */
 +

[Pixman] [PATCH] sse2: mark pack_565_2x128_128 as static force_inline

2012-09-24 Thread Matt Turner

---
 pixman/pixman-sse2.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
index e273a95..cf21ef8 100644
--- a/pixman/pixman-sse2.c
+++ b/pixman/pixman-sse2.c
@@ -146,7 +146,7 @@ pack_565_2packedx128_128 (__m128i lo, __m128i hi)
 return _mm_packs_epi32 (t0, t1);
 }
 
-__m128i
+static force_inline __m128i
 pack_565_2x128_128 (__m128i lo, __m128i hi)
 {
 __m128i data;
-- 
1.7.8.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] build: Remove useless DEP_CFLAGS/DEP_LIBS variables

2012-09-16 Thread Matt Turner

Reduces the size of the generated pixman/Makefile from 46k to 41k.
---
 configure.ac   |2 --
 pixman-1.pc.in |4 ++--
 pixman/Makefile.am |   23 +--
 3 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/configure.ac b/configure.ac
index e3a5ff9..5fda547 100644
--- a/configure.ac
+++ b/configure.ac
@@ -796,8 +796,6 @@ AM_CONDITIONAL(HAVE_GTK, [test x$enable_gtk = xyes])
 
 AC_SUBST(GTK_CFLAGS)
 AC_SUBST(GTK_LIBS)
-AC_SUBST(DEP_CFLAGS)
-AC_SUBST(DEP_LIBS)
 
 dnl =
 dnl posix_memalign, sigaction, alarm, gettimeofday
diff --git a/pixman-1.pc.in b/pixman-1.pc.in
index 936d95d..e3b9711 100644
--- a/pixman-1.pc.in
+++ b/pixman-1.pc.in
@@ -6,6 +6,6 @@ includedir=@includedir@
 Name: Pixman
 Description: The pixman library (version 1)
 Version: @PACKAGE_VERSION@
-Cflags: -I${includedir}/pixman-1 @DEP_CFLAGS@
-Libs: -L${libdir} -lpixman-1 @DEP_LIBS@
+Cflags: -I${includedir}/pixman-1
+Libs: -L${libdir} -lpixman-1
 
diff --git a/pixman/Makefile.am b/pixman/Makefile.am
index 843711a..270d65e 100644
--- a/pixman/Makefile.am
+++ b/pixman/Makefile.am
@@ -3,7 +3,7 @@ include $(top_srcdir)/pixman/Makefile.sources
 lib_LTLIBRARIES = libpixman-1.la
 
 libpixman_1_la_LDFLAGS = -version-info $(LT_VERSION_INFO) -no-undefined 
@PTHREAD_LDFLAGS@ 
-libpixman_1_la_LIBADD = @PTHREAD_LIBS@ @DEP_LIBS@ -lm
+libpixman_1_la_LIBADD = @PTHREAD_LIBS@ -lm
 libpixman_1_la_SOURCES = $(libpixman_sources) $(libpixman_headers)
 
 libpixmanincludedir = $(includedir)/pixman-1
@@ -27,8 +27,7 @@ if USE_X86_MMX
 noinst_LTLIBRARIES += libpixman-mmx.la
 libpixman_mmx_la_SOURCES = \
pixman-mmx.c
-libpixman_mmx_la_CFLAGS = $(DEP_CFLAGS) $(MMX_CFLAGS)
-libpixman_mmx_la_LIBADD = $(DEP_LIBS)
+libpixman_mmx_la_CFLAGS = $(MMX_CFLAGS)
 libpixman_1_la_LDFLAGS += $(MMX_LDFLAGS)
 libpixman_1_la_LIBADD += libpixman-mmx.la
 
@@ -41,8 +40,7 @@ noinst_LTLIBRARIES += libpixman-vmx.la
 libpixman_vmx_la_SOURCES = \
pixman-vmx.c \
pixman-combine32.h
-libpixman_vmx_la_CFLAGS = $(DEP_CFLAGS) $(VMX_CFLAGS)
-libpixman_vmx_la_LIBADD = $(DEP_LIBS)
+libpixman_vmx_la_CFLAGS = $(VMX_CFLAGS)
 libpixman_1_la_LIBADD += libpixman-vmx.la
 
 ASM_CFLAGS_vmx=$(VMX_CFLAGS)
@@ -53,8 +51,7 @@ if USE_SSE2
 noinst_LTLIBRARIES += libpixman-sse2.la
 libpixman_sse2_la_SOURCES = \
pixman-sse2.c
-libpixman_sse2_la_CFLAGS = $(DEP_CFLAGS) $(SSE2_CFLAGS)
-libpixman_sse2_la_LIBADD = $(DEP_LIBS)
+libpixman_sse2_la_CFLAGS = $(SSE2_CFLAGS)
 libpixman_1_la_LDFLAGS += $(SSE2_LDFLAGS)
 libpixman_1_la_LIBADD += libpixman-sse2.la
 
@@ -68,8 +65,6 @@ libpixman_arm_simd_la_SOURCES = \
pixman-arm-simd.c   \
pixman-arm-common.h \
pixman-arm-simd-asm.S
-libpixman_arm_simd_la_CFLAGS = $(DEP_CFLAGS)
-libpixman_arm_simd_la_LIBADD = $(DEP_LIBS)
 libpixman_1_la_LIBADD += libpixman-arm-simd.la
 
 ASM_CFLAGS_arm_simd=
@@ -84,8 +79,6 @@ libpixman_arm_neon_la_SOURCES = \
 pixman-arm-neon-asm.S  \
pixman-arm-neon-asm-bilinear.S \
 pixman-arm-neon-asm.h
-libpixman_arm_neon_la_CFLAGS = $(DEP_CFLAGS)
-libpixman_arm_neon_la_LIBADD = $(DEP_LIBS)
 libpixman_1_la_LIBADD += libpixman-arm-neon.la
 
 ASM_CFLAGS_arm_neon=
@@ -106,7 +99,6 @@ libpixman_iwmmxt_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \
 $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link $(CCLD) \
$(CFLAGS) $(IWMMXT_CFLAGS) $(AM_LDFLAGS) \
$(LDFLAGS) -o $@
-libpixman_iwmmxt_la_LIBADD = $(DEP_LIBS)
 
 libpixman-iwmmxt.la: libpixman_iwmmxt_la-pixman-mmx.lo 
$(libpixman_iwmmxt_la_DEPENDENCIES) 
$(AM_V_CCLD)$(libpixman_iwmmxt_la_LINK) 
libpixman_iwmmxt_la-pixman-mmx.lo $(libpixman_iwmmxt_la_LIBADD) $(LIBS)
@@ -121,8 +113,6 @@ libpixman_mips_dspr2_la_SOURCES = \
 pixman-mips-dspr2-asm.S \
 pixman-mips-dspr2-asm.h \
 pixman-mips-memcpy-asm.S
-libpixman_mips_dspr2_la_CFLAGS = $(DEP_CFLAGS)
-libpixman_mips_dspr2_la_LIBADD = $(DEP_LIBS)
 libpixman_1_la_LIBADD += libpixman-mips-dspr2.la
 
 ASM_CFLAGS_mips_dspr2=
@@ -132,12 +122,9 @@ endif
 if USE_LOONGSON_MMI
 noinst_LTLIBRARIES += libpixman-loongson-mmi.la
 libpixman_loongson_mmi_la_SOURCES = pixman-mmx.c loongson-mmintrin.h
-libpixman_loongson_mmi_la_CFLAGS = $(DEP_CFLAGS) $(LS_CFLAGS)
-libpixman_loongson_mmi_la_LIBADD = $(DEP_LIBS)
+libpixman_loongson_mmi_la_CFLAGS = $(LS_CFLAGS)
 libpixman_1_la_LDFLAGS += $(LS_LDFLAGS)
 libpixman_1_la_LIBADD += libpixman-loongson-mmi.la
-
-ASM_CFLAGS_ls=$(LS_CFLAGS)
 endif
 
 .c.s : $(libpixmaninclude_HEADERS) $(BUILT_SOURCES)
-- 
1.7.8.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a

2012-09-16 Thread Matt Turner

pixman/Makefile.am contains a hack that allows pixman-mmx.c to
be compiled with different overriding CFLAGS, since automake
seriously doesn't have a way to do this. Seriously stupid.

It works by defining a new rule and recursively calling make
with modified CFLAGS set.

Note the difference between the USE_LOONGSON* and HAVE_LOONGSON*
preprocessor macros.

Cc: Cyril Brulebois k...@debian.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451
---
This patch applies on top of the previous.

Although the build system works, linking unfortunately doesn't. gcc
refuses to link object files that have been compiled with different
-march=loongson* options together. This sucks.

I'm not sure what to do. I guess I could make them separate shared
objects or even dlopen them, but that really sucks, especially when
I don't see a reason why gcc shouldn't be able to link this code
together.

Anyone have any other ideas? It's really obnoxious that there's not
just a simple -mloongson-mmi flag irrespective of -march=...

 configure.ac|   87 ++
 pixman/Makefile.am  |   36 +---
 pixman/pixman-mips.c|   16 +++-
 pixman/pixman-mmx.c |   10 +-
 pixman/pixman-private.h |   13 +++
 5 files changed, 146 insertions(+), 16 deletions(-)

diff --git a/configure.ac b/configure.ac
index 5fda547..f3804ba 100644
--- a/configure.ac
+++ b/configure.ac
@@ -273,21 +273,27 @@ PIXMAN_CHECK_CFLAG([-xldscope=hidden], [dnl
 dnl ===
 dnl Check for Loongson Multimedia Instructions
 
-if test x$LS_CFLAGS = x ; then
-LS_CFLAGS=-march=loongson2f
+if test x$LS2E_CFLAGS = x ; then
+LS2E_CFLAGS=-march=loongson2e
+fi
+if test x$LS2F_CFLAGS = x ; then
+LS2F_CFLAGS=-march=loongson2f
+fi
+if test x$LS3A_CFLAGS = x ; then
+LS3A_CFLAGS=-march=loongson3a
 fi
 
 have_loongson_mmi=no
 AC_MSG_CHECKING(whether to use Loongson MMI assembler)
 
 xserver_save_CFLAGS=$CFLAGS
-CFLAGS= $LS_CFLAGS $CFLAGS -I$srcdir
+CFLAGS= $LS2F_CFLAGS $CFLAGS -I$srcdir
 AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
 #ifndef __mips_loongson_vector_rev
 #error Loongson Multimedia Instructions are only available on Loongson
 #endif
 #if defined(__GNUC__)  (__GNUC__  4 || (__GNUC__ == 4  __GNUC_MINOR__  
4))
-#error Need GCC = 4.4 for Loongson MMI compilation
+#error Need GCC = 4.4 for Loongson 2e/f MMI compilation
 #endif
 #include pixman/loongson-mmintrin.h
 int main () {
@@ -299,29 +305,95 @@ int main () {
 __m64 c = _mm_srli_pi16 (a.v, b);
 return 0;
 }]])], have_loongson_mmi=yes)
+have_loongson2e_mmi=$have_loongson_mmi
+have_loongson2f_mmi=$have_loongson_mmi
+CFLAGS=$xserver_save_CFLAGS
+
+xserver_save_CFLAGS=$CFLAGS
+CFLAGS= $LS3A_CFLAGS $CFLAGS -I$srcdir
+AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
+#ifndef __mips_loongson_vector_rev
+#error Loongson Multimedia Instructions are only available on Loongson
+#endif
+#if defined(__GNUC__)  (__GNUC__  4 || (__GNUC__ == 4  __GNUC_MINOR__  
6))
+#error Need GCC = 4.6 for Loongson 3A MMI compilation
+#endif
+#include pixman/loongson-mmintrin.h
+int main () {
+union {
+__m64 v;
+char c[8];
+} a = { .c = {1, 2, 3, 4, 5, 6, 7, 8} };
+int b = 4;
+__m64 c = _mm_srli_pi16 (a.v, b);
+return 0;
+}]])], have_loongson3a_mmi=yes)
 CFLAGS=$xserver_save_CFLAGS
 
 AC_ARG_ENABLE(loongson-mmi,
[AC_HELP_STRING([--disable-loongson-mmi],
[disable Loongson MMI fast paths])],
[enable_loongson_mmi=$enableval], [enable_loongson_mmi=auto])
+AC_ARG_ENABLE(loongson2e-mmi,
+   [AC_HELP_STRING([--disable-loongson2e-mmi],
+   [do not build Loongson MMI fast paths for 2e])],
+   [enable_loongson2e_mmi=$enableval], [enable_loongson2e_mmi=auto])
+AC_ARG_ENABLE(loongson2f-mmi,
+   [AC_HELP_STRING([--disable-loongson2f-mmi],
+   [do not build Loongson MMI fast paths for 2f])],
+   [enable_loongson2f_mmi=$enableval], [enable_loongson2f_mmi=auto])
+AC_ARG_ENABLE(loongson3a-mmi,
+   [AC_HELP_STRING([--disable-loongson3a-mmi],
+   [do not build Loongson MMI fast paths for 3a])],
+   [enable_loongson3a_mmi=$enableval], [enable_loongson3a_mmi=auto])
 
 if test $enable_loongson_mmi = no ; then
have_loongson_mmi=disabled
 fi
+if test $enable_loongson2e_mmi = no ; then
+   have_loongson2e_mmi=disabled
+fi
+if test $enable_loongson2f_mmi = no ; then
+   have_loongson2f_mmi=disabled
+fi
+if test $enable_loongson3a_mmi = no ; then
+   have_loongson3a_mmi=disabled
+fi
 
 if test $have_loongson_mmi = yes ; then
+   loongson_msg=yes:
AC_DEFINE(USE_LOONGSON_MMI, 1, [use Loongson Multimedia Instructions])
+   if test $have_loongson2e_mmi = yes ; then
+   loongson_msg=$loongson_msg 2e
+   AC_DEFINE(HAVE_LOONGSON2E_MMI, 1, [use Loongson 2e Multimedia 
Instructions])
+   fi
+   if test $have_loongson2f_mmi = yes ; then
+   loongson_msg=$loongson_msg 2f
+

Re: [Pixman] [PATCH] Make pixman-mmx.c compile on x86-32 without optimization

2012-07-11 Thread Matt Turner

On Mon, Jul 9, 2012 at 10:19 PM, Matt Turner matts...@gmail.com wrote:
 Works for me.

On second glance, did I just make a mistake in b87cd1f and write ifdef
instead of ifndef?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Make pixman-mmx.c compile on x86-32 without optimization

2012-07-09 Thread Matt Turner

On Mon, Jul 9, 2012 at 7:31 AM, Søren Sandmann sandm...@cs.au.dk wrote:
 From: Søren Sandmann Pedersen s...@redhat.com

 When not optimizing, write _mm_shuffle_pi16() as a statement
 expression with inline assembly. That way we avoid
 __builtin_ia32_pshufw(), which is only available when compiling with
 -msse, while still allowing the non-optimizing gcc to understand that
 the second argument is a compile time constant.

 Cc: matts...@gmail.com
 ---
  pixman/pixman-mmx.c |   13 +++--
  1 files changed, 11 insertions(+), 2 deletions(-)

 diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
 index 5441d6b..74a5e87 100644
 --- a/pixman/pixman-mmx.c
 +++ b/pixman/pixman-mmx.c
 @@ -105,8 +105,17 @@ _mm_shuffle_pi16 (__m64 __A, int8_t const __N)
  return ret;
  }
  #  else
 -#   define _mm_shuffle_pi16(A, N) \
 -((__m64) __builtin_ia32_pshufw ((__v4hi)(__m64)(A), (int)(N)))
 +#   define _mm_shuffle_pi16(A, N)  \
 +({ \
 +   __m64 ret;  \
 +   \
 +   asm (pshufw %2, %1, %0\n\t\
 +: =y (ret)   \
 +: y (A), K ((const int8_t)N)   \
 +   );  \
 +   \
 +   ret;\
 +})
  #  endif
  # endif
  #endif
 --
 1.7.4

Works for me.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [BUG pixman] f9c91ee2f27eaea68d8c3a130bf7d4bc0c860834 breaks compilation

2012-07-09 Thread Matt Turner

On Mon, Jul 9, 2012 at 1:55 AM, Knut Petersen knut_peter...@t-online.de wrote:
 Søren, the bad commit was supposed to fix a gcc -O0 compile problem, but it
 breaks
 gcc -O0 compilation here. Reverting f9c91ee2 fixes the problem for me.

Is this build automated?

If it's an automated build that runs the test suite, you're actually
spending way more time running the test suite when built with -O0 than
you save by building with -O0.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 1/5] mmx: add scaled bilinear src_8888_8888

2012-07-01 Thread Matt Turner

On Sun, Jul 1, 2012 at 12:56 PM, Søren Sandmann sandm...@cs.au.dk wrote:
 Matt Turner matts...@gmail.com writes:

 +SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8,  a8r8g8b8, 
 mmx__ ),
 +SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8,  x8r8g8b8, 
 mmx__ ),
 +SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8,  x8r8g8b8, 
 mmx__ ),
 +

 Looks like the abrg entries are missing.


 Soren

Indeed. They're missing from SSE2 as well. I'll fix that up when I push it.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] Use a compile-time constant for the K constraint in the MMX detection.

2012-07-01 Thread Matt Turner

On Sun, Jul 1, 2012 at 5:03 PM, Søren Sandman sandm...@cs.au.dk wrote:
 From: Søren Sandmann Pedersen s...@redhat.com

 When compiling with -O0, gcc doesn't understand that in

  signed char x = 0;

  ...

  asm (...,
   : K (x));

 x is constant. Fix this by using an immediate constant instead of a
 variable.
 ---
  configure.ac |3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

 diff --git a/configure.ac b/configure.ac
 index 2b9d1ba..36f423e 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -351,12 +351,11 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
  int main () {
  __m64 v = _mm_cvtsi32_si64 (1);
  __m64 w;
 -signed char x = 0;

  /* Some versions of clang will choke on K */
  asm (pshufw %2, %1, %0\n\t
  : =y (w)
 -: y (v), K (x)
 +: y (v), K (5)
  );

  return _mm_cvtsi64_si32 (v);
 --
 1.7.10.4

Seems like the smart thing to me.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [ANNOUNCE] pixman release 0.26.2 now available

2012-06-30 Thread Matt Turner

On Sat, Jun 30, 2012 at 3:04 AM, Andreas Radke a.ra...@arcor.de wrote:
 Somehow I get different checksums:

 [andyrtr@workstation64 trunk]$ md5sum pixman-0.26.2.tar.*
 6b3e4c5300adb893a2baa9631c23efb2  pixman-0.26.2.tar.bz2
 276242da5b3af1258d072cf205d18f0b  pixman-0.26.2.tar.gz

 Can you confirm the sums please again?

Confirmed.

Not sure exactly how this happened. It was the first release I've
done, and I hit a couple of permissions hick-ups uploading the
tarballs to cairo.fdo.

Sorry about that. The .sha1 files should be right though.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [ANNOUNCE] pixman release 0.26.2 now available

2012-06-29 Thread Matt Turner

A new pixman release 0.26.2 is now available. This is a stable release. It
contains some bug fixes, custom build rules for ARM/iwMMXt, and an important
bug fix for MMX/x86.

tar.gz:
http://cairographics.org/releases/pixman-0.26.2.tar.gz
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.26.2.tar.gz

tar.bz2:
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.26.2.tar.bz2

Hashes:
MD5:  69af3cf4ce6515ee01b0960edf8009fb  pixman-0.26.2.tar.gz
MD5:  2b57fb3038be4890ec433d11176280cd  pixman-0.26.2.tar.bz2
SHA1: ba71d029d174aa8b9d23b1072ab76e6b4ea3de59  pixman-0.26.2.tar.gz
SHA1: c7cdb5803061ee6614acc66258b0339ad4e52314  pixman-0.26.2.tar.bz2

GPG signature:
http://cairographics.org/releases/pixman-0.26.2.tar.gz.sha1.asc
(signed by Matt Turner matts...@gmail.com)

Git:
git://git.freedesktop.org/git/pixman
tag: pixman-0.26.2

Log:
Matt Turner (6):
  Post-release version bump to 0.26.1
  mmx: add missing _mm_empty calls
  autotools: use custom build rule to build iwMMXt code
  configure.ac: add iwmmxt2 configure flag
  Fix distcheck due to custom iwMMXt rules
  Pre-release version bump to 0.26.2

Siarhei Siamashka (2):
  test: OpenMP 2.5 requires signed loop iteration variables
  test: fix bisecting issue in fuzzer-find-diff.pl

Søren Sandmann Pedersen (1):
  test: Add missing break in stress-test.c



pgpgxsYR8Ypir.pgp
Description: PGP signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] mmx: Use expand_alpha instead of mask/shift

2012-06-29 Thread Matt Turner

---
 pixman/pixman-mmx.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index bff8585..071cdfd 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -1618,9 +1618,7 @@ mmx_composite_over__n_ (pixman_implementation_t 
*imp,
 PIXMAN_IMAGE_GET_LINE (src_image, src_x, src_y, uint32_t, src_stride, 
src_line, 1);
 
 mask = _pixman_image_get_solid (imp, mask_image, dest_image-bits.format);
-mask = 0xff00;
-mask = mask | mask  8 | mask  16 | mask  24;
-vmask = load (mask);
+vmask = expand_alpha (load (mask));
 
 while (height--)
 {
@@ -1689,9 +1687,7 @@ mmx_composite_over_x888_n_ (pixman_implementation_t 
*imp,
 PIXMAN_IMAGE_GET_LINE (src_image, src_x, src_y, uint32_t, src_stride, 
src_line, 1);
 mask = _pixman_image_get_solid (imp, mask_image, dest_image-bits.format);
 
-mask = 0xff00;
-mask = mask | mask  8 | mask  16 | mask  24;
-vmask = load (mask);
+vmask = expand_alpha (load (mask));
 srca = MC (4x00ff);
 
 while (height--)
-- 
1.7.3.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 00/10] Cleanups to CPU detection

2012-06-29 Thread Matt Turner

On Fri, Jun 29, 2012 at 5:20 PM, Alan Coopersmith
alan.coopersm...@oracle.com wrote:
 On 06/29/12 01:44 PM, Søren Sandmann Pedersen wrote:
 I was looking at making use of some of the newer x86 SIMD instruction
 sets and realized that (a) we don't ever call cpuid on x86-64, we just
 assume that MMX and SSE2 are present,

 I thought the amd64 ABI guaranteed MMX  SSE2 would always be present - is
 that not the case?

SSE2 seems to be required by the ABI, but I don't know why MMX would
(maybe x87 FPU is, and by extension MMX?).

I'm guessing here -- but since newer AMD chips dropped 3DNow, I would
think it'd be possible for future chips to drop MMX as well.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 3/5] mmx: add scaled bilinear over_8888_8_8888

2012-06-29 Thread Matt Turner

On Wed, Jun 27, 2012 at 10:38 PM, Matt Turner matts...@gmail.com wrote:
 Reduces runtime of firefox-fishtank trace from 1510 to 1030 seconds on 
 Loongson.

 ---
  pixman/pixman-mmx.c |   84 
 +++
  1 files changed, 84 insertions(+), 0 deletions(-)

Loongson:
image firefox-fishtank 1665.163 1670.370   0.17%3/3
image firefox-fishtank 1037.738 1040.218   0.19%3/3

ARM/iwMMXt:
image firefox-fishtank 2042.723 2045.308   0.10%3/3
image firefox-fishtank 1487.282 1492.640   0.17%3/3
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 5/5] mmx: optimize bilinear function when using 7-bit precision

2012-06-29 Thread Matt Turner

On Wed, Jun 27, 2012 at 10:38 PM, Matt Turner matts...@gmail.com wrote:
 ---
 Reduces runtime of firefox-planet-gnome trace from 156 to 153 seconds on 
 Loongson.

 Increases runtime of firefox-fishtank trace from 1030 to 1060 seconds. Why?

  pixman/pixman-mmx.c |   45 -
  1 files changed, 32 insertions(+), 13 deletions(-)

Loongson:
image firefox-fishtank 1037.738 1040.218   0.19%3/3
image firefox-fishtank 1056.611 1057.581   0.20%3/3

ARM/iwMMXt:
image firefox-fishtank 1487.282 1492.640   0.17%3/3
image firefox-fishtank 1363.913 1364.366   0.11%3/3

I'm mostly okay with the slight decrease in performance on Loongson,
given the speed-up on ARM (and on x86). Maybe look at it later..
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] Bilinear scaling patches for MMX

2012-06-27 Thread Matt Turner


These five patches implement the same bilinear scaling compositing functions
as provided by the SSE2 code. They pass the test suite on x86, Loongson, and
iwMMXt, but I haven't done extensive benchmarking yet on iwMMXt.

The fifth patch optimizes the functions for 7-bit bilinear interpolation, but
doesn't give the performance differences I would expect. firefox-planet-gnome
performance is increased by ~1% and firefox-fishtank performance is reduced.

Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH 1/5] mmx: add scaled bilinear src_8888_8888

2012-06-27 Thread Matt Turner

---
 pixman/loongson-mmintrin.h |   73 ++
 pixman/pixman-mmx.c|   93 
 2 files changed, 166 insertions(+), 0 deletions(-)

diff --git a/pixman/loongson-mmintrin.h b/pixman/loongson-mmintrin.h
index 1a114fe..f0931ac 100644
--- a/pixman/loongson-mmintrin.h
+++ b/pixman/loongson-mmintrin.h
@@ -45,6 +45,28 @@ _mm_setzero_si64 (void)
 }
 
 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_add_pi16 (__m64 __m1, __m64 __m2)
+{
+   __m64 ret;
+   asm(paddh %0, %1, %2\n\t
+  : =f (ret)
+  : f (__m1), f (__m2)
+   );
+   return ret;
+}
+
+extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_add_pi32 (__m64 __m1, __m64 __m2)
+{
+   __m64 ret;
+   asm(paddw %0, %1, %2\n\t
+  : =f (ret)
+  : f (__m1), f (__m2)
+   );
+   return ret;
+}
+
+extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_adds_pu16 (__m64 __m1, __m64 __m2)
 {
__m64 ret;
@@ -150,6 +172,35 @@ _mm_packs_pu16 (__m64 __m1, __m64 __m2)
 }
 
 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_packs_pi32 (__m64 __m1, __m64 __m2)
+{
+   __m64 ret;
+   asm(packsswh %0, %1, %2\n\t
+  : =f (ret)
+  : f (__m1), f (__m2)
+   );
+   return ret;
+}
+
+extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_set_pi16 (uint16_t __w3, uint16_t __w2, uint16_t __w1, uint16_t __w0)
+{
+   uint64_t val = ((uint64_t)__w3  48)
+| ((uint64_t)__w2  32)
+| ((uint64_t)__w1  16)
+| ((uint64_t)__w0   0);
+   return *(__m64 *)val;
+}
+
+extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_set_pi32 (unsigned __i1, unsigned __i0)
+{
+   uint64_t val = ((uint64_t)__i1  32)
+| ((uint64_t)__i0   0);
+   return *(__m64 *)val;
+}
+
+extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_shuffle_pi16 (__m64 __m, int64_t __n)
 {
__m64 ret;
@@ -193,6 +244,17 @@ _mm_srli_pi16 (__m64 __m, int64_t __count)
 }
 
 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_srli_pi32 (__m64 __m, int64_t __count)
+{
+   __m64 ret;
+   asm(psrlw %0, %1, %2\n\t
+  : =f (ret)
+  : f (__m), f (*(__m64 *)__count)
+   );
+   return ret;
+}
+
+extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_srli_si64 (__m64 __m, int64_t __count)
 {
__m64 ret;
@@ -204,6 +266,17 @@ _mm_srli_si64 (__m64 __m, int64_t __count)
 }
 
 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_sub_pi16 (__m64 __m1, __m64 __m2)
+{
+   __m64 ret;
+   asm(psubh %0, %1, %2\n\t
+  : =f (ret)
+  : f (__m1), f (__m2)
+   );
+   return ret;
+}
+
+extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_unpackhi_pi8 (__m64 __m1, __m64 __m2)
 {
__m64 ret;
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index d869c04..904529f 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -42,6 +42,7 @@
 #endif
 #include pixman-private.h
 #include pixman-combine32.h
+#include pixman-inlines.h
 
 #define no_vERBOSE
 
@@ -3506,6 +3507,94 @@ mmx_composite_over_reverse_n_ 
(pixman_implementation_t *imp,
 _mm_empty ();
 }
 
+#define BSHIFT ((1  BILINEAR_INTERPOLATION_BITS))
+
+#define BILINEAR_DECLARE_VARIABLES 
\
+const __m64 mm_wt = _mm_set_pi16 (wt, wt, wt, wt); 
\
+const __m64 mm_wb = _mm_set_pi16 (wb, wb, wb, wb); 
\
+const __m64 mm_BSHIFT = _mm_set_pi16 (BSHIFT, BSHIFT, BSHIFT, BSHIFT); 
\
+const __m64 mm_ux = _mm_set_pi16 (unit_x, unit_x, unit_x, unit_x); 
\
+const __m64 mm_zero = _mm_setzero_si64 (); 
\
+__m64 mm_x = _mm_set_pi16 (vx, vx, vx, vx)
+
+#define BILINEAR_INTERPOLATE_ONE_PIXEL(pix)
\
+do {   
\
+/* fetch 2x2 pixel block into 2 mmx registers */   
\
+__m64 t = ldq_u ((__m64 *)src_top [pixman_fixed_to_int (vx)]);
\
+__m64 b = ldq_u ((__m64 *)src_bottom [pixman_fixed_to_int (vx)]); 
\
+vx += unit_x;  
\
+/* vertical interpolation */   
\
+__m64 t_hi = _mm_mullo_pi16 (_mm_unpackhi_pi8 (t, mm_zero), mm_wt);
\
+__m64 t_lo = _mm_mullo_pi16 (_mm_unpacklo_pi8 (t,

[Pixman] [PATCH 3/5] mmx: add scaled bilinear over_8888_8_8888

2012-06-27 Thread Matt Turner

Reduces runtime of firefox-fishtank trace from 1510 to 1030 seconds on Loongson.

---
 pixman/pixman-mmx.c |   84 +++
 1 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index a504b60..ea732bb 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -3571,6 +3571,12 @@ do { 
\
 pix = lo;  
\
 } while (0)
 
+#define BILINEAR_SKIP_ONE_PIXEL()  
\
+do {   
\
+vx += unit_x;  
\
+mm_x = _mm_add_pi16 (mm_x, mm_ux); 
\
+} while(0)
+
 static force_inline void
 scaled_bilinear_scanline_mmx___SRC (uint32_t *   dst,
const uint32_t * mask,
@@ -3663,6 +3669,79 @@ FAST_BILINEAR_MAINLOOP_COMMON (mmx___normal_OVER,
   scaled_bilinear_scanline_mmx___OVER,
   uint32_t, uint32_t, uint32_t,
   NORMAL, FLAG_NONE)
+
+static force_inline void
+scaled_bilinear_scanline_mmx__8__OVER (uint32_t *   dst,
+  const uint8_t  * mask,
+  const uint32_t * src_top,
+  const uint32_t * src_bottom,
+  int32_t  w,
+  int  wt,
+  int  wb,
+  pixman_fixed_t   vx,
+  pixman_fixed_t   unit_x,
+  pixman_fixed_t   max_vx,
+  pixman_bool_tzero_src)
+{
+BILINEAR_DECLARE_VARIABLES;
+__m64 pix1, pix2;
+uint32_t m;
+
+while (w)
+{
+   m = (uint32_t) *mask++;
+
+   if (m)
+   {
+   BILINEAR_INTERPOLATE_ONE_PIXEL (pix1);
+
+   if (m == 0xff  is_opaque (pix1))
+   {
+   store (dst, pix1);
+   }
+   else
+   {
+   __m64 ms, md, ma, msa;
+
+   pix2 = load (dst);
+   ma = expand_alpha_rev (to_m64 (m));
+   ms = _mm_unpacklo_pi8 (pix1, _mm_setzero_si64 ());
+   md = _mm_unpacklo_pi8 (pix2, _mm_setzero_si64 ());
+
+   msa = expand_alpha (ms);
+
+   store (dst, (in_over (ms, msa, ma, md)));
+   }
+   }
+   else
+   {
+   BILINEAR_SKIP_ONE_PIXEL ();
+   }
+
+   w--;
+   dst++;
+}
+
+_mm_empty ();
+}
+
+FAST_BILINEAR_MAINLOOP_COMMON (mmx__8__cover_OVER,
+  scaled_bilinear_scanline_mmx__8__OVER,
+  uint32_t, uint8_t, uint32_t,
+  COVER, FLAG_HAVE_NON_SOLID_MASK)
+FAST_BILINEAR_MAINLOOP_COMMON (mmx__8__pad_OVER,
+  scaled_bilinear_scanline_mmx__8__OVER,
+  uint32_t, uint8_t, uint32_t,
+  PAD, FLAG_HAVE_NON_SOLID_MASK)
+FAST_BILINEAR_MAINLOOP_COMMON (mmx__8__none_OVER,
+  scaled_bilinear_scanline_mmx__8__OVER,
+  uint32_t, uint8_t, uint32_t,
+  NONE, FLAG_HAVE_NON_SOLID_MASK)
+FAST_BILINEAR_MAINLOOP_COMMON (mmx__8__normal_OVER,
+  scaled_bilinear_scanline_mmx__8__OVER,
+  uint32_t, uint8_t, uint32_t,
+  NORMAL, FLAG_HAVE_NON_SOLID_MASK)
+
 static uint32_t *
 mmx_fetch_x8r8g8b8 (pixman_iter_t *iter, const uint32_t *mask)
 {
@@ -3927,6 +4006,11 @@ static const pixman_fast_path_t mmx_fast_paths[] =
 SIMPLE_BILINEAR_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, mmx__ 
),
 SIMPLE_BILINEAR_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, mmx__ 
),
 
+SIMPLE_BILINEAR_A8_MASK_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, 
mmx__8_   ),
+SIMPLE_BILINEAR_A8_MASK_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, 
mmx__8_   ),
+SIMPLE_BILINEAR_A8_MASK_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, 
mmx__8_   ),
+SIMPLE_BILINEAR_A8_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, 
mmx__8_   ),
+
 { PIXMAN_OP_NONE },
 };
 
-- 
1.7.3.4

___
Pixman mailing list
Pixman@lists.freedesktop.org

[Pixman] [PATCH 5/5] mmx: optimize bilinear function when using 7-bit precision

2012-06-27 Thread Matt Turner

---
Reduces runtime of firefox-planet-gnome trace from 156 to 153 seconds on 
Loongson.

Increases runtime of firefox-fishtank trace from 1030 to 1060 seconds. Why?

 pixman/pixman-mmx.c |   45 -
 1 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index ea732bb..bff8585 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -3526,11 +3526,14 @@ mmx_composite_over_reverse_n_ 
(pixman_implementation_t *imp,
 }
 
 #define BSHIFT ((1  BILINEAR_INTERPOLATION_BITS))
+#define BMSK (BSHIFT - 1)
 
 #define BILINEAR_DECLARE_VARIABLES 
\
 const __m64 mm_wt = _mm_set_pi16 (wt, wt, wt, wt); 
\
 const __m64 mm_wb = _mm_set_pi16 (wb, wb, wb, wb); 
\
 const __m64 mm_BSHIFT = _mm_set_pi16 (BSHIFT, BSHIFT, BSHIFT, BSHIFT); 
\
+const __m64 mm_addc7 = _mm_set_pi16 (0, 1, 0, 1);  
\
+const __m64 mm_xorc7 = _mm_set_pi16 (0, BMSK, 0, BMSK);
\
 const __m64 mm_ux = _mm_set_pi16 (unit_x, unit_x, unit_x, unit_x); 
\
 const __m64 mm_zero = _mm_setzero_si64 (); 
\
 __m64 mm_x = _mm_set_pi16 (vx, vx, vx, vx)
@@ -3548,21 +3551,37 @@ do {
\
 __m64 b_lo = _mm_mullo_pi16 (_mm_unpacklo_pi8 (b, mm_zero), mm_wb);
\
 __m64 hi = _mm_add_pi16 (t_hi, b_hi);  
\
 __m64 lo = _mm_add_pi16 (t_lo, b_lo);  
\
-/* calculate horizontal weights */ 
\
-__m64 mm_wh_lo = _mm_sub_pi16 (mm_BSHIFT, _mm_srli_pi16 (mm_x, 
\
+if (BILINEAR_INTERPOLATION_BITS  8)   
\
+{  
\
+   /* calculate horizontal weights */  
\
+   __m64 mm_wh = _mm_add_pi16 (mm_addc7, _mm_xor_si64 (mm_xorc7,   
\
+ _mm_srli_pi16 (mm_x,  
\
+16 - BILINEAR_INTERPOLATION_BITS)));   
\
+   mm_x = _mm_add_pi16 (mm_x, mm_ux);  
\
+   /* horizontal interpolation */  
\
+   __m64 p = _mm_unpacklo_pi16 (lo, hi);   
\
+   __m64 q = _mm_unpackhi_pi16 (lo, hi);   
\
+   lo = _mm_madd_pi16 (p, mm_wh);  
\
+   hi = _mm_madd_pi16 (q, mm_wh);  
\
+}  
\
+else   
\
+{  
\
+   /* calculate horizontal weights */  
\
+   __m64 mm_wh_lo = _mm_sub_pi16 (mm_BSHIFT, _mm_srli_pi16 (mm_x,  
\
16 - BILINEAR_INTERPOLATION_BITS)); 
\
-__m64 mm_wh_hi = _mm_srli_pi16 (mm_x,  
\
+   __m64 mm_wh_hi = _mm_srli_pi16 (mm_x,   
\
16 - BILINEAR_INTERPOLATION_BITS);  
\
-mm_x = _mm_add_pi16 (mm_x, mm_ux); 
\
-/* horizontal interpolation */ 
\
-__m64 mm_lo_lo = _mm_mullo_pi16 (lo, mm_wh_lo);
\
-__m64 mm_lo_hi = _mm_mullo_pi16 (hi, mm_wh_hi);
\
-__m64 mm_hi_lo = _mm_mulhi_pu16 (lo, mm_wh_lo);
\
-__m64 mm_hi_hi = _mm_mulhi_pu16 (hi, mm_wh_hi);
\
-lo = _mm_add_pi32 (_mm_unpacklo_pi16 (mm_lo_lo, mm_hi_lo), 
\
-  _mm_unpacklo_pi16 (mm_lo_hi, mm_hi_hi)); 
\
-hi = _mm_add_pi32 (_mm_unpackhi_pi16 (mm_lo_lo, mm_hi_lo), 
\
-  _mm_unpackhi_pi16 (mm_lo_hi, mm_hi_hi)); 
\
+   mm_x = _mm_add_pi16 (mm_x, mm_ux);  
\
+   /* horizontal interpolation */  
\
+   __m64 mm_lo_lo = _mm_mullo_pi16 (lo, mm_wh_lo); 
\
+   __m64 mm_lo_hi = _mm_mullo_pi16 (hi, mm_wh_hi); 
\
+   __m64 mm_hi_lo = _mm_mulhi_pu16 (lo, mm_wh_lo); 
\
+   __m64 mm_hi_hi = _mm_mulhi_pu16 (hi, mm_wh_hi); 
\
+   lo = _mm_add_pi32 (_mm_unpacklo_pi16 (mm_lo_lo, mm_hi_lo),

1 2 3 >

1 - 100 of 205 matches

Mail list logo