On Wed, Jul 27, 2011 at 1:03 PM, Matt Turner <[email protected]> wrote: > On Wed, Jul 27, 2011 at 12:52 PM, Soeren Sandmann <[email protected]> wrote: >> Matt Turner <[email protected]> writes: >> >>> The 3 patch series adds support for compiling pixman's pixman-mmx.c >>> for ARM/iwmmxt for some performance improvements on iwmmxt-enabled ARM >>> CPUs. This is done by taking advantage of the fact that gcc provides >>> MMX-compatible _mm_*-style intrinsics for iwmmxt on ARM. >>> >>> On my OLPC XO 1.75 (with a Marvell CPU), they pass the pixman test >>> suite (verified that test suite passes on x86/MMX as well) and improve >>> performance of most cairo-traces 7% or more. (See attached) >>> >>> For lowlevel-blit-bench, iwmmxt paths are not always faster, at times >>> losing to ARMv6 or geneic paths (but even ARMv6 is sometimes slower >>> than generic...) but providing some massive speed-ups at times: >> >> A few overall comments: >> >> - It would make sense to rename USE_MMX to USE_X86_MMX for symmetry, and >> also adding a comment at the top of pixman-mmx.c to indicate that it >> is being used on both x86 and ARM. > > OK, I can do that. > >> - We need more details in the commit messages. > > Indeed. Will do. > >> Thanks for generating the detailed data. I have formatted it here: >> >> low-level-blit: >> http://people.freedesktop.org/~sandmann/bench-data/all-llblit.txt >> traces: >> http://people.freedesktop.org/~sandmann/bench-data/all-traces.txt >> >> to more clearly show the differences between the various >> implementations. As Siarhei already commented on, the most surprising >> result is that the armv6 assembly is generally slower than the generic C >> code, in some cases a lot slower. >> >>> gcc's current support for iwmmxt code generation is atrocious (See gcc >>> bugs 35294, 36798, 36966), so I have patched gcc to add missing shift >>> and logical iwmmxt instructions. I have seen patches posted improving >>> gcc's iwmmxt support, so I hope that gcc-4.7 will be able to use >>> pixman's iwmmxt code without trouble. (Reminds me as I write this that >>> I need to modify the configure.ac test to use instructions that cause >>> current gcc to crash.) >> >> Are you saying that current versions of GCC basically don't work with >> iwmmxt? If so, we should probably just check for the GCC 4.7 in >> configure. > > Yes, patches have been send to gcc-patches@ but I don't think they're > in gcc-4.7 yet. gcc-4.6 and older, unless there have been some > startling regressions, certainly cannot use basic shift and logical > instruction intrinsics. > > I will modify the configure.ac hunk to check for gcc-4.7 and also > modify the test code to use an intrinsic that is used in pixman-mmx.c > and known to not work with gcc-4.6.1. > > Thanks, > Matt
I've been trying to figure out if the ARM iwmmxt inline assembly makes any difference at all. I think the conclusion is that it does not. Updated code is here: http://cgit.freedesktop.org/~mattst88/pixman/log/?h=iwmmxt-optimizations See http://people.freedesktop.org/~mattst88/pixman-iwmmxt-benchdata.txt Never does using inline assembly seem to make any sort of meaningful difference over simply compiling pixman-mmx.c for ARM/iwmmxt. I tried checking the alignment in the 'wip' commit in the blt function to avoid a lot of unnecessary walign instructions, but as you can see from the benchmark results, it doesn't help anything. Should I just drop the inline assembly pieces? It would definitely make the code simpler. Thanks, Matt _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
