On Wed, Feb 06, 2013 at 12:39:13AM +0000, Ben Avison wrote: > Similar in concept to fast_composite_tiled_repeat(), this breaks up any > unscaled composites, where source/mask areas outside the bitmap grid are > not clipped, into a series of simpler composites (either bitmap to bitmap > or solid to bitmap). These simpler composites are usually likely to match > existing fast path implementations, and so should benefit all platforms. > > This produces some significant speedups for some cairo-perf-trace tests. > For example, timings on ARMv6 (using Siarhei's trimmed traces) are > > Before: > [ # ] backend test min(s) median(s) stddev. count > [ # ] image: pixman 0.29.3 > [ 0] image t-firefox-chalkboard 35.715 35.736 0.03% 6/6 > > After: > [ # ] backend test min(s) median(s) stddev. count > [ # ] image: pixman 0.29.3 > [ 0] image t-firefox-chalkboard 9.254 9.261 0.15% 6/6 > > That's a speedup of 3.86x.
Impressive. On IVB, I'm only seeing an improvement of 42.6 -> 34.8s with the full firefox-chalkboard trace, which is marginally better than just implementing PAD support for the simple fetcher. How does that compare with the tiled approach? To handle the fallback case, I think we probably want another implmentation level between general and fast (tiled?) to clarify that these kernels behave differently and that only the general routine would match otherwise. Hence making the fallback to general clearer, and possibly even export the general_composite_rect for use by the tiled implementation. > Also added a simple test program to check different repeat types. This is a nice test and would serve as a good precursor to this patch. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
