On Thu, 29 Aug 2013 13:02:53 -0400 "Søren Sandmann Pedersen" <[email protected]> wrote:
> This new iterator uses the SSSE3 instructions pmaddubsw and pabsw to > implement a fast iterator for bilinear scaling. This patch shows some really good performance for upscaling. In fact even better than I expected. And the trick with PABSW is very nice. > There is a graph here recording the per-pixel time for various > bilinear scaling algorithms as reported by scaling-bench: > > http://people.freedesktop.org/~sandmann/ssse3/ssse3.png I wonder if the discontinuity of the lines on the graph are caused by the calloc behaviour: http://lists.freedesktop.org/archives/pixman/2013-September/002884.html On my system, doing explicit memset or solid fill to the allocated memory (before starting the timer) resulted in generally lower measured times and less chaotic graphs. Also running the tests multiple times for each scaling ratio and selecting the best result seems to filter out even more measurements noise. > As the graph shows, this new iterator is clearly faster than the > existing C iterator, and when used with an SSE2 combiner, it is also > faster than the existing SSE2 fast paths except for the lowest scaling > ratios. > > The data was measured on an Ivy Bridge i7-3520M @ 2.0GHz and is > available in this directory: > > http://people.freedesktop.org/~sandmann/ssse3/ Just spotted one problem with the patch. Compilation in 32-bit mode fails with "undefined reference to `_mm_cvtsi128_si64'". Looks like _mm_storel_epi64 needs to be used instead of _mm_cvtsi128_si64. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
