On Sun, Jul 1, 2012 at 12:25 AM, Søren Sandmann <[email protected]> wrote: > Siarhei Siamashka <[email protected]> writes: > >> On Tue, Jun 26, 2012 at 6:13 AM, Jeff Muizelaar >> <[email protected]> wrote: >>> We recently switched to using the same precision as Skia on >>> Android. (4 bits for RGBA32 and 2 bits for RGB565) >> >> This is a good point. If using just 4 or even 2 bits of interpolation >> precision is acceptable for Skia, then maybe the current bilinear >> interpolation precision is really excessive in pixman. It would be >> too generous to give a handicap to Skia :) >> >> So should we also try 4-bit interpolation? Compared to 7-bit or 8-bit >> interpolation, it has an advantage that all the calculations need >> only 16-bit unsigned variables. This is faster for at least C, x86 >> SSE2 and ARM NEON code. > > Four bits is also the precision that GdkPixbf has used for many years > with no complaints from users that I'm aware of. If you zoom more than > 16x with GdkPixbuf, banding artefacts definitely start showing up, but > zooming that far is unlikely to look good with bilinear filtering > anyway. So dropping to four bits doesn't sound too bad to me.
OK, we'll see how much performance can be gained by going to lower precision. There is only way to find out: implement different variants and benchmark them against each other. > That said, I'm a little concerned that nobody is trying separable > scaling Well, you are trying. So this does not qualify as nobody ;) > instead of worrying about these microoptimizations. It's not instead, but in addition to. All these optimizations are independent from each other and can be used together with some minor adaptation. Separable scaling is good idea, but it is not a silver bullet. Downscaling is still a valid use case, and separable scaling would provide no reduction for the number of arithmetic operations for it. Also x86 SSSE3 and ARM NEON add some extra challenges: * Using 8-bit multiplications for horizontal interpolation is difficult as the weight factors need to be updated for each pixel. Single pass scaling can easily use 8-bit multiplications for vertical interpolation as the weight factors are pre-calculated before entering loop. * Separable scaling needs extra load/store instructions to save temporary data between passes * When we are approaching the memory speed barrier, the separation of operations into passes may result in uneven usage of memory subsystem. Still, for example on ARMv6 without real SIMD, it seems to be difficult to implement both vertical and bilinear horizontal interpolation in one pass. There are just not enough general purpose registers to sufficiently unroll the loop to get rid of pipeline stalls and do this without spilling temporary data to memory or reloading constants. So the support for separable scaling is a very welcome feature for microoptimizations. There is no need to put all eggs into one basket. Having multiple scaling methods available should not be a problem, each tuned for different use cases and different target architectures. > As far as I know Skia does this as well, as does pretty much everybody > who cares about fast scaling. > > That is, the two closest source lines are scaled horizontally, the > result is cached, and then the intermediate destination lines can be > generated by just interpolating vertically. This cuts down the amount > of arithmetic required quite substantially, especially for scale > factors larger than 2. > > Here is an example. > > http://cgit.freedesktop.org/~sandmann/pixman/log/?h=separable-bilinear > > Performance results for fishtank with SSE2 and MMX disabled: > > Before: > [ 0] image firefox-fishtank 603.804 603.804 0.00% > > After: > [ 0] image firefox-fishtank 535.321 535.321 0.00% > > And this is with fishtank, which is exclusively downscalings so each > line is reused at most once. Looks good to me. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
