On Fri, 6 Jul 2012 00:08:46 +0300 Siarhei Siamashka <[email protected]> wrote:
> On Thu, 05 Jul 2012 12:25:55 -0700 > Bill Spitzak <[email protected]> wrote: > > > On 07/05/2012 12:22 AM, Siarhei Siamashka wrote: > > > > > Separable scaling is good idea, but it is not a silver bullet. > > > Downscaling is still a valid use case, and separable scaling would > > > provide no reduction for the number of arithmetic operations for > > > it. > > > > This is not true. In fact 2-pass scaling is a much *bigger* win for > > downscaling if you use filters of size greater than 1. And filters > > larger than 1 are absolutely required to fix the artifacts that are > > in current downscaling. > > Looks like you got it backwards. Let's do some calculations. For the > sake of simplicity, let's assume that MxM image is scaled to NxN size. > > In the case of single pass bilinear scaling, we fetch 2x2 pixel block > from the source image for each destination pixel (total N * N times) > and perform 1 vertical interpolation followed by 1 horizontal > interpolation for it. This is total "N * N + N * N" interpolation > operations (let's ignore the fact that horizontal interpolation is a > little bit more expensive). Actually I stand corrected. If we are fetching 2x2 pixel block, then we need to interpolate 2 pairs of pixels first in one direction. And then do one more interpolation in another direction. So it is "N * N + N * N + N * N" interpolation operations in plain C. My mistake was getting too much reliant on SIMD. And at least for 128-bit wide SIMD, it does not matter whether we are interpolating one pair of pixels or two pairs at once. > In the case of 2-pass scaling, typically all the pixels from the > source image are participating in calculations (unless M >= 2 * N, > which is 2x+ downscaling). So the number of horizontal interpolation > operations is M * N (each source scanline gets horizontally scaled > to size N and cached). The number of vertical interpolation > operations is N * N (we are doing it for every destination pixel > on the second pass). So the total number of interpolation operations > is "M * N + N * N" for 2-pass bilinear scaling. > > Now we only need to compare "N * N + N * N" with "M * N + N * N". In fact the comparison turns into "3 * N * N" vs. "(M + N) * N" operations, which is clearly more favourable for 2-pass processing in plain C for a wide range of scaling factors. > In the case of downscaling (N < M), 2-pass scaling needs more arithmetic > operations. Now let's remember that horizontal interpolation is more > expensive (horizontal weights need to be recalculated every time) and > storing/reloading data between passes adds more overhead, which makes > the outcome even worse for 2-pass scaling. > > > Single-pass bilinear is *not* a big loss for upscaling, because as > > you noticed there is overhead of the intermediate values. It is > > pretty much equivalent to stretching the image in one direction, and > > then in the other. You need to store this intermediate stretched > > image, which is bigger than the original. > > Increasing the scale factor for upscaling is going to eventually let > the 2-pass algorithm outperform its single pass counterpart. There > must be a crossover point somewhere, and it can be used to make a > decision which algorithm to select for better performance in each > particular case. Still the underlying instruction set does matter a lot. We can't simply ignore the existence of wide SIMD instruction sets. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
