On Wed, 13 Feb 2013 00:08:20 -0000, Søren Sandmann <[email protected]> wrote:
One thing that would help would
be to make downconversions do rounding instead of bitshifting, that is,
to convert from 8 to 6 bits, do

      v6 = floor ((v8 / 255.0) * 63.0 + 0.5)

but of course implemented with integer arithmetic:

      v6 = ((253 * v8 + 512) >> 10).

One question is what would be the performance impact of doing that on
ARMv6 rather than the current:

      v6 = v8 >> 2

And for the 5-bit components (red & blue), I assume we'd use

       v5 = ((249 * v8 + 1024) >> 11)

I've done some on-paper implementations of src_x888_0565 for ARMv6 for
the sake of comparison. I hasn't gone as far as worrying about register
allocation, but I reckon I can usefully say the bitshift version can
load, process and store 2 pixels in about 13 cycles (with 1 register
used to hold a constant), and the rounding version can do the same in
about 25 cycles (with 5 registers holding constants). At a guess
therefore, I'd say that when L1 cache constrained, it'd be about half
the speed; however when memory constrained, those sorts of differences
are likely to be overwhelmed by memory fetch times.

Hope that's useful,
Ben
_______________________________________________
Pixman mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pixman

Reply via email to