On 15 May, Simon Thum wrote:
>
>
> On Mon, 13 May 2002 [EMAIL PROTECTED] wrote:
>
> This actually is a correct and fast implemetation, because the steps you
> mentioned already went into the computation of the XDCCC tables. At least
> they should, and a table tookup is nothing I'd call complicated or slow.
>
Slow, not complicated. Using tables you need to do the following
sequence once per color channel per pixel:
Voltage Alpha Correct Alpha
Convert 8-bit to 32-bit ( A )
Convert 8-bit to 32-bit ( B )
8bit compute A*alpha 32bit compute A*alpha
8bit compute B*(1-alpha) 32bit compute B*(1-alpha)
8bit add 32bit add
Convert 32-bit to 8-bit (result)
I assumed table lookup for the first two steps, and accepted the use of
somewhat inaccurate mathematical tricks to speed the second. Even so,
this is a lot more computation per pixel per channel. The 8bit
computations correspond to hardware operations that are available and
work in parallel on both video cards and vector capable processors. With
an x86 processor these computations are only 1 cycle/pixel, so other
loop related computation and memory access time is significant.
The accurate computations will take several cycles for color per pixel.
The first table lookup will probably hit L2 or L3 cache, but subsequent
use should be fast. The inverse 32-bit to 8-bit will not optimize as
well. Several cycles is really optimistic for it. So the compositing
operation will be about 10 times slower. The final conversion from
32-bit to 8-bit is the step where I do not see a simple fast
implementation. If someone does not find one, the 10x becomes 30-50x.
R Horn
_______________________________________________
Render mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/render