Around 19 o'clock on Jan 1, Owen Taylor wrote:

> I can certainly provide code. What's the requirement for the results
> -- does it have to produce identical results to the slow-path code or
> are other "equally accurate" results acceptable?

The intent of the specification is to require that all implmentations
produce the same results.  I'd like to leave this in the spec as it makes 
applications get the same results on all target drawables, which seems 
like a nice property.

As an anecdote about performance -- I was configuring my new laptop and
performed some tests on text performance.  The sub-pixel text is less than
10% slower than regular AA text while performing some 4 times as many
operations per pixel. I know that slow machines exist in abundance, but the
speed of the AGP/PCI bus will dominate software rendering making concerns
about the number of instructions executed much less relevant than in a pure
memory rendering environment.

>  (A) Allocate a new pixmap
>  (B) Clear it to a solid color
>  (C) Draw some lines, RGB picures, and text on it
>  (D) Blit it to the screen
> 
> If we have hardware accelerated RENDER, doing everything with the
> pixmap in video RAM works well. If not, how do you avoid pulling the
> solid color data back from the video card from (B) to (C)?

One of the keys is to allow the data to exist on *both* sides of the bus; 
the data slowly "migrates" from one side to the other as the need becomes 
clear.  This avoids thrashing the bus while still eventually getting data 
on the "right" side.  As the CPU and graphics engine can draw in parallel, 
there's little need to push things across for simple drawing operations 
like these.  One maintains a global as well as local history of operations 
to get an idea of what current application are doing to try and tune 
things appropriately.  We've got plenty of computrons, there's just this 
thin hose to move the bits around with.

> Worrying about memory accesses sounds like a sound strategy, but my
> experiments indicate we're still quite a ways from being able to
> composite at memcpy() speeds (in C anyways) even with inline code.

I thought your experiments showed that doing the compositing operation in
two discrete steps (premulitply, PutImage, Composite) was essentially half
the speed of doing it all at once; that seems to indicate that the limiting
step remains the memory references.  Of course, the key will be to get the
compositing operations to be done by the graphics hardware.  Given the
amount of work needed, I suspect that will probably take at least another
year for the core XFree86 drivers.

> I'm about to check in my new compositing code to GTK+ over the 
> next few days. Once I do that and make sure I'm happy with
> it, I'll send you the relevant special case compositing routines.

Ok; I'll be in Durham most of next week; it'll likely wait until after 
that before I get things into XFree86 CVS.

Keith Packard        XFree86 Core Team        Compaq Cambridge Research Lab


_______________________________________________
Render mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/render

Reply via email to