Around 19 o'clock on Jan 1, Owen Taylor wrote:
> I can certainly provide code. What's the requirement for the results > -- does it have to produce identical results to the slow-path code or > are other "equally accurate" results acceptable? The intent of the specification is to require that all implmentations produce the same results. I'd like to leave this in the spec as it makes applications get the same results on all target drawables, which seems like a nice property. As an anecdote about performance -- I was configuring my new laptop and performed some tests on text performance. The sub-pixel text is less than 10% slower than regular AA text while performing some 4 times as many operations per pixel. I know that slow machines exist in abundance, but the speed of the AGP/PCI bus will dominate software rendering making concerns about the number of instructions executed much less relevant than in a pure memory rendering environment. > (A) Allocate a new pixmap > (B) Clear it to a solid color > (C) Draw some lines, RGB picures, and text on it > (D) Blit it to the screen > > If we have hardware accelerated RENDER, doing everything with the > pixmap in video RAM works well. If not, how do you avoid pulling the > solid color data back from the video card from (B) to (C)? One of the keys is to allow the data to exist on *both* sides of the bus; the data slowly "migrates" from one side to the other as the need becomes clear. This avoids thrashing the bus while still eventually getting data on the "right" side. As the CPU and graphics engine can draw in parallel, there's little need to push things across for simple drawing operations like these. One maintains a global as well as local history of operations to get an idea of what current application are doing to try and tune things appropriately. We've got plenty of computrons, there's just this thin hose to move the bits around with. > Worrying about memory accesses sounds like a sound strategy, but my > experiments indicate we're still quite a ways from being able to > composite at memcpy() speeds (in C anyways) even with inline code. I thought your experiments showed that doing the compositing operation in two discrete steps (premulitply, PutImage, Composite) was essentially half the speed of doing it all at once; that seems to indicate that the limiting step remains the memory references. Of course, the key will be to get the compositing operations to be done by the graphics hardware. Given the amount of work needed, I suspect that will probably take at least another year for the core XFree86 drivers. > I'm about to check in my new compositing code to GTK+ over the > next few days. Once I do that and make sure I'm happy with > it, I'll send you the relevant special case compositing routines. Ok; I'll be in Durham most of next week; it'll likely wait until after that before I get things into XFree86 CVS. Keith Packard XFree86 Core Team Compaq Cambridge Research Lab _______________________________________________ Render mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/render
