Keith Packard <[EMAIL PROTECTED]> writes:
> Around 17 o'clock on Dec 30, Owen Taylor wrote:
>
> > * Optimized software/hardware suppor for compositing (pix_format=xBGR
> > mask_format=Axxx), (pix_format=RGBx, mask_format=xxxA) with pix and mask
> > pointing to the same buffer. Probably people with better
> > initial choices of formats would like to see xRGB/Axxx
> > added at the same time.
>
> These optimizations are pretty easy to add in software; the code all
> linves in programs/Xserver/fb/fbpict.c. If you have some existing code
> which operates on the datatype you use, send it along and I'll just plug
> it in. Adding appropriate formats is easy as well. With code in hand, I
> can add this to XFree86 4.2.
I can certainly provide code. What's the requirement for the results
-- does it have to produce identical results to the slow-path code or
are other "equally accurate" results acceptable?
In particular, is it necessary to simulate the loss of
precision when pre-multiplying ... is:
Cd = ROUND8 ((Ca * Aa + ((Cb - Aa) * Aa)) / 255)
acceptable, or is it necesary to do:
Cd = ROUND8 ((ROUND8(Ca * Aa) * 255 + ((Cb - Aa) * Aa)) / 255)
(I hope the notation here is clear). It seems to me that the RENDER
protocol spec perhaps needs a bit more specification about compositing
results ... the version I have defines precisely conversion from
integer components to real components, but doesn't say anything about
going the other way, or about acceptible errors as compared to the
"exact" results.
> > * Some way of compositing source data from memory without risk
> > of round-tripping it to the video card first, such as a
> > CompositeImage request
>
> I've thought quite a bit about how such a request would look; the main
> problem is that there are two source operands to the composite operator
> and I'd like to allow either (or both) to be provided over the wire. I'd
> also like to support image compression, likely using the basic compression
> stuff from PNG. How about magic XIDs for the existing composite operator
> which indicate that the data are inline (and, optionally, compressed)?
This sounds like a reasonable way to avoid an explosion of requests.
> The alternative is to add several new op codes, one for each combination
> of data. As you've noticed, it's nice to allow both source and mask
> data to come from the same image in different formats; that will mean
> additional magic is require to avoid transmitting the A separately from
> RGB in your non-premulitplied environment.
Here "noticed" is defined as "you told me about it" ;-).
I'm not completely 100% happy about the trick as compared to having
non-precomposited formats, since it means one can't use the mask for
other things (overall alpha, or simulating stipples), forcing the use
of intermediate buffers in some cases, but it should work well for
what I'm doing now.
> The alternative seems to be to continue usin PutImage and to make the
> server smarter about video memory managment for pixmaps. One should
> avoid migrating image data across the AGP bus except when necessary.
>
> A scheme I've used in the past is to allow pixmaps to exist either or both
> sides of the bus; data are migrated when necessary (like copying to a
> window). Rendering can occur on either side of the bus, and possibly both
> depending on where the data are actively used. Ref US patent 6,006,238;
> (fortunately, that patent was carefully worded to refer only to networked
> environments (one does try)).
Yeah, this seems like an area with some fertile possibilities. At a
minimum I'd expect at a minimum some simple measures like:
- Never put a pixmap into video ram until we do some operation
on it where we get a benefit from having it video ram
- Kick inactive pixmaps out of video ram after a time
Would help a lot. Some cases may be pretty intractible. GTK+'s typical
drawing pattern is:
(A) Allocate a new pixmap
(B) Clear it to a solid color
(C) Draw some lines, RGB picures, and text on it
(D) Blit it to the screen
If we have hardware accelerated RENDER, doing everything with the
pixmap in video RAM works well. If not, how do you avoid pulling the
solid color data back from the video card from (B) to (C)?
You can do sophisticated things like remembering that portions of the
pixmap are solid colors, but I'm not sure that is going to scale to
slightly more complicated scenarios.
Not to be defeatist... I'm sure a lot of improvement is possible, but
getting fancy might risk spending a lot of work to speed up app A, and
slowing down app B in the process?
> > We can represent not premultiplied data in the framework
> > of RENDER by using the data both as source and mask, though
> > that's not possible with XFree86-4.1 and and seems to be
> > unacceleratd/optimized with current CVS.
>
> Current CVS should have all of the operators working correctly in all
> modes; that's made possible by having really general code at the bottom
> that catches all of the "uncommon" cases. It takes four or five function
> calls per pixel, and a bunch of other activity. With an infintely fast
> CPU, that should be the same speed as your custom code.
Know where I can get pick one of those infinitely fast CPUs? ;-)
Worrying about memory accesses sounds like a sound strategy, but my
experiments indicate we're still quite a ways from being able to
composite at memcpy() speeds (in C anyways) even with inline code.
> most of that code tries to touch memory only when needed. Until
> then, we'll need to special case the GTK formats.
[...]
I'm about to check in my new compositing code to GTK+ over the
next few days. Once I do that and make sure I'm happy with
it, I'll send you the relevant special case compositing routines.
Regards,
Owen
_______________________________________________
Render mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/render