On 9/12/06, Attila Kinali <[EMAIL PROTECTED]> wrote:
On Sat, 9 Sep 2006 12:38:17 -0400
"Timothy Miller" <[EMAIL PROTECTED]> wrote:
> It has the advantage of being able to, for zero cost, apply alpha
> blending, rops, and planemasks directly to image uploads.
> Essentially, a graphics memory write can be diverted through the
> drawing engine where it becomes a fragment and can therefore have any
> drawing engine feature applied to it. (There are limits for cases
> where the fragment has an address but not coordinates.) The drawback
> is that writes have a huge latency... if you ever want to read the
> word back, you have to know what you did and flush the engine pipeline
> before you try to read it back.
Assume for now, that we do not provide any read back to host memory.
It makes our life much simpler :)
Not host memory. Graphics memory.
You drop a YUV into OGA. It's in convert-YUV-to-RGB mode, do it
diverts the word though the drawing engine.
Now, the host decides it wants to read the address in graphics memory
where it had tried to write the YUV value (knowing, of course, that it
would get an RGB back instead). The host would have to wait until the
YUV had made its way all the way through the drawing engine. The
synchronization/coherency for this particular path is not automatic.
This is true because it's true about all GPU activity. But we're used
to that. If you tell the GPU to do a bitblt, it's going to run in
parallel with the CPU, and we know that if we read part of the
graphics memory before the bitblt is finished, we're going to get
stale data.
What's odd here is that a graphics memory write is being implicitly
converted into a GPU operation. We're not used to that implicit
conversion, so we have to be aware of that in this case.
> We can move the YUV/RGB logic into the engine where we can send YUYV
> for one scanline, then change modes (just drop a configuration write
> down the pipeline) and provide an offset where we provide YYYY and
> have the GPU read memory from an appropriate offset back from where
> we're writing. You could alternate YUYV and YYYY, or you could do all
> YUYV at once and then interleave the YYYY in there.
>
> In this case, since we're storing as RGB, we'd have to sneak U and V
> into the alpha channel bits of image being uploaded. So alternating
> pixels would be stored as URGB and then VRGB. If you want to apply an
> alpha blend to the video data when it's being composited onto the
> screen, you can provide it as a constant in the texture unit.
I would not mix in YUYV, that format is hardly used anymore
and IMHO makes our life harder than simpler. You'd have to be
very carefull in associating the correct pixel with the correct
converter to get the righ pixel values in the end.
YUYV is, you might say, an intermediate format for even (odd?)
scanlines when doing the conversion from YUV where every pixel has Y,
but 2x2 blocks of pixels have U and V to share between them. I don't
know the meanings of 4:4:4 or 4:2:2 or 4:2:0. I can never remember
them. But one of them is what I described, and it's what someone on
the list said we'd have to deal with.
The way we mechanically have to deal with it is to have Y come with U
and V for half of the scanlines but come WITHOUT U and V for the other
half.
So YUYV isn't the format. It's an encoding that we could accept. For
all we care, it could be YYUV. Or UVUV, where Y comes later. In the
latter case, you send |X|/4 UVUV 32-bit words. That's |X|/2 encodings
of U and V together, and they apply to pairs of scanlines. Later,
when you send the Y's (one for each pixel), the GPU knows where to go
to fetch the U and V values sent earlier.
One way to handle this is to use the texture unit. You're going to
fill a rectangle with pixels. First, you upload all of the U and V
values to a texture. When you're doing that, the GPU is implicitly
storing UVuv words as pairs of words, XXUV and XXuv. What you create
is a 1/4 scale (1/2 x 1/2) image that has U values stored in G and V
values stored in B.
Next, you set up the texture unit to work in a sampling mode where it
blows up the image by 2 on each axis when it's read back. Now, for
each 2x2 block of pixels, you get the same U and V values.
Next, you start uploading Y values in YYYY format. The GPU implicitly
converts them to XYXX format pixels where Y is stored in R. By
splicing, the texture unit puts the pixels back together in YUV
format, to be converted to RGB in the pipeline. You could also do
AYay format, where you get alpha channels.
If you do your offsets right and put the texture unit into an
interpolating mode, you can get the texture unit to smooth out the U
and V values. In this case, U and V values are exact for even
scanlines (and pixels) but interpolations for odd scanlines (and
pixels).
One of the benefits of using the drawing engine is that we may be able
to build it so that alignment isn't restricted to 4-pixel boundaries
for YYYY units. It's simpler WITH the restrictions, so if we're
always uploading to a back buffer, we'll just leave it that way.
Currently i propose to do the whole think in 3 steps:
1) When the image data enters our card, do a horizontal
upsampling. This is fairly easy as the data enters
line by line, and each line will be complete. The only
difficulty will be that the pci transferes will not be
continous and that some other transfere might sneek in.
So we need to safe some state at the end of each transfere
operation and compare that state against the next operation
to ensure that we are working with the correct data.
Image uploads can go in two ways. One is just a linear memory copy.
The other is linear out of host memory but gets pushed as fragments
through the drawing engine, so it can be rectangular.
The rasterizer is put in a mode where it's counting coordinates, but
it only counts when a pixel comes in. So you set up a rectangular
area to fill and then start pushing pixels in which are taken to be
the primary fragment color (rather than a computed one).
There is state here, but the rasterizer isn't "busy" while waiting on
pixels. The pixels coming in are events. If you decide to stop, you
can just change the state of the rasterizer and do something else.
This way, you can keep track of the state and then restore it later.
2) read the image out, but not in the usual order, but
in vertical lines. perform vertical upsampling if neccessary.
Advantage is that we have to read the data only once, when
we will read it anyways. Drawback is that we get the data
in an order we don't want to (vertical lines instead of
horizontal lines).
I'm not sure I get this part. What's vertical? If you're thinking
about what I'm thinking about, it's just a matter of using the texture
unit to do math on coordinates and pixel values read from the texture.
You just blow up the texture by factor of two in each dimension.
3) perform YUV->RGB conversion. Now that the data is upsampled,
this is very easy and just a matrix*vector multiplication.
> A point not to be lost here is that we're not wasting any memory
> bandwidth by storing unnecessary YUV in the framebuffer. It's an
> on-the-fly conversion, and it's one-way.
Who cares about our memory usage ;)
BTW: I started a discussion on ffmpeg-devel about this conversion
stuff, to see what is really important:
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2006-September/044690.html
If you want to join the discussion, please get the mails from the mbox
at http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/ so that threading
isnt broken.
Currently the main points that came out are
* Different YUV->RGB coefficients aren't that important
* Correct upsampling from 4:2:0 and 4:2:2 to 4:4:4 is important
We can do each of these different formats by changing up we up-sample
the texture. We can blow it up horizontally without doing so
vertically or vice versa. And we can interpolate or not.
* Only 4:2:2 and 4:2:0 are important
* Only 4:2:2, 4:2:0, 4:1:1 and 4:1:0 are relevant, but 4:1:x can be left out.
I'm not familiar with all of those formats.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)