On 7/13/06, Patrick McNamara <[EMAIL PROTECTED]> wrote:
Hrm, now I'm pretty sure I'm being dense. I can't even do math properly. You
are correct, there are 8 pixels per 256bit memory word. 8 bits per byte x 4
bytes per pixel * 8 pixels does in fact equal 256 bits. Imagine that. :)
----- Original Message ----
> From: Timothy Miller <[EMAIL PROTECTED]>
> To: Patrick McNamara <[EMAIL PROTECTED]>
> Cc: Open Graphics Project List <[email protected]>
> Sent: Thursday, July 13, 2006 5:55:09 AM
> Subject: Re: Questions about the video controller...
>
> On 7/12/06, Patrick McNamara <[EMAIL PROTECTED]> wrote:
>> Each instruction takes four pixel clocks, or put another way, we clock
>> out four pixels per instruction, correct?
>
> Yeah, at this point, we're pushing out 128 bits per clock.
That would be video controller instruction clock, not pixel clock, correct?
Pixel clock is effectively instruction clock time four, yes?
Yes, although some of the video devices take multiple pixels in
parallel. For instance, dual-link DVI. But I'm just picking at nits.
At some point along the way, there is a clock that is 4x the one
we're using.
>
>> Is there a valid video mode that would require a back porch less than
>> four pixel clocks wide? How about eight? Better yet, how about an odd
>> pixel width for the back porch, front porch, or either blanking interval?
>
> I've never seen anything less than 8, and certainly nothing odd.
>
Good enough. Does anybody have a reference they could point to for the
expected timing values for value video modes?
Old Xfree86 config files list lots of modes. I'm not sure where they
list the VESA modes now.
I found this on the web:
http://stuff.mit.edu/afs/athena/system/sun4x_510/os-9.4/usr/X/server/etc/pgxresinfo
This is a file that's part of a Tech Source PGX32 driver package,
based on VESA modes (being a VESA member, they can do that). The fact
that that file is available on the web may or may not be a violation
of Tech Source copyright, although anyone with a PGX32 card can see
it. But in any case, since it's there, have a look.
>
>> For a given instruction what is the skew between instruction execution
>> and flag assertion? Will the pixel data have an equal skew? In other
>> words, will the vsync and hsync pulse edges align with the pixel pulse
>> edges. Does it even matter as long as it is within the timing constraints?
>
> It matters, especially for de. And I'm just delaying the syncs in a
> shift register to keep them lined up.
>
What this means is that the output to the display device will be delayed some
time T from the percieved execution time of the video controller. This may
sound like stating the obvious since we all know that logic is not
instantaneous, but I was thinking more along the lines of T possibly being one
or multiple pixel clocks, even perhaps multiple instruction clocks. This
really shouldn't matter but I figured it was worth stateing.
Well, it's pipelined, but as long as all information flows down the
pipeline together, it's all fine. Besides some pipelining in the
video controller, and the cursor logic, there's some logic to change
clock domains to talk to the DVI transmitter or DAC, and then that's
pipelined, and then there's latency in the DVI channel, and then any
amount of pipelining in the monitor circuitry. And the negative
effect of this is... nothing. :)
>>
>> I assume that we can be fetching a new scanline while we are outputting
>> the current one? In other words, the following code would not corrupt
>> the pixel data (I'm not worried about the syncs here.
>>
>> FETCH 640 ;Fetch a scanline. The data is not completely valid for
>> about 320 instruction cycles
>> DELAY 640 ;We will delay a full scanline just to make sure the data is
>> ready
>> FETCH 640 ;Now, start the 2nd scaline fetch.
>> SEND 640 ;Start output of the first scanline
>
> There's a fifo that's supposed to get hooked up to this that keeps
> everything in order. It might be good to send out a reset signal
> every frame to the fifo just to make sure we never get junk data in
> there.
>
> Anyhow, the sample program fetches one scanline early, which is
> critical, since memory latency is not predictable.
And that is what I was getting at. We have to fetch one scanline early,
meaning that we have to be able to start the fetch of scanline n+1 right before
we start outputting scanline n. I think it may be that there is a bug in the
original example that made things confusing.
Possibly. I've already found a couple of minor bugs...
vactive:
FETCH [+v,-h,-d] #1 ; last 4 pixels of hsync <--- Should this be #640?
It should be pixels/8. (80 for this)
NOOP [+v,+h,-d] #2 ; back porch
SEND [+v,+h,-d] #640 ; active period, but vblank
And this should be 160.
NOOP [+v,+h,-d,ret] #2 ; front porch
If the answer to the above question is yes, then we are in agreement and I was
understanding things correctly as that would start the fetch of the next line
of pixel data prior to displaying the current one.
Yeah. You're right. Also, the count in the FETCH instruction is not
the sort of count in others. It's a regular number (1 means 1), and
the fetch process goes on in parallel the processor. So to us, the
FETCH instruction takes the time of 4 pixels, but the data isn't
available until later.
That part of it I get. My question is this: Do we continue clocking out valid
pixel data even when we are not executing a send instruction? If the answer is
no, the we cannot send 8192 pixels per line since we would have to break it up
into two send instructions and two fetch instructions. Since you would have to
interleve them you would have a four pixel clock period when we were not
clocking out valid pixels out, but the display device was expecting them. Not
that an 8192x2048 display is exactly small or anything. :) If we do continue
outputting the contents of the pixel fifo for commands other than send, we need
to take that into account in the programs.
You're right. The max number of pixels you can request to fetch at a
time is (I think) 4095*8. And since there's only one fetch machine,
starting another fetch immediately after would screw it up. There
would have to be a small queue of requests, so we could do them back
to back.
Breaking into two SEND instructions is NOT a problem. Inserting a
FETCH instruction in the middle of a scanline IS a problem. In
intermediate stages of our video pipeline, whenever you're not doing
SEND, the pixels coming out are whatever is at the output of the video
data fifo. You're just not dequeueing it.
One idea occurred to me: Instead of having a SEND instruction, make
bit 24 mean SEND, so we could send at the same time as some other
instruction.
I just wonder if 32760 pixels (more usefully, 16384) isn't enough.
> They're counters being fed to a separate block. I'll check that into
> SVN, but I also posted it to the list. The cursor block shifts the
> cursor data to 1-pixel boundaries and overlays it on the video stream.
> The cursor overlay is a seperate pipeline, and the cursor counters
> are just a convenience; I could infer them from the syncs.
I'll have to go dig into the cursor overlay code. I guess my question was "When would we want to set
these flags in a video controller program?" And given that the "when" might change from frame
to frame, "How would we detect that we were at the correct point to assert the flags?" Right now,
it appears to me that those flags will never be used in video controller program.
Which flags? The cursor inc flags? We'd just set them in some
reasonably appropriate instruction.
Anyhow, the way it works is we have config registers that have the
initial counter values. When the cursor flags indicate "reset", then
the counter resets to that config value. When the flags mean to
increment, they increment. The cursor block is then designed to check
when the cursor coordinate values are within range of the cursor.
Changing the config registers changes the position of "in range".
> I'm not planning support for 1-bit, and in fact, I'm probably going to
> rip the 8 and 16 out for OGA, since the engine doesn't support
> anything other than 32.
That is something we need to be aware of then. If the frame buffer is always
32bpp then any interaction with that frame buffer will have to be translated on
the fly. I'm specifically thinking about VGA and VESA modes here, but the
problem would exist for X running in something other than 32bpp as well. This
also limits the output devices we can support. I know it's probably unusual to
run across a display that is not capable of accepting 24bit color input, but in
the embedded space when you may be using lower end LCD panels that is something
we need to factor in. I certainly don't have a problem with using fixed 32bpp
internally, we just need to make sure that we can still output to any device we
could ever conceivable want to talk to.
I'm planning on doing format conversions somewhere between the host
interface and the memory controller. In the 8-bit mode, for instance,
a 32-bit PCI write gets each byte replicated out four times, so ABCD
turns into AAAABBBBCCCCDDDD, and then that's how we write it to memory
(with a planemask to disable writing of some of those bytes). On
reads, which A, B, C, and D are extracted from this 128-bit word is
selectable.
The VGA mode would not have to have an efficient memory format--just
one that's convenient for the requirements of VGA.
>> 854x480 -- Wide screen 480p
>> 1365/66x768 -- Wide XGA, 768 line format
>
> I don't know how most video controllers would be that specific. I bet
> the monitors are designed to handle some padding on the right. We
> should research this.
>
Browsing through all various resolutions I could find on the net the two above
were the only ones that I could find that were not evenly divisible by eight,
so perhaps it is not a big issue.
Going back to an earlier question, are all the various sync timings going to be
visible by eight?
I think so, but I can't say 100% for sure.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)