On 5/27/05, Alexander van Heukelum <[EMAIL PROTECTED]> wrote:
> On Thu, 26 May 2005 13:51:11 -0400
>  Timothy Miller <[EMAIL PROTECTED]> wrote:
> >On 5/26/05, Alexander van Heukelum <[EMAIL PROTECTED]> wrote:
> >
> >> Hi!
> >>
> >> Would it be an option to do all drawing in a line-by-line fashion?
> >
> >What do you mean?  That video memory would be accessed linearly?  That
> >is the general idea.
> 
> Ok. I was thinking about the nanocontroller just being in charge of
> setting a small set of configuration registers for drawing each line,
> then submitting the start address for drawing a complete line.
> 
> The 'original' vga model basically splits naturally in two parts: on the
> one side there is the cpu accessing the configuration registers and the
> memory window (and thereby possibly changing the contents of the vga
> buffer) and on the other side there is the translation of the vga
> buffer, via the palette and the character generator, into the displayed
> picture.
> 
> I think we agree that the first part can be done with a nanocontroller.
> Translating the vga buffer to (I imagine) a texture that can be
> displayed by the main graphics pipeline may be too slow. So I imagined
> that the nanocontroller would load the necessary information into some
> dedicated ram in the fpga, where the last part of the translation
> process would be done by dedicated hardware line-by-line...

We could make some sort of SIMD logic for the translation that's
controlled by the processor.  But I think we can absorb the latency by
splitting read instructions.

> 
> >> For
> >> the bios graphics modes, each line occupies more or less contiguous
> >> memory in the vga buffer (worst case is 1280 bytes per line for
> >> mode 13h, because it's only using every 4th byte of every vga plane).
> >> For non-bios modes (including the win95 splashscreen), the memory
> >> access would sometimes need to be split in two because of strange
> >> address wraps.
> 
> (correcting myself: for the win95 splashscreen, the stride of 4 bytes is
> disabled, and then no 'strange' memory wraps occur.)

Do we need to know more about this?

> 
> >> For text modes, caching the character set(s) would be
> >> beneficial. Worst case here is 512 characters of 32 bytes each: 16kb
> >> total. Could this amount of memory be allocated on the fpga?
> >
> >There's no reason to store the VGA memory on-chip.  We're fast enough
> >that we can put it in graphics memory.
> 
> If no vga memory is stored/cached on the fpga, the worst-case scenario
> for line-by-line drawing is textmode; the character/attribute accesses
> are perfectly cachable (heh: "perhaps you mean: catchable, cashable"),
> but for each character one needs to look-up a byte in the font table,
> for every line: 400 lines * 80 chars/line -> 32000 uncachable fetches...
> For the sake of hard numbers: say we run at 100 MHz, and we want to be
> able to do 70 frames per second: 44 cycles per memory fetch. This seems
> doable; however translating this to a set of pixels with the
> nanocontroller in reasonable time seems undoable.

Some of those cycles are fifo overhead, and some are memory row
misses.  If we have to use prefetch instructions, then we have to do
some loop-unrolling anyhow, and lots of latency will be sucked up.

> 
> Thinking a bit about this, instead of drawing truely line-by-line, it
> would be possible to have the nanocontroller set some state of a pixel
> pipeline, and submit vga buffer addresses... Each vga buffer address
> contains information for 4/8/9(/16/32) pixels, but the interpretation
> depends on the video mode.
> 
> For textmode the nanocontroller would need to fetch a
> character/attribute from memory (cachable), translate it into a
> font/line-address (uncachable->needs good pipelining) and submit
> foreground color, background color, whether "the 9th pixel is equal to
> the 8th pixel, or blank" and the vga address containing the line of the
> font to the pixel pipeline. For graphics modes one would just submit the
> next vga address, using the same pipeline as for text modes.

I'm not sure I totally follow you, but we could do something like
this, perhaps.  Also, what good is the 9-pixel thing?  Does it only
make things look better?  The dump it.  We'll do 8-pixel only.  I've
done it before.  It looks fine.

> 
> Setup of the pipeline contains information like "x", "y", the
> pixel/shiftregister mode (ega/cga/"256-compatible"), graphics blinking
> state, and the number of pixels to be drawn 8/9/16/32. Underlining,
> text blinking and cursor support can be done by the nanocontroller by
> setting foreground and background appropriately; horizontal pixel/byte
> panning can be done by setting an appropriate starting value for "x". Do
> we need support for blinking in graphics modes?

I don't think there's blink in graphics mode.  And I agree with just
changing the colors for blinking.

> 
> All of the above assumes that the EGA/VGA palette is implemented
> completely in hardware, and that overscan color is either completely
> ignored, or handeled seperately by the main pixel pipeline.

I think we can skip the overscan.

> >I think, however, that even done poorly, the translation code will be
> >fast enough, but if every instruction involves a lookup to execute,
> >that'll come out too slow.
> 
> Basically I have no idea how ram works, but I think there are two
> important questions one must ask:
>  - If one submits random read requests and handles them in a fully
> pipelined way, how often can you then get a result (I guess that is:
> what is the throughput, measured in units of optimally-sized fetches per
> second, of reads at random addresses)?

If a read causes a row-miss, that's an 8-cycle penalty at 200MHz.  Row
hits are faster then you can use them.  :)

>  - If one submits such a random read request, and one has to wait for
> the result; how long does that take (maximum latency)?

Here, you have to assume a row miss (it may not happen) and fifo
latency overhead.

> The latter would be of the order of 200 nanoseconds, I assume. The
> former is quicker than 5000000 per second (Your answer seems to imply
> this...). On the other hand, my trusy matrox millenium II only runs
> textmodes up to a pixel clock of 66MHz, before all kinds of funny things
> happen to the text; this would point to a limit of about 7300000 reads
> per second...
> 
> The above are notably different from the question that the
> hardware-manufacturers like to answer (would that be you in his case ;)
> ): If one reads from memory in a linear fashion, without any
> interruptions, how many Gigabytes per second could one read?

6.4

> >In order to make memory reads efficient enough for the nanocontroller,
> >the read instructions should be split in two.  One instruction (like a
> >prefetch) requests the memory, and another pulls it out of a FIFO.
> >You can then write the code to absorb some of the latency.
> 
> That sounds nice... *ugh*

It may be necessary.

> 
> Could the nanocontroller run two 'programs' with timesharing (avoiding
> the word concurrently here!)? Such that one program is waiting for a
> memory access to complete, while the other one is 'working'. The idea
> being that one can handle the cpu->vga and cpu->register requests, and
> the other one do the vga->output translation.

Yes.

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to