On Mon, Jul 21, 2008 at 1:41 PM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
>
> Is there a need for HQ to switch the bypass?  Else, I'd say leave it to
> the driver, as it probably needs to know about the bypass state anyway.

I can't think of a case where it wouldn't be dangerous, so we can
leave out that capability for now.

>> It could actually be challenging.  A row of characters is 160 bytes,
>> or 40 words.  Since that's not an even multiple of 128, the code that
>> requests reads will have to be designed to figure out where to split
>> the request, and in as few instructions as possible.
>
> Do you mean that these 40 words will be stored in chip memory, rather
> than being transferred from PCI directly to HQs BRAM?  Of course that
> could be necessary to support some wide and long text modes.

The text mode is 4000 bytes or 1000 words.  That won't fit in the
scratch memory.

>> Enough of the
>> way the bridge bus protocol works is mingled into the address decoder
>> that we may have to make some changes to be able to sensibly queue up
>> multiple separate read requests back to back so that HQ can always be
>> able to do something else while waiting on read data.  I'll have to go
>> back and look to see what would happen if a command were queued up
>> while in read mode.  Right now, that will never happen, since the
>> address decoder is the only thing ever talking to the bridge.
>>
>> We can also consider changes to the bridge protocol.
>
> To optimise the fetches, I'd consider something like
>
>  * issue first read
>  * fetch glyph of first character
>  * issue second read
>  * render first glyph (while second read is in progress)
>  * fetch glyph of second character
>  * issue third read
>  * ...

You can't render (which involves writes) while a read is outstanding.
Normally, this would not be the case, since inside the S3, we always
have separate queues for writes, possibly a separate one for read
requests, and one for read data.  With the bridge as it is, we're
sucking everything through one straw.  About all we can do is make a
read request, do some other computation, then wait for the read data.
If we do it right, we may be able to queue up more than one read
request before deciding to wait for data in the return queue, and as
such, we can queue writes as well.  But that command queue is only 16
entries, which amounts to 8 requests, and a write won't complete ahead
of a read that came before it, so the only parallelism is in that the
queue is serviced in parallel to what HQ does.  Right now that may nor
may not work, since with PCI, there's no opportunity for queueing
things like this (except for writes and writes).

Keep in mind that, at last as a first approximation, we don't have to
convert the whole screen in one video frame.  10 FPS will be more than
adequate.

> and after a certain number of glyphs, write the pixel data, then
> continue.
>
> An algorithm like that could benefit form having the bridge access HQ
> BRAM directly.  Even though we can't avoid the small pipe to the bridge
> due to bypass mode, the transfer unit can be connected either to the
> pipe or parallel to it.  That saves the program from fetching glyph
> data.

Let's keep this in mind.  But if we manage to meet or exceed 30 FPS,
then it becomes an unnecessary optimization.  On the other hand, it
may become necessary for DMA to be efficient!

> The advantage of direct BRAM transfers is less obvious if we can come up
> with a modification to the bridge protocol, as you indicate, to allow
> outstanding reads, so that the algorithm can fetch and render
> simultaneously without unnecessary stalling.

The first, easiest thing we can do is make sure that the bridge logic
doesn't try to dequeue a command unless the bus can take it.  This
way, we can queue up multiple read requests and writes.  If a read
request comes along, then the bridge holds up all other requests until
the whole read is serviced, but that doesn't prevent us from queueing
up some writes.  We can also think about enlarging that command queue.
 This will allow us to dump quite a lot of write data into it and let
that go at its own pace while HQ computes on something else.

We need to be strategic about the queue management.  Do we want to
clear the read return data sooner or later?  As it is, if we request
more reads than can be held in the queue, and we don't clear the queue
fast enough, then read data will be lost, which could be disasterous,
since everything expecting a certain number of read words to appear.
The way the code I've written works now, when it fetches a glyph,
that's only four words, and so it can leave them in the queue and pull
them out when it's time.  One drawback of this is that since there are
out-standing words in the queue, we can't poll PCI as often as we
might like.


-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to