On 2008-07-20, Timothy Normand Miller wrote:
> On Sun, Jul 20, 2008 at 8:38 AM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> > So if I understand this correctly, the current plan is full-intercept or
> > no intercept, but it's something we may need to reconsider.  I guess for
> > VGA, full-intercept is okay since most data is translated, but if we use
> > HQ in GPU mode, then full-intercept would be a major bottleneck.
> 
> Making this selectable by HQ itself could be good, although we'll have
> to be very careful about race conditions where there are PCI accesses
> coming through at the same time that HQ makes the switch.
> Alternatively, we could require the driver to do it.  If we want to
> switch between PIO and DMA, we have to require the driver to switch
> the bypass on and off.  Ideally, in GPU mode, DMA will be used for
> almost everything.  Any PIOs that do happen will have latency, since
> HQ will have to poll for them and pass them along, but that will have
> minimal impact.  Of course, DMA is for later.

Is there a need for HQ to switch the bypass?  Else, I'd say leave it to
the driver, as it probably needs to know about the bypass state anyway.
 
> >> BTW, there are some facts about the bus protocol that we might want to
> >> change.  When accessing the bridge, the first cycle is the address,
> >> and the flag bits indicate the target (memory or config registers).
> >
> > These flag bits sound like a natural extension as the highest bits of
> > the address.
> 
> Yeah, so an early change we can make is to move those bits into the
> address, even before HQ is in.  Various things in the XP10 and S3 will
> have to change for that.

Sounds good.  I'll have a look at your VGA code in the meantime.
 
> >> For reads, the subsequent cycle is the word count, after which the bus
> >> switches direction and waits.
> >>
> >> For writes, subsequent cycles are data, flags indicate which bytes are
> >> valid, and the address auto-increments.
> >
> > So, these flags can't be combined with the other data.  I guess the
> > common case is that all are 1, so shall we
> >  * write an optional byte-enable before write with default 1111, and
> >    then it applies to all data, or
> >  * add a write-mode where byte-enables and data are interlaced?
> 
> Another option would be to have 15 I/O ports for writes, one for each
> combination of flags.  If you already know the flags (usually 1111),
> you can hard-code it.  Otherwise, you can add the flags to some
> address.

That's the solution, of course :-)

> >> The address counter in the S3 auto-increments, but it only increments
> >> the lower 7 bits of the word address.  So every 128 32-bit words, it's
> >> required that a new address be sent.  That happens automatically with
> >> PCI due to the way this target is designed, but HQ will have to
> >> enforce it in the program.
> >
> > I think we can manage that.
> 
> It could actually be challenging.  A row of characters is 160 bytes,
> or 40 words.  Since that's not an even multiple of 128, the code that
> requests reads will have to be designed to figure out where to split
> the request, and in as few instructions as possible.

Do you mean that these 40 words will be stored in chip memory, rather
than being transferred from PCI directly to HQs BRAM?  Of course that
could be necessary to support some wide and long text modes.

> Enough of the
> way the bridge bus protocol works is mingled into the address decoder
> that we may have to make some changes to be able to sensibly queue up
> multiple separate read requests back to back so that HQ can always be
> able to do something else while waiting on read data.  I'll have to go
> back and look to see what would happen if a command were queued up
> while in read mode.  Right now, that will never happen, since the
> address decoder is the only thing ever talking to the bridge.
>
> We can also consider changes to the bridge protocol.

To optimise the fetches, I'd consider something like

  * issue first read
  * fetch glyph of first character
  * issue second read
  * render first glyph (while second read is in progress)
  * fetch glyph of second character
  * issue third read
  * ...

and after a certain number of glyphs, write the pixel data, then
continue.

An algorithm like that could benefit form having the bridge access HQ
BRAM directly.  Even though we can't avoid the small pipe to the bridge
due to bypass mode, the transfer unit can be connected either to the
pipe or parallel to it.  That saves the program from fetching glyph
data.

The advantage of direct BRAM transfers is less obvious if we can come up
with a modification to the bridge protocol, as you indicate, to allow
outstanding reads, so that the algorithm can fetch and render
simultaneously without unnecessary stalling.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to