On Sun, Jul 20, 2008 at 8:38 AM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> On 2008-07-19, Timothy Normand Miller wrote:
>> [...] We need to consider the consequences
>> of having HQ intercept EVERY access to the bridge.
>
> So if I understand this correctly, the current plan is full-intercept or
> no intercept, but it's something we may need to reconsider.  I guess for
> VGA, full-intercept is okay since most data is translated, but if we use
> HQ in GPU mode, then full-intercept would be a major bottleneck.

Making this selectable by HQ itself could be good, although we'll have
to be very careful about race conditions where there are PCI accesses
coming through at the same time that HQ makes the switch.
Alternatively, we could require the driver to do it.  If we want to
switch between PIO and DMA, we have to require the driver to switch
the bypass on and off.  Ideally, in GPU mode, DMA will be used for
almost everything.  Any PIOs that do happen will have latency, since
HQ will have to poll for them and pass them along, but that will have
minimal impact.  Of course, DMA is for later.

> But there is no way to do the clock-switch properly, is there?

Not really.  Too complicated, requiring so much extra logic that you
might as well just add another queue.

>> That could be very useful, for performance and more asynchrony.
>> However, the bypass won't work that way, so we'd have to implement
>> both mechanisms.  We should start with the dumber one that works with
>> bypass and see if we can really benefit from the optimization
>> afterwards.
>
> So, data always passes though HQs pipes and clock domain even in bypass
> mode?  That solved the clock-switching issue.  It adds latency for
> bypass mode, but it's probably negligible overall.

It'll be minor compared to the other delays.

>> BTW, there are some facts about the bus protocol that we might want to
>> change.  When accessing the bridge, the first cycle is the address,
>> and the flag bits indicate the target (memory or config registers).
>
> These flag bits sound like a natural extension as the highest bits of
> the address.

Yeah, so an early change we can make is to move those bits into the
address, even before HQ is in.  Various things in the XP10 and S3 will
have to change for that.

>> For reads, the subsequent cycle is the word count, after which the bus
>> switches direction and waits.
>>
>> For writes, subsequent cycles are data, flags indicate which bytes are
>> valid, and the address auto-increments.
>
> So, these flags can't be combined with the other data.  I guess the
> common case is that all are 1, so shall we
>  * write an optional byte-enable before write with default 1111, and
>    then it applies to all data, or
>  * add a write-mode where byte-enables and data are interlaced?

Another option would be to have 15 I/O ports for writes, one for each
combination of flags.  If you already know the flags (usually 1111),
you can hard-code it.  Otherwise, you can add the flags to some
address.

>> The address counter in the S3 auto-increments, but it only increments
>> the lower 7 bits of the word address.  So every 128 32-bit words, it's
>> required that a new address be sent.  That happens automatically with
>> PCI due to the way this target is designed, but HQ will have to
>> enforce it in the program.
>
> I think we can manage that.

It could actually be challenging.  A row of characters is 160 bytes,
or 40 words.  Since that's not an even multiple of 128, the code that
requests reads will have to be designed to figure out where to split
the request, and in as few instructions as possible.  Enough of the
way the bridge bus protocol works is mingled into the address decoder
that we may have to make some changes to be able to sensibly queue up
multiple separate read requests back to back so that HQ can always be
able to do something else while waiting on read data.  I'll have to go
back and look to see what would happen if a command were queued up
while in read mode.  Right now, that will never happen, since the
address decoder is the only thing ever talking to the bridge.

We can also consider changes to the bridge protocol.

>> One thing we may want to change is how the target flags are presented.
>>  Right now, they're separate from the address, but the address isn't
>> 32 bits, so they could be prepended.  However, it may actually be
>> faster to make them separate, potentially saving some HQ code to
>> extract them.
>
> I'm not sure either.  If the flags are encoded in the address in such a
> way that it does not affect the use of the address, and if the common
> usage for flags is to test them individually, then combining flags and
> addresses can save register usage and fetch commands.

This would be easy enough to change even now.


-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to