On Fri, Aug 22, 2008 at 4:14 PM, Petter Urkedal <[EMAIL PROTECTED]> wrote:

>> For the moment, let's just do the easy thing, which is to hook the
>> unused port of the scratch memory up to the same top-level HQ module
>> ports that let us load the program file.  Just one extra address bit
>> and an extra gate on each write-enable.
>>
>> If space gets tight, we can look into program access to the program
>> file, via the HQIO handler.
>
> I think we're still speaking different languages, but I'm not sure where
> the misunderstanding lies.  When I refer the HQIO handler I mean a
> fragment of HQ-code which handles a PCI target commands addressed to
> TARGET_IO (as defined in pci_address_decode.v).  It has been my
> assumption that TARGET_IO is the mechanism by which the host
> communicates and controls HQ.  Maybe that's my misunderstanding.  I have
> not at all considered accessing the program file at from HQ code, and I
> don't see how a HQIO handler, which is a host-to-HQ channel, could do
> that.

Yeah.  We need more names for things.  :)  I think your understanding
is fine, and what I was incorrectly referring to as HQIO is basically
the MEM stage and some logic in the HQ wrapper that hooks up IO ports.

Things we need names for:

- VGA legacy I/O space on the PCI bus
- What you get when you access a negative address in the MEM stage
- Microcode handlers that move data between PCI and bridge

>> All the termination tells you is that the next thing you'll get is an
>> address command.  But as you have designed this, you maintain enough
>> state that it doesn't matter.  It looks like you grab the address into
>> a global, then when you get write commands, you send the address over
>> the bridge for the first one, then forward writes until you run out,
>> at which point you bail.  If the next thing you get after an idle
>> moment is a write command, you send the address again from the global.
>>
>> Don't forget to count how many words you sent and increment the
>> address.  The fact that you have to do this is unfortunate and makes
>> me ponder to have the null command so we don't have to bother storing
>> and incrementing the address.
>
> I have not forgotten the increment, but thanks for the reminder.  I
> wouldn't worry about that single instruction to store the address, since
> it's not part of the inner write loop.  If the inner write loop ever
> exits, then the write commands are coming in slow enough that we can
> keep up, anyway, and we could possibly benefit from exiting poll_pci()
> to do some other work.

That's true.  If PCI is so slow that we get ahead, then we can do
something else useful, then come back and catch up again.

>
> The address increment, on the other hand, costs us one instruction in
> the inner loop.  But remember, we have optimised out two instructions,
> namely one to fetch of PCI_T_CMD_COUNT and one for testing it for zero.

True.  Plus, we have no idea how well this will perform in reality,
and moreover if we don't keep up, we may not care anyhow.

>> Let's see...
>> Jump 1 gets executed while jump 2 gets fetched
>> Jump 2 gets executed while target gets fetched
>>
>> Ok, so basically, the target of the first jump ends up filling the
>> time slot of the delay slot of the second jump?
>
> Don't worry, I have tested it.  Have you managed to build the
> assembler and simulator?  If so, try to save the following as
> "test_delay_hack.asm" under tools/oga1hq:

I'll have to give it a try.

> include SIM
>
>        ;; r0 is the operator to apply, r1 and r2 are the operands.
>        move 77, r1
>        move 70, r2
>        move 1, r0
>        add r0, apply_operator, r0
>        jump r0
>          jump cont
> cont:
>        noop
>        noop
>        noop
>        jump SIM_DUMP
>        noop
>        jump SIM_HALT
>        noop
> apply_operator:
>        add r1, r2, r1  ; If r0 = 0, then add
>        sub r1, r2, r1  ; If r0 = 1, then substract
>        and r1, r2, r1  ; etc
>        or r1, r2, r1
>        xor r1, r2, r1
>
> Then run
>
>    shell$ ./runsim test_delay_hack.hex
>
> This will invoke the assembler to compile the .asm file to .hex, then
> run the simulator on it.  Pass an option "-s" to enter the debugger on
> startup.

The nice thing about designing our own CPU is that we know exactly how
it'll behave with what would otherwise be unadvisable instruction
sequences.  One of the things that I always thought would be fun about
designing this hardware in general (the GPU in particular) would be to
see how people would creatively abuse it.

>> It's always all or nothing.  HQ becomes a gatekeeper for the bridge
>> over to the S3.  It's only safe to switch when there's absolutely
>> nothing pending on the bridge.  So we'll need some way, via PCI, to
>> tell the microcode to go into a safe state, then we wait until it's in
>> that state before changing modes.
>
> Sorry, I misread your original comment.  But in any case, if the video
> controller is doing a long read, does it affect HQ in any other way that
> delaying the eventual MEM_READQ_AVAIL becoming non-zero?

Yeah.  It could back up the write queues.  In the S3, there's a write
queue connecting the bridge to the memory arbiter; when it fills to 8
entries, it signals busy to the bridge, which propagates to the XP10;
due to the delays you get roughly 12 entries there before the XP10
bridge registers the busy.  Then you get another 16 entries on the
XP10 command queue from HQ to the bridge.  If those all fill, then you
have to stop trying to send commands, or else they'll get lost.
That's why there is/should be a "free command fifo entries" I/O port
that HQ can read.


-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to