Re: [Open-graphics] HQ assembly code, bypass

Timothy Normand Miller Fri, 22 Aug 2008 11:30:37 -0700

On Fri, Aug 22, 2008 at 1:42 PM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
>> But that's only if we're using both ports on the RAM.  If we're using
>> only one, then this is trivial... just hook to the other port, using
>> the same mechanism that we use to write to the program file from the
>> host.
>
> Okay, that'll work.  Maybe it's not a big issue, but for ASIC it may
> incur more logic since we could have gotten away with single-ported
> memory.


Certain kinds of ASICs have dedicated RAM blocks too, and they're
typically dual-ported.

>> The RAMs are dual-ported.  One port is accessed in the fetch stage,
>> while the other is accessed in the MEM stage.
>
> Yes, but this does not have to be efficient at all, right?  Why not use
> a HQIO handler?  That is, no changes to the RTL.

Yeah, that's basically what I had in mind, although I thought of it
more like how we access scratch memory.

With the use of a fifo, we can have both data access to the program
file from microcode as well as host access to the scratch memory.

For the moment, let's just do the easy thing, which is to hook the
unused port of the scratch memory up to the same top-level HQ module
ports that let us load the program file.  Just one extra address bit
and an extra gate on each write-enable.

If space gets tight, we can look into program access to the program
file, via the HQIO handler.

>> >> Yeah.  But we're going to end up with a lot more ports anyhow.  If
>> >> there's no use in having the combined port, then ditch it.  If we find
>> >> a use for it, we can put it back later.
>> >
>> > Good, I have prepared to commit this to the port decode:
>> >
>> >        PCI_T_CMD_TYPE:
>> >            hqio_inport = pci2hq_cmd_type & {32{pci2hq_cmd_valid}};
>> >        PCI_T_CMD_FLAGS:
>> >            hqio_inport = pci2hq_cmd_flags;
>> >
>> > Then we also have the bit to avoid checking PCI_T_CMD_COUNT, as
>> > discussed.  I'm assuming pci2hq_cmd_valid is the same as
>> > pci2hq_cmd_count != 0.
>>
>> Yes.  valid means count != 0.  But the problem is that now you can't
>> dequeue a null command.  Have we decided not to do the null command?
>
> I didn't think of the dequeuing issue.  But, yes I don't see a use for a
> null command if to terminate writes.  Isn't it so that a PCI target
> write of N words followed by one of M words in a continuous range should
> be considered equivalent to a singe write of N + M words?  If that's the
> PCI semantics, then I don't think a termination command carries any
> meaning.

All the termination tells you is that the next thing you'll get is an
address command.  But as you have designed this, you maintain enough
state that it doesn't matter.  It looks like you grab the address into
a global, then when you get write commands, you send the address over
the bridge for the first one, then forward writes until you run out,
at which point you bail.  If the next thing you get after an idle
moment is a write command, you send the address again from the global.

Don't forget to count how many words you sent and increment the
address.  The fact that you have to do this is unfortunate and makes
me ponder to have the null command so we don't have to bother storing
and incrementing the address.

>> The jump table approach appeals to be because it's so much more
>> flexible.  I think we should definitely have host access to scratch
>> (at least for writing).
>
> Well in this particular case, I'm now down to 3 jump instructions, as
> apparent from the attached code, but in general I agree.

Yeah.  It definitely makes sense to do the sieve here.

>> Also, we do have more block RAMs available for program and scratch.
>> It just means more MUXing after the registered outputs.
>
> That's good to know.  We're down to 85 words for poll_pci() mostly due
> to the 1 or 16 specialisation of the reads, but the VGA parts could
> become much bigger.

Yeah.  But who would ever need more than 512 words of program memory?
Maybe we need 640.  :)

>> As long as it never corresponds to anything, that's fine.  Hey, how
>> about a branch instruction that always comes out false?  How would
>> that work?  It would be odd to have a branch in a delay slot... but
>> this is one that would always do nothing.  What would happen?
>
> Good idea.  I think that's safe.
>
> I have so far played it safe with the delay slot, but we can in fact do

I vaguely recall from my computer arch. class some statistics about
delay slots.  They're only usefully filled about 50% of the time.  I
may have it backwards, but I think you can only put an instruction in
there 80% of the time, and only about 60% of the time are the results
of that instruction actually used.  Yeah, I may have that backwards...
but the 50% is right.

> some clever tricks with it like executing a single instruction somewhere
> followed by an immediate jump to some other location.  E.g. the
> following code executes an arithmetic operation which is encoded in r0,
> applies it to r1 and r2 and stores the result in r1:
>
>        add r0, apply_operator, r0
>        jump r0
>          jump cont
> cont:
>        ...
>
> apply_operator:
>        add r1, r2, r1  ; If r0 = 0, then add
>        sub r1, r2, r1  ; If r0 = 1, then substract
>        and r1, r2, r1  ; etc
>        or r1, r2, r1
>        xor r1, r2, r1


That sounds clever, although the semantics of a taken jump in the
delay slot of a taken jump is something that I would have to
understand more.  Are you sure that the instruction target of the
first jump will actually get executed?

Let's see...
Jump 1 gets executed while jump 2 gets fetched
Jump 2 gets executed while target gets fetched

Ok, so basically, the target of the first jump ends up filling the
time slot of the delay slot of the second jump?

>> I meant a long video read being done by the video controller in the
>> S3.  We're only intercepting PCI access to memory.
>
> In the current bridge wrapper HQ intercepts all or nothing.  Will we
> change that, or will the driver switch modes at will even while the
> bridge is active reading?

It's always all or nothing.  HQ becomes a gatekeeper for the bridge
over to the S3.  It's only safe to switch when there's absolutely
nothing pending on the bridge.  So we'll need some way, via PCI, to
tell the microcode to go into a safe state, then we wait until it's in
that state before changing modes.




-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] HQ assembly code, bypass

Reply via email to