Re: [Open-graphics] OGP Status: OGA1 in OGD1, hardware programmed and recognized under Linux

Petter Urkedal Sat, 19 Jul 2008 05:28:50 -0700

On 2008-07-18, Timothy Normand Miller wrote:
> On Fri, Jul 18, 2008 at 1:31 PM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> > I believe HQ has not yet been connected to the bridge?
> 
> No.  Not yet.  We're going to get to that once we have debugged it to
> this point.  However, that doesn't mean we can't start on the coding
> and them merge things later.  Would you like to work with me to
> architect this?


I'm not sure how this will be done, but I can probably help anyway.

  * As I recall from previous discussion, we want to decode the PCI
    address and dispatch to HQ in hardware, rather than equipping HQ
    with a control bit to intercept all incoming PCI commands.  Can we
    assume the BAR for HQ is fixed, or shall HQ be able to configure the
    pipe from PCI to intercept different address ranges on demand?

  * As far as I can see, there are four clocks involved, the PCI clock,
    two separate clocks for bridge transmission and reception, and the
    HQ clock.  Are all these different?

I assume memory access goes though the bridge.  So, we must extend
xp10_bridge_wrapper.v with an additional internal interface for HQ
memory operations.  If we need high thoughput, is there any alternative
to two extra FIFOs using two BRAMs?  Any yet another two for PCI?

Since HQs BRAM has an unused port with it's own clock domain, it may be
possible to let the bridge read and write data directly to HQ memory.
That is, for memory-write, HQ prepares the data in a subrange of it's
BRAM, and tells the bridge to transmit the range to a given memory
address.  For memory-read, HQ tells the bridge to transfer a memory
range to a BRAM range.  That could also work for PCI, though we'd need
to extend HQ internal memory with another BRAM due to the separate clock
domains.

> >> The next tasks, things we really need help with, include:
> >>
> >> - HQ microcode.
> >
> > When we start on the microcode it may be an idea to come up with an
> > (manually enforced) ABI for utilise the registers as best as we can
> > without ending up with a web of dependencies to rework if we need to
> > change something.  Is there a register-based ABI/practices we can adapt?
> > Otherwise, I can write up some ideas.
> 
> Are you referring to the assignment of "names" to scratch space
> addresses?

That's an issue, too.  For the moment, I was just considering
parameters, results, and scratch registers for subroutine calls.  Since
our programs are small, it's probably not a big issue.  It may suffice
with a single level of calls to rather simple subroutines using only a
few registers, and once written the register usage of the subroutine is
unlikely to change.

> We definitely need something like that, but it may be very
> program-dependent.  Unless things turn out to be surprisingly small,
> we'll have one program for VGA text, one for VGA graphics, and
> eventually, one for DMA.  BIOS or kernel can reload the program as
> necessary.
> 
> I'm always in favor of creating good design structures.  512 program
> words doesn't seem like a lot, but that's part of the challenge --
> fitting a program into that space.  To keep our sanity, we really need
> to be organized about it.  This is especially important for us and our
> progeny to be able to maintain it later.  Unfortunately, I'm not sure
> what pre-existing paradigms might apply here, so lets develop
> something new.

This is what I had in mind.  We allocate from r0 upwards in the
following order with possible overlaps (s = scratch, r = read, w =
write)

                                        caller  callee
  1. Scratch registers.                 s       s
  2. Result registers.                  s/r     s/w
  3. Scratched parameter registers.     s/w     s/r
  4. Preserved parameter registers.     s/w     r
  5. Continuation address register.     s/w     r
  6. Callee preserved registers.        s       -
  7. Caller preserved registers.        -       -

Relating to stack-based ABI, regs 2 to 5 makes up the current frame, and
higher registers are higher stack frames.  Relating to CPS-based ABI,
regs 4 to 7 are the continuation and regs 2 are the parameters passed to
the continuation.

E.g. "z += x*y" would be (cf hqlib/mulu.asm though it doesn't use this
convention),

    r0 - parameter and result z
    r1 - parameter x
    r3 - parameter y
    r4 - continuation address
    r5..r31 - preserved

If we had a subroutine using the above, it's usage may be

    r0..r4 - scratch
    r5..r[N-1] - output, input, cont
    r[N]..r31 - preserved

This facilitates bottom-up coding.  As long as we are dealing with
simple and well-defined subroutines, we'll be able to allocate registers
precisely.  For higher level subroutines, we can think forward and set a
side some extra scratch registers.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] OGP Status: OGA1 in OGD1, hardware programmed and recognized under Linux

Reply via email to