[Open-graphics] HQ assembler and coding conventions

Petter Urkedal Sun, 20 Jul 2008 04:13:47 -0700

On 2008-07-19, Timothy Normand Miller wrote:
> >> > When we start on the microcode it may be an idea to come up with an
> >> > (manually enforced) ABI for utilise the registers as best as we can
> >> > without ending up with a web of dependencies to rework if we need to
> >> > change something.  Is there a register-based ABI/practices we can adapt?
> >> > Otherwise, I can write up some ideas.
> >>
> >> Are you referring to the assignment of "names" to scratch space
> >> addresses?
> >
> > That's an issue, too.  For the moment, I was just considering
> > parameters, results, and scratch registers for subroutine calls.  Since
> > our programs are small, it's probably not a big issue.  It may suffice
> > with a single level of calls to rather simple subroutines using only a
> > few registers, and once written the register usage of the subroutine is
> > unlikely to change.
> 
> We're extremely constrained, so we need to pick something that's very
> efficient, even it's more challenging to program.  We could think like
> fortran 77, where parameters are passed by reference and live at fixed
> addresses.  So if you call function X, then you know exactly where in
> scratch space to dump its parameters.  Exactly what to do with the
> return/continuation address is a question, but if we do it this way,
> there's a place in scratch memory to put it (if the subroutine needs
> the register for something else).  Not having a stack does pose some
> challenges.  We can simulate a stack, but the instruction overhead
> could hurt performance.  We could also consider implementing a
> 16-entry stack, although I don't want to if we can avoid it.


With so many registers and so limited program space, I think we can
manage the manual register allocation without saving, at least for the
inner calls.  As you indicate, parameters and results may overlap.
Likewise return/continuation address is just a preserved input
parameter.   I listed these explicitly below only to suggest the
ordering.

> If we were ever to take this basic architecture and scale it to a more
> powerful processor design in the future, it would not be binary
> compatible.  Moreover, different revisions of OGA do not have to have
> binary-compatible HQs.  As long as the instruction sets are
> well-documented, the system and BIOS can handle figuring out which
> code file to load for any given purpose.
> 
> >> We definitely need something like that, but it may be very
> >> program-dependent.  Unless things turn out to be surprisingly small,
> >> we'll have one program for VGA text, one for VGA graphics, and
> >> eventually, one for DMA.  BIOS or kernel can reload the program as
> >> necessary.
> >>
> >> I'm always in favor of creating good design structures.  512 program
> >> words doesn't seem like a lot, but that's part of the challenge --
> >> fitting a program into that space.  To keep our sanity, we really need
> >> to be organized about it.  This is especially important for us and our
> >> progeny to be able to maintain it later.  Unfortunately, I'm not sure
> >> what pre-existing paradigms might apply here, so lets develop
> >> something new.
> >
> > This is what I had in mind.  We allocate from r0 upwards in the
> > following order with possible overlaps (s = scratch, r = read, w =
> > write)
> >
> >                                        caller  callee
> >  1. Scratch registers.                 s       s
> >  2. Result registers.                  s/r     s/w
> >  3. Scratched parameter registers.     s/w     s/r
> >  4. Preserved parameter registers.     s/w     r
> >  5. Continuation address register.     s/w     r
> >  6. Callee preserved registers.        s       -
> >  7. Caller preserved registers.        -       -
> >
> > Relating to stack-based ABI, regs 2 to 5 makes up the current frame, and
> > higher registers are higher stack frames.  Relating to CPS-based ABI,
> > regs 4 to 7 are the continuation and regs 2 are the parameters passed to
> > the continuation.
> 
> Continuations are interesting for high-level programming, but I'm not
> even thinking of doing a proper stack-based ABI.

Yes, I'm just drawing the analog to see if the proposal makes sense.
Since we have neither a stack nor higher order functions, the analogs
both match and mismatch.

> A simple approach:
> 
> Every function has access to every register for any purpose, but for
> any register it's going to use, it is responsible for storing it to a
> fixed (computed by the assembler) location in scratch space, and
> restoring it before returning.  This includes the call return address.
>  We can devise assembler directives that indicate which registers hold
> which parameters (no scratch space needed), which registers are for
> local variables (requiring a backing store in scratch space), and how
> many additional scratch locations are necessary for the task to
> compute.
> Parameters can be passed in registers, and we are free to restrict it
> so that no more than some number of parameters can be passed, owing to
> certain registers being reserved (particularly the return address).
> Recursion is not allowed.  Indeed, disallowing recursion may help us
> simplify things further.

I was hoping we could avoid saving registers in most cases, since we
have quite many of them.  This also means we don't fix the return
address register.  I think the assembler directive to indicate register
usage is mostly just documentation, but see below.
 
> It's slightly faster to use registers than scratch space, so we should
> see how we can optimize to use registers more.

Yes.

> > E.g. "z += x*y" would be (cf hqlib/mulu.asm though it doesn't use this
> > convention),
> >
> >    r0 - parameter and result z
> >    r1 - parameter x
> >    r3 - parameter y
> >    r4 - continuation address
> >    r5..r31 - preserved
> >
> > If we had a subroutine using the above, it's usage may be
> >
> >    r0..r4 - scratch
> >    r5..r[N-1] - output, input, cont
> >    r[N]..r31 - preserved
> >
> > This facilitates bottom-up coding.  As long as we are dealing with
> > simple and well-defined subroutines, we'll be able to allocate registers
> > precisely.  For higher level subroutines, we can think forward and set a
> > side some extra scratch registers.
> 
> I like this.  We can put some intelligence into the assembler (or
> compiler?) that allows a function to state which registers it's going
> to use.  Then the CALLER is responsible for saving and restoring, and
> this makes room for the caller to simply avoid certain registers,
> rather than the callee having to save and restore registers even if
> nothing useful is in them.  So for leaf nodes and their parents, this
> can result in some significant optimizations, although that will
> diminish dramatically for the next level up.

Yes, I've considered some simple extension to the assembler.  The
assembler has no concept of subroutine boundaries.  A simple extension
would be to introduce a register aliasing directive like

        reg p0..p2 = r5..

so that p-registers are our parameters.  There is no point in aliasing
the scratch registers, since they always start at r0, and registers
above our parameters are preserved by us, so we don't refer to them.  It
could make sense to alias registers between those scratched by all inner
calls and our own parameters:

        reg q0..q3 = r6..

When calling a subroutine, the subroutine's register definitions will
not be in scope, so we still need to refer to physical registers for
parameters.  We could allow references like label.p0, but that's
probably not a big advantage since we need to keep track of which
r-registers are scratched by the subroutine, anyway.  If we see

        move ..., r3
        move ..., r4
        jump func, r5

then the reader of the code knows that r0..r5 will be scratched by the
call, whereas "move ..., func.p0" will be less informative.

To introduce subroutine-awareness we could add a directive right after
the label:

   some_function:
        sub(r3..r4; r5) r0..r2  ; (params, cont) scratch
        ...
        endsub

and in the caller

        sub(r10..r11; r12) r0..r10
        local q0..q3 = r6..  ; assert that no subroutine scratches these
        ...
        ;; Here the assembler checks the "sub" directive of
        ;; "some_function" and fails if it overlaps with r6..r9.
        call some_function(l3, r1)
        ...
        endsub

where the call, which must not be in a delay-slot, translates to

        move l3, r3
        move r1, r4
        jump some_function, r5

This complicates the assembler, so maybe we should start coding and see
if and what we need before adding such an extension.

A technicality:  The assembler uses a dedicated lexical class for
registers, so if we introduce any of the above we must make them
distinct from other identifiers.  We could

  * Reserve certain literals for registers, say /[a-z][0-9]+/ or
    more limited /[p-s][0-9]+/.

  * Use a prefix as in %r8, %base_addr.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

[Open-graphics] HQ assembler and coding conventions

Reply via email to