Re: [Open-graphics] Designing a CPU

Timothy Normand Miller Sat, 17 Mar 2007 14:32:06 -0800

On 3/17/07, Stephen Pollei <[EMAIL PROTECTED]> wrote:

and if you keep mips-like, then maybe even just patching gnu asm aka
gas a little bit.


That would minimize the development time...


http://www.simplescalar.com/ portable instruction set architecture
or http://www.cise.ufl.edu/~mpf/manuscript/pisa.ps Pendulum
instruction set architecture ??


I believe it's simplescalar.  At least, that's the one we used in class.

BTW, this is the class I took:

http://www.cse.ohio-state.edu/%7Elauria/cse775/index.html

I did well in it partly because of my chip design background and
partly because I'd tinkered with CPU designs for quite a long time.
This class did answer a number of critical questions I had in mind,
and knowing those answers, I believe I know how to do it now.

So, yes, although I've done quite a number of complex chip designs,
the closest thing to a CPU design that I've ever done and actually
deployed is the video controller for OGA.

> > > This is also where we need to deal with branches.  If the instruction
> > > is a branch, the condition needs to be resolved, and the address needs
> > > to be fed back to stage (1).  This is why RISC processors typically
> > > have a delayed branch.  The possible branch conditions are reg-value=0
> > > and reg-value!=0.
> > BEQ -- Branch on equal
> > BNE -- Branch on not equal
> > BGEZ, BGEZAL, BGTZ, BLEZ, BLTZ, BLTZAL from mips not used?
>
> I'm not sure what those do, but I'm working from the simplest MIPS
> model from the textbook.  We need to strike a balance between
> functionality and logic area.  We need for the general case to be able
> to keep up with the dataflow from PCI at 66MHz.  If the CPU runs at,
> say, 100MHz, we can do that if we use lots of unrolled loops for data
> movement.  Instructions that don't help us with that are just not
> needed.
GEZ is Greater equal Zero >=0 ; GTZ >0 ; LE <= ; LT <
AL is "and link" .. like JAL


Gotcha.  In MIPS, I forgot... do they use a stack for linking?  Or a
register?  Is the register fixed or selectable in the instruction?

They are not completely orthogonal, but they should be very cheap
However you maybe wanted to add a shift-and-add instructions... so
freeing up some space would be nice. Also though if I had to choose
two branch instructions from the list I'd choose maybe BEQ and BGTZ,
rather than BEQ and BNE.
Also then do you need 32 registers? or could you live with 16? reg0 is
usually hardwired to be zero IIRC. 31 or 15 is where JAL would store
return address.


Some of these are just signed/unsigned versions of each other.  Noting
that some of the work is done in a SUB or SLT instruction preceeding,
here are some of the basic comparisons being done here:

Unsigned:
BEQ -- reg == 0
BNE -- reg != 0

Signed:
BLTZ -- reg[sign] == 1
BGEZ -- reg[sign] == 0
BLEZ -- reg[sign] == 1 || reg == 0
BGTZ -- reg[sign] == 0 && reg != 0

> > J -- Jump
> > JAL -- Jump and link
> > JR -- Jump register
> > You might not want some of these J's, if you want it real simple.
>
> Yeah.  I figured we'd have a CALL-like instruction that puts the
> return pointer into a register.  Return would just be to jump to an
> address contained in a register.
sure JAL and then JR.. You can't nest JALs without handling a stack yourself.
JAL always stores the old IP(Instruction Pointer) into register 31.


That answers my question above.  :)

Also how big is your program going to be? Jump is using 26bits for
address. if you stay under 64kibytes than you would only need
like14bits.. that gives you 12bits for something else.


I think I want to provide a 512-word program file, so we need 9 bits.
However, I think we should future proof a little in case that turns
out to be just too small.  We'll reserve 10 or 11 bits.

> Since our addresses are only 9 bits, that gives us some freedom that
> MIPS didn't have.  We should probably reserve 10 or 11 bits to
> future-proof it, but since this is very special-purpose, we shouldn't
> be afraid to use a completely different ISA in a future product.
sure and maybe you can use that to carve out space for your
shift-and-add instructions.


However, with a real multiply instruction, shift and add may not be
necessary.  Multiple shift-and-add could be accomplished by
multiplying by a constant.  Our multiply will be cheap.

> Yeah, but ditch the divide.  It's a multi-cycle instruction.  In its
> place, we can add some single-cycle instructions that assist with
> divides.  Early SPARCs didn't have multiply, but they did have
> multiply step instructions.  We can do multiply, though, because the
> FPGAs and ASICs have dedicated multiplier circuits.
OK you can always just make a divide be a function that you JAL into as well.


Yes.  Now, here's the thing... if we want to have our CPU be a
translator to make 3D ops for some 2D ops, we'll end up having to
convert some integer values to floats.  That would require a lot of
time-consuming manipulation, and we need to figure out exactly what
that manipulation is going to look like.

>
> BTW, does MIPS have a shift-and-add instruction?
Not in the list I read. If you drop the LB,SB, and DIV you could maybe
have room enough.
Plus the shift instructions themselves seem to have unused bits in a
lot of cases.
SLL -- Shift left logical 0000 00ss ssst tttt dddd dhhh hh00 0000
SLLV -- Shift left logical variable 0000 00ss ssst tttt dddd d--- --00 0100
SRA -- Shift right arithmetic 0000 00-- ---t tttt dddd dhhh hh00 0011
SRL -- Shift right logical 0000 00-- ---t tttt dddd dhhh hh00 0010
SRLV -- Shift right logical variable 0000 00ss ssst tttt dddd d000 0000 0110

- means unused ...


Sounds good.  I think we may find ourselves considering some special
purpose instructions too.

> We don't need any byte or 16-bit instructions.  OGA is spec'd out to
> not understand anything but 32-bit pixels anyhow (we'll provide ways
> to pretend to do 8-bit, but that's a separate discussion).  So if the
> CPU needs to process 8-bit words, we'll just have to act like Alpha
> and use extra code.
OK yes, I've read about unaligned acess on some early Alpha's from the
lkml IIRC.


I think the never did add 8 or 16 bit instructions.  They did add 64's
though.  This caused problems when accessing I/O devices.  The trick
they used was a sort of "sparse" addressing mode, where some of the
address bits encoded the word size.  The alpha still thought it was
doing 32-bit accesses, but the I/O chipset did some translations
behind its back, so to speak.

> > > I believe the MIPS processor uses the ALU to add the contents of one
> > > register to a short immediate value stored in the instruction, and
> > > that's used as the address.  We should do the same.  That makes it so
> > > that the only memory addressing mode is reg-value + offset.
> >
> > Yes and imediate values are 16 bits. However if we have only 512 bytes
> > of ram then you only really need 9 bits. Only J and JAL take
> > immediates that are greater than 16bits, 26bits.
>
> 512 32-bit words.
ok thats easier to have to live within. However since it's offseted
from a 32 bit number anyway... You still might have unaligned access
or a 64 gibibyte instead of a 4 gibibyte addressable memory range.


We won't know anything about byte addresses.  All addresses refer to
32-bit words.  When referring to graphics memory, addresses will refer
to whole 32-bit pixels.  When we do DMA, the C/BE flags will all be
asserted, always.  Our chip will simply have no concept of a smaller
word size.  (Well, technically, the memory controller will handle
writes with PCI byte enables that aren't all asserted, and it will do
it correctly, but that and config space are the limit of it.)

So, for instance, when we're translating VGA 80x25 text mode, we'll
grab two characters at once (character and attributes), and then we'll
process them by shifting, etc.

>
> They can look like memory load/store instructions.
OK memory mapped I/O some goes on-board scratch ram others go further away.


Right.

> But there are also
> cases where we might want a single instruction to cause a word to be
> popped straight from one fifo and pushed into another.
OK I don't know mips but 8086 had IN and OUT instructions
it just let you send or receive a register(AL, or AX) to a port.
If it was an immediate argument you were limited to 0 to 255, however
if you used DX you could get 0 to 65,535 ports.
How many fifo's do you need? would a splice -- port-to-port
instruction be useful?


Yeah.  I think we'll have an IN that reads from a fifo into a
register, an OUT that writes from a register to a fifo, and an INOUT
whose operands are both memory addresses, one referring to the source
fifo, the other referring to the target.  The INOUT won't store
anything to local registers.

Is it always 32 bits that move?


Yes, and if that's not the case, we pretend it is anyhow.

do you want a count of words to move at a time?


That might be sensible.  We may need some way of determining if a
splice is still active.  However, if we're only ever allowed to splice
less than or equal to the number of available and free words, the time
is deterministic.

> Also, we need ways to handle writing to full fifos and reading from
> empty ones.  One way would be to have the instruction block, but being
> able to pause the pipeline adds extra logic and complexity that we
> might like to avoid.

OK so do you want an error return or do you want to query for size, or
do you want interrupts? I think being able to query the size of a fifo
would be best.


Definitely query, in any case.  Just another memory address.  We can
have, say, a space of four words for a fifo.  One is the fifo port
itself (read or write, whichever is the case), another is the number
of available words (to read) or free words (to write), and another
would be other status flags (full, empty, etc.).  The fourth address
could be unused.  Or we could arrange things differently -- maybe a
range of write-only, a range of read-only that affects something
(pushes or whatever), and a range of read-only for status.

Here's one way to handle grabbing a bunch of fifo entries at once:
Get the number of available entries and use that to compute a branch
target into an unrolled loop.

--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Favorite book:  The Design of Everyday Things, Donald A. Norman, ISBN
0-465-06710-7
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Designing a CPU

Reply via email to