Re: [Open-graphics] Designing a CPU

Stephen Pollei Sat, 17 Mar 2007 13:02:07 -0800

On 3/17/07, Timothy Normand Miller <[EMAIL PROTECTED]> wrote:

On 3/17/07, Stephen Pollei <[EMAIL PROTECTED]> wrote:
> On 3/16/07, Timothy Normand Miller <[EMAIL PROTECTED]> wrote:
> > It won't be long before we'll have to design a nanocontroller for OGD1
> > to manage VGA and DMA.  I may be able to just go off and design one
> > myself, but I think that many of you would fancy observing and
> > participating in the design process, and with more brains on it, we'd
> > do a better job.
>
> Sounds very interesting, do you want an assembler for it as well? Or
> do you just want to use machine code to run it?


We'll definitely want an assembler.  It'll be too hard to code otherwise.

sure and it sounds like an assembler for this is like a easy
assignment, even from scratch this is something that would be like a
end-of-quarter college project.
and if you keep mips-like, then maybe even just patching gnu asm aka
gas a little bit.


> Since you are basing it off a mips design do you want to at least use
> a subset of the mips mnemonics?
Yes, absolutely.

>
>  Do you just want to
> copy the way mips encodes their instructions?

I figured we'd work our way back to there, yes.  Jim Dinan was
suggesting PISA, I believe.  We just need to be sure not to fixate on
that.  I chose a MIPS arch because it's very simple.  A non-pipelined
design could require almost as much logic, so we might as well
pipeline it.


OK, yes I also have looked into a few Microcontrollers before. pics,
i8051, Zilog z80, etc.

http://www.simplescalar.com/ portable instruction set architecture
or http://www.cise.ufl.edu/~mpf/manuscript/pisa.ps Pendulum
instruction set architecture ??

> > This is also where we need to deal with branches.  If the instruction
> > is a branch, the condition needs to be resolved, and the address needs
> > to be fed back to stage (1).  This is why RISC processors typically
> > have a delayed branch.  The possible branch conditions are reg-value=0
> > and reg-value!=0.
> BEQ -- Branch on equal
> BNE -- Branch on not equal
> BGEZ, BGEZAL, BGTZ, BLEZ, BLTZ, BLTZAL from mips not used?

I'm not sure what those do, but I'm working from the simplest MIPS
model from the textbook.  We need to strike a balance between
functionality and logic area.  We need for the general case to be able
to keep up with the dataflow from PCI at 66MHz.  If the CPU runs at,
say, 100MHz, we can do that if we use lots of unrolled loops for data
movement.  Instructions that don't help us with that are just not
needed.

GEZ is Greater equal Zero >=0 ; GTZ >0 ; LE <= ; LT <
AL is "and link" .. like JAL
They are not completely orthogonal, but they should be very cheap
However you maybe wanted to add a shift-and-add instructions... so
freeing up some space would be nice. Also though if I had to choose
two branch instructions from the list I'd choose maybe BEQ and BGTZ,
rather than BEQ and BNE.
Also then do you need 32 registers? or could you live with 16? reg0 is
usually hardwired to be zero IIRC. 31 or 15 is where JAL would store
return address.

> J -- Jump
> JAL -- Jump and link
> JR -- Jump register
> You might not want some of these J's, if you want it real simple.

Yeah.  I figured we'd have a CALL-like instruction that puts the
return pointer into a register.  Return would just be to jump to an
address contained in a register.

sure JAL and then JR.. You can't nest JALs without handling a stack yourself.
JAL always stores the old IP(Instruction Pointer) into register 31.

Also how big is your program going to be? Jump is using 26bits for
address. if you stay under 64kibytes than you would only need
like14bits.. that gives you 12bits for something else.

Since our addresses are only 9 bits, that gives us some freedom that
MIPS didn't have.  We should probably reserve 10 or 11 bits to
future-proof it, but since this is very special-purpose, we shouldn't
be afraid to use a completely different ISA in a future product.

sure and maybe you can use that to carve out space for your
shift-and-add instructions.

> > (3) ALU
> > Here, the numbers fetched from registers in stage (2) are combined
> > based on an opcode in the instruction.  ALU operations include add,
> > subtract, shift, multiply (using dedicated multiplier logic), and
> > bitwise logical operations.
> ADD -- Add
> ADDI -- Add immediate
> ADDIU -- Add immediate unsigned
> ADDU -- Add unsigned
> AND -- Bitwise and
> ANDI -- Bitwise and immediate
> DIV -- Divide
> DIVU -- Divide unsigned
> MULT, MULTU, OR, ORI, SLL, SLLV, SRA, SRL, SRLV, SUB, SUBU, XOR, XORI
> I suppose you are happy with this list?

Yeah, but ditch the divide.  It's a multi-cycle instruction.  In its
place, we can add some single-cycle instructions that assist with
divides.  Early SPARCs didn't have multiply, but they did have
multiply step instructions.  We can do multiply, though, because the
FPGAs and ASICs have dedicated multiplier circuits.

OK you can always just make a divide be a function that you JAL into as well.


BTW, does MIPS have a shift-and-add instruction?

Not in the list I read. If you drop the LB,SB, and DIV you could maybe
have room enough.
Plus the shift instructions themselves seem to have unused bits in a
lot of cases.
SLL -- Shift left logical 0000 00ss ssst tttt dddd dhhh hh00 0000
SLLV -- Shift left logical variable 0000 00ss ssst tttt dddd d--- --00 0100
SRA -- Shift right arithmetic 0000 00-- ---t tttt dddd dhhh hh00 0011
SRL -- Shift right logical 0000 00-- ---t tttt dddd dhhh hh00 0010
SRLV -- Shift right logical variable 0000 00ss ssst tttt dddd d000 0000 0110

- means unused ...


>
> > (4) Memory access and I/O
> > This is the stage where we take an address computed above and read or
> > write our local memory.  Our "local" memory is actually another
> > 512-word block RAM, that we'll use as scratch space.
>
> LB -- Load byte
> LUI -- Load upper immediate
> LW -- Load word
> SB -- Store byte
> SW -- Store word

We don't need any byte or 16-bit instructions.  OGA is spec'd out to
not understand anything but 32-bit pixels anyhow (we'll provide ways
to pretend to do 8-bit, but that's a separate discussion).  So if the
CPU needs to process 8-bit words, we'll just have to act like Alpha
and use extra code.

OK yes, I've read about unaligned acess on some early Alpha's from the
lkml IIRC.

> > I believe the MIPS processor uses the ALU to add the contents of one
> > register to a short immediate value stored in the instruction, and
> > that's used as the address.  We should do the same.  That makes it so
> > that the only memory addressing mode is reg-value + offset.
>
> Yes and imediate values are 16 bits. However if we have only 512 bytes
> of ram then you only really need 9 bits. Only J and JAL take
> immediates that are greater than 16bits, 26bits.

512 32-bit words.

ok thats easier to have to live within. However since it's offseted
from a 32 bit number anyway... You still might have unaligned access
or a 64 gibibyte instead of a 4 gibibyte addressable memory range.

This will either make it easier to code the instruction, or give us
more freedom.

>
> > In addition, this is also the stage where we'll want to do other
> > I/O-related operations, such as providing access to real graphics
> > memory and controlling other aspects of the GPU that are accessible by
> > this processor.  We'll make that available, to appear as another
> > 512-word space (or more or less as necessary) or read-only and
> > write-only "memory locations".
>
> OK I don't know what you want io instructions to look like.

They can look like memory load/store instructions.

OK memory mapped I/O some goes on-board scratch ram others go further away.

But there are also
cases where we might want a single instruction to cause a word to be
popped straight from one fifo and pushed into another.

OK I don't know mips but 8086 had IN and OUT instructions
it just let you send or receive a register(AL, or AX) to a port.
If it was an immediate argument you were limited to 0 to 255, however
if you used DX you could get 0 to 65,535 ports.
How many fifo's do you need? would a splice -- port-to-port
instruction be useful?
Is it always 32 bits that move?
do you want a count of words to move at a time?

Also, we need ways to handle writing to full fifos and reading from
empty ones.  One way would be to have the instruction block, but being
able to pause the pipeline adds extra logic and complexity that we
might like to avoid.


OK so do you want an error return or do you want to query for size, or
do you want interrupts? I think being able to query the size of a fifo
would be best.

--
http://dmoz.org/profiles/pollei.html
http://sourceforge.net/users/stephen_pollei/
http://www.orkut.com/Profile.aspx?uid=2455954990164098214
http://stephen_pollei.home.comcast.net/
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Designing a CPU

Reply via email to