Eric Smith wrote:
Patrick wrote:
Ok, here are some assumptions I made.
Load/store architecture
unified instruction, data, registers. In other words the 512 memory
locations contain both code data and registers
Looking at the instruction count, I think we can certainly use one of
the FPGA 512x36 RAM blocks for the nanocontroller.
The XC3S1500 has 32 of the 18Kbit BlockRAMs, and the XC3S4000 has
96 of them, so it's probably reasonable to allocate several to the
nanocontroller to provide flexibility. And after all, it's an FPGA,
so tweaking the number of BlockRAMs assigned to the nanocontroller
should only be a matter of changing a few lines of RTL.
The block RAMs have only two ports, so you can't use a single one
for code, data, and registers.
For a load/store architecture (that doesn't do both simultaneously),
you might be able to share one block RAM between instructions and
data. But if pipelining requires that data written by store
instruction n has to be be written at the same time as data read by a
load instruction n+1, then a separate block RAM is needed for data
(or a stall/pipeline bubble).
So we drop the pipelining. As Timothy has pointed out, it doesn't have
to be fast. Rather than try and pipeline the nanocontroller (which will
be constantly stalled waiting on card memory anyway) lets go the other
way. Assume 1 instruction every 5 clocks for the nanocontroller. That
should give enough stages to allow for a single read or write per clock
cycle.
Now for the math. Timothy said to expect a 20 clock delay for random
access to card memory. I'm going to assume that is 20 clocks in the
200Mhz domain. This means the controller would have to stall for 10
clocks for each external memory access. At 149000 memory accesses per
screen update that gives us 1.49M clock cycles for memory access. For
the program there are 62 instructions. One set of 13 is looped 64
times (the blit of the character bitmap). This gives us 881
instructions per character output times 2000 characters or 1.76M
instructions per screen update. At 5 clocks per instruction we get
8.81M clocks per screen update for the instructions. A grand total of
10.3M clocks per screen update or just under 10hz at 100Mhz controller
clock.
Of course that cuts in half on an 80x50 screen... Maybe we do need the
pipelining... At 1.2 (the .2 I tossed in for pipeline stalls and
flushes) ipc and 80x25 screen refresh rate of slightly over 27Hz is
possible.
Food for thought.
Patrick M
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)