Eric Smith wrote:

Patrick wrote:
Ok, here are some assumptions I made.

Load/store architecture
unified instruction, data, registers.  In other words the 512 memory
locations contain both code data and registers

Looking at the instruction count, I think we can certainly use one of
the FPGA 512x36 RAM blocks for the nanocontroller.

The XC3S1500 has 32 of the 18Kbit BlockRAMs, and the XC3S4000 has
96 of them, so it's probably reasonable to allocate several to the
nanocontroller to provide flexibility.  And after all, it's an FPGA,
so tweaking the number of BlockRAMs assigned to the nanocontroller
should only be a matter of changing a few lines of RTL.

The block RAMs have only two ports, so you can't use a single one
for code, data, and registers.

For a load/store architecture (that doesn't do both simultaneously),
you might be able to share one block RAM between instructions and
data.  But if pipelining requires that data written by store
instruction n has to be be written at the same time as data read by a
load instruction n+1, then a separate block RAM is needed for data
(or a stall/pipeline bubble).
So we drop the pipelining. As Timothy has pointed out, it doesn't have to be fast. Rather than try and pipeline the nanocontroller (which will be constantly stalled waiting on card memory anyway) lets go the other way. Assume 1 instruction every 5 clocks for the nanocontroller. That should give enough stages to allow for a single read or write per clock cycle.

Now for the math. Timothy said to expect a 20 clock delay for random access to card memory. I'm going to assume that is 20 clocks in the 200Mhz domain. This means the controller would have to stall for 10 clocks for each external memory access. At 149000 memory accesses per screen update that gives us 1.49M clock cycles for memory access. For the program there are 62 instructions. One set of 13 is looped 64 times (the blit of the character bitmap). This gives us 881 instructions per character output times 2000 characters or 1.76M instructions per screen update. At 5 clocks per instruction we get 8.81M clocks per screen update for the instructions. A grand total of 10.3M clocks per screen update or just under 10hz at 100Mhz controller clock.

Of course that cuts in half on an 80x50 screen... Maybe we do need the pipelining... At 1.2 (the .2 I tossed in for pipeline stalls and flushes) ipc and 80x25 screen refresh rate of slightly over 27Hz is possible.

Food for thought.

Patrick M
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to