On 11/27/07, Farhan Mohamed Ali <[EMAIL PROTECTED]> wrote:
I won't have much to do during my break either (December 19-jan 13).
Actually i won't have much to do after december 11 i think.
Most things verilog are fine with me. Maybe you can give a rough idea
about the complexity of each task (which are the trivial ones and which
are more substantial)?
Ok, let's see what I can come up with here...
- Modify the PCI controller so that it synthesizes for the Lattice chip
The two main issues are IO drivers/receivers and a meta-coding issue.
In order to make the thing meet timing, I have to trick the
synthesizer into not optimizing the logic the way it normally wants
to. You can tell it that the inputs already have 4ns of delay on them
from the bus all you want, and the P&R will try to take it into
account, but the synthesizer is stupid and completely ignores all of
that. So what I had to do was define a whole bunch of multiplexer
modules that I route things through, and then I tell the synthesizer
not to optimize through those modules. This forces the logic to take
the shape I want. The way I did this was put meta-comments into the
Verilog code that ISE understands.
There are two porting options for this. One is to figure out what the
meta comments are for Synplicity (or whatever synthesizer we use for
the Lattice part). That would be the easiest. The other would be to
manually instantiate multiplexers in the same way that you can
manually instantiate a block RAM or a multiplier block. Supposedly
the synthesizer will see this as a boundary to optimization and not
optimize across it.
So this involves some research.
- Build the two halves of the bridge logic to carry memory/reg
accesses to the Xilinx
For various reasons, we may have to write a new bridge from scratch.
We'll see. We just need to define a protocol that describes how
requests are encoded to travel across the bridge. They're all memory
accesses. With writes, you get an address and a sequence of data.
With reads, you specify an address and a count, and then data comes
back later. If multiple things can go on at the same time, then you
have to arbitrate them (although I think we'll just prevent that).
- Glue PCI to the bridge and the SPI PROM controller
This is just address decode logic. A certain address relative to a
certain BAR is hit, and we need to decide whom to talk to.
- Design logic for the Xilinx to process "engine" register accesses
(so we can configure the memory controller, video controller, etc.)
This is just another block of address decode.
- Design an arbiter that manages competing memory accesses between PCI
(bridge) and video.
This is a scheduler. The module would have multiple ports for each
agent wanting to talk to memory. A reader or a writer is an agent, so
if something wanted to do both, we're best off as treating as two
agents. The last interface on the module connects to the memory
controllers.
Those requests come in according to fifo protocol, and we need to have
some kind of priority selection to choose whom to pay attention to.
The way I've done this before is to have a 1-hot "you are allowed"
encoding. If someone is allowed, their requests are paid attention to
and forwarded to the memory controller(s). If not, they're blocked
(fifo full signal asserted). If a higher priority request comes in,
the scheduler can take a few cycles to make the decision and alter the
allowed registers.
A writer agent is one fifo, where each word is an address with data.
A writer is a pair of fifos, one that takes addresses for requests,
and the other returns the data. The memory controller allows tags to
be passed through so you can tell whose data is coming out so you know
whose return fifo to write to.
One challenge with this is making sure that return fifos don't fill.
If you request a read, the memory controller just processes it and
returns the data. If you don't take the data, it's gone. You need to
make sure that the number of requests in the request fifo plus the
number that can be outstanding in the memory controller pipeline does
not exceed the number of free entries in the return fifo.
Also, a really smart scheduler will take into consideration things
like memory row misses. If you have more than one agent with the same
priority, as long as the agent you're paying attention stays on the
same memory row, you want to stick to it. If it's going to cause a
row miss (use some heurstic like assume a row hit as long as the lower
4 bits of the row portion of the address stays the same), you might
want to see if some other agent wants to access that same row. This
way, you magically save 20 cycles of delay. It's debatable, however,
how much that wins you because access patterns can be anywhere from
well-behaved and linear (where row misses can be ignored at the high
level), or completely random, where the chances of two agents wanting
the same row are so low that you might as well just assume they don't
and don't bother with the extra logic.
Finally, consider the rate at which an agent makes requests. The
video controller, for example, should have the highest priority.
However, if its request generator is in the video clock domain,
requests may come in slower than the memory controller would process
them. If you switch immediately to video, this could cause excessive
row misses due to excessive switching. Instead, you might want to
wait until that agent has made some minimum number of requests before
switching.
- Design top levels modules for both FPGAs and wrap with pad rings.
The pad ring is a module that breaks out individual pins. Some I/O
buffers may be instantiated here (if not done already at an inner
module and they're not inferred correctly). Also, this is where you
put clock generators. Pins that constitute busses are grouped
together and connected to the multi-bit ports on the top-level module.
This one is more tedious than anything else. We can provide at least
a partial pad ring.
Phase 2: Installing HQ
- Design I/O interfaces for HQ that it would use to get access to the
bridge and intercept PCI transactions.
Some discussions have to happen before this, but basically, some fifos
tie into the MEM stage of HQ and connect to other things in the chip.
I described this at length in some earlier emails which we'll have to
dig up. This would also include other control registers, VGA
I/O-space registers, etc.
- Insert HQ into the XP10
- Develop test code for HQ and run it both in simulation and in a real device
Patch it into the glue logic in some sensible way. We'll know how to
do this when the glue logic sans HQ is working.
Phase 3: BIOS ROM
- Get basic BIOS code together. This mostly involves finding out the
format and putting together a skeleton. Without HQ, we can have it do
something simple like program memory and video controllers.
Research. Find web pages and books that document this and write some
assembly code.
- Get started on VGA BIOS code
Phase 4: VGA
- Complete the nanocode for HQ that emulates at least CGA 80x25 text mode.
Understand what process has to be performed and write the code to do
it. Note that HQ is single-threaded, so in the middle of doing
translations, it needs to have explicit subroutine calls sprinkled
about that will look at request fifos from PCI and process them. I've
also described this in detail before.
Did I miss anything?