On 11/27/07, Farhan Mohamed Ali <[EMAIL PROTECTED]> wrote: > I won't have much to do during my break either (December 19-jan 13). > Actually i won't have much to do after december 11 i think. > Most things verilog are fine with me. Maybe you can give a rough idea > about the complexity of each task (which are the trivial ones and which > are more substantial)?
Ok, let's see what I can come up with here... > > - Modify the PCI controller so that it synthesizes for the Lattice chip The two main issues are IO drivers/receivers and a meta-coding issue. In order to make the thing meet timing, I have to trick the synthesizer into not optimizing the logic the way it normally wants to. You can tell it that the inputs already have 4ns of delay on them from the bus all you want, and the P&R will try to take it into account, but the synthesizer is stupid and completely ignores all of that. So what I had to do was define a whole bunch of multiplexer modules that I route things through, and then I tell the synthesizer not to optimize through those modules. This forces the logic to take the shape I want. The way I did this was put meta-comments into the Verilog code that ISE understands. There are two porting options for this. One is to figure out what the meta comments are for Synplicity (or whatever synthesizer we use for the Lattice part). That would be the easiest. The other would be to manually instantiate multiplexers in the same way that you can manually instantiate a block RAM or a multiplier block. Supposedly the synthesizer will see this as a boundary to optimization and not optimize across it. So this involves some research. > > - Build the two halves of the bridge logic to carry memory/reg > > accesses to the Xilinx For various reasons, we may have to write a new bridge from scratch. We'll see. We just need to define a protocol that describes how requests are encoded to travel across the bridge. They're all memory accesses. With writes, you get an address and a sequence of data. With reads, you specify an address and a count, and then data comes back later. If multiple things can go on at the same time, then you have to arbitrate them (although I think we'll just prevent that). > > - Glue PCI to the bridge and the SPI PROM controller This is just address decode logic. A certain address relative to a certain BAR is hit, and we need to decide whom to talk to. > > - Design logic for the Xilinx to process "engine" register accesses > > (so we can configure the memory controller, video controller, etc.) This is just another block of address decode. > > - Design an arbiter that manages competing memory accesses between PCI > > (bridge) and video. This is a scheduler. The module would have multiple ports for each agent wanting to talk to memory. A reader or a writer is an agent, so if something wanted to do both, we're best off as treating as two agents. The last interface on the module connects to the memory controllers. Those requests come in according to fifo protocol, and we need to have some kind of priority selection to choose whom to pay attention to. The way I've done this before is to have a 1-hot "you are allowed" encoding. If someone is allowed, their requests are paid attention to and forwarded to the memory controller(s). If not, they're blocked (fifo full signal asserted). If a higher priority request comes in, the scheduler can take a few cycles to make the decision and alter the allowed registers. A writer agent is one fifo, where each word is an address with data. A writer is a pair of fifos, one that takes addresses for requests, and the other returns the data. The memory controller allows tags to be passed through so you can tell whose data is coming out so you know whose return fifo to write to. One challenge with this is making sure that return fifos don't fill. If you request a read, the memory controller just processes it and returns the data. If you don't take the data, it's gone. You need to make sure that the number of requests in the request fifo plus the number that can be outstanding in the memory controller pipeline does not exceed the number of free entries in the return fifo. Also, a really smart scheduler will take into consideration things like memory row misses. If you have more than one agent with the same priority, as long as the agent you're paying attention stays on the same memory row, you want to stick to it. If it's going to cause a row miss (use some heurstic like assume a row hit as long as the lower 4 bits of the row portion of the address stays the same), you might want to see if some other agent wants to access that same row. This way, you magically save 20 cycles of delay. It's debatable, however, how much that wins you because access patterns can be anywhere from well-behaved and linear (where row misses can be ignored at the high level), or completely random, where the chances of two agents wanting the same row are so low that you might as well just assume they don't and don't bother with the extra logic. Finally, consider the rate at which an agent makes requests. The video controller, for example, should have the highest priority. However, if its request generator is in the video clock domain, requests may come in slower than the memory controller would process them. If you switch immediately to video, this could cause excessive row misses due to excessive switching. Instead, you might want to wait until that agent has made some minimum number of requests before switching. > > - Design top levels modules for both FPGAs and wrap with pad rings. The pad ring is a module that breaks out individual pins. Some I/O buffers may be instantiated here (if not done already at an inner module and they're not inferred correctly). Also, this is where you put clock generators. Pins that constitute busses are grouped together and connected to the multi-bit ports on the top-level module. This one is more tedious than anything else. We can provide at least a partial pad ring. > > Phase 2: Installing HQ > > > > - Design I/O interfaces for HQ that it would use to get access to the > > bridge and intercept PCI transactions. Some discussions have to happen before this, but basically, some fifos tie into the MEM stage of HQ and connect to other things in the chip. I described this at length in some earlier emails which we'll have to dig up. This would also include other control registers, VGA I/O-space registers, etc. > > - Insert HQ into the XP10 > > - Develop test code for HQ and run it both in simulation and in a real > > device Patch it into the glue logic in some sensible way. We'll know how to do this when the glue logic sans HQ is working. > > Phase 3: BIOS ROM > > > > - Get basic BIOS code together. This mostly involves finding out the > > format and putting together a skeleton. Without HQ, we can have it do > > something simple like program memory and video controllers. Research. Find web pages and books that document this and write some assembly code. > > - Get started on VGA BIOS code > > > > Phase 4: VGA > > > > - Complete the nanocode for HQ that emulates at least CGA 80x25 text mode. Understand what process has to be performed and write the code to do it. Note that HQ is single-threaded, so in the middle of doing translations, it needs to have explicit subroutine calls sprinkled about that will look at request fifos from PCI and process them. I've also described this in detail before. Did I miss anything? -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
