This is some of the email I had mentioned earlier wrt HQ. I had intended to reply to Petter a while back, so I'm doing it now. I've included a lot of stuff relevant to the structure of the VGA project, so if some people can help me keep track, extract and past into wiki, that would be great!
Note that all of this is very important. We need to keep track of both the microcode's view of this and the module interface. They're _slightly_ different, because they come from different perspectives about what's going on, even though they resolve to the same functions. On 9/9/07, Petter Urkedal <[EMAIL PROTECTED]> wrote: > On 2007-09-08, Timothy Normand Miller wrote: > > On 9/8/07, Petter Urkedal <[EMAIL PROTECTED]> wrote: > > > Here is my attempt to refine the I/O ports. For PCI I'm making wild > > > guesses, so nailing down the ports is just an easy way to expose my > > > misunderstandings. The point is to gain some understanding of how the > > > nanocontroller will interact with the PCI controller (and memory). > > > > > > Memory read > > > in MEM_READREQ_FREE Free slots in command pipe. > > > out MEM_READREQ_ADDR First address to read. > > > out MEM_READREQ_COUNT Number of words to read. > > > in MEM_READREPLY_DATA Data stream from memory. > > > in MEM_READREPLY_AVAIL Number of words in FIFO. > > > > Perfect. > > > > BTW, in the symbol name, we may want to add a reminder to the human > > programmer as to which ones are "trigger" writes. For instance, > > MEM_READREQ_COUNT triggers the read at the address that was programmed > > in. > Annotation: This is about the set of I/O ports connected to HQ that are involved in making a read request from main memory, via the bridge. We need to define the bridge interface a bit first. I'm not going to get this exactly right, but I'm doing interfaces today, and this is where we need to get started. // This is the part of the bridge in the XP10 that is inward-facing to PCI and HQ // We'll include the out-facing pins soon enough. module bridge_xp10( ... // Common for requests input [24:0] req_addr; // might need separate read and write addresses output busy; // cannot accept a request // Read requests input [6:0] read_req_count; // not sure on the max count; 64 for now input do_read; // Write requests input [31:0] write_data; input [3:0] write_bytes; // byte enable flags input do_write; // Read return path output [31:0] read_data; // return of read data output read_valid; ... endmodule I think the bridge will not have queues. This is one-at-a-time. Other logic will have queues. > We could use MEM_READREQ_ADDR so that we can allow skipping the count if > it's the same as the last one, then > > Memory read > in MEM_READREQ_FREE Free slots in command pipe. > out MEM_READREQ_COUNT Number of words to read. > out MEM_READREQ_ADDR_TRIG First address to read. > in MEM_READREPLY_DATA Data stream from memory. > in MEM_READREPLY_AVAIL Number of words in FIFO. So, on the top-level wrapper for HQ, these will look something like: // Memory read request queue input [3:0] mem_readreq_free; // Available req entries, used by microcode output [6:0] mem_readreq_count; // Part of the request output [24:0] mem_readreq_addr; // Address of first req output mem_readreq_enq; // Enqueue into request queue input mem_readreq_full; // read the fifo interface doc! // Memory read return queue input [31:0] mem_readreply_data; input mem_readreply_valid; input mem_readreply_count; // How many read words in queue output mem_read_deq; > > > > I'm not sure that the count has much meaning. Writes are nice in that > > in some cases like this one, we can just "fire and forget." :) > > All this great new terminology ;-) > > So now we have, > > Memory write > out MEM_WRITE_ADDR Start address. > in MEM_WRITE_FREE Free slots in output FIFO. > out MEM_WRITE_DATA Data stream to memory. > Annotation: Write requests to main memory via bridge // Memory write requests input mem_write_full; output mem_write_enq; output [24:0] mem_write_addr; output [31:0] mem_write_data; input [6:0] mem_write_free; // number of requests that can be queued Note that since HQ's pipeline cannot be held up, the busy signal doesn't really do anything for us. It's part of a normal fifo interface, but what we'll really be doing is finding out how many free req slots there are and then filling them. If we overfill, we'll lose a request, but that's a coding error. > And, this is unchanged: > > Master read > in PCI_MASTER_READREQ_FREE Free slots in command pipe. > out PCI_MASTER_READREQ_ADDR Host-mapped address to read. > out PCI_MASTER_READREQ_COUNT Number of words to receive. > in PCI_MASTER_READREPLY_DATA Data stream from host. > in PCI_MASTER_READREPLY_AVAIL Number of words in FIFO. This should be put into the wiki, but we're not going to implement it yet. In fact, that's the case for all of the stuff we've defined in this post. > > Now that I think about it, I rather prefer the idea of indicating > > count up front, rather than having to tag the last word. This way, > > nothing (besides the master, which really needs to know the most and > > soonest) has to keep track of when the last word is going to come > > through. > > I though the master needed to know in advance, but now that you mention > it I remember the PCI allows us to just terminate at will. Is there an > advantage with not using the count, that we can continue streaming past > the 2^6 or so limit if we wish? We don't want to do this in case the target decides to terminate with retry. If it does, we can't get the address right. > And for the number of instructions, it > does not matter. We write to a count before, or we write to another > register after. If we want to optimise, we replace the count port with > PCI_MASTER_WRITE_DATA_FINAL. I think we should go with the count. This gives us more freedom in the master to predict when a last entry is going to appear. > > And moreover, I'm also thinking that perhaps the DMA master should > > have separate command and write data fifos. This way, some other > > agent can be filling the data fifo asynchronously. For instance, some > > data words come in from the memory system, but the master doesn't know > > what to do with them, so it doesn't do anything, and then the > > nanocontroller gets around to sending a command to the master, and > > then it can do something with the data. More opportunity to make > > things asynchronous. > > Great idea! Let's put in a reminder (PCI_MASTER_WRITE_ROUTE). Now, if > we also replace the count, we have > > Master write > out PCI_MASTER_WRITE_ADDR Host-mapped address to write. > out PCI_MASTER_WRITE_ROUTE FIFO routing commands. > in PCI_MASTER_WRITE_FREE Free words in output FIFO. > out PCI_MASTER_WRITE_DATA Data stream to host. > out PCI_MASTER_WRITE_DATA_FINAL Final data word to host. > > Maybe using PCI_MASTER_WRITE_ROUTE should be mandatory, also when > filling the pipeline with PCI_MASTER_WRITE_DATA. That way, we can make > sure enough data to avoid PCI timeout before triggering the DMA. I would just call it PCI_MASTER_CMD, because all it does is tell the master to do something. These are the ports I would use: Master write data: out PCI_MASTER_WRITE_DATA Data word out PCI_MASTER_WRITE_BYTES Byte enables (optional, default to all) in PCI_MASTER_WRITE_DATA_FREE How many data words we can write out Master command: out PCI_MASTER_CMD Includes direction (read/write) and count in PCI_MASTER_CMD_FREE Number of commands we can write Master read data: in PCI_MASTER_READ_DATA Data word in PCI_MASTER_READ_COUNT Number of words available to read So, this is three queues. Two for data that can be filled as necessary, one for commands that control the state machine. If you put extra write data in the write queue, for instance, only the amount you request in the command will get written. This means you can do some things more asynchronously. (If you underfill the queue, or you don't empty the read queue, the state machine will insert wait states on the bus. That's good, but avoid it.) > > > Target of write (we're reading) > > > in PCI_TARGET_WRITEREQ_ADDR Target address of write. > > > in PCI_TARGET_WRITEREQ_COUNT Number of words to receive. > > > in PCI_TARGET_WRITEREPLY_AVAIL Number of words in FIFO. > > > out PCI_TARGET_WRITEREPLY_DATA Data stream from host. > > > > This one's tricky. With the target, we have absolutely no control > > here. For one thing, we "config" ports that set whether or not we're > > take PIO accesses. Either they go directly over to the Spartan, or > > they all come to us. Or perhaps we want to select by BAR. Not sure > > exactly yet. > > > > Now, the only time we do intercept PIO transactions is when we're > > really going to process them somehow, so the flow control can be as > > complex as we can afford in the time available. > > > > So, basically, I think we could make do with one physical fifo. The > > target keeps track of addresses for PIO bursts, so we could just push > > 64-bit entries into a fifo. 4 bits are byte flags. 28 bits are a Maybe address words above should be [27:0] instead of [24:0]. > > word address (1 GiB max space). 32 bits are data. One way to handle > > this is to have one I/O port samples (but doesn't dequeue) the > > flags/address word. The other I/O port grabs the data and dequeues. > > This way, in the unlikely event that you KNEW what the next address > > would be, you could just ignore it and grab the data in one cycle. > > > > These would be the I/O ports: > > > > in PCI_TARGET_WRITE_COUNT The number of words in the write queue > > in PCI_TARGET_WRITE_ADDRFLAGS Address and flags for a write data word > > in PCI_TARGET_WRITE_DATA Data of write word > Note: We need to discuss fifo counts. I want PCI's address granularity to be 64-word, but queues are best being 16-entry. In some cases, it doesn't matter. If the queue fills, wait states get inserted while we catch up. The ports in HQ for target write: input [6:0] pci_target_write_count; input [31:0] pci_target_write_data; input [3:0] pci_target_write_bytes; input pci_target_write_valid; // queue contains >0 entries output pci_target_write_deq; > Looks good to me. > > But this raises a question. I assume we can ignore byte-enables for > master mode, since we're deciding. For target read it also does not > matter for non-config space since we are just be filling in redundant > information. So is target write the only non-config mode where we care > about byte-enables? For master writes, we CAN care about byte enables. For target writes, we MUST care. For reads, we never need to care. > That also brings up a general question about all addresses. In the > nanocontroller we have a 32 bit granularity on scratch space. Would it > not save us code if address ports are all right-shifted 2 bits, and in > the case we need byte-enable, they are on a separate port? Doesn't matter too much. If we right-shift the address, that frees up two bits. What do we do with them? It's not enough space for byte enables. Maybe we should keep everything byte-oriented (even though we ignore the lower two bits), unless we discover that there's some major performance advantage, like how we don't have to left-shift counts to add to addresses. There's a tradeoff between getting the programmer confused and other kinds of efficiency. I can buy both approaches, so let's discuss it. > > You know what? We'll never be able to keep up with that. It's too > > complicated. There's absolutely no reason why PIO reads have to be > > fast, ESPECIALLY in the cases where we would actually intercept > > requests. PIO reads suck, and we should not put gobs of logic into > > trying to make it not suck. So, no, I think we should handle one at a > > time, and each individual transaction should be only one word. In > > this state, the target would be in a mode where it always asserts STOP > > at the same time as TRDY, on top of the usual timeout mechanism. > > > > So here are the ports: > > > > in PCI_TARGET_READ_PENDING Is a read pending? > > in PCI_TARGET_READ_ADDR The one address that is pending > > out PCI_TARGET_READ_DATA Where we write the one data word > > I like this simple solution if PIOs efficiency is secondary. We'd would > probably need interrupts to handle them without timing out. Alas, interrupts too would be a problem. Controlling the bridge also involves sequences of things. Interrupting that would be a race condition. So we're stuck with having to poll PCI at regular intervals. Here's the interface section on HQ: input pci_target_read_pending; input [27:0] pci_target_read_addr; output pci_target_read_deq; output [31:0] pci_target_read_data; output pci_target_read_valid; // pulse this with valid data > > So, if the read times out, the controller is smart enough to recognize > > that the address of a later retry is the same as the posted one and > > automatically returns the data (if we've supplied any) or times out > > (if we haven't supplied any). > > That sounded easy at first, but isn't there a race condition here? The > PCI target sets the address. The nanocode sees the address and start > computing the reply. The PCI target has timed out and gotten a new > request, then updates the address. The nanocode writes the reply to the > previous address. Hmm, we'll need to let the PCI target also detect > when the nanocontroller reads the address, and discard any data which > the nanocode writes prior to reading the last address. It will always be the same address. This is a requirement of PCI protocol. I don't think it's allowed that some other agent come in and make a request while this posted transaction is going on. > > Note that we could probably combine PENDING and ADDR. It's not a > > queue. Most of the bits will be the address, and one is a flag > > indicating if it's valid. The PENDING flag will be cleared whenever > > we write to the DATA port (which is also not a queue). > > I suggest we let ADDR be negative if there is nothing pending, since we > don't need addresses above 2 GiB. That way, we can use the bneg > instruction. > > in PCI_TARGET_READ_ADDR The one address that is pending or -1. > out PCI_TARGET_READ_DATA Where we write the one data word Do we have a bpos instruction? I'd rather keep the flag high assertion for valid. Putting the flag in the high bit is a good idea. Also, the interface I provided doesn't change for this. Although the deq and valid are probably the same signal, so we should combine them. input pci_target_read_pending; input [27:0] pci_target_read_addr; output pci_target_read_deq; // Removes request, asserts response, pulse with valid data output [31:0] pci_target_read_data; > > In the microcode we would do well to do some sort of caching, so that > > we can return data before timeout (by my design, we have only 8 PCI > > cycles, though). But that's all an implementation detail. > > Okay, let's postpone that, since it doesn't affect the ports. It'll > probably be more clear at we start to write the code. > > > Oh, and don't forget PCI_TARGET_INTERCEPT_CONFIG or whatever. > > Will this be unified with target-write ports? Can you type it out? This is just a configuration register that indicates routing for glue logic to HQ. For instance, a bit will indicate whether some/all PCI transactions go directly to the bridge or route through HQ. This registers should be accessible by HQ and also via a PCI config space register (extended cfg space). -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
