On 9/6/07, Mark <[EMAIL PROTECTED]> wrote: > Is there any chance you could pound out a quick code sample for the DMA > case? Since the nanocontroller has to be able to support both this and > the VGA translation, I'd find it helpful to have an example for both. > If you have time, etc., of course.
Well, before I do that, let's start with some background. This way, others can participate in the process. One thing that we would use DMA for is simple memory moves. This doesn't require much smarts. To begin with, both the target and master have limited length bursts they can perform, because they intentionally have limited-sized address counters. For instance, the target will get a 32-bit address in, say 0x00012300, and process a burst, where it internally advances the address by 4 for every word. However, the counter only has 6 bits, so what it does is detect when it would roll over from 0x000123FC to 0x00012400 and terminate the burst. This forces the host controller to restart the burst with the next address. In other words, our PCI target forces burst to start and end on 256-byte boundaries. The counters are limited in this way for the sake of keeping counters small and clock rate up. When we have combined the PCI controller with other logic and see how it performs, we may consider increasing the counter size. Although the master design isn't quite finished, I intend to impose the same sort of limit. Now, the master really only cares about starting addresses, so in theory, it could handle bursts of any length. You only need to know an address when you initiate a transaction, after which you can burst streams of data all you like. Problems arise, however, when something prematurely terminates the transaction. When that happens, we have two options. One is to have the nanocontroller baby-sit the DMA master, detect when a transaction fails, figure out where to restart, and start it over again. Unfortunately, this means that we can't queue up multiple requests and then go do something else! The alternative, which I like better, is to give the master enough smarts to restart transactions that do not cross a 256-byte boundary. So it has a counter and can restart on its own. This way, we can queue up arbitrary numbers of requests. For a download (move from graphics memory to main memory), the nanocontroller would begin by making requests from the graphics memory system, in whatever size is appropriate for the queues and such. It would probably wait until the first word arrived from the memory controller and then request a write transaction from the DMA master. The write transactions would respect the 64-word boundaries. This would be followed by an interleaved succession of read requests to gfx memory and write requests to the master. Following the last write request would be a request to end the transaction. The actual data movement could be baby-sat by the nanocontroller, or we could implement a crossbar for it to set up for automatically connecting fifos. For an upload, a rather analogous approach would be taken. Note that the memory system too would involve short counters, so we can't make individual requests that cross those boundaries. Writes would typically be accompanied by an explicit address, but when you're streaming writes, all you have are data words, and there needs to be a counter that runs on its own for this. The counter would be set for each boundary-crossing. Read requests would typical be made as an address with a word count. Both the address counter would be short, and the word count would be small. In order to manage multiple requests, we want to be able to queue them. So for a streaming write, we would queue up an address with a count. When that count has expired, another request would be dequeued from the write command queue, with its starting address and count. Read requests would be queued up the same way. So, off the top of my head, here are some of the I/O ports we would employ: READ_REQUEST_ADDR (write these two to enqueue a read command) READ_REQUEST_COUNT READ_DATA (read this to dequeue) READ_DATA_COUNT (how many are in the return queue that can be pulled) WRITE_DATA (write these two to enqueue a single write) WRITE_ADDR WRITE_STREAM_ADDR (write these two to set up a streaming write) WRITE_STREAM_COUNT WRITE_STREAM_DATA (a write here enqueues data and advances addr) WRITE_QUEUE_FREE (how many free words in the write data queue) READ_COMMAND_FREE (how many free read command entries) WRITE_COMMAND_FREE (same for writes) Then there'd be other ports for things like setting up the crossbar and whatnot. Now, while we may use a crossbar for data moves, the nanocontroller would be directly involved in processing rendering commands. Rendering commands would start with a 32-bit word that describes the rest of the packet. For instance, there would be a packet type, a set of flags to indicate what attribute values are included, and maybe a number indicating how many of the particular rendering operation to do. For instance: * Packet header that indicates we're going to draw a triangle. It indicates that the triangle is solid and that the vertex color is included. It also indicates that there are two triangles to be drawn. * Since the vertex color is flagged, that's inserted here. * Three vertexes * Three more vertexes * Next packet.... DMA traditionally involves both ring buffers and linear ones. For simplicity, let's consider only dealing with a linear buffer. The controller gets programmed (by PIO or via a command in the ring buffer) to know whence to fetch and how many words to fetch. It would begin by requesting a few blocks of reads from the DMA master and then set about processing the data. It would interpret the commands and turn those into what the 3D engine would see as the same as PIO register writes. Between packets, it would request more blocks, at least sufficient for the next command. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
