I'm still working on simulating the PCI target, and I have a master I've written as part of that test. The master will go on to be rewritten to be the master that we put into OGD1, so I'm making sure it works properly as well.
I encountered a situation where the master violated bus protocol. One of the things I'm testing is situations where either the master or target inserts wait states. For those of you interested in the protocol, I can go into detail, but the nutshell is that I was simulating a "slow" bus master. That is, the master was inserting wait state cycles. This corner case is a little hard to fully test, but I think I have come up with an appropriate solution, and it appears to work for reads. Contriving a case for writes may be difficult. There are potential problems with this, and what I'd like to do is make sure that the situation simply never happens. That means making the microcontroller able to keep up. With 33MHz PCI, that shouldn't be a problem, but with 66MHz, I think it will be. I don't expect the microcontroller to run at more than 100MHz (in the FPGA). That also poses a problem, because there are numerous situations where it literally needs to be doing two things at once, and the branch instruction overhead for a single-threaded processor would be far too much. Instead, I think it would be better to make it interleave two threads. This gives us, effectively, two tightly-coupled 50MHz processors. (Everything but the program counter would be shared.) So, now, we have a situation where we cannot supply or consume data as fast as the PCI controller can move it. For reads, the simplest solution is to have a big (maybe 64 entries) return data fifo. For other reasons, we don't want to request overly long bursts anyhow, so that's settled. To keep things moving reasonable well, we'd request bursts of 32 reads. Request two bursts, then grab the data for the first one, then request another burst, then grab the data from the second burst, and so on. Keeps things moving reasonably well. Writes are a little harder. Certainly, one solution would be to fill an out-going data fifo with all writes for the request before allowing the controller to start. But that means that we can't pipeline the writes. Generate write data, filling the fifo, then tell it to go, and then wait around for it to finish. Or rather than waiting, we could throw in another burst. But it would be much more efficient to be able to push data into the fifo as we're getting it. When the first word is available, start the transaction, and then just dump words in as we get them. I'm not entirely sure what is a good way to solve this problem. But it gets even worse. There's something I haven't bothered to tell you. The master state machine isn't smart enough to restart a transaction after whomever we're writing to has decided to terminate the transaction early. For speed reasons, our own target breaks transactions at 256-byte boundaries. This is primarily so that we don't have to have a counter of more than 6 bits (we may enlarge that later). It is convenient to end the transaction (and 64 word bursts are quite long anyhow) and let the other master begin again. Well, we have to do the same. The basic pattern our "software" code in the microcontroller has to employ goes something like this: - request a transaction at a certain address (read or write) - request a certain number of words to be transferred - check back later to find out if the right number of words got moved. - if so, we're done and can do something ele. - otherwise, wipe out everything in the master request fifo - recompute a new starting address, at the location where we have to begin again - request the remaining number of words - continue the pattern until all of the data is transferred. We can run into a pathological situation where the target of our accesses terminates after every word. This is legal, but it sucks because we spend all of our time babysitting the master, AND we cannot do any kind of streamlining. The reason to do it this way is that we consider this to be a corner case. It shouldn't happen very often, so it's better to dedicate some ROM space to some code than to dedicate gates in the chip. We especially want to avoid computing addresses anywhere but in the microcontroller. There needs to be some discussion about this. There may be good reason to move the restart logic into hardware, for instance. Not because it's time critical but because a restart is so disruptive to efficient dataflow, and the real cost there is all of the code we have to write to make it work smoothly. Consider the situation where we're fetching GPU commands from a buffer in the host. One thread would spend all its time babysitting the PCI master. The other thread would be busy processing the read data coming in. This should work out okay, but we need to discuss some fine details to make sure that everything keeps up. One thing I haven't solved is how to deal with an empty data fifo. I absolutely don't want to design into the microcontroller anything that allows the pipeline to stall. Somehow, that has to affect flow control with low overhead. But we don't necessarily know about it until several stages down in the pipeline, making is a massive flow control hazard. -- Timothy Miller http://www.cse.ohio-state.edu/~millerti _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
