Since we're dumb and don't have checkpoints and releases (yet), the repository goes through phases when you can't just download and simulate it. Aside from the fact that you have to get your own DDR memory model, the RTL is currently in a state where you can simulate it. We've also fixed a number of PCI bugs.
There's something weird we encountered. It seems that memcpy is really evil. We expected PIO read performance to be bad. It turned out to be far worse than we expected. Analysis showed that within 512-byte regions, the order in which words are fetched is RANDOM, completely defeating our caching scheme. We made the cache 4 times larger and dealt with that problem, but now what we find is that there's a surprising amount of idle time on the bus. Just lots of dead cycles between transactions. Interestingly, if we write our own loop that just reads 32-bit words one at a time, it doesn't affect performance, even though it's a hell of a lot less complicated then memcpy itself (which has to be to deal with byte alignment issues). There's nothing about the source code to memcpy that would give any indication as to why its read ordering is random. Our best guess is that it just comes down to a consequence of using OOO processors, although the sequential 32-bit copy IS sequential. We're thinking about writing an SSE-based copy routine. None of the PC chipsets are smart enough to consolidate sequential PCI reads into bursts, but if they're not REALLY stupid, then an SSE load instruction might at least fetch four at a time. That would increase the performance from bad to mediocre. Mind you, when we add acceleration, this will only matter for getimage, and even then, only as long as we don't have DMA. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
