In testing and experimenting, I noticed something strange, and I was wondering if anyone could help shed some light on this.
Keep in mind that OGD1 does not support DMA at this time, so everything we do is simple PIO access to the card. If I use the test "x11perf -putimage500", I get a result of about 99/sec. That translates to a bus throughput of about 94 megabytes/sec. If, on the other hand, I use memcpy, I only get about 24 megs/sec. I've looked at the source to x.org, and I just can't see them doing anything special. I don't see any use of inline assembly or processor-specific instructions. They use memcpy for aligned copies and they do something more complex for unaligned copies. I tried doing just 32-bit word copies (to try to imitate what they're doing), and even that didn't get me any faster than about 24 or 25 megs/sec. What could possibly be making my code so much slower than theirs? Note: We looked at it with a PCI bus analyzer, and what we see is that the x.org code results in long bursts, while ours results in individual transactions, which would explain why we're so slow. What could x.org code be doing that would result in bursts, while ours doesn't? Thanks. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
