In testing and experimenting, I noticed something strange, and I was
wondering if anyone could help shed some light on this.

Keep in mind that OGD1 does not support DMA at this time, so
everything we do is simple PIO access to the card.  If I use the test
"x11perf -putimage500", I get a result of about 99/sec.  That
translates to a bus throughput of about 94 megabytes/sec.

If, on the other hand, I use memcpy, I only get about 24 megs/sec.

I've looked at the source to x.org, and I just can't see them doing
anything special.  I don't see any use of inline assembly or
processor-specific instructions.  They use memcpy for aligned copies
and they do something more complex for unaligned copies.

I tried doing just 32-bit word copies (to try to imitate what they're
doing), and even that didn't get me any faster than about 24 or 25
megs/sec.

What could possibly be making my code so much slower than theirs?

Note:  We looked at it with a PCI bus analyzer, and what we see is
that the x.org code results in long bursts, while ours results in
individual transactions, which would explain why we're so slow.  What
could x.org code be doing that would result in bursts, while ours
doesn't?

Thanks.

--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to