Timothy Normand Miller wrote:
In testing and experimenting, I noticed something strange, and I was
wondering if anyone could help shed some light on this.

Keep in mind that OGD1 does not support DMA at this time, so
everything we do is simple PIO access to the card.  If I use the test
"x11perf -putimage500", I get a result of about 99/sec.  That
translates to a bus throughput of about 94 megabytes/sec.

If, on the other hand, I use memcpy, I only get about 24 megs/sec.

I've looked at the source to x.org, and I just can't see them doing
anything special.  I don't see any use of inline assembly or
processor-specific instructions.  They use memcpy for aligned copies
and they do something more complex for unaligned copies.

I tried doing just 32-bit word copies (to try to imitate what they're
doing), and even that didn't get me any faster than about 24 or 25
megs/sec.

What could possibly be making my code so much slower than theirs?

I bet that it is the MTRR setting.
Here is the MTRR from a system running an Nvidia card with the nv driver.
$ cat /proc/mtrr
reg00: base=0x00000000 (   0MB), size=1024MB: write-back, count=1
reg01: base=0xf0000000 (3840MB), size=  64MB: write-combining, count=1

This is the line from the X log where it sets it up:
(==) NV(0): Write-combining range (0xf0000000,0x4000000)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to