On 8/20/06, Jon Smirl <[EMAIL PROTECTED]> wrote:
On 8/20/06, Timothy Miller <[EMAIL PROTECTED]> wrote:
> If you have submitted a bunch of GPU commands (because you didn't have
> any more to submit), and then part way through the process of DMAing
> those commands, you decide you want to send more, how do you tell the
> GPU to fetch those new commands?
With current hardware there is only one command stream and you have to
wait until it finishes. With DX10 there are multiple command streams.
The commands are usually written into VRAM using shared memory access.
The GPU reads them from VRAM and avoid the PCI bus.
Are these reused blocks of commands? That is, are they like scripts
launched repeatedly?
You can use tricks like making the last command a loop. Write you new
commands somewhere else in memory. Then use shared VRAM access to
modify the loop instruction with a single memory write. It can be done
safely.
TROZ uses PIO to load commands into a large queue. It gets excellent
throughput, but it ties up the CPU to fill the queue. DMA would make
things all-around more efficient.
DMA is primarily used for image transfer.
That doesn't make sense unless the CPU hardly generates any commands
at all. Do you know how horribly inefficient PIOs are?
I've done graphics drivers that used all PIO and graphics drivers that
used DMA and avoided PIO like the plague. For the same GPU, the DMA
driver for xmarks numbers were 3 times faster than the PIO driver.
For OGA, DMA for rendering commands is critical since we don't have
hardware T&L.
Don't forget about the problem of cache coherency with the CPU. If you
write the commands into system memory that memory has to be flushed
out of the CPU cache. That can be slower than just writing to VRAM to
being with.
Depends on how much data has to be flushed, but flushing from L2 to
main memory is always faster than your host interface. Also, isn't
this coherency thing a solved problem? There are only a handful of
cases where that has been a problem for me, and it had to do with the
PCI chipset buffering data, not CPU/memory coherency.
This reminds me, I haven't seen any mention of GART/AGP technology.
The internal GPU DMA engine depends on that to make system memory
appear in it's internal address space. The GPU DMA engine always runs
with internal addresses, if those addresses end up in the internally
mapped AGP area they turn into PCI bus ops.
Yeah. What does AMD call it? Iommu or something? Anyhow, I don't
think we have to worry about that too much.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)