On Wed, 2 Feb 2005 00:20:21 +0100, Rodolphe Ortalo <[EMAIL PROTECTED]> wrote: > On Tuesday 01 February 2005 23:54, Daniel Phillips wrote: > [...] > > What is wrong with memory-mapping the DMA buffer? This would not give > > access to the DMA control registers themselves. > > Nothing per se. In fact, that's what I try to achieve too. > > However, if you mmap in userspace directly the direct DMA buffer area, then > only one process can do it. That's pretty X centric. (It's not a criticism > btw, just a remark.).
Yeah. I guess most processes would allocate indirect buffers. >Also, you need kernel-specific mechanisms to ensure > that this is locked memory (never goes to swap, never gets remapped) and/or > to have multiple contiguous pages. I thought this was simple in Linux. > Feasible, but maybe not so fancy after all. Plus you also have issues wrt to > starting DMA and waiting for it to end (all via ioctl()). I would have our kernel driver provide separate calls to initiate DMA and sleep on interrupts. Basically, any process should be able to ask to go to sleep and be woken up based on an interrupt bitmask for the device. > My idea is more to let the kernel driver reassemble the regular pages > submitted by (possible several) processes for DMA execution using the > indirect DMA mode. That makes the indirect mode the norm, but it seems to me > it integrates more naturally with kernels virtual memory mechanisms. > Like I say, the only problem is when you get lots of little packets. Say you're just drawing rectangles. The packets are short. Say it takes 10 words to specify a rectangle drawing packet. The X server would store those 10 words in its indirect buffer, then it would submit that to the driver which incurs (a) ioctl overhead, (b) 3 words in the ring buffer, (c) a PIO to change the queue tail pointer, and (d) a fair amount of latency just getting out that PIO because the bus is probably already busy. That's a huge amount of overhead. If the X server just controlled the ring buffer on its own, there would be no ioctl overhead, and we'd only have to deal with the PIO. Furthermore, if there were some sort of timer, then the PIO would only happen periodically, further reducing the overhead. (This is particularly bad with CopyArea which doesn't have any way to submit lists of copies, making X protocol and CPU processing the dominant factor in its performance.) I'll just note that I worked out a fully DMA DDX layer for a particular GPU not that long ago, and I found all sorts of inefficiencies and ways to get around them. It was somewhat of a challenge to deal with DMA buffer flushes (and dual processor issues) from user space under Solaris which provides no straightforward means for user space processes to do DMA. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
