On Wed, 2 Feb 2005 00:20:21 +0100, Rodolphe Ortalo
<[EMAIL PROTECTED]> wrote:
> On Tuesday 01 February 2005 23:54, Daniel Phillips wrote:
> [...]
> > What is wrong with memory-mapping the DMA buffer?  This would not give
> > access to the DMA control registers themselves.
> 
> Nothing per se. In fact, that's what I try to achieve too.
> 
> However, if you mmap in userspace directly the direct DMA buffer area, then
> only one process can do it. That's pretty X centric. (It's not a criticism
> btw, just a remark.). 

Yeah.  I guess most processes would allocate indirect buffers.

>Also, you need kernel-specific mechanisms to ensure
> that this is locked memory (never goes to swap, never gets remapped) and/or
> to have multiple contiguous pages.

I thought this was simple in Linux.

> Feasible, but maybe not so fancy after all. Plus you also have issues wrt to
> starting DMA and waiting for it to end (all via ioctl()).

I would have our kernel driver provide separate calls to initiate DMA
and sleep on interrupts.  Basically, any process should be able to ask
to go to sleep and be woken up based on an interrupt bitmask for the
device.

> My idea is more to let the kernel driver reassemble the regular pages
> submitted by (possible several) processes for DMA execution using the
> indirect DMA mode. That makes the indirect mode the norm, but it seems to me
> it integrates more naturally with kernels virtual memory mechanisms.
> 

Like I say, the only problem is when you get lots of little packets. 
Say you're just drawing rectangles.  The packets are short.

Say it takes 10 words to specify a rectangle drawing packet.

The X server would store those 10 words in its indirect buffer, then
it would submit that to the driver which incurs (a) ioctl overhead,
(b) 3 words in the ring buffer, (c) a PIO to change the queue tail
pointer, and (d) a fair amount of latency just getting out that PIO
because the bus is probably already busy.

That's a huge amount of overhead.  If the X server just controlled the
ring buffer on its own, there would be no ioctl overhead, and we'd
only have to deal with the PIO.  Furthermore, if there were some sort
of timer, then the PIO would only happen periodically, further
reducing the overhead.  (This is particularly bad with CopyArea which
doesn't have any way to submit lists of copies, making X protocol and
CPU processing the dominant factor in its performance.)

I'll just note that I worked out a fully DMA DDX layer for a
particular GPU not that long ago, and I found all sorts of
inefficiencies and ways to get around them.  It was somewhat of a
challenge to deal with DMA buffer flushes (and dual processor issues)
from user space under Solaris which provides no straightforward means
for user space processes to do DMA.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to