On Wednesday 02 February 2005 04:02, Timothy Miller wrote: [...] > > However, if you mmap in userspace directly the direct DMA buffer area, > > then only one process can do it. That's pretty X centric. (It's not a > > criticism btw, just a remark.). > > Yeah. I guess most processes would allocate indirect buffers.
Personally, I think processes should be *obliged* to operate this way. That would enforce conformance to the virtualization assumptions of multiuser/multiprocesses systems (like Unix ones) and to me the benefits seems worth the constraints. But that's just my viewpoint of course. > >Also, you need kernel-specific mechanisms to ensure > > that this is locked memory (never goes to swap, never gets remapped) > > and/or to have multiple contiguous pages. > > I thought this was simple in Linux. That's possible. X11 did it that way for a long time, on linux but also on *BSD. But it interferes with other things in the kernel (unified buffer cache for example, or also co-existence of X11 and linuxfb). So there is a cost (software maintainance and kernel-specific code issues mostly). > I would have our kernel driver provide separate calls to initiate DMA > and sleep on interrupts. Basically, any process should be able to ask > to go to sleep and be woken up based on an interrupt bitmask for the > device. Nice! Maybe not scalable to many processes. Could the interrupt be associated with the last address fetched by DMA (so that the kernel driver can find back the initiating process by physical->virtual transformation of that adress). > > My idea is more to let the kernel driver reassemble the regular pages > > submitted by (possible several) processes for DMA execution using the > > indirect DMA mode. That makes the indirect mode the norm, but it seems to > > me it integrates more naturally with kernels virtual memory mechanisms. > > Like I say, the only problem is when you get lots of little packets. > Say you're just drawing rectangles. The packets are short. > > Say it takes 10 words to specify a rectangle drawing packet. > > The X server would store those 10 words in its indirect buffer, then > it would submit that to the driver which incurs (a) ioctl overhead, > (b) 3 words in the ring buffer, (c) a PIO to change the queue tail > pointer, and (d) a fair amount of latency just getting out that PIO > because the bus is probably already busy. > > That's a huge amount of overhead. If the X server just controlled the > ring buffer on its own, there would be no ioctl overhead, and we'd > only have to deal with the PIO. Furthermore, if there were some sort > of timer, then the PIO would only happen periodically, further > reducing the overhead. (This is particularly bad with CopyArea which > doesn't have any way to submit lists of copies, making X protocol and > CPU processing the dominant factor in its performance.) All this is true. But unless X11 programmers go nut all at once, that probably won't happen: X11 will gather multiple requests itself by copying. Small items coming from multiple processes is the case were copying is more performant than indirect addressing and page-tables magic. The remaining pathological case are full-screen applications that call XFlush() after every single drawing operation call. It can occur (in sane applications) but it is usually the indication that they require special APIs (like video/TV software for example). > I'll just note that I worked out a fully DMA DDX layer for a > particular GPU not that long ago, and I found all sorts of > inefficiencies and ways to get around them. It was somewhat of a > challenge to deal with DMA buffer flushes (and dual processor issues) > from user space under Solaris which provides no straightforward means > for user space processes to do DMA. I think I understand. But mmapping all the DMA resources of the hardware to one single "master" userspace processes, even if it is one solution, may not be the best one. If the driver can virtualize a DMA resource and provide one to each userspace process that wants one (like in the indirect mode case), it would be a definitive plus in term of software. I do agree that this is not so simple: there are interrupts too. (Apart from the ones directed to itself, how can the driver know that an interrupt is for one process or another? How do we let the userspace process insert an interrupt process in the command buffer without bypassing kernel control?) And there are data transfers too (How to synchronize the userspace process with the engine? etc.) But it seems to me no graphics hardware designer has ever been asked to try to provide elementary hardware mechanisms that could allow some kernel software to provide such properties. At least for affordable hardware and simple mechanisms. For once, there is one listening, so this is a rare opportunity for me to insist heavily. (But I hope I'm not too heavy. ;-) Rodolphe _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
