On Mon, 21 Mar 2005 13:51:31 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> This fd is currently just a generic character device and not a socket, > so DRI would have to be patched along with adding our driver to the X > tree, which isn't necessarily a bad idea. I can't think of any > compatibility problem with changing the DRI character device(s) to a > socket. The quick test is to try it and see if anything breaks. > > The DRI socket's job would just be to listen for connections, then > create the real socket and hand it to the client. > > We can probably manage to set up a per-client socket connection within > the existing DRI framework, and so be able to offer a driver variant > that works without upgrading X/DRI, for what it's worth. I haven't > tried this, and I still haven't looked at a lot of DRI code, so I can't > swear it will work. I'm not sure I like the idea of using the read/write interface. That'll most likely involve extra copies. Rather, I prefer to use shared memory pages, and the GL app sends pointers via an ioctl interface. Zero-copy. > > > > > - Proper scheduling means that we also need proper context > > > > switching, including preserving all the relevant hardware state, > > > > i.e. texture, blending, etc. settings. This will be expensive > > > > unless we figure out a way for the userspace driver to > > > > communicate "reconfiguration points" in the command stream that > > > > contains the necessary information to reload state. > > > > > > The kernel driver knows which task it got the command submission > > > from, so it can switch to the correct context. > > > > Yes, but it can be expensive. When the kernel switches contexts, it > > must make sure that all the on-card registers (I know you don't like > > "register writes", but that's what they are) > > Well let me clear something up: I'm not quite that dense ;-) I call > those the "real" registers, and of course you can't build anything > useful without them. The registers that sit somewhere on the engine > pipeline are the ones we're interested in at the moment. We can't > update a pipeline register whenever we want, we must arrange to update > it at exactly the right time: after data already in the pipeline has > progressed past that stage and before that stage processes any data > that relies on the new setting. > > This requires some clever register updating mechanism, which Timothy > partly described earlier: there is a queue of register update values > that runs in parallel with the pipeline and there is some means of > deciding when to pull a value out of the queue into a particular > register. This still leaves a bunch of questions, like: how exactly > are the queue values tagged, and what do you do when more than one > register needs to be updated at the same pipeline stage? Since I've done this before, I can answer the question directly.... While it's perfectly valid to look at it the way you've described it, I tend to think of the register writes going THROUGH the pipeline. Of course, being hardware, it's also parallel to the pipeline; it's just that they synchronization logic. As for numbering, that's easy too: Each pipeline stage has a number, and each register with that stage has a number. N bits for the stage number, M bits for the register number, gives you an ADDRESS that is M+N bits long. The only question is whether or not that 2**(M+N) sized address space is made part of the PIO address space or not. It would be simple to do so. > > Anyway, the point of this is, the 'real' registers aren't accessible in > any simple way, either for writing or reading. So if there are a set > of registers available via PIO, it is either an indirect interface to > the real registers or it is completely unsynchronized with the > pipeline, and it is up to software to compensate, probably by flushing > the pipeline before any register read or write. This is the main > reason I think that exposing a drawing register interface is worse than > useless. Reads of these registers would be unsynced with the pipeline; that is, if you do something that would cause a register to be updated, but that something is held up earlier in the pipeline, reading the register will give you "old" data. Note: based on experience, being able to easily read and write engine registers is ABSOLUTELY VITAL to debugging. I've done lots of special test cases, where it was necessary to write to registers one at a time. > For non-DMA control of the card, I suggested emulating the DMA stream > with an auto-incrementing PIO register. This is an inverted way of > looking at the problem: instead of seeing the world in terms of PIO > registers with the DMA interface built on top of them, you see the > world in terms of the DMA stream, or instruction stream if you like, > and provide a PIO interface capable of emulating the DMA stream for any > special situations where DMA isn't possible (though I still haven't > seen a clear explanation of how this might come up). Regardless of the fate of the registers appearing in the PIO space, what you describe is a totally valid and potentially very useful alternative. Add it to the list! Normally, you'd have the DMA engine feed fifo A, which is read by translation logic that then feeds fifo B, which is read by the engine. The PIO method you describe gives PIO access to fifo A. If we also allow PIO access directly to registers by number, then that would be PIO access to fifo B. > > that reflect OpenGL > > state (blending, Z test, texturing, texture environments, ...) are > > correctly preserved. > > That is easy: the driver will keep the OpenGL state in host memory. It > would be perverse to try to read the GL state from the engine > registers, since the 'real' registers reflect the GL state as of some > point in the past. On the other hand, if the host remembers the GL > state it is available immediately and efficiently. It does not make a > bit of sense to try to use the card as some sort of auxiliary, > out-of-sync state memory. > > The corollary is, except perhaps for debugging, there is no reason to > provide read access to the internal pipeline registers. Which ought to > save a little hardware and certainly will save some documentation. It > also removes the need to devise a register numbering scheme, which is > replaced by a queue tagging scheme that can be whatever is convenient, > and can even change from rev to rev, transparently to the driver. I don't see why the register numbers and register tags can't be exactly the same numbers. > > It must also make sure that all the referenced > > textures, including offscreen rendering targets, are in place and not > > swapped out. > > > > This means that the kernel must keep track of which memory areas each > > "GPU program" currently references and what all the registers > > contain. You can't just wave that away. > > Yes. This leads us back to the discussion of resource handles I think. > Perhaps you'd care to weigh in on that? > > > Some hardware support for writing/reading register states into a > > predefined area of video memory could help a lot here, but I don't > > know if that's feasible. > > I don't see why it's needed, since the driver can remember all the state > it sent to the card. It might be nice to have a way for the driver to > confirm that its view of drawing state matches the card's view, for > debugging purposes. > > For some non-pipeline state, direct r/w PIO access makes perfect sense, > for example, the cursor position register. > > > And even if it is feasible, the video memory management issues still > > remain. They can obviously be reduced by allowing each GPU program to > > lock memory in place so that it will not be moved by the video memory > > manager until a certain part of the program has been executed. > > Ideally, the DMA engine would advance its head pointer only after a > drawing operation has completely cleared the pipeline. But perhaps > that is too hard to implement in hardware. A reasonable alternative is > to flush the pipeline before recovering a resource that is known to be > in use. I agree with this approach. If you have to shuffle memory, you have to bite the bullet and do it serially with rendering. > > > > > - In the current DRI design, the kernel module does basically > > > > everything in a process context. With this design, it'll have to > > > > do almost everything in interrupt or bottomhalf context. This > > > > alone brings a number of problems, such as access to the calling > > > > process' data. > > > > > > Fortunately, the virtual pages are resolved to physical when the > > > process obtains the dma buffer. I don't think we need anything > > > else from process context. > > > > There is some meta information about the GPU program that doesn't fit > > into the DMA buffer itself. For example, with an in-kernel memory > > manager, we must communicate which textures and rendering targets the > > program currently needs and when they become "unlocked". > > Think of it like this: All userspace clients will need the kernel to > > issue some direct DMA commands for them: Memory moves, i.e. memory > > management stuff, and "calls" to indirect DMA. The meta information > > that is not in DMA memory reflects the direct DMA commands that > > userspace needs the kernel to issue. > > My assumption is, the kernel driver keeps all the necessary state on > behalf of each client. The client _updates_ the state in process > context, the kernel driver accesses the state via kernel address. I think this is the same as what I said before. :) > > However, this probably isn't too big a problem: This data shouldn't > > be too much, so we could just allocate a special page for it in > > memory. > > Indeed. Even if it is a lot, that's still ok. Someone want to count? :) _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
