On Fri, 28 Jan 2005 19:30:26 +0100, Nicolai Haehnle <[EMAIL PROTECTED]> wrote:
> On Friday 28 January 2005 18:21, Timothy Miller wrote:
> > I want to drop a feature:  The ability for the rendering engine to
> > access host memory.  That is, you can't have a texture in host memory.
> > Instead, DMA can do two things:  (1) fetch drawing commands, and (2)
> > move blocks of memory data in and out of the graphics memory.
> >
> > What this means is that if a texture is not in graphics memory,
> > something has to be swapped out first.
> 
> Sounds reasonable.
> 
> Question: Will DMA copies and rendering (be able to) run in parallel in the
> hardware? I.e. could there potentially be some rendering going on while at
> the same time a texture is being uploaded?

Well, seeing as how the bus can only handle one stream at a time, I
don't see how it matters that much.

If doing the texture DMA is going to overwrite something being used by
the current drawing operation, you'll want to sync the pipeline. 
Otherwise, I guess you can just queue up this DMA request along with
others.


> 
> > This then makes me thing about memory management.  What I would like is
> > unified memory management between GL and X.  We can implement this as a
> > daemon.  The daemon manages both graphic memory and pixmaps/textures
> > which have been swapped out.  In addition, it's good to use a user
> > process for this so that swapped out images can also be swapped to disk
> > (automatically by the VM).
> >
> > Our kernel driver can provide an ioctl interface which allows
> > applications to allocate memory (and when an application dies, the
> > kernel can figure it out and automatically free the resources).  While
> > this would entail some overhead, I don't think it would be so bad.
> >
> > One of the jobs of the memory manager is to make sure that textures are
> > available when they're needed.  Using a LRU algorithm, it can swap
> > textures in and out just before they get used.
> 
> So your idea of a memory manager looks somewhat like this:
> To the memory manager, the main entity is the "block of memory" (I expect
> the size of such a block would always be a multiple of some larger number,
> say 64KB). The manager doesn't really care whether this block contains a
> texture, a depth buffer, or chocolate cake.

Exactly.  BTW, we can do smaller.  It's useful to be able to cache
some relatively small pixmaps.  On the other hand, we want to be able
to do shared memory between the memory manager and other applications,
so we need to manage in some minimum block size, say, the page size on
the hosting processor.

> 
> Applications (this includes 3D clients and the X server itself) can allocate
> blocks, which gives them an opaque handle. 

Well, they do need a handle to track the resource, but the security is
probably good enough that we can also provide the graphics memory
address as well.  Yes, one client can clobber another's graphics
memory, but that's all.

> They can then request that the
> block be in video memory for subsequent rendering operations. There also
> have to be some primitives that allow mmap'ing, reading and writing the
> block's contents. Something (either the kernel or a special dedicated
> daemon) moves blocks in and out of video memory as necessary.

Yeah, we can provide mechanisms to move data, or we can mmap blocks
from graphics memory, do shared memory, etc.

It occurs to me that there might be a pathelogical case where many
clients start competing for card memory to the point that an active
client's texture gets kicked out between the time it's requested in
and when it gets around to using it.  We may need some sort of locking
mechanism.  When you request to lock a texture, it gets swapped in (if
it isn't already) and gets locked.  When you unlock it, it may get
swapped out.  What happens then is that if we run out of memory, some
clients will block and get sorta queued up.  It sounds bad, but really
the software overhead for this is trivial compared to all the swapping
going on.

> All this happens via a kernel ioctl interface. So whether there is a user
> process behind all this is really an implementation detail, right? A
> relevant detail, but still - the applications themselves will never know
> the difference.

Yeah.  Processes with root priveledges, like X11 and the memory
manager, get full access to the drawing engine, but user clients can
only get at the ioctl interface.  X11 and the memory manager will
STILL use the ioctl interface for plenty of coherency stuff, though.

This is making me think about some way to automatically manage context
switches.  Since a lot of access to the GPU is being centralized
through the ioctl interface, we might be able to find a way for
clients to find out if another process has changed the context and
switch back.  The context for each client can actually be maintained
in software buffers, so when you want to switch TO a context, you just
dump a DMA packet, while switching FROM a context involves... nothing.

> > While OpenGL textures all have to be in graphics memory, X11 pixmaps
> > don't (you can punt to cfb), so they can be kicked out to make room for
> > textures with no problem.  However, there is some amount of complexity
> > involved in giving X access to the data when it's been swapped out into
> > the daemon's host address space.  In that case, the X server can be
> > instructed by the daemon (through a signal handler) to move the pixmap
> > data when a swap-out needs to happen.
> 
> I'm afraid I don't understand. How are pixmaps significantly different from
> textures? I also don't like the idea of creating new special cases for the
> X server - mostly because it removes us further from being able to run X
> without root privileges, but also because it just doesn't seem very
> reasonable: Why shouldn't we be able to accelerate 2D in the X clients in a
> DRI-like fashion?

The X server is, by definition, a priveleged process.  You should be
able to trust it to behave correctly and protect the hardware from
errant X clients.

I come from a world where we would (used to) run out of graphics
memory for pixmaps quickly.  Say you're in 24-bit MOX mode on a Raptor
2000.  It's an old product, so it only has 24 megs of memory.  You
have memory for the viewable framebuffer plus ONE screen-sized pixmap.
 Everything else has to live in host memory.  As such, our DDX layers
have always had to deal with the fact that a pixmap may be either
"accelerated" (in graphics memory) or not (in host memory), and do the
right thing in all cases.

Things are more complicated with 3D and texturing, because if a
texture is swapped out of card memory, you can't just punt to a
software algorithm to do the rendering.  Well, you can, but it sucks,
because you have to write a software renderer that can handle
everything including doing depth/stencil buffer reads and updates
properly.  In X11, it's just WAY simpler, and you can fall back on CFB
to do rendering to host memory pixmaps for you without it being an
issue.

Another thing to realize is that it's unusual to have more than one GL
client running at one time, but you USUALLY have LOTS of X11 clients. 
They all allocate gobs of pixmaps, and you really CAN run out of
graphics memory, and the user should never have to know that.

Remember that we have experience with multiple different kinds of
graphics cards.  We have ATC cards where pixmaps are allocated to be
2048x2048.  We have medical cards where images can be MUCH BIGGER than
the viewable framebuffer (so you can efficiently pan around in them). 
We also did the PGX32 for Sun, which was a 32 meg (or was it 64 meg?)
console/X11 card that they used in lots of their workstations.  In
each case, we encounter a different kind of memory management load,
and we've developed algorithms to deal with them effectively.  And in
each case, we have to manage pixmaps both in host memory and in card
memory and do it seamlessly.

So, yes, X11 is definitely a special case!

> > This then leads me into some things regarding security.  With texture
> > units being unable to access host memory directly, that plugs one
> > security hole.  In fact, as I see it, there's no reason to map any part
> > of the GPU register set into the user address space.  The OpenGL process
> > can access the graphics memory all it wants without being able to muck
> > with the kernel or other user processes, and it can share memory pages
> > with the memory manager daemon also without causing any trouble.
> > Furthermore, we only ever want the user process to instruct the GPU via
> > DMA, and we can limit the DMA command set so that the worst it can do is
> > corrupt graphics memory data.  While we'll give X11 direct control over
> > DMA, user-space OpenGL will generate command packets and place them
> > appropriately into a DMA buffer, but it will have to SUBMIT those
> > commands via an IOCTl.  Then the only thing left is being able to lock
> > up the GPU.  However, as it stands, I believe that the worst that can be
> > done is to make the rasterizer loop for a long time.
> 
> Again, I don't like the idea of special-cases for the X server. Apart from
> that, however, this sounds right.

X11 is a bottleneck, since it's single-threaded and every graphical
client on the box relies on it.  If you don't give it means to be more
efficient and flexible, you'll CRIPPLE it and thereby cripple all X11
clients.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to