Re: Clip Lists
On Wed, 2007-11-28 at 06:19 +0100, Soeren Sandmann wrote: > - When a client wishes to copy something to the frontbuffer (for > whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses > plain old XCopyArea() with the generated pixmap. The X server is > then responsible for any clipping necessary. Using a plain old XCopyArea will make scheduling this in the kernel quite a bit harder; if the kernel knows it's doing a swap buffer, then it can interrupt ongoing rendering and do the copy at higher priority, precisely when the vblank interrupt lands. Plus, you've just added the latency of a pair of context switches to the frame update interval. Also, placing any user-mode code in the middle of the interrupt->blt logic will occasionally cause tearing on the screen; having the kernel push the blt just-in-time means we'd have reliable swaps (well, down to the context switch time in the graphics hardware). I like the simplicity, and we'll certainly be wanting the pixmap-from-object API for lots of other fun stuff, but buffer swaps may still need more magic than we can manage at this point. I also wonder what the effects of a compositing manager are in this environment -- ideally, your 'buffer swap' would be just a renaming of two buffers, and not involve any data copying at all. Keeping all of that hidden behind an abstract API will let us move from copying the data to swizzling pointers without breaking existing apps. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Clip Lists
"Stephane Marchesin" <[EMAIL PROTECTED]> writes: > I fail to see how this works with a lockless design. How do you ensure the X > server doesn't change cliprects between the time it has written those in the > shared ring buffer and the time the DRI client picks them up and has the > command fired and actually executed ? Do you lock out the server during that > time ? The scheme I have been advocating is this: - A new extension is added to the X server, with a PixmapFromBufferObject request. - Clients render into a private back buffer object, for which they used the new extension to generate a pixmap. - When a client wishes to copy something to the frontbuffer (for whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses plain old XCopyArea() with the generated pixmap. The X server is then responsible for any clipping necessary. This scheme puts all clip list management in the X server. No cliprects in shared memory or in the kernel would be required. And no locking is required since the X server is already processing requests in sequence. To synchronize with vblank, a new SYNC counter is introduced that records the number of vblanks since some time in the past. The clients can then issue SyncAwait requests before any copy they want synchronized with vblank. This allows the client to do useful processing while it waits, which I don't believe is the case now. As an additional benefit, the PixmapFromBufferObject request would also be useful as the basis of a better shared-memory feature that allows the memory to be used as target for accelerated rendering, and to be DMA'ed from or mapped into GTT tables. Soren - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On Wed, Nov 28, 2007 at 12:43:41AM +0100, Stephane Marchesin wrote: > On 11/27/07, Kristian Høgsberg <[EMAIL PROTECTED]> wrote: > > > > On Nov 27, 2007 11:48 AM, Stephane Marchesin > > <[EMAIL PROTECTED]> wrote: > > > On 11/22/07, Kristian Høgsberg <[EMAIL PROTECTED]> wrote: > > ... > > > > It's all delightfully simple, but I'm starting to reconsider whether > > > > the "lockless" bullet point is realistic. Note, the drawable lock is > > > > gone, we always render to private back buffers and do swap buffers in > > > > the kernel, so I'm "only" concerned with the DRI lock here. The idea > > > > is that since we have the memory manager and the super-ioctl and the X > > > > server now can push cliprects into the kernel in one atomic operation, > > > > we would be able to get rid of the DRI lock. My overall question, > > > > here is, is that feasible? > > > > > > How do you plan to ensure that X didn't change the cliprects after you > > > emitted them to the DRM ? > > > > The idea was that the buffer swap happens in the kernel, triggered by > > an ioctl. The kernel generates the command stream to execute the swap > > against the current set of cliprects. The back buffers are always > > private so the cliprects only come into play when copying from the > > back buffer to the front buffer. Single buffered visuals are secretly > > double buffered and implemented the same way. > > > What if cliprects change between the time you emit them to the fifo and the > time the blit gets executed by the card ? Do you sync to the card in-drm, > effectively killing performance ? > > I'm trying to figure now whether it makes more sense to keep cliprects > > and swapbuffer out of the kernel, which wouldn't change the above > > much, except the swapbuffer case. I described the idea for swapbuffer > > in this case in my reply to Thomas: the X server publishes cliprects > > to the clients through a shared ring buffer, and clients parse the > > clip rect changes out of this buffer as they need it. When posting a > > swap buffer request, the buffer head should be included in the > > super-ioctl so that the kernel can reject stale requests. When that > > happens, the client must parse the new cliprect info and resubmit an > > updated swap buffer request. > > > I fail to see how this works with a lockless design. How do you ensure the X > server doesn't change cliprects between the time it has written those in the > shared ring buffer and the time the DRI client picks them up and has the > command fired and actually executed ? Do you lock out the server during that > time ? > > Stephane All this is starting to confuse me a bit :) So, if i am right, all client render to private buffer and the you want to blit this client into the front buffer. In composite world it's up to the compositor to do this and so you don't have to care about cliprect. I think we can have somekind of dumb default compositor that would handle this blit directly into the X server. And this compositor might even not use cliprect. For window|pixmap|... resizing can the following scheme fit: Single buffered path (ie no backbuffer but client still render to a private buffer). 1) X get the resize event 2) X ask drm to allocate new drawable with the new size 3) X enqueu a query to drm which copy current buffer content into the new one and also update drawable so further rendering request will happen in the new buffer 4) X start using the new buffer when compositing into the scanout buffer So you might see rendering artifact but i guess this has to be exepected in single buffered applications where size change can happen at any time. Double buffered path: 1) X get the resize event 2) X allocate new front buffer and blit old front buffer into it (i might be wrong and X might not need to preserver content on resize) so X can start blitting using new buffer size but with old content. X allocate new back buffer 3) X enqueu a query to drm to change drawable back buffer If there is no pending rendering on the current back buffer (old size): 4) drm update drawable so back buffer is the new one (with new size) Finished with resizing If there is pending rendering on the current back buffer (old size): 4) On next swapbuffer X blit back buffer into the front buffer (if there is a need to preserve content) deallocate back buffer and allocate a new one with new size (so front buffer stay the same just a part of is updated) Finished with resizing So this follow comment Keith made on the wiki about resizing, i think it's best strategy even though if redrawing the window is slow then one might never see proper content if he is constantly resizing the window (but we can't do much against broken user ;)) Of course in this scheme you do the blitting from private client buffer (wether its the front buffer or the only buffer for single buffered app) in client space ie in the X server (so no need for cliprect in kernel). Note that you b
Re: DRI2 and lock-less operation
On Tue, Nov 27, 2007 at 03:44:48PM -0500, Kristian Høgsberg wrote: > On Nov 27, 2007 2:30 PM, Keith Packard <[EMAIL PROTECTED]> wrote: > ... > > > I both cases, the kernel will need to > > > know how to activate a given context and the context handle should be > > > part of the super ioctl arguments. > > > > We needn't expose the contexts to user-space, just provide a virtual > > consistent device and manage contexts in the kernel. We could add the > > ability to manage contexts from user space for cases where that makes > > sense (like, perhaps, in the X server where a context per client may be > > useful). > > Oh, right we don't need one per GLContext, just per DRI client, mesa > handles switching between GL contexts. What about multithreaded > rendering sharing the same drm fd? > > > > I imagine one optimiation you could do with a fixed number of contexts > > > is to assume that loosing the context will be very rare, and just fail > > > the super-ioctl when it happens, and then expect user space to > > > resubmit with state emission preamble. In fact it may work well for > > > single context hardware... > > > > I recall having the same discussion in the past; have the superioctl > > fail so that the client needn't constantly compute the full state > > restore on every submission may be a performance win for some hardware. > > All that this requires is a flag to the kernel that says 'this > > submission reinitializes the hardware', and an error return that says > > 'lost context'. > > Exactly. > > > > But the super-ioctl is chipset specific and we can decide on the > > > details there on a chipset to chipset basis. If you have input to how > > > the super-ioctl for intel should look like to support lockless > > > operation for current and future intel chipset, I'd love to hear it. > > > And of course we can version our way out of trouble as a last resort. > > > > We should do the lockless and context stuff at the same time; doing > > re-submit would be a bunch of work in the driver that would be wasted. > > Is it that bad? We will still need the resubmit code for older > chipsets that dont have the hardware context support. The drivers > already have the code to emit state in case of context loss, that code > just needs to be repurposed to generate a batch buffer to prepend to > the rendering code. And the rendering code doesn't need to change > when resubmitting. Will that work? > > > Right now, we're just trying to get 965 running with ttm; once that's > > limping along, figuring out how to make it reasonable will be the next > > task, and hardware context support is clearly a big part of that. > > Yeah - I'm trying to limit the scope of DRI2 so that we can have > redirected direct rendering and GLX1.4 in the tree sooner rather than > later (before the end of the year). Maybe the best way to do that is > to keep the lock around for now and punt on the lock-less super-ioctl > if that really blocks on hardware context support. How far back is > hardware contexts supported? > > Kristian Maybe just accept than only drivers converted to dri2 will be lockless ie if you are dri2 you have superiotcl and others things like that as anyway i believe what GLX1.4 give is bit useless without a proper driver (TTM at least). Cheers, Jerome Glisse - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On Wed, 2007-11-28 at 00:52 +0100, Stephane Marchesin wrote: > The third case obviously will never require any kind of state > re-emitting. Unless you run out of hardware contexts. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On Tue, 2007-11-27 at 15:44 -0500, Kristian Høgsberg wrote: > Oh, right we don't need one per GLContext, just per DRI client, mesa > handles switching between GL contexts. What about multithreaded > rendering sharing the same drm fd? For that, you'd either want 'switch context' ioctls, or context arguments with every request. The former is easy to retrofit, the latter would have to be done right now. > > All that this requires is a flag to the kernel that says 'this > > submission reinitializes the hardware', and an error return that says > > 'lost context'. > > Exactly. I think Keith's comment that knowing how to reinit is effectively the same as computing the reinitialization buffer may be relevant here. I'm not entirely sure this is true; it might be possible to save higher level information about the state than the ring contents. For Intel, I'm not planning on using this anyway, so I suspect Radeon will be the test case here. > > We should do the lockless and context stuff at the same time; doing > > re-submit would be a bunch of work in the driver that would be wasted. > > Is it that bad? We will still need the resubmit code for older > chipsets that dont have the hardware context support. From what I can tell, all Intel chips support MI_SET_CONTEXT, with the possible exception of i830. I'm getting some additional docs for that chip to see what it does, but the i845 docs mention the 'new context support'; dunno if that's new as of i830 or i845... > The drivers > already have the code to emit state in case of context loss, that code > just needs to be repurposed to generate a batch buffer to prepend to > the rendering code. And the rendering code doesn't need to change > when resubmitting. Will that work? With MI_SET_CONTEXT, you should never lose context, so we'd never need to be able to do this. > Yeah - I'm trying to limit the scope of DRI2 so that we can have > redirected direct rendering and GLX1.4 in the tree sooner rather than > later (before the end of the year). Maybe the best way to do that is > to keep the lock around for now and punt on the lock-less super-ioctl > if that really blocks on hardware context support. How far back is > hardware contexts supported? Hardware context support actually looks really easy to add; it's just a page in GTT per drm client, then an extra instruction in the ring when context switching is required. I'll have to see if the i830 supports it though. The current mechanism actually looks a lot harder to handle; I don't know why the driver didn't use MI_SET_CONTEXT from the very start. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[Bug 13091] glean abort with "Segmentation fault" in GarbageCollectDRIDrawables
http://bugs.freedesktop.org/show_bug.cgi?id=13091 --- Comment #5 from [EMAIL PROTECTED] 2007-11-27 17:25 PST --- I've checked in the patch from comment #4. It appears it's not applicable to mesa_7_0_branch. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On 11/27/07, Kristian Høgsberg <[EMAIL PROTECTED]> wrote: > > > Yeah - I'm trying to limit the scope of DRI2 so that we can have > redirected direct rendering and GLX1.4 in the tree sooner rather than > later (before the end of the year). Maybe the best way to do that is > to keep the lock around for now and punt on the lock-less super-ioctl > if that really blocks on hardware context support. How far back is > hardware contexts supported? Hmm you can't be binary like that. I think there are 3 classes of devices around : - no context support at all : old radeon, old intel - multiple fifos, no hw context switching : newer radeon, newer intel - multiple fifos, hw context switching : all nvidia The third case obviously will never require any kind of state re-emitting. Stephane - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On 11/27/07, Kristian Høgsberg <[EMAIL PROTECTED]> wrote: > > On Nov 27, 2007 11:48 AM, Stephane Marchesin > <[EMAIL PROTECTED]> wrote: > > On 11/22/07, Kristian Høgsberg <[EMAIL PROTECTED]> wrote: > ... > > > It's all delightfully simple, but I'm starting to reconsider whether > > > the "lockless" bullet point is realistic. Note, the drawable lock is > > > gone, we always render to private back buffers and do swap buffers in > > > the kernel, so I'm "only" concerned with the DRI lock here. The idea > > > is that since we have the memory manager and the super-ioctl and the X > > > server now can push cliprects into the kernel in one atomic operation, > > > we would be able to get rid of the DRI lock. My overall question, > > > here is, is that feasible? > > > > How do you plan to ensure that X didn't change the cliprects after you > > emitted them to the DRM ? > > The idea was that the buffer swap happens in the kernel, triggered by > an ioctl. The kernel generates the command stream to execute the swap > against the current set of cliprects. The back buffers are always > private so the cliprects only come into play when copying from the > back buffer to the front buffer. Single buffered visuals are secretly > double buffered and implemented the same way. What if cliprects change between the time you emit them to the fifo and the time the blit gets executed by the card ? Do you sync to the card in-drm, effectively killing performance ? I'm trying to figure now whether it makes more sense to keep cliprects > and swapbuffer out of the kernel, which wouldn't change the above > much, except the swapbuffer case. I described the idea for swapbuffer > in this case in my reply to Thomas: the X server publishes cliprects > to the clients through a shared ring buffer, and clients parse the > clip rect changes out of this buffer as they need it. When posting a > swap buffer request, the buffer head should be included in the > super-ioctl so that the kernel can reject stale requests. When that > happens, the client must parse the new cliprect info and resubmit an > updated swap buffer request. I fail to see how this works with a lockless design. How do you ensure the X server doesn't change cliprects between the time it has written those in the shared ring buffer and the time the DRI client picks them up and has the command fired and actually executed ? Do you lock out the server during that time ? Stephane - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
In general the problem with the superioctl returning 'fail' is that the client has to then go back in time and figure out what the state preamble would have been at the start of the batchbuffer. Of course the easiest way to do this is to actually precompute the preamble at batchbuffer start time and store it in case the superioctl fails -- in which case, why not pass it to the kernel along with the rest of the batchbuffer and have the kernel decide whether or not to play it? Keith - Original Message From: Kristian Høgsberg <[EMAIL PROTECTED]> To: Keith Packard <[EMAIL PROTECTED]> Cc: Jerome Glisse <[EMAIL PROTECTED]>; Dave Airlie <[EMAIL PROTECTED]>; dri-devel@lists.sourceforge.net; Keith Whitwell <[EMAIL PROTECTED]> Sent: Tuesday, November 27, 2007 8:44:48 PM Subject: Re: DRI2 and lock-less operation On Nov 27, 2007 2:30 PM, Keith Packard <[EMAIL PROTECTED]> wrote: ... > > I both cases, the kernel will need to > > know how to activate a given context and the context handle should be > > part of the super ioctl arguments. > > We needn't expose the contexts to user-space, just provide a virtual > consistent device and manage contexts in the kernel. We could add the > ability to manage contexts from user space for cases where that makes > sense (like, perhaps, in the X server where a context per client may be > useful). Oh, right we don't need one per GLContext, just per DRI client, mesa handles switching between GL contexts. What about multithreaded rendering sharing the same drm fd? > > I imagine one optimiation you could do with a fixed number of contexts > > is to assume that loosing the context will be very rare, and just fail > > the super-ioctl when it happens, and then expect user space to > > resubmit with state emission preamble. In fact it may work well for > > single context hardware... > > I recall having the same discussion in the past; have the superioctl > fail so that the client needn't constantly compute the full state > restore on every submission may be a performance win for some hardware. > All that this requires is a flag to the kernel that says 'this > submission reinitializes the hardware', and an error return that says > 'lost context'. Exactly. > > But the super-ioctl is chipset specific and we can decide on the > > details there on a chipset to chipset basis. If you have input to how > > the super-ioctl for intel should look like to support lockless > > operation for current and future intel chipset, I'd love to hear it. > > And of course we can version our way out of trouble as a last resort. > > We should do the lockless and context stuff at the same time; doing > re-submit would be a bunch of work in the driver that would be wasted. Is it that bad? We will still need the resubmit code for older chipsets that dont have the hardware context support. The drivers already have the code to emit state in case of context loss, that code just needs to be repurposed to generate a batch buffer to prepend to the rendering code. And the rendering code doesn't need to change when resubmitting. Will that work? > Right now, we're just trying to get 965 running with ttm; once that's > limping along, figuring out how to make it reasonable will be the next > task, and hardware context support is clearly a big part of that. Yeah - I'm trying to limit the scope of DRI2 so that we can have redirected direct rendering and GLX1.4 in the tree sooner rather than later (before the end of the year). Maybe the best way to do that is to keep the lock around for now and punt on the lock-less super-ioctl if that really blocks on hardware context support. How far back is hardware contexts supported? Kristian - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On Nov 27, 2007 11:48 AM, Stephane Marchesin <[EMAIL PROTECTED]> wrote: > On 11/22/07, Kristian Høgsberg <[EMAIL PROTECTED]> wrote: ... > > It's all delightfully simple, but I'm starting to reconsider whether > > the "lockless" bullet point is realistic. Note, the drawable lock is > > gone, we always render to private back buffers and do swap buffers in > > the kernel, so I'm "only" concerned with the DRI lock here. The idea > > is that since we have the memory manager and the super-ioctl and the X > > server now can push cliprects into the kernel in one atomic operation, > > we would be able to get rid of the DRI lock. My overall question, > > here is, is that feasible? > > How do you plan to ensure that X didn't change the cliprects after you > emitted them to the DRM ? The idea was that the buffer swap happens in the kernel, triggered by an ioctl. The kernel generates the command stream to execute the swap against the current set of cliprects. The back buffers are always private so the cliprects only come into play when copying from the back buffer to the front buffer. Single buffered visuals are secretly double buffered and implemented the same way. I'm trying to figure now whether it makes more sense to keep cliprects and swapbuffer out of the kernel, which wouldn't change the above much, except the swapbuffer case. I described the idea for swapbuffer in this case in my reply to Thomas: the X server publishes cliprects to the clients through a shared ring buffer, and clients parse the clip rect changes out of this buffer as they need it. When posting a swap buffer request, the buffer head should be included in the super-ioctl so that the kernel can reject stale requests. When that happens, the client must parse the new cliprect info and resubmit an updated swap buffer request. Kristian - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On Nov 27, 2007 10:41 AM, Thomas Hellström <[EMAIL PROTECTED]> wrote: ... > >>However, there are cases where it is very difficult to use cliprects > >>from the drm, though I wouldn't say impossible. > > > >The idea is to keep the cliprects in the kernel and only render double > >buffered. The only code that needs to worry about cliprects is swap > >buffer, either immediate or synced to vblank. What are the cliprect > >problems you mention? > > > >Kristian > > > > > Hi, Kristian. > Sorry for the late response. > > What I'm thinking about is the case where the swapbuffers needs to be > done by the 3D engine, and with increasingly complex hardware this will > at the very least mean some sort of pixel-shader code in the kernel, or > perhaps loaded by the kernel firmware interface as a firmware snippet. And the kernel will have to somehow communicate the list of clip rects to this firmware/pixel-shader in one way or another, maybe fixing up the code or providing a tri-list buffer or something. I don't really know, but it already sounds too complicated for the kernel, in my opinion. Another problem is that it's not just swapbuffer - anything that touches the front buffer have to respect the cliprects - glCopyPixels and glXCopySubBufferMESA - and thus have to happen in the kernel. > If we take Poulsbo as an example, the 2D engine is present and open, and > swapbuffers can be done using it, but since the 2D- and 3D engines > operate separately they are synced in software by the TTM memory manager > fence class arbitration code, and we lose performance since we cannot > pipeline 3D tasks as we'd want to. If the 3D engine were open, we'd > still need a vast amount of code in the kernel to be able to just to a > 3D blit. > > This is even more complicated by the fact that we may not be able to > implement 3D blits in the kernel for IP protection reasons. Note that > I'm just stating the problem here. I'm not arguing that this should > influence the DRI2 design. I understand, but I'm starting to doubt the "cliprects and swapbuffer in the kernel" design anyhow. I wasn't going this route originally, but since we already had that in place for i915 vblank buffer swaps, I figured we might as well go that route. I'm not sure the buffer swap from the vblank tasklet is really necessary to begin with... are there benchmarks showing that the irq->userspace latency was too big for this to work in userspace? My proposal back at XDS was to have a shared host memory ring buffer where the X server pushes cliprect changes and clients read it out from there and I still think that's nice design. In a lockless world, the superioctl arguments optionally include the buffer head (as a timestamp) so that the kernel can detect whether a swap buffer batchbuffer is stale. If it is, meaning the X server published cliprect changes, the submit fails and the client must recompute the batchbuffer for the swap command and resubmit after parsing the new cliprects. When rendering to a private back buffer, clip rects aren't relevant and so the superioctl wont have the buffer head field set and the kernel will never discard it. I dunno, maybe I'm just rambling... Kristian - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On Nov 27, 2007 2:30 PM, Keith Packard <[EMAIL PROTECTED]> wrote: ... > > I both cases, the kernel will need to > > know how to activate a given context and the context handle should be > > part of the super ioctl arguments. > > We needn't expose the contexts to user-space, just provide a virtual > consistent device and manage contexts in the kernel. We could add the > ability to manage contexts from user space for cases where that makes > sense (like, perhaps, in the X server where a context per client may be > useful). Oh, right we don't need one per GLContext, just per DRI client, mesa handles switching between GL contexts. What about multithreaded rendering sharing the same drm fd? > > I imagine one optimiation you could do with a fixed number of contexts > > is to assume that loosing the context will be very rare, and just fail > > the super-ioctl when it happens, and then expect user space to > > resubmit with state emission preamble. In fact it may work well for > > single context hardware... > > I recall having the same discussion in the past; have the superioctl > fail so that the client needn't constantly compute the full state > restore on every submission may be a performance win for some hardware. > All that this requires is a flag to the kernel that says 'this > submission reinitializes the hardware', and an error return that says > 'lost context'. Exactly. > > But the super-ioctl is chipset specific and we can decide on the > > details there on a chipset to chipset basis. If you have input to how > > the super-ioctl for intel should look like to support lockless > > operation for current and future intel chipset, I'd love to hear it. > > And of course we can version our way out of trouble as a last resort. > > We should do the lockless and context stuff at the same time; doing > re-submit would be a bunch of work in the driver that would be wasted. Is it that bad? We will still need the resubmit code for older chipsets that dont have the hardware context support. The drivers already have the code to emit state in case of context loss, that code just needs to be repurposed to generate a batch buffer to prepend to the rendering code. And the rendering code doesn't need to change when resubmitting. Will that work? > Right now, we're just trying to get 965 running with ttm; once that's > limping along, figuring out how to make it reasonable will be the next > task, and hardware context support is clearly a big part of that. Yeah - I'm trying to limit the scope of DRI2 so that we can have redirected direct rendering and GLX1.4 in the tree sooner rather than later (before the end of the year). Maybe the best way to do that is to keep the lock around for now and punt on the lock-less super-ioctl if that really blocks on hardware context support. How far back is hardware contexts supported? Kristian - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On Tue, 2007-11-27 at 14:03 -0500, Kristian Høgsberg wrote: > As for hardware contexts, I guess there are two cases; hardware that > has a fixed set of contexts builtin and hardware where a context is > just a piece of video memory that you can point to (effectively an > unlimited number of contexts). Yeah, intel hardware uses memory and has no intrinsic limit to the number of contexts; they just need to be pinned in GTT space and given a unique address. > I both cases, the kernel will need to > know how to activate a given context and the context handle should be > part of the super ioctl arguments. We needn't expose the contexts to user-space, just provide a virtual consistent device and manage contexts in the kernel. We could add the ability to manage contexts from user space for cases where that makes sense (like, perhaps, in the X server where a context per client may be useful). > I imagine one optimiation you could do with a fixed number of contexts > is to assume that loosing the context will be very rare, and just fail > the super-ioctl when it happens, and then expect user space to > resubmit with state emission preamble. In fact it may work well for > single context hardware... I recall having the same discussion in the past; have the superioctl fail so that the client needn't constantly compute the full state restore on every submission may be a performance win for some hardware. All that this requires is a flag to the kernel that says 'this submission reinitializes the hardware', and an error return that says 'lost context'. > But the super-ioctl is chipset specific and we can decide on the > details there on a chipset to chipset basis. If you have input to how > the super-ioctl for intel should look like to support lockless > operation for current and future intel chipset, I'd love to hear it. > And of course we can version our way out of trouble as a last resort. We should do the lockless and context stuff at the same time; doing re-submit would be a bunch of work in the driver that would be wasted. Right now, we're just trying to get 965 running with ttm; once that's limping along, figuring out how to make it reasonable will be the next task, and hardware context support is clearly a big part of that. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[Bug 13409] Killing compiz crashes Xorg
http://bugs.freedesktop.org/show_bug.cgi?id=13409 --- Comment #2 from [EMAIL PROTECTED] 2007-11-27 11:06 PST --- Once again, does anyone have a clue why there are no line numbers in my stacktraces? I've compiled with the seemingly redundant CFLAGS -g3 -ggdb -O1 so there should be no lack of debugging information. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[Bug 13391] Intermittent Xorg crashes with compiz
http://bugs.freedesktop.org/show_bug.cgi?id=13391 --- Comment #5 from [EMAIL PROTECTED] 2007-11-27 11:04 PST --- It turns out that attachment 12728 was actually from another bug pertaining to compiz. In this case, compiz crashed, which then triggered another bug in the intel driver, causing Xorg to crash. This is a very reproducible behavior (for me) so I created bug 13409 to track it. Otherwise, I do not believe I've seen a crash like the one in attachment 12719 since the initial incident. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On Nov 26, 2007 11:15 PM, Keith Packard <[EMAIL PROTECTED]> wrote: > > On Mon, 2007-11-26 at 17:15 -0500, Kristian Høgsberg wrote: > > > > -full state > > I assume you'll deal with hardware which supports multiple contexts and > avoid the need to include full state with each buffer? As for hardware contexts, I guess there are two cases; hardware that has a fixed set of contexts builtin and hardware where a context is just a piece of video memory that you can point to (effectively an unlimited number of contexts). I both cases, the kernel will need to know how to activate a given context and the context handle should be part of the super ioctl arguments. In the case of unlimited contexts, that may be all that's needed. In the case of a fixed number of contexts, you will need to be able to restore state when you have more software contexts in use that you have hardware contexts. I imagine one optimiation you could do with a fixed number of contexts is to assume that loosing the context will be very rare, and just fail the super-ioctl when it happens, and then expect user space to resubmit with state emission preamble. In fact it may work well for single context hardware... But the super-ioctl is chipset specific and we can decide on the details there on a chipset to chipset basis. If you have input to how the super-ioctl for intel should look like to support lockless operation for current and future intel chipset, I'd love to hear it. And of course we can version our way out of trouble as a last resort. cheers, Kristian - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[Bug 13391] Intermittent Xorg crashes with compiz
http://bugs.freedesktop.org/show_bug.cgi?id=13391 [EMAIL PROTECTED] changed: What|Removed |Added Attachment #12728|0 |1 is obsolete|| -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[Bug 13409] Killing compiz crashes Xorg
http://bugs.freedesktop.org/show_bug.cgi?id=13409 --- Comment #1 from [EMAIL PROTECTED] 2007-11-27 11:00 PST --- Created an attachment (id=12747) --> (http://bugs.freedesktop.org/attachment.cgi?id=12747&action=view) Xorg.log from a server crashed by killing compiz -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[Bug 13409] New: Killing compiz crashes Xorg
http://bugs.freedesktop.org/show_bug.cgi?id=13409 Summary: Killing compiz crashes Xorg Product: Mesa Version: unspecified Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Drivers/DRI/i830 AssignedTo: dri-devel@lists.sourceforge.net ReportedBy: [EMAIL PROTECTED] When compiz is killed it takes down Xorg with it while destroying its openGL context. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On 11/22/07, Kristian Høgsberg <[EMAIL PROTECTED]> wrote: > > Hi all, > > I've been working on the DRI2 implementation recently and I'm starting > to get a little confused, so I figured I'd throw a couple of questions > out to the list. First of, I wrote up this summary shortly after XD > > http://wiki.x.org/wiki/DRI2 > > which upon re-reading is still pretty much up to date with what I'm > trying to do. The buzzword summary from the page has > > * Lockless > * Always private back buffers > * No clip rects in DRI driver > * No DDX driver part > * Minimal X server part > * Swap buffer and clip rects in kernel > * No SAREA > > I've implemented the DRI2 xserver module > (http://cgit.freedesktop.org/~krh/xserver/log/?h=dri2) and the new drm > ioctls that is uses > (http://cgit.freedesktop.org/~krh/drm/log/?h=dri2). I did the DDX > part for the intel driver and DRI2 initialization consists of doing > drmOpen (this is now up to the DDX driver), initialize the memory > manager and use it to allocate stuff and then call DRI2ScreenInit(), > passing in pScreen and the file descriptor. Basically, all of > i830_dri.c isn't used in this mode. > > It's all delightfully simple, but I'm starting to reconsider whether > the "lockless" bullet point is realistic. Note, the drawable lock is > gone, we always render to private back buffers and do swap buffers in > the kernel, so I'm "only" concerned with the DRI lock here. The idea > is that since we have the memory manager and the super-ioctl and the X > server now can push cliprects into the kernel in one atomic operation, > we would be able to get rid of the DRI lock. My overall question, > here is, is that feasible? How do you plan to ensure that X didn't change the cliprects after you emitted them to the DRM ? I'm trying to figure out how context switches acutally work... the DRI > lock is overloaded as context switcher, and there is code in the > kernel to call out to a chipset specific context switch routine when > the DRI lock is taken... but only ffb uses it... So I'm guessing the > way context switches work today is that the DRI driver grabs the lock > and after potentially updating the cliprects through X protocol, it > emits all the state it depends on to the cards. Is the state emission > done by just writing out a bunch of registers? Is this how the X > server works too, except XAA/EXA acceleration doesn't depend on a lot > of state and thus the DDX driver can emit everything for each > operation? > > How would this work if we didn't have a lock? You can't emit the > state and then start rendering without a lock to keep the state in > place... If the kernel doesn't restore any state, whats the point of > the drm_context_t we pass to the kernel in drmLock? Should the kernel > know how to restore state (this ties in to the email from jglisse on > state tracking in drm and all the gallium jazz, I guess)? How do we > identify state to the kernel, and how do we pass it in in the > super-ioctl? Can we add a list of registers to be written and the > values? I talked to Dave about it and we agreed that adding a > drm_context_t to the super-ioctl would work, but now I'm thinking if > the kernel doesn't track any state it wont really work. Maybe I seem to recall Eric Anholt did optimize the emission of radeon state atoms a long time ago, and he got some speed improvements. You'd have to ask him how much though. This could give us a rough idea of the performance impact of emitting full state vs needed state, although this doesn't measure the gain of removing lock contention. I might be totally wrong on this, this dates back to a couple of years ago. cross-client state sharing isn't useful for performance as Keith and > Roland argues, but if the kernel doesn't restore state when it sees a > super-ioctl coming from a different context, who does? Yes it probably has to go to the kernel anyway. Stephane - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On 11/27/07, Keith Packard <[EMAIL PROTECTED]> wrote: > > > On Mon, 2007-11-26 at 17:15 -0500, Kristian Høgsberg wrote: > > > > -full state > > I assume you'll deal with hardware which supports multiple contexts and > avoid the need to include full state with each buffer? That's what we do already for nouveau, and there are no issues to implementing it. Really that's driver-dependent, like the radeon atom emission mechanism is. Stephane - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
On 11/27/07, Thomas Hellström <[EMAIL PROTECTED]> wrote: > > Kristian Høgsberg wrote: > > >On Nov 22, 2007 4:03 AM, Thomas Hellström <[EMAIL PROTECTED]> > wrote: > >... > > > > > >>There are probably various ways to do this, which is another argument > >>for keeping super-ioctls device-specific. > >>For i915-type hardware, Dave's approach above is probably the most > >>attracting one. > >>For Poulsbo, all state is always implicitly included, usually as a > >>reference to a buffer object, so we don't really bother about contexts > here. > >>For hardware like the Unichrome, the state is stored in a limited set of > >>registers. > >>Here the drm can keep a copy of those registers for each context and do > >>a smart update on a context switch. > >> > >>However, there are cases where it is very difficult to use cliprects > >>from the drm, though I wouldn't say impossible. > >> > >> > > > >The idea is to keep the cliprects in the kernel and only render double > >buffered. The only code that needs to worry about cliprects is swap > >buffer, either immediate or synced to vblank. What are the cliprect > >problems you mention? > > > >Kristian > > > > > Hi, Kristian. > Sorry for the late response. > > What I'm thinking about is the case where the swapbuffers needs to be > done by the 3D engine, and with increasingly complex hardware this will > at the very least mean some sort of pixel-shader code in the kernel, or > perhaps loaded by the kernel firmware interface as a firmware snippet. > > If we take Poulsbo as an example, the 2D engine is present and open, and > swapbuffers can be done using it, but since the 2D- and 3D engines > operate separately they are synced in software by the TTM memory manager > fence class arbitration code, and we lose performance since we cannot > pipeline 3D tasks as we'd want to. If the 3D engine were open, we'd > still need a vast amount of code in the kernel to be able to just to a > 3D blit. Then why don't you do it in user space ? You could very well do swapbuffers in the DDX (and cliprects then become a non-issue). This is even more complicated by the fact that we may not be able to > implement 3D blits in the kernel for IP protection reasons. Note that > I'm just stating the problem here. I'm not arguing that this should > influence the DRI2 design. Yes, I don't think we should base the open source DRI design upon specs that have to be kept hidden. Especially if that hardware is not relevant in any way technically. Stephane - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
Kristian Høgsberg wrote: >On Nov 22, 2007 4:03 AM, Thomas Hellström <[EMAIL PROTECTED]> wrote: >... > > >>There are probably various ways to do this, which is another argument >>for keeping super-ioctls device-specific. >>For i915-type hardware, Dave's approach above is probably the most >>attracting one. >>For Poulsbo, all state is always implicitly included, usually as a >>reference to a buffer object, so we don't really bother about contexts here. >>For hardware like the Unichrome, the state is stored in a limited set of >>registers. >>Here the drm can keep a copy of those registers for each context and do >>a smart update on a context switch. >> >>However, there are cases where it is very difficult to use cliprects >>from the drm, though I wouldn't say impossible. >> >> > >The idea is to keep the cliprects in the kernel and only render double >buffered. The only code that needs to worry about cliprects is swap >buffer, either immediate or synced to vblank. What are the cliprect >problems you mention? > >Kristian > > Hi, Kristian. Sorry for the late response. What I'm thinking about is the case where the swapbuffers needs to be done by the 3D engine, and with increasingly complex hardware this will at the very least mean some sort of pixel-shader code in the kernel, or perhaps loaded by the kernel firmware interface as a firmware snippet. If we take Poulsbo as an example, the 2D engine is present and open, and swapbuffers can be done using it, but since the 2D- and 3D engines operate separately they are synced in software by the TTM memory manager fence class arbitration code, and we lose performance since we cannot pipeline 3D tasks as we'd want to. If the 3D engine were open, we'd still need a vast amount of code in the kernel to be able to just to a 3D blit. This is even more complicated by the fact that we may not be able to implement 3D blits in the kernel for IP protection reasons. Note that I'm just stating the problem here. I'm not arguing that this should influence the DRI2 design. /Thomas - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[Bug 13391] Intermittent Xorg crashes with compiz
http://bugs.freedesktop.org/show_bug.cgi?id=13391 --- Comment #4 from [EMAIL PROTECTED] 2007-11-27 02:38 PST --- (In reply to comment #3) > Can anyone see any reason why my stack traces don't have line numbers? The backtrace in the log file never provides more information. Use something like gdb. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel