Re: Initial DRI3000 protocol specs available
On 03/21/2013 03:41 PM, Keith Packard wrote: * PGP Signed by an unknown key James Jones jajo...@nvidia.com writes: If you associate an X Fence Sync with your swap operation, the driver has the option to trigger it directly from the client command stream and wake up only the applications waiting for that fence. The compositor, if using GL, could have received the swap notification event and already programmed the response compositing based on it before the swap even completes, and just insert a token to make the GPU or kernel wait for the fence to complete before executing the compositing rendering commands. Sorry for the long lag; I've been thinking about this quite a bit. Sorry for the lag in my response as well. I went and read through your earlier proposal as well as reading through the related GL fence extensions and I think this is what we want in general terms. There are two distinct times of interest here: 1) When the buffer is free and can be used for another frame. 2) When the buffer contents are visible on the screen. (This needs some weasel wording so that the system can do things like severely limit background applications or invisible applications.) Providing X Fence Sync objects for each of these times seems like it will give applications what they need. The second issue is that we must relate these X Fence Sync objects to direct rendering so that clients using shared objects outside of the X protocol can synchronize their operations with the X server. For that, I think we can share a page between application and X server that contains all of the necessary fence information along with a pthreads semaphore object that can lock access to the fence and also provide a way to block until the X Fence Sync object is signaled. There's some work going on to get something very much like fence sync objects into the Linux kernel as well, I think starting with something from the Android kernel trees, and with input from some people working on fence objects usable to synchronize access to dmabuf objects. It might make sense to use these primitives rather than shared memory+pthreads to share the syncs between direct rendering clients and the X server. At the very least, it would be nice to have an abstraction where the implementation details of the sync objects weren't directly exposed by the API. This eliminates the explicit Swap events and replaces them with Sync objects. I'll write this up in the extension description and try to get some code written in the next couple of weeks. I like the sound of the overall direction. Looking forward to what you come up with. Thanks, -James ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
James Jones jajo...@nvidia.com writes: There's some work going on to get something very much like fence sync objects into the Linux kernel as well, I think starting with something from the Android kernel trees, and with input from some people working on fence objects usable to synchronize access to dmabuf objects. I'll see if I can't find out what they're up to. It might make sense to use these primitives rather than shared memory+pthreads to share the syncs between direct rendering clients and the X server. I can start prototyping with shared memory and the current DRI kernel drivers as that will not require any updates to the kernel -- as DRI provides serialization guarantees across multiple applications, I can perform all of the fence operations strictly from user-mode and have things work correctly on DRI-based hardware. As such, that clearly means that these would be 'DRI' fences, and not some more general DMA-BUF fences. At the very least, it would be nice to have an abstraction where the implementation details of the sync objects weren't directly exposed by the API. I think we can use your existing X Sync fence stuff, or something quite similar, for an X API. The DRM APIs hide underneath the DRM libraries (vdpau, vaapi, opengl), and so how that extension works isn't directly visible to applications. I like the sound of the overall direction. Looking forward to what you come up with. I'm typing; the current short-term goal is client-side buffer allocation, shared memory fences and using CopyArea for the presentation portion of the work. That neatly divides the design into the DRM half and the presentation/swap-buffers half. -- keith.pack...@intel.com pgpofR2nqD1tE.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
James Jones jajo...@nvidia.com writes: If you associate an X Fence Sync with your swap operation, the driver has the option to trigger it directly from the client command stream and wake up only the applications waiting for that fence. The compositor, if using GL, could have received the swap notification event and already programmed the response compositing based on it before the swap even completes, and just insert a token to make the GPU or kernel wait for the fence to complete before executing the compositing rendering commands. Sorry for the long lag; I've been thinking about this quite a bit. I went and read through your earlier proposal as well as reading through the related GL fence extensions and I think this is what we want in general terms. There are two distinct times of interest here: 1) When the buffer is free and can be used for another frame. 2) When the buffer contents are visible on the screen. (This needs some weasel wording so that the system can do things like severely limit background applications or invisible applications.) Providing X Fence Sync objects for each of these times seems like it will give applications what they need. The second issue is that we must relate these X Fence Sync objects to direct rendering so that clients using shared objects outside of the X protocol can synchronize their operations with the X server. For that, I think we can share a page between application and X server that contains all of the necessary fence information along with a pthreads semaphore object that can lock access to the fence and also provide a way to block until the X Fence Sync object is signaled. This eliminates the explicit Swap events and replaces them with Sync objects. I'll write this up in the extension description and try to get some code written in the next couple of weeks. -- keith.pack...@intel.com pgpr5cOHy7U5s.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 02/19/2013 07:45 PM, Keith Packard wrote: 11.1 GLX GLX has no direct relation with DRI3, but a direct rendering OpenGL application will presumably use both, and target Was there supposed to be more of this sentence? ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 02/20/2013 10:50 PM, Keith Packard wrote: Stéphane Marchesin stephane.marche...@gmail.com writes: On Wed, Feb 20, 2013 at 12:42 PM, Keith Packard kei...@keithp.com wrote: Stéphane Marchesin stephane.marche...@gmail.com writes: I'm interested in two specific use cases: - Swap to an overlay and flip a crtc in an atomic fashion, As you may remember, I proposed a bunch of RandR changes to support per-CRTC pixmaps and atomic mode setting operations a while back. With hardware now commonly supporting multiple overlays, even that stuff wouldn't suffice anymore. Off the top of my head, we'd need to construct some Drawable that represented each overlay, and then perform a PolySwapRegion operation to synchronously update their contents from appropriate back buffers. Right, that's what I'm after. If you have a bunch of GL surfaces you're rendering to, a main drawable and 2 overlays, I'd like the ability to swap to arbitrary overlays or to my main surface. Of course the GL extension for that is still TBD, but having the ability in DRI3 would be a nice start. Having an actual API to design to would be a huge help though; I suspect anything we do in advance will just get messed up by the GL ARB. Well, if you have vsync enabled for your CopyRegion implementation, then you'll need to vsync for each region right? What I'm after is a swap all these regions together, vsync only once type of thing. Oh. I've been focused on the GL swapbuffers APIs. SwapRegion isn't a general CopyRegion operation. I've neglected to write down some of the more important semantics which underlie the goals of this work. EGL_NOK_swap_region (supported by Mesa) allows specifying multiple subrectangles to swap together. EGLAPI EGLBoolean EGLAPIENTRY eglSwapBuffersRegionNOK(EGLDisplay dpy, EGLSurface surface, EGLint numRects, const EGLint* rects); For SwapRegion, I want to be able to require that the X server always be free to just swap the entire contents of the source buffer with the destination buffer -- the region is just the 'damaged' area within the window; areas outside of that don't *need* to be copied from the new buffer, but the client guarantees that the entire buffer contain the correct contents for the window and that only the area within the specified region differs from the current window contents. For a compositing manager, you really do want to pull data from all of the window pixmaps and paint them into the frame buffer in one giant operation. The usual way of doing this is to construct the whole next screen frame in a new single image and then use SwapRegion to get that onto the screen. Then the individual updates could use a sequence of SwapRegion operations to construct that intermediate buffer; once that was ready, a single SwapRegion would move that to the scanout buffer. Presumably that final SwapRegion would be a simple page flip operation in the driver, so it would take no time or memory bandwidth. It might be fun to figure out how to bypass the intermediate back buffer though, and for that we'd need some complicated PolySwapRegion request that queued up all of the changes in one giant request to be executed at the right time, but that seems like something that wouldn't match how I imagine compositing managers working today. ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 02/20/2013 02:46 AM, Chris Wilson wrote: On Tue, Feb 19, 2013 at 07:46:22PM -0800, Keith Packard wrote: ┌─── SwapRegion destination: DRAWABLE region: REGION src-off-x,src-off-y: INT16 source: PIXMAP swap-interval: CARD32 target_msc_hi: CARD32 target_msc_lo: CARD32 divisor_hi: CARD32 divisor_lo: CARD32 remainder_hi: CARD32 remainder_lo: CARD32 ▶ swap_hi: CARD32 swap_lo: CARD32 suggested-x-off,suggested-y-off: INT16 suggested-width,suggested-height: CARD16 idle: LISTofSWAPIDLE └─── What I don't see here is how the client instructs the server to handle a missed swap. For example, with the typical use of swap-interval 0, divisor = 0, target = current we can choose to either emit this SwapRegion synchronously, or asynchronously (to risk tearing but allow the client catch up to its target framerate). Actually, there isn't a mention of whether this should be synchronized to the display at all (and how to handle synchronisation across multiple scanouts). Applications really want this. We've had several major ISVs chastise us for not supporting GLX_EXT_swap_control_tear. http://www.opengl.org/registry/specs/EXT/glx_swap_control_tear.txt What happens for a delayed error? In the reply, swap_hi/swap_lo form a 64-bit swap count value when the swap will actually occur (e.g. the last queued swap count + (pending swap count * swap interval)). I'm not sure exactly what SBC is meant to be. Is it a simple seqno of the SwapRegion in this Drawable's swap queue (why then does swap_interval matter), or is it meant to correlate with the vblank counter (in which case it is merely a predicted value)? -Chris ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Ian Romanick i...@freedesktop.org writes: EGL_NOK_swap_region (supported by Mesa) allows specifying multiple subrectangles to swap together. An X region consists of multiple rectangles, but with a common offset -- I can't see from this specification whether these rectangles have that property, or if we need slightly different semantics. -- keith.pack...@intel.com pgpzoGYS1LTyj.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 02/26/2013 01:01 PM, Owen Taylor wrote: Sorry for joining the discussion late here. I went through the specs and discussion and have various comments and questions. - Owen * It would be great if we could figure out a plan to get to the point where the exact same application code is going to work for proprietary and open source drivers. When you get down to the details of swap this isn't close to the case currently. - Because swap handled client side in some drivers, INTEL_swap_event is seen as awkward to implement. - There is divergence on some basic behaviors, e.g., whether glXSwapBuffers() glFinish() waits for the swap to complete or not. - When rendering with a compositor, the X server is innocent of relevant information about timing and when the application should draw additional new frames. I've been working on handing this via client = compositor protocols (https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html) But this adds a lot of complexity to the minimal client, especially when a client wants to work both redirected and unredirected. I think it would be great if we could sit down and figure out what the Linux-ecosystem API is for this in a way we could give to application authors. * One significant problem that I have currently is that the default mode for the Intel drivers is to use triple buffering and send back swap events *when rendering the next frame would not block* - that is, immediately. This results in a frame of unnecessary latency. (The too-early events are also missing ust/msc/sbc information.) So I'd like to make sure that we know exactly what SwapComplete means and not have creative reinterpretations based on what works well for one client or another. The SwapComplete event is specified as - This event is delivered when a SwapRegion operation completes - but the specification of SwapRegion itself is fuzzy enough that I'm unclear exactly what that means. - The description SwapRegion needs to define swap since the operation has only a vague resemblance to the English-language meaning of swap. Maybe call it PresentRegion instead? Swap is another one of those overloaded terms in graphics. It's not quite as bad as normal, but it's pretty close. - My interpretation of SwapRegion is that the actual propagation of source to destination is *asynchronous* to the X protocol stream. This is implied by Schedule a swap... but probably should be explicitly stated, since it is radically different from other rendering in X. - Is the serial in the SwapComplete event synchronous to the protocol stream? E.g., can you assume that any CopyArea from the destination drawable before that serial will get the old contents, and a CopyArea from the destination after that serial will get the new contents? - What happens when multiple SwapRegion requests are made with a swap-interval of zero. Are previous ones discarded? - Is it an error to render to a non-idle pixmap? Is it an error to pass a non-idle pixmap as the source to SwapRegion? - What's the interaction between swap-interval and target-msc, etc? - When a window is redirected, what's the interpretation of swap-interval, target-msc, etc? Is it that the server performs the operation at the selected blanking interval (as if they window wasn't redirected), and then damage/other events are generated and the server picks it up and renders to the real front buffer at the next opportunity - usually a frame later. * Do I understand correctly that the idle pixmaps returned from a SwapRegion request are the pixmaps that *will* be idle once the corresponding SwapComplete event is received? If this is correct, what happens if things change before the swap actually happens and what was scheduled as a swap ends up being a copy? Is it sufficient to assume that a ConfigureNotify on a destination window means that all pixmaps passed to previous SwapRegion requests are now idle? * In the definition of SWAPIDLE you say: If valid is TRUE, swap-hi/swap-lo form a 64-bit swap count value from the SwapRegion request which matches the data that the pixmap currently contains If I'm not misunderstanding things, this is a confusing statement because, leaving aside damage to the front buffer, pixmaps always contain the same contents (whatever the client rendered into it.) Is the use of the swap-hi/swap-lo identify the SwapRegion problematical in the case where swaps aren't throttled? Would it be better to use sequence number of the request? Or is the pixmap itself sufficient? * What control, if any, will applications have over the number of buffers used - what the behavior will be when an application starts rendering another frame in terms of allocating a new buffer versus
Re: Initial DRI3000 protocol specs available
On 03/07/2013 05:17 PM, Keith Packard wrote: * PGP Signed by an unknown key James Jones jajo...@nvidia.com writes: There didn't seem to be much interest outside of NVIDIA, so besides fence sync, the ideas are tabled internally ATM. This shouldn't surprise you though -- no-one else needs this kind of synchronization, so it's really hard for anyone to evaluate it. And, DRI2 offers 'sufficient' support for the various GL sync extensions. I was referring to the multi-buffer/tear-free presentation part, not the synchronization parts. I still rather surprised everyone thinks implicit synchronization is a good idea though. I don't think we're the only ones that have loosely defined command buffer processing in HW anymore. Meh. So, what I'd like to know is if you think nVidia could take advantage of the Swap extension so that nVidia 3D applications could do the whole Swap redirect plan? If so, then I'm a lot more interested in figuring out how we can get apps using the necessary fencing to actually make it work right. Sorry, I've been ignoring this thread because of the DRI3000 title, so I missed the point where it defined a generic swap mechanism in X protocol. From my reading, applications do roughly this in the spec: Pixmap pix[N] = MakeListOfPixmaps(N) Window win = MakeWindow(); int n = 0; while (1) { // Stuff here to ensure pix[n] is idle. Render(pix[n]); SwapRegion(win, pix[n]); } I think I saw in one branch of the thread that you might allow redirecting the swap request out to a composite manager rather than processing in X. Basically that's what I proposed (and Aaron presented some of at XDC) a few years ago and got no feedback. However, my full proposal included: -setting the list of pixmaps associated with a window up front, so that the composite manager or GL applications could query them and do work once to bind them in to GL. This is pretty expensive. With your proposal, this could probably be done lazily and tracked in a cache-type thing, but if applications wanted to be dumb and generate a new pixmap for every frame, nothing is stopping them. Applications would do something like: Pixmap pix[N] XCreateWindowPixmaps(win, N, pix /* out */); up front, and composite managers would get an event notifying them that win now has those N pixmaps associated with it. Swaps would be done by indexing into that array rather than sending the actual pixmap ID. -Using sync object lists in place of all the hard-coded timing information. We've never been a fan of the OML style swap timing semantics. It doesn't line up well with our HW. Why not allow arbitrary fence objects to dictate when the swapping occurs? Then apps that just want a simple vsync can just send a vsync fence. Apps that want exact timing can query what types of counters are available and get exact timing on different HW that supports different timers. -I had a bunch of GLX proposals to solve that mess. -Redirecting present operations (or swaps) to the composite manager was central to the proposal. It looks like a lot of the details and psuedo-code didn't make it in the final public presentation, just a high-level overview. I'll see if I can dig up more of that. Here's the URL to the presentation. Just skip all the fence sync parts. http://people.freedesktop.org/~aplattner/x-presentation-and-synchronization.pdf Thanks, -James ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Hi, On 7 March 2013 17:17, Keith Packard kei...@keithp.com wrote: James Jones jajo...@nvidia.com writes: There didn't seem to be much interest outside of NVIDIA, so besides fence sync, the ideas are tabled internally ATM. This shouldn't surprise you though -- no-one else needs this kind of synchronization, so it's really hard for anyone to evaluate it. And, DRI2 offers 'sufficient' support for the various GL sync extensions. At least for TEXTURE_2D, sure. TEXTURE_EXTERNAL still requires synchronisation (i.e. fences) beyond 'just smash some commands at the kernel and flush on context switches and we're all good'. Cheers, Daniel ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On Fri, Mar 8, 2013 at 3:59 AM, James Jones jajo...@nvidia.com wrote: On 03/07/2013 05:17 PM, Keith Packard wrote: * PGP Signed by an unknown key James Jones jajo...@nvidia.com writes: There didn't seem to be much interest outside of NVIDIA, so besides fence sync, the ideas are tabled internally ATM. This shouldn't surprise you though -- no-one else needs this kind of synchronization, so it's really hard for anyone to evaluate it. And, DRI2 offers 'sufficient' support for the various GL sync extensions. I was referring to the multi-buffer/tear-free presentation part, not the synchronization parts. I still rather surprised everyone thinks implicit synchronization is a good idea though. I don't think we're the only ones that have loosely defined command buffer processing in HW anymore. Meh. It's not that other hw don't have that (or even other drivers for your hw, ie nouveau). Serializing through the kernel execution manager lets the kernel know the the expected order of rendering. If rendering in hw queue A depends on a result from a hw queue B (B renders to buffer, A textures from same buffer), the kernel can insert synchronization primitives to ensure that the A queue doesn't proceed before the B queue signals the fence. If A and B queues don't have any inter-dependencies, no synchronization is necessary and they can run in parallel or out of order. Kristian ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 03/06/2013 10:35 PM, Keith Packard wrote: * PGP Signed by an unknown key Owen Taylor otay...@redhat.com writes: A complex scheme where the compositor and the server collaborate on the implementation of SwapRegion seems fragile to me, and still doesn't get some details right - like the swap count returned from SwapRegion. What if we made SwapRegion redirectable along the lines of ResizeRedirectMask? Since it would be tricky to block the client calling SwapRegion until the compositor responded, this would probably require removing the reply to SwapRegion and sending everything needed back in events. When I first read this a week ago, I thought this was a crazy plan; but upon reflection, I think this is exactly the right direction. I've written up a blog posting in more detail about that here: http://keithp.com/blogs/composite-swap/ SwapScheduled - whatever information is available immediately on receipt of SwapRegion I think this can still be in the reply to SwapRegion itself; essentially all we're returning is the swap-hi/swap-lo numbers and a suggestion for future buffer allocation sizes. We could place the buffer size hints in a separate event, but I don't think they're that critical; it's just a hint, and we'll get it right after a couple of swaps once the user stops moving the window around anyways. SwapIdle - a buffer is returned to the application for rendering SwapComplete - the swap actually happened and we know the msc/sbc/ust triple Yup. The blog posting suggests how the Complete event might be delayed until the Compositor gets the content up onto the screen itself. If I'm understanding this correctly, this requires the X server to receive a notification from the GPU that the swap is complete so it can send the SwapComplete event. Is there any chance this could be done with a Fence instead? The application could specify the fence in the Swap request, and then use that fence to block further rendering on the GPU or wait on the fence from the CPU. We typically try to do the scheduling on the GPU when possible because triggering an interrupt and waking up the X server burns power and adds latency for no good reason. I also think that SwapIdle should *not* be an event. Instead, the client should mark its pixmap as 'becomes idle upon swap'; on redirection, the compositor ends up holding the last 'its not idle yet' bit, and when it does the 'becomes idle upon swap', then the buffer goes idle. The client must then tell the server to un-idle the pixmap, and that request will return whether the contents were preserved or not. This has to be synchronous or huge races will persist. But I don't know that you need that much granularity. I think SwapIdle and SwapComplete are sufficient. As above, SwapIdle isn't good enough, an explicit un-idle request is required. Tricky parts: * Not leaking buffers during redirection/unredirection could be tricky. What if the compositor exits while a client is waiting for a SwapIdle? An event when swap is redirected/unredirected is probably necessary. When the Compositor exits, the X server will know all of the pending SwapRegion requests and can 'unredirect' them easily enough. I don't want to tell apps when they're getting redirected/unredirected, and I don't think it's necessary. * To make this somewhat safe, the concept of idle has to be one of correct display not system stability. It can't take down the system if the compositor sends SwapIdle at the wrong time. See above. * Because the SBC is a drawable attribute it's a little complex to continue having the right value over swap redirection. When a window is swap-redirected, we say that the SBC is incremented by one every time the redirecting client calls SwapRegion, and never otherwise. A query is provided for the current value. We could simply decouple these values and just have a 'swap count' associated with the window which is used to mark pixmap contents when 'UnIdled'. * It doesn't make sense to have both the server and the compositor scheduling stuff. I think you'd specify that once you swap redirect a window, it gets simple: Good point. The redirected swap event should contain all of the swap parameters so that the Compositor can appropriately schedule the window swap with the matching screen swap. Actually, from the compositor's perspective, the window's front buffer doesn't matter, but you probably need to keep it current to make screenshot tools, etc, work correctly. My swap redirect plan has that pixmap getting swapped at the same time the screen pixmap is swapped, so things will look 'right'. Is this better than a more collaborative approach where the server and compositor together determine what pixmaps are idle? Idleness is certainly a joint prospect, but I don't think it's cooperative. Instead, a pixmap is idle
Re: Initial DRI3000 protocol specs available
Aaron Plattner aplatt...@nvidia.com writes: If I'm understanding this correctly, this requires the X server to receive a notification from the GPU that the swap is complete so it can send the SwapComplete event. Is there any chance this could be done with a Fence instead? The application could specify the fence in the Swap request, and then use that fence to block further rendering on the GPU or wait on the fence from the CPU. From what I've heard from application developers, there are two different operations here: 1) Throttle application rendering to avoid racing ahead of the screen 2) Keeping the screen up-to-date with simple application changes, but not any faster than frame rate. The SwapComplete event is designed for this second operation. Imagine a terminal emulator; it doesn't want to draw any faster than frame rate, but any particular frame can be drawn in essentially zero time. This application doesn't want to *block* at all, it wants to keep processing external events, like getting terminal output and user input events. As I understand it, a HW fence would cause the terminal emulator to stall down in the driver, blocking processing of all of the events and terminal output. For simple application throttling, that wouldn't use these SwapComplete events, rather it would use whatever existing mechanisms exist for blocking rendering to limit the application frame rate. We typically try to do the scheduling on the GPU when possible because triggering an interrupt and waking up the X server burns power and adds latency for no good reason. Right, we definitely don't want a high-performance application to block waiting for an X event to arrive before it starts preparing the next frame. -- keith.pack...@intel.com pgpSQnDQhrpjU.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On Thu, 2013-02-28 at 16:55 -0800, Keith Packard wrote: * It would be great if we could figure out a plan to get to the point where the exact same application code is going to work for proprietary and open source drivers. When you get down to the details of swap this isn't close to the case currently. Agreed -- the problem here is that except for the nVidia closed drivers, everything else implicitly serializes device access through the kernel, providing a natural way to provide some defined order of operations. Failing that, I'd love to know what mechanisms *could* work with that design. I don't think serialization is actually the big issue - although it's annoying to deal with fences that are no-op for the open sources, it's pretty well defined where you have to insert them, and because they are no-op's for the open source drivers, there's little overhead. Notification is more of an issue. - Because swap handled client side in some drivers, INTEL_swap_event is seen as awkward to implement. I'm not sure what could be done here, other than to have some way for the X server to get information about the swap and stuff it into the event stream, of course. It could be as simple as having the client stuff the event data to the X server itself. It may be that a focus on redirection makes things easier - once the compositor is involved, we can't get away from X server involvement. The compositor is the main case where the X server can be completely bypassed when swapping. And I'm less concerned about API divergence for the compositor. (Not that I *invite* it...) - There is divergence on some basic behaviors, e.g., whether glXSwapBuffers() glFinish() waits for the swap to complete or not. glXSwapBuffers is pretty darn explicit in saying that it *does not* wait for the swap to complete, and glFinish only promises to synchronize the effects of rendering (contents of the frame buffer), not the actual swap operation itself. I'm not sure how we're supposed to respond when drivers ignore the spec and do their own thing? I wish the GLX specification was clear enough so we actually knew who was ignoring the spec and doing their own thing... ;-) The GLX specification describes the swap operation as the contents of the back buffer become the contents of the front buffer ... that seems like an operation on the contents of the frame buffer. But getting into the details here is a bit of a distraction - my goal is to try to get us to convergence so we have only one API with well defined behaviors. - When rendering with a compositor, the X server is innocent of relevant information about timing and when the application should draw additional new frames. I've been working on handing this via client = compositor protocols With 'Swap', I think the X server should be involved as it is necessary to get be able to 'idle' buffers which aren't in use after the compositor is done with them. I tried to outline a sketch of how that would work before. (https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html) But this adds a lot of complexity to the minimal client, especially when a client wants to work both redirected and unredirected. Right, which is why I think fixing the X server to help here would be better. If the goal is really to obsolete the proposed WM spec changes, rather than just make existing GLX apps work better, then there's quite a bit of stuff to get right. For example, from my perspective, the OML_sync_control defined UST timestamps are completely insufficient - it's not even defined what the units are for these timestamps! I think it would be great if we could sit down and figure out what the Linux-ecosystem API is for this in a way we could give to application authors. Ideally, a GL application using simple GLX or EGL APIs would work 'perfectly', without the need to use additional X-specific APIs. My hope with splitting DRI3000 into separate DRI3 and Swap extensions is to provide those same semantics to simple double-buffered 2D applications using core X and Render drawing as well, without requiring that they be rewritten to use GL, and while providing all of the same functionality over the network as local direct rendering applications get today. The GLX APIs have some significant holes and poorly defined aspects. And they don't properly take compositing into account, which is the norm today. So providing those capabilities to 2D apps seems of limited utility. [...] The SwapComplete event is specified as - This event is delivered when a SwapRegion operation completes - but the specification of SwapRegion itself is fuzzy enough that I'm unclear exactly what that means. - The description SwapRegion needs to define swap since the operation has only a vague resemblance to the English-language meaning of swap. Right, SwapRegion can
Re: Initial DRI3000 protocol specs available
On 03/07/2013 12:49 PM, Keith Packard wrote: * PGP Signed by an unknown key Aaron Plattner aplatt...@nvidia.com writes: If I'm understanding this correctly, this requires the X server to receive a notification from the GPU that the swap is complete so it can send the SwapComplete event. Is there any chance this could be done with a Fence instead? The application could specify the fence in the Swap request, and then use that fence to block further rendering on the GPU or wait on the fence from the CPU. From what I've heard from application developers, there are two different operations here: 1) Throttle application rendering to avoid racing ahead of the screen 2) Keeping the screen up-to-date with simple application changes, but not any faster than frame rate. The SwapComplete event is designed for this second operation. Imagine a terminal emulator; it doesn't want to draw any faster than frame rate, but any particular frame can be drawn in essentially zero time. This application doesn't want to *block* at all, it wants to keep processing external events, like getting terminal output and user input events. As I understand it, a HW fence would cause the terminal emulator to stall down in the driver, blocking processing of all of the events and terminal output. If you associate an X Fence Sync with your swap operation, the driver has the option to trigger it directly from the client command stream and wake up only the applications waiting for that fence. The compositor, if using GL, could have received the swap notification event and already programmed the response compositing based on it before the swap even completes, and just insert a token to make the GPU or kernel wait for the fence to complete before executing the compositing rendering commands. Thanks, -James For simple application throttling, that wouldn't use these SwapComplete events, rather it would use whatever existing mechanisms exist for blocking rendering to limit the application frame rate. We typically try to do the scheduling on the GPU when possible because triggering an interrupt and waking up the X server burns power and adds latency for no good reason. Right, we definitely don't want a high-performance application to block waiting for an X event to arrive before it starts preparing the next frame. ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 03/07/2013 01:19 PM, Owen Taylor wrote: On Thu, 2013-02-28 at 16:55 -0800, Keith Packard wrote: * It would be great if we could figure out a plan to get to the point where the exact same application code is going to work for proprietary and open source drivers. When you get down to the details of swap this isn't close to the case currently. Agreed -- the problem here is that except for the nVidia closed drivers, everything else implicitly serializes device access through the kernel, providing a natural way to provide some defined order of operations. Failing that, I'd love to know what mechanisms *could* work with that design. Fence syncs. Note the original fence sync + multi-buffer proposal solved basically the same problems you're trying to solve here, as well as everything Owen's WM spec updates do, but more generally, and with that, a little more implementation complexity. It included proposals to make minor updates to GLX/EGL as well to tie them in with the newer model. There didn't seem to be much interest outside of NVIDIA, so besides fence sync, the ideas are tabled internally ATM. I don't think serialization is actually the big issue - although it's annoying to deal with fences that are no-op for the open sources, it's pretty well defined where you have to insert them, and because they are no-op's for the open source drivers, there's little overhead. Notification is more of an issue. - Because swap handled client side in some drivers, INTEL_swap_event is seen as awkward to implement. I'm not sure what could be done here, other than to have some way for the X server to get information about the swap and stuff it into the event stream, of course. It could be as simple as having the client stuff the event data to the X server itself. It may be that a focus on redirection makes things easier - once the compositor is involved, we can't get away from X server involvement. The compositor is the main case where the X server can be completely bypassed when swapping. And I'm less concerned about API divergence for the compositor. (Not that I *invite* it...) - There is divergence on some basic behaviors, e.g., whether glXSwapBuffers() glFinish() waits for the swap to complete or not. glXSwapBuffers is pretty darn explicit in saying that it *does not* wait for the swap to complete, and glFinish only promises to synchronize the effects of rendering (contents of the frame buffer), not the actual swap operation itself. I'm not sure how we're supposed to respond when drivers ignore the spec and do their own thing? I wish the GLX specification was clear enough so we actually knew who was ignoring the spec and doing their own thing... ;-) The GLX specification describes the swap operation as the contents of the back buffer become the contents of the front buffer ... that seems like an operation on the contents of the frame buffer. The GLX spec is plenty clear here. It states: Subsequent OpenGL commands can be issued immediately, but will not be executed until the buffer swapping has completed... And glFinish, besides the fact that it counts as a GL command, isn't defined as simply waiting until effects on the framebuffer land. All rendering, client, and server (GL server, not X server) state side effects from previous operations must settle before it returns. SwapBuffers affects all three of those. Same for fence syncs with condition GL_SYNC_GPU_COMMANDS_COMPLETE. So if the drawable swapped is current to the thread calling swap buffers, and they issue any other GL commands afterwards, including glFinish, glFenceSync, etc., those commands can't complete until after the swap operation does. For glFinish, that means it can't return. For fence, the fence won't trigger until the swap finishes. If implementations aren't behaving that way, it's a bug in the implementation. Not to say our implementation doesn't have bugs, but AFAIK, we don't have that one. Thanks, -James But getting into the details here is a bit of a distraction - my goal is to try to get us to convergence so we have only one API with well defined behaviors. - When rendering with a compositor, the X server is innocent of relevant information about timing and when the application should draw additional new frames. I've been working on handing this via client = compositor protocols With 'Swap', I think the X server should be involved as it is necessary to get be able to 'idle' buffers which aren't in use after the compositor is done with them. I tried to outline a sketch of how that would work before. (https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html) But this adds a lot of complexity to the minimal client, especially when a client wants to work both redirected and unredirected. Right, which is why I think fixing the X server to help here would be better. If the goal is really to obsolete the proposed WM
Re: Initial DRI3000 protocol specs available
James Jones jajo...@nvidia.com writes: If you associate an X Fence Sync with your swap operation, the driver has the option to trigger it directly from the client command stream and wake up only the applications waiting for that fence. Yeah, right now we're doing some hand-waving about serialization which isn't entirely satisfying. The compositor, if using GL, could have received the swap notification event and already programmed the response compositing based on it before the swap even completes, and just insert a token to make the GPU or kernel wait for the fence to complete before executing the compositing rendering commands. We just don't have these issues with the open source drivers, so it's really hard for us to reason about this kind of asynchronous operation. Access to the underlying buffers is mediated by the kernel which ensures that as long as you serialize kernel calls, you will serialize hw execution as well. -- keith.pack...@intel.com pgpUbjPWFp_KU.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
James Jones jajo...@nvidia.com writes: There didn't seem to be much interest outside of NVIDIA, so besides fence sync, the ideas are tabled internally ATM. This shouldn't surprise you though -- no-one else needs this kind of synchronization, so it's really hard for anyone to evaluate it. And, DRI2 offers 'sufficient' support for the various GL sync extensions. So, what I'd like to know is if you think nVidia could take advantage of the Swap extension so that nVidia 3D applications could do the whole Swap redirect plan? If so, then I'm a lot more interested in figuring out how we can get apps using the necessary fencing to actually make it work right. -- keith.pack...@intel.com pgpC1oD49AHAr.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Owen Taylor otay...@redhat.com writes: A complex scheme where the compositor and the server collaborate on the implementation of SwapRegion seems fragile to me, and still doesn't get some details right - like the swap count returned from SwapRegion. What if we made SwapRegion redirectable along the lines of ResizeRedirectMask? Since it would be tricky to block the client calling SwapRegion until the compositor responded, this would probably require removing the reply to SwapRegion and sending everything needed back in events. When I first read this a week ago, I thought this was a crazy plan; but upon reflection, I think this is exactly the right direction. I've written up a blog posting in more detail about that here: http://keithp.com/blogs/composite-swap/ SwapScheduled - whatever information is available immediately on receipt of SwapRegion I think this can still be in the reply to SwapRegion itself; essentially all we're returning is the swap-hi/swap-lo numbers and a suggestion for future buffer allocation sizes. We could place the buffer size hints in a separate event, but I don't think they're that critical; it's just a hint, and we'll get it right after a couple of swaps once the user stops moving the window around anyways. SwapIdle - a buffer is returned to the application for rendering SwapComplete - the swap actually happened and we know the msc/sbc/ust triple Yup. The blog posting suggests how the Complete event might be delayed until the Compositor gets the content up onto the screen itself. I also think that SwapIdle should *not* be an event. Instead, the client should mark its pixmap as 'becomes idle upon swap'; on redirection, the compositor ends up holding the last 'its not idle yet' bit, and when it does the 'becomes idle upon swap', then the buffer goes idle. The client must then tell the server to un-idle the pixmap, and that request will return whether the contents were preserved or not. This has to be synchronous or huge races will persist. But I don't know that you need that much granularity. I think SwapIdle and SwapComplete are sufficient. As above, SwapIdle isn't good enough, an explicit un-idle request is required. Tricky parts: * Not leaking buffers during redirection/unredirection could be tricky. What if the compositor exits while a client is waiting for a SwapIdle? An event when swap is redirected/unredirected is probably necessary. When the Compositor exits, the X server will know all of the pending SwapRegion requests and can 'unredirect' them easily enough. I don't want to tell apps when they're getting redirected/unredirected, and I don't think it's necessary. * To make this somewhat safe, the concept of idle has to be one of correct display not system stability. It can't take down the system if the compositor sends SwapIdle at the wrong time. See above. * Because the SBC is a drawable attribute it's a little complex to continue having the right value over swap redirection. When a window is swap-redirected, we say that the SBC is incremented by one every time the redirecting client calls SwapRegion, and never otherwise. A query is provided for the current value. We could simply decouple these values and just have a 'swap count' associated with the window which is used to mark pixmap contents when 'UnIdled'. * It doesn't make sense to have both the server and the compositor scheduling stuff. I think you'd specify that once you swap redirect a window, it gets simple: Good point. The redirected swap event should contain all of the swap parameters so that the Compositor can appropriately schedule the window swap with the matching screen swap. Actually, from the compositor's perspective, the window's front buffer doesn't matter, but you probably need to keep it current to make screenshot tools, etc, work correctly. My swap redirect plan has that pixmap getting swapped at the same time the screen pixmap is swapped, so things will look 'right'. Is this better than a more collaborative approach where the server and compositor together determine what pixmaps are idle? Idleness is certainly a joint prospect, but I don't think it's cooperative. Instead, a pixmap is idle when both application and compositor say it is idle. The SwapRedirect explicitly marks the pixmap as 'not idle' for the compositor, and an explicit 'make it idle' call is required by the compositor. -- keith.pack...@intel.com pgpywS4NeLds1.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Owen Taylor otay...@redhat.com writes: Sorry for joining the discussion late here. I went through the specs and discussion and have various comments and questions. Thanks for reviewing stuff, of course. You're not late at all :-) * It would be great if we could figure out a plan to get to the point where the exact same application code is going to work for proprietary and open source drivers. When you get down to the details of swap this isn't close to the case currently. Agreed -- the problem here is that except for the nVidia closed drivers, everything else implicitly serializes device access through the kernel, providing a natural way to provide some defined order of operations. Failing that, I'd love to know what mechanisms *could* work with that design. - Because swap handled client side in some drivers, INTEL_swap_event is seen as awkward to implement. I'm not sure what could be done here, other than to have some way for the X server to get information about the swap and stuff it into the event stream, of course. It could be as simple as having the client stuff the event data to the X server itself. - There is divergence on some basic behaviors, e.g., whether glXSwapBuffers() glFinish() waits for the swap to complete or not. glXSwapBuffers is pretty darn explicit in saying that it *does not* wait for the swap to complete, and glFinish only promises to synchronize the effects of rendering (contents of the frame buffer), not the actual swap operation itself. I'm not sure how we're supposed to respond when drivers ignore the spec and do their own thing? - When rendering with a compositor, the X server is innocent of relevant information about timing and when the application should draw additional new frames. I've been working on handing this via client = compositor protocols With 'Swap', I think the X server should be involved as it is necessary to get be able to 'idle' buffers which aren't in use after the compositor is done with them. I tried to outline a sketch of how that would work before. (https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html) But this adds a lot of complexity to the minimal client, especially when a client wants to work both redirected and unredirected. Right, which is why I think fixing the X server to help here would be better. I think it would be great if we could sit down and figure out what the Linux-ecosystem API is for this in a way we could give to application authors. Ideally, a GL application using simple GLX or EGL APIs would work 'perfectly', without the need to use additional X-specific APIs. My hope with splitting DRI3000 into separate DRI3 and Swap extensions is to provide those same semantics to simple double-buffered 2D applications using core X and Render drawing as well, without requiring that they be rewritten to use GL, and while providing all of the same functionality over the network as local direct rendering applications get today. * One significant problem that I have currently is that the default mode for the Intel drivers is to use triple buffering and send back swap events *when rendering the next frame would not block* - that is, immediately. This results in a frame of unnecessary latency. (The too-early events are also missing ust/msc/sbc information.) As you noted, there are a whole range of suitable times to tell clients about their buffers: 1) Right after the swap, the client needs to know what happened to each buffer and what the scheduled swap time is. 2) When the buffer usage changes; for DRI2, that's what the Invalidate events are for. With DRI3 as proposed, that's what the reply to the SwapRegion contains. 3) When their contents actually appear on the screen. I suggest that we'll need all three to provide applications enough information to make good drawing choices. The SwapComplete event is specified as - This event is delivered when a SwapRegion operation completes - but the specification of SwapRegion itself is fuzzy enough that I'm unclear exactly what that means. - The description SwapRegion needs to define swap since the operation has only a vague resemblance to the English-language meaning of swap. Right, SwapRegion can either be a copy operation or an actual swap. The returned information about idle buffers tells the client what they contain, so I think the only confusion here is over the name of the request? - My interpretation of SwapRegion is that the actual propagation of source to destination is *asynchronous* to the X protocol stream. This is implied by Schedule a swap... but probably should be explicitly stated, since it is radically different from other rendering in X. Ok, a bit more wording clarifying the inherent asynchronous nature of the Swap operation seems necessary. - Is the serial in the SwapComplete event synchronous to the
Re: Initial DRI3000 protocol specs available
Sorry for joining the discussion late here. I went through the specs and discussion and have various comments and questions. - Owen * It would be great if we could figure out a plan to get to the point where the exact same application code is going to work for proprietary and open source drivers. When you get down to the details of swap this isn't close to the case currently. - Because swap handled client side in some drivers, INTEL_swap_event is seen as awkward to implement. - There is divergence on some basic behaviors, e.g., whether glXSwapBuffers() glFinish() waits for the swap to complete or not. - When rendering with a compositor, the X server is innocent of relevant information about timing and when the application should draw additional new frames. I've been working on handing this via client = compositor protocols (https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html) But this adds a lot of complexity to the minimal client, especially when a client wants to work both redirected and unredirected. I think it would be great if we could sit down and figure out what the Linux-ecosystem API is for this in a way we could give to application authors. * One significant problem that I have currently is that the default mode for the Intel drivers is to use triple buffering and send back swap events *when rendering the next frame would not block* - that is, immediately. This results in a frame of unnecessary latency. (The too-early events are also missing ust/msc/sbc information.) So I'd like to make sure that we know exactly what SwapComplete means and not have creative reinterpretations based on what works well for one client or another. The SwapComplete event is specified as - This event is delivered when a SwapRegion operation completes - but the specification of SwapRegion itself is fuzzy enough that I'm unclear exactly what that means. - The description SwapRegion needs to define swap since the operation has only a vague resemblance to the English-language meaning of swap. - My interpretation of SwapRegion is that the actual propagation of source to destination is *asynchronous* to the X protocol stream. This is implied by Schedule a swap... but probably should be explicitly stated, since it is radically different from other rendering in X. - Is the serial in the SwapComplete event synchronous to the protocol stream? E.g., can you assume that any CopyArea from the destination drawable before that serial will get the old contents, and a CopyArea from the destination after that serial will get the new contents? - What happens when multiple SwapRegion requests are made with a swap-interval of zero. Are previous ones discarded? - Is it an error to render to a non-idle pixmap? Is it an error to pass a non-idle pixmap as the source to SwapRegion? - What's the interaction between swap-interval and target-msc, etc? - When a window is redirected, what's the interpretation of swap-interval, target-msc, etc? Is it that the server performs the operation at the selected blanking interval (as if they window wasn't redirected), and then damage/other events are generated and the server picks it up and renders to the real front buffer at the next opportunity - usually a frame later. * Do I understand correctly that the idle pixmaps returned from a SwapRegion request are the pixmaps that *will* be idle once the corresponding SwapComplete event is received? If this is correct, what happens if things change before the swap actually happens and what was scheduled as a swap ends up being a copy? Is it sufficient to assume that a ConfigureNotify on a destination window means that all pixmaps passed to previous SwapRegion requests are now idle? * In the definition of SWAPIDLE you say: If valid is TRUE, swap-hi/swap-lo form a 64-bit swap count value from the SwapRegion request which matches the data that the pixmap currently contains If I'm not misunderstanding things, this is a confusing statement because, leaving aside damage to the front buffer, pixmaps always contain the same contents (whatever the client rendered into it.) Is the use of the swap-hi/swap-lo identify the SwapRegion problematical in the case where swaps aren't throttled? Would it be better to use sequence number of the request? Or is the pixmap itself sufficient? * What control, if any, will applications have over the number of buffers used - what the behavior will be when an application starts rendering another frame in terms of allocating a new buffer versus swapping? * Do we need to deal with stereo as part of this? ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
A complex scheme where the compositor and the server collaborate on the implementation of SwapRegion seems fragile to me, and still doesn't get some details right - like the swap count returned from SwapRegion. What if we made SwapRegion redirectable along the lines of ResizeRedirectMask? Since it would be tricky to block the client calling SwapRegion until the compositor responded, this would probably require removing the reply to SwapRegion and sending everything needed back in events. At the most granular, this would be: SwapScheduled - whatever information is available immediately on receipt of SwapRegion SwapIdle - a buffer is returned to the application for rendering SwapComplete - the swap actually happened and we know the msc/sbc/ust triple But I don't know that you need that much granularity. I think SwapIdle and SwapComplete are sufficient. Tricky parts: * Not leaking buffers during redirection/unredirection could be tricky. What if the compositor exits while a client is waiting for a SwapIdle? An event when swap is redirected/unredirected is probably necessary. * To make this somewhat safe, the concept of idle has to be one of correct display not system stability. It can't take down the system if the compositor sends SwapIdle at the wrong time. * Because the SBC is a drawable attribute it's a little complex to continue having the right value over swap redirection. When a window is swap-redirected, we say that the SBC is incremented by one every time the redirecting client calls SwapRegion, and never otherwise. A query is provided for the current value. * It doesn't make sense to have both the server and the compositor scheduling stuff. I think you'd specify that once you swap redirect a window, it gets simple: - SwapRegion called by another client - SwapRegionRequest is immediately generated. - SwapRegion called by redirecting client - action (either a swap or a copy) happens immediately. Actually, from the compositor's perspective, the window's front buffer doesn't matter, but you probably need to keep it current to make screenshot tools, etc, work correctly. Is this better than a more collaborative approach where the server and compositor together determine what pixmaps are idle? I think it might be a little simpler and more flexible. It also potentially allows a SwapRegion on a client window to be directly turned to a SwapRegion on the root window for a full-screen client. But there probably are a host of difficulties I'm not thinking of :-) - Owen ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Owen Taylor otay...@redhat.com writes: A complex scheme where the compositor and the server collaborate on the implementation of SwapRegion seems fragile to me, and still doesn't get some details right - like the swap count returned from SwapRegion. The question is whether the SwapCount is the count of swaps on the window, or the count of swaps to the final screen image. I think it's just the former, in which case the presence of a compositor doesn't have any effect on the correctness. The effect of the compositor is strictly in holding a reference to the window buffer and potentially delaying the reuse of that buffer within the application. What if we made SwapRegion redirectable along the lines of ResizeRedirectMask? Since it would be tricky to block the client calling SwapRegion until the compositor responded, this would probably require removing the reply to SwapRegion and sending everything needed back in events. At the most granular, this would be: We're really trying to avoid using events here -- they're quite messy on the client side as you must capture them deep within the X library implementation and shovel them over to the correct context within the Mesa library. Using replies makes it all quite simple; the correct context will be present automatically to receive the reply from SwapRegion. * Not leaking buffers during redirection/unredirection could be tricky. What if the compositor exits while a client is waiting for a SwapIdle? An event when swap is redirected/unredirected is probably necessary. With the current pixmap ID referencing technique, that ID will be freed when the compositor exits (I mentioned this would be required before) which will reduce the reference count to the point where the SwapIdle event would get sent. Is this better than a more collaborative approach where the server and compositor together determine what pixmaps are idle? I think it might be a little simpler and more flexible. It also potentially allows a SwapRegion on a client window to be directly turned to a SwapRegion on the root window for a full-screen client. But there probably are a host of difficulties I'm not thinking of :-) Oh. Having the compositor just take the new window buffer and send a SwapRegion from that to the root window. I hadn't thought of that, as I was still assuming we'd be 'unredirecting' full screen windows, but this sure looks like a compelling alternative plan that should simplify the compositor fairly nicely. Let's stick that one on the list of 'it sure would be nice if this were possible' list and see what it will take to make it work. -- keith.pack...@intel.com pgp0TjY1M2fjo.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On Wed, Feb 20, 2013 at 10:17:56PM -0800, Keith Packard wrote: Chris Wilson ch...@chris-wilson.co.uk writes: You manage ask yourself the question I was trying to lead: how the heck does the compositor learn that the underlying graphics object has changed? It can certainly tell that the underlying contents have changed with Damage events, but as to how it knows that it should do another BufferFromPixmap request, I think that's gonna require another event. Now, the big question is how to deal with compositing managers which don't know about DRI3. I suspect we'll just have to skip the DMA-BUF swapping hack for window pixmaps unless the compositor gives us the OK to do so. If a DRI2 client also grabs the buffer, then we have to fallback to blits. That should be fairly easy to detect and handle. In DRI2 this is through the InvalidateEvent, and the lack of being able to send those from the driver before the Damage is sent is one of the reasons why the current exchange mechanism is broken. Hrm. With the current system, except for override-redirect windows, if the compositor is also the window manager, it should always know when the window pixmap is going to be replaced because that only happens when the window is resized, and the window manager is entirely responsible for making that happen. For override-redirect windows, if the ConfigureNotify event was delivered before the Damage event, then the compositor could know about that as well. We are concerned with the GEM objects backing the Pixmaps, which may be changed at whim by the driver. I guess I'd like to know more about what is broken with the current system for compositors... We cannot perform simple name exchanges currently in DRI2 because the Damage is badly ordered wrt the Invalidate event and there is no coordination between client - server - compositor on when the buffers are reusable by the client. In any case, as the underlying DMA-BUF is changing, the compositor is going to need to know that so it can release the old pixmap back to the application, and so it can rewire its own compositing operations to use the new object. It might be nice to have the compositor use persistent names for the various DMA-BUFs that are used for a particular window. I think that means having the compositor hold on to old DMA-BUF window pixmap IDs. That seems tricky though. The alternative will be to have the compositor create/destroy a pixmap ID per frame. Not intolerable, but not optimal. Getting buffer exchanges working in conjunction with the external compositor is more or less as tricky as it gets. The notion that the buffer is kept busy by the compositor and so prevents the DRI3 client from overdrawing it is key. And that naturally leads to the compositor needing to release the old buffer once it is referencing the new post-swap buffer. Right, an easy technique there would be to have it use NameWindowPixmap when it got an event telling it that a new pixmap was in use for the window, and then when it was finished with that pixmap, it could just use FreePixmap to tell the server it was done. That would bump the refcnt down to one in the server, at which point it could queue the pixmap to be sent as 'idle' the next time SwapRegion was called. Serialisation between rendering of the common buffers is definitely s.e.p. I agree that should solve the compositor problem. Ok, cool. So, changes that I think are needed: 1) If someone calls TextureFromPixmap on a window pixmap, we need to suppress the window pixmap swapping hack. Alternatively, we can have the compositor explicitly enable window pixmap swapping. I think we definitely want to support window swapping with DRI3 compositors. DRI2 compositors will just have to continue to force blits. 2) We need to send an event when the buffer underlying a window switches. 3) We need to be explicit about event ordering between the new window pixmap change notify event and any related Damage. As I see it the challenge is to prevent sending the buffer release (SwapIdle) back to the client before all interested third parties have had a chance to snoop its contents and react. Sketching that out we need to increment the busy count everytime we send an Invalidate and expect the client (compositor) to send a release after they have finished processing the buffer. For fun, imagine a fullscreen redirected Window (because Wine still manages to confuse everybody): client servercompositor new drawable - setup Damage show A (Swap) - invalidate buffer damage - - show A (Swap) flip A show B invalidate buffer damage - - show B
Re: Initial DRI3000 protocol specs available
On 21 February 2013 07:17, Keith Packard kei...@keithp.com wrote: Chris Wilson ch...@chris-wilson.co.uk writes: Hrm. With the current system, except for override-redirect windows, if the compositor is also the window manager, it should always know when There are compositors that are not window mangers I use one because without a compositor xwd -id does not work. Thanks Michal ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Hi, And for Render, along with passing blobs. Yeah, I can easily imagine doing a PictureFromBuffer as well. Let's focus on Pixmaps for now and get Mesa fixed up. Passing blobs (if this means what I think it does) would be something we are looking forward for Java's XRender backend. Currently we upload 32x32 alpha masks using XPutImage and most drivers (except SNA/intel) don't cope with this very well. In effect it would help to cure the weakness of XRender when it comes to geometry. Regards, Clemens ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Stéphane Marchesin stephane.marche...@gmail.com writes: With that said, I don't think it's that difficult/different. I can design a GLX extension spec and send a draft, then we can work from there. Yeah, some concrete plan for GL would be really nice to have, at least as a starting point. That is actually not what you want because it is a waste of bandwidth. Since compositors are typically bandwidth limited, you instead want to paint only the relevant sub regions. Those are easy to determine by transforming X damage regions into screen coordinates. Of course, that's what SwapRegion is for -- it will get to pick whether to copy or page flip and let the client know what happened, the region you pass Most non-trivial compositing managers are already using partial update schemes through GLX_MESA_copy_sub_buffer or the GLX_EXT_buffer_age extensions + copies. I don't think it is far fetched to support a list of rectangles instead. A region is already a list of rectangles; the only restriction that the relative location of all of the source and dest rectangles is the same. This satisfies the goal of doing a damage-based back-front update. -- keith.pack...@intel.com pgpb3Q267E6v2.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Chris Wilson ch...@chris-wilson.co.uk writes: If a DRI2 client also grabs the buffer, then we have to fallback to blits. That should be fairly easy to detect and handle. So, the question is whether the NameWindowPixmap IDs are stable across pixmap replacement. I'm frankly tempted to add a new event to Composite that is sent whenever the window pixmap changes -- that way applications wouldn't have to guess that the pixmap changed whenever the window was resized. This would also provide an opportunity to improve resize performance as the X server could over-allocate window pixmaps during the resize operation, and then shrink them back down once the final size had been selected. We are concerned with the GEM objects backing the Pixmaps, which may be changed at whim by the driver. Huh? We cannot perform simple name exchanges currently in DRI2 because the Damage is badly ordered wrt the Invalidate event and there is no coordination between client - server - compositor on when the buffers are reusable by the client. Right, so we clearly need to pass the backing buffer from application to X server and thence to the compositor. What I'm not sure about is how to name these buffers, and how to scope their lifetime. Here's a quick proposal -- have the X server assign server XIDs to the buffers, and send those XIDs to the compositor in events. Now the compositor is responsible for telling the server (some new 'IdlePixmap' call?) when it finishes with the objects, at which point they can be released back to the application. We'd need some magic to make sure the pixmaps got freed if the compositor crashed, but I think that's easier than trying to figure out how to allocate XIDs in the compositor ID space from within the X server. This would replace NameWindowPixmap, and would eliminate the current race conditions between the ConfigureNotify and the NameWindowPixmap call while also providing traceable ownership of the buffer contents: busyapplication X servercompositor A Draw to buffer A Allocate pixmap ID for buffer A, 'Pixmap Aa' SwapRegion Pixmap Aa Allocate server ID for Pixmap Aa, 'Pixmap Ax' Send 'window pixmap changed' event 'Pixmap Ax' Receive event Convert 'Pixmap Ax' into buffer A using TextureFromPixmap paint screen using buffer A ... AB Draw to buffer B Allocate pixmap ID for buffer B, 'Pixmap Ba' SwapRegion Pixmap Ba Allocate server id for Pixmap Ba, 'Pixmap Bx' Send 'window pixmap changed' event 'Pixmap Ax' Receive event IdlePixmap Pixmap Ax Convert 'Pixmap Bx' into buffer B using TextureFromPixmap paint screen using buffer B B Mark Pixmap Aa as idle BC Draw to buffer C Allocate pixmap ID for buffer C, 'Pixmap Ca' SwapRegion Pixmap Ca Reply with Pixmap Aa idle Allocate server id for Pixmap Ca, 'Pixmap Cx' Send 'window pixmap changed' event 'Pixmap Cx' Receive event IdlePixmap Pixmap Bx Convert 'Pixmap Cx' into buffer C using TextureFromPixmap paint screen using buffer C C Mark Pixmap Ba as idle CA Draw to buffer A SwapRegion Pixmap Aa Reply with Pixmap Ba idle Send 'window pixmap changed' event 'Pixmap Ax' Receive event IdlePixmap Pixmap Cx paint screen using buffer A A Mark Pixmap Ca as idle At this point, we're in a steady state, using three buffers for the window -- a 'back buffer', a 'front buffer' and an 'idle buffer'. One easy thing for memory usage is to consider idle buffers as suitable for discard in the kernel; that would get us to one pinned buffer in the idle case, although we'd be using three buffers while active. It would be nice to flip between two buffers instead, but there may be compositor rendering traffic in flight using buffer A as the application draws to B and then C. Hrm. What we need is for the client to learn that the compositor has marked a buffer idle before it starts drawing; the current design places that information in the reply to SwapRegion,
Re: Initial DRI3000 protocol specs available
On Tue, Feb 19, 2013 at 7:46 PM, Keith Packard kei...@keithp.com wrote: And here's the Swap extension The Swap Extension Version 1.0 2013-2-14 Keith Packard kei...@keithp.com Intel Corporation 1. Introduction The Swap extension provides GL SwapBuffers semantics to move pixels From a pixmap to a drawable. This can be used by OpenGL implementations or directly by regular applications. 1.1. Acknowledgments Eric Anholt e...@anholt.net Dave Airlie airl...@redhat.com Kristian Høgsberg k...@bitplanet.net ❄ ❄ ❄ ❄ ❄ ❄ ❄ 2. Data Types The server side region support specified in the Xfixes extension version 2 is used in the SwapRegion request. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 4. Errors No errors are defined by the Swap extension. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 5. Events The Swap extension provides a new event, SwapComplete, to signal when a swap operation has finished. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 6. Protocol Types SWAPSELECTMASK { SwapCompleteMask } Used with SwapSelectInput to specify which events a client is to receive. SWAPIDLE { pixmap: PIXMAP valid: BOOL swap-hi: CARD32 swap-lo: CARD32 } This structure contains information about a pixmap which had been used in a SwapRegion request and which the server is now finished with. If valid is TRUE, swap-hi/swap-lo form a 64-bit swap count value from the SwapRegion request which matches the data that the pixmap currently contains. If valid is FALSE, then the contents of the pixmap are undefined. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 7. Extension Initialization The name of this extension is Swap. ┌─── SwapQueryVersion client-major-version: CARD32 client-minor-version: CARD32 ▶ major-version: CARD32 minor-version: CARD32 └─── The client sends the highest supported version to the server and the server sends the highest version it supports, but no higher than the requested version. Major versions changes can introduce incompatibilities in existing functionality, minor version changes introduce only backward compatible changes. It is the clients responsibility to ensure that the server supports a version which is compatible with its expectations. Backwards compatible changes included addition of new requests. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 8. Extension Requests ┌─── SwapRegion destination: DRAWABLE region: REGION src-off-x,src-off-y: INT16 source: PIXMAP swap-interval: CARD32 target_msc_hi: CARD32 target_msc_lo: CARD32 divisor_hi: CARD32 divisor_lo: CARD32 remainder_hi: CARD32 remainder_lo: CARD32 ▶ swap_hi: CARD32 swap_lo: CARD32 suggested-x-off,suggested-y-off: INT16 suggested-width,suggested-height: CARD16 idle: LISTofSWAPIDLE └─── Errors: Pixmap, Drawable, Region, Value Schedule a swap of the specified region from the source pixmap to the destination drawable. region specifies the region within the destination to be swapped from the source. src-off-x and src-off-y specify the offset to be added to region to align it with the source pixmap. swap-interval specifies the minimum number of frames since the last SwapRegion request. target_msc_hi/target_msc_lo form a 64-bit value marking the target media stamp count for the swap request. When non-zero, these mark the desired time where the data should be presented. divisor_hi/divisor_lo form a 64-bit value marking the desired media stamp count interval between swaps. remainder_hi/remainder_lo form a 64-bite value marking the desired offset within the divisor_hi/divisor_lo swap interval. In the reply, swap_hi/swap_lo form a 64-bit swap count value when the swap will actually occur (e.g. the last queued swap count + (pending swap count * swap interval)). suggested-width and suggested-height offer a hint as to the best pixmap size to use for full-sized swaps in the future. suggested-x-off and suggested-y-off provide a hint as to where the window contents should be placed within that allocation for future swaps. idle provides a list of pixmaps which were passed in previous SwapRegion requests by this client targeting the same destination. How
Re: Initial DRI3000 protocol specs available
On Tue, Feb 19, 2013 at 07:45:09PM -0800, Keith Packard wrote: ┌─── DRI3BufferFromPixmap pixmap: PIXMAP ▶ depth: CARD8 width, height, stride: CARD16 depth, bpp: CARD8 buffer: FD └─── Errors: Pixmap, Match Pass back a direct rendering object associated with pixmap. Future changes to pixmap will be visible in that direct rendered object. The pixel format and geometry of the buffer are returned along with a file descriptor referencing the underlying direct rendering object. What is the serialization for multiple clients using BufferFromPixmap? (In particular, with a compositor reading from a DRI3 client controlled PixmapFromBuffer.) Do we need an Invalidate for when the GEM object is exchanged for the Pixmap following a Swap (or other external modifications)? Are all operations still implicitly flushed to the GPU before any reply to the Client? Extending this protocol to supersede MIT-SHM would also be useful if it makes the serialization explicit. 11.2 XvMC / Xv It might be nice to be able to reference YUV formatted direct rendered objects from the X server. And for Render, along with passing blobs. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On Tue, Feb 19, 2013 at 07:46:22PM -0800, Keith Packard wrote: ┌─── SwapRegion destination: DRAWABLE region: REGION src-off-x,src-off-y: INT16 source: PIXMAP swap-interval: CARD32 target_msc_hi: CARD32 target_msc_lo: CARD32 divisor_hi: CARD32 divisor_lo: CARD32 remainder_hi: CARD32 remainder_lo: CARD32 ▶ swap_hi: CARD32 swap_lo: CARD32 suggested-x-off,suggested-y-off: INT16 suggested-width,suggested-height: CARD16 idle: LISTofSWAPIDLE └─── What I don't see here is how the client instructs the server to handle a missed swap. For example, with the typical use of swap-interval 0, divisor = 0, target = current we can choose to either emit this SwapRegion synchronously, or asynchronously (to risk tearing but allow the client catch up to its target framerate). Actually, there isn't a mention of whether this should be synchronized to the display at all (and how to handle synchronisation across multiple scanouts). What happens for a delayed error? In the reply, swap_hi/swap_lo form a 64-bit swap count value when the swap will actually occur (e.g. the last queued swap count + (pending swap count * swap interval)). I'm not sure exactly what SBC is meant to be. Is it a simple seqno of the SwapRegion in this Drawable's swap queue (why then does swap_interval matter), or is it meant to correlate with the vblank counter (in which case it is merely a predicted value)? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 2013-02-19 22:46, Keith Packard wrote: A.3 Protocol Events The Swap extension specifies the SwapComplete event. ┌─── SwapComplete 1 CARD8 type 1 CARD8 extension 2 CARD16 sequenceNumber 4 DRAWABLEdrawable 4 CARD32 ust_hi 4 CARD32 ust_lo 4 CARD32 msc_hi 4 CARD32 msc_lo 4 CARD32 sbc_hi 4 CARD32 sbc_lo └─── May I suggest that all new events be Generic Events? One event isn't too bad, but the legacy event space is already crowded. SwapComplete 1 35 GenericEvent 1 CARD8 extension 2 CARD16 sequenceNumber 4 2 length 2 CARD16 evtype 2 unused 4 DRAWABLEdrawable 4 CARD32 ust_hi 4 CARD32 ust_lo 4 CARD32 msc_hi 4 CARD32 msc_lo 4 CARD32 sbc_hi 4 CARD32 sbc_lo (I assume extension in the original is a typo. If it isn't and an extra byte of data is needed, it easily fits in the two bytes of unused after evtype). Peter Harris -- Open Text Connectivity Solutions Group Peter Harrishttp://connectivity.opentext.com/ Research and DevelopmentPhone: +1 905 762 6001 phar...@opentext.comToll Free: 1 877 359 4866 ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Chris Wilson ch...@chris-wilson.co.uk writes: What is the serialization for multiple clients using BufferFromPixmap? Once they've got a handle to the object, it's really up to them to serialize among themselves. We don't have any control of the underlying direct rendering infrastructure. (In particular, with a compositor reading from a DRI3 client controlled PixmapFromBuffer.) This case is half supported by the Swap semantics -- the pixmap is handed from glxgears to the server with PixmapFromBuffer, then used in a SwapRegion operation. Once given to that, that pixmap is 'busy' until the server releases it back to the client in a future reply to SwapRegion. When the compositor then does a BufferFromPixmap on the same object, we want that pixmap to remain busy until the compositor is done using it. Would it suffice to require that the compositor call FreePixmap to signal that it was done using it? And that leads to another question here -- if we're swapping pixmaps between back buffer and window buffer, how the heck does the compositor learn that the underlying graphics object has changed? Ideally, we'd be able to re-use the same BufferFromPixmap result across multiple frames, but that would mean the compositor would need to be provided new window pixmap IDs. I wonder if we need a Swap events to send new pixmap IDs when the swap happened? That would be pretty easy at least. Those would need to be delivered before any related Damage events so that the compositor would see the new Pixmap ID before it responded to the damage. Do we need an Invalidate for when the GEM object is exchanged for the Pixmap following a Swap (or other external modifications)? I'm afraid I don't understand this question. Are you thinking that pixmap IDs will end up changing which GEM objects they point at? Are all operations still implicitly flushed to the GPU before any reply to the Client? Not necessarily flushed to the GPU, of course, but there definitely needs to be some serialization mechanism that multiple DRI clients sharing the same DRI buffer are using, and the X server needs to participate in that serialization mechanism as appropriate for the underlying hardware. That's really up to the specific DRI infrastructure though. We should make this explicit in the DRI3 spec though so that DRI implementations aren't surprised by the requirement again. Extending this protocol to supersede MIT-SHM would also be useful if it makes the serialization explicit. I've hacked up MIT-SHM to use FD passing already. It's nice to have something that can pass *arbitrary* memory mappings instead of just DMA-BUFs. And for Render, along with passing blobs. Yeah, I can easily imagine doing a PictureFromBuffer as well. Let's focus on Pixmaps for now and get Mesa fixed up. -- keith.pack...@intel.com pgpPa4RQRqoRJ.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Stéphane Marchesin stephane.marche...@gmail.com writes: How would you handle atomic swaps? Multiple of these back to back? Do you want to synchronously swap multiple windows? Or swap sections From multiple back buffers? -- keith.pack...@intel.com pgpga6esDld5C.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On Wed, Feb 20, 2013 at 12:01 PM, Keith Packard kei...@keithp.com wrote: Stéphane Marchesin stephane.marche...@gmail.com writes: How would you handle atomic swaps? Multiple of these back to back? Do you want to synchronously swap multiple windows? Or swap sections From multiple back buffers? I'm interested in two specific use cases: - Swap to an overlay and flip a crtc in an atomic fashion, - Specify a list of dirty rectangles for a single frame, like what CopyRegion does but with multiple rectangles. Stéphane ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Chris Wilson ch...@chris-wilson.co.uk writes: What I don't see here is how the client instructs the server to handle a missed swap. Right, this first pass was just trying to replicate the DRI2 semantics; figuring out how to improve those seems like a good idea. From what game developers have told us, a missed swap should just tear instead of dropping a frame. It might be nice to inform the client that they're not keeping up with the target frame rate and let them scale stuff back; I'd suggest the SwapComplete event could contain enough information to let them know what actually happened. For example, with the typical use of swap-interval 0, divisor = 0, target = current we can choose to either emit this SwapRegion synchronously, or asynchronously (to risk tearing but allow the client catch up to its target framerate). I haven't heard anyone asking for us to skip a frame in this case to avoid tearing. Actually, there isn't a mention of whether this should be synchronized to the display at all (and how to handle synchronisation across multiple scanouts). Eric suggested we fix the multi screen problem by making the application figure this out (if they like). I'm thinking we would just add a RandR CRTC to the request, or let the client set it to None if they don't care. What happens for a delayed error? What kind of errors can happen after the request is validated? I'd hope that these cases would be truly exceptional. Realistically, we don't have any place to report these that the DRI library could ever hope to see them reliably. We could report them in an event, but that would need to trust on the kindness of the application to send it along to the library. We could pend the errors and report them in a future SwapRegion reply, but that would presume that the application is continuously rendering frames. In the reply, swap_hi/swap_lo form a 64-bit swap count value when the swap will actually occur (e.g. the last queued swap count + (pending swap count * swap interval)). I'm not sure exactly what SBC is meant to be. Is it a simple seqno of the SwapRegion in this Drawable's swap queue (why then does swap_interval matter), or is it meant to correlate with the vblank counter (in which case it is merely a predicted value)? Just like DRI2, it's the planned swap time. Don't be late! -- keith.pack...@intel.com pgp8GmR9HTF2y.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Peter Harris phar...@opentext.com writes: May I suggest that all new events be Generic Events? One event isn't too bad, but the legacy event space is already crowded. Yes, of course. I didn't worry too much about the encoding part, I'm afraid :-) -- keith.pack...@intel.com pgpW0UG26xHXe.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Stéphane Marchesin stephane.marche...@gmail.com writes: I'm interested in two specific use cases: - Swap to an overlay and flip a crtc in an atomic fashion, As you may remember, I proposed a bunch of RandR changes to support per-CRTC pixmaps and atomic mode setting operations a while back. With hardware now commonly supporting multiple overlays, even that stuff wouldn't suffice anymore. Off the top of my head, we'd need to construct some Drawable that represented each overlay, and then perform a PolySwapRegion operation to synchronously update their contents from appropriate back buffers. - Specify a list of dirty rectangles for a single frame, like what CopyRegion does but with multiple rectangles. And they're not arranged so that a single region and source offset x/y could be used? I can imagine creating a SwapRectangles request, but I don't know that it would be any better than simply executing multiple SwapRegion requests. -- keith.pack...@intel.com pgpmZJlgmN56A.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On Wed, Feb 20, 2013 at 12:42 PM, Keith Packard kei...@keithp.com wrote: Stéphane Marchesin stephane.marche...@gmail.com writes: I'm interested in two specific use cases: - Swap to an overlay and flip a crtc in an atomic fashion, As you may remember, I proposed a bunch of RandR changes to support per-CRTC pixmaps and atomic mode setting operations a while back. With hardware now commonly supporting multiple overlays, even that stuff wouldn't suffice anymore. Off the top of my head, we'd need to construct some Drawable that represented each overlay, and then perform a PolySwapRegion operation to synchronously update their contents from appropriate back buffers. Right, that's what I'm after. If you have a bunch of GL surfaces you're rendering to, a main drawable and 2 overlays, I'd like the ability to swap to arbitrary overlays or to my main surface. Of course the GL extension for that is still TBD, but having the ability in DRI3 would be a nice start. - Specify a list of dirty rectangles for a single frame, like what CopyRegion does but with multiple rectangles. And they're not arranged so that a single region and source offset x/y could be used? I can imagine creating a SwapRectangles request, but I don't know that it would be any better than simply executing multiple SwapRegion requests. Well, if you have vsync enabled for your CopyRegion implementation, then you'll need to vsync for each region right? What I'm after is a swap all these regions together, vsync only once type of thing. Stéphane ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On Wed, Feb 20, 2013 at 11:55:56AM -0800, Keith Packard wrote: Chris Wilson ch...@chris-wilson.co.uk writes: Do we need an Invalidate for when the GEM object is exchanged for the Pixmap following a Swap (or other external modifications)? I'm afraid I don't understand this question. Are you thinking that pixmap IDs will end up changing which GEM objects they point at? You manage ask yourself the question I was trying to lead: how the heck does the compositor learn that the underlying graphics object has changed? In DRI2 this is through the InvalidateEvent, and the lack of being able to send those from the driver before the Damage is sent is one of the reasons why the current exchange mechanism is broken. Getting buffer exchanges working in conjunction with the external compositor is more or less as tricky as it gets. The notion that the buffer is kept busy by the compositor and so prevents the DRI3 client from overdrawing it is key. And that naturally leads to the compositor needing to release the old buffer once it is referencing the new post-swap buffer. Serialisation between rendering of the common buffers is definitely s.e.p. I agree that should solve the compositor problem. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 02/20/2013 09:27 PM, Keith Packard wrote: Chris Wilson ch...@chris-wilson.co.uk writes: What I don't see here is how the client instructs the server to handle a missed swap. Right, this first pass was just trying to replicate the DRI2 semantics; figuring out how to improve those seems like a good idea. From what game developers have told us, a missed swap should just tear instead of dropping a frame. It might be nice to inform the client that they're not keeping up with the target frame rate and let them scale stuff back; I'd suggest the SwapComplete event could contain enough information to let them know what actually happened. Please make this configurable. Tearing makes sense for a game, but for the kind of scientific apps i do, we don't want it to tear ever, or bad things would happen for us. We need it to just flip the frame delayed but vsync'ed and then the app can figure out via the INTEL_swap_events or glXWaitForSbcOML() that a deadline was missed and what to do to catch up. There's http://www.opengl.org/registry/specs/EXT/glx_swap_control_tear.txt that allows apps to define if they want to tear or vsync on a missed swap deadline. For example, with the typical use of swap-interval 0, divisor = 0, target = current we can choose to either emit this SwapRegion synchronously, or asynchronously (to risk tearing but allow the client catch up to its target framerate). I haven't heard anyone asking for us to skip a frame in this case to avoid tearing. See above :-). ... In the reply, swap_hi/swap_lo form a 64-bit swap count value when the swap will actually occur (e.g. the last queued swap count + (pending swap count * swap interval)). t I'm not sure exactly what SBC is meant to be. Is it a simple seqno of the SwapRegion in this Drawable's swap queue (why then does swap_interval matter), or is it meant to correlate with the vblank counter (in which case it is merely a predicted value)? Just like DRI2, it's the planned swap time. Don't be late! SBC in DRI2 is the running count of completed swaps for a drawable, ie., current swap count + pending swap count, not the planned swap time. Essentially a reference to a just queued swap via sbc = glXSwapBuffersMscOML(...), so you can use sbc as a unique id for that swap in glXWaitForSBCOML() or to match it up with the sbc in a returned INTEL_swap_event. Very useful, so i guess this should stay backwards-compatible. thanks, -mario ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On 02/19/13 19:45, Keith Packard wrote: * PGP Signed by an unknown key Here's the spec for DRI3: The DRI3 Extension Version 1.0 2013-2-19 Keith Packard kei...@keithp.com Intel Corporation 1. Introduction The DRI3 extension provides mechanisms to translate between direct rendered buffers and X pixmaps. When combined with the Swap extension, a complete direct rendering solution for OpenGL is provided. The direct rendered buffers are passed across the protocol via standard POSIX file descriptor passing mechanisms. On Linux, these buffers are DMA-BUF objects. 1.1. Acknowledgments Eric Anholt e...@anholt.net Dave Airlie airl...@redhat.com Kristian Høgsberg k...@bitplanet.net ❄ ❄ ❄ ❄ ❄ ❄ ❄ 2. Data Types The DRI3 extension uses the RandR extension Provider to select among multiple GPUs on a single screen. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 4. Errors No errors are defined by the DRI3 extension. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 5. Events No events are defined by the DRI3 extension. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 6. Protocol Types DRI3DRIVER { DRI3DriverDRI DRI3DriverVDPAU } These values describe the type of driver the client will want to load. The server sends back the name of the driver to use for the screen in question. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 7. Extension Initialization The name of this extension is DRI3 (third time is the charm?). ┌─── DRI3QueryVersion client-major-version: CARD32 client-minor-version: CARD32 ▶ major-version: CARD32 minor-version: CARD32 └─── The client sends the highest supported version to the server and the server sends the highest version it supports, but no higher than the requested version. Major versions changes can introduce incompatibilities in existing functionality, minor version changes introduce only backward compatible changes. It is the clients responsibility to ensure that the server supports a version which is compatible with its expectations. Backwards compatible changes included addition of new requests. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 8. Extension Requests ┌─── DRI3Open drawable: DRAWABLE driverType: DRI3DRIVER provider: PROVIDER ▶ driver: STRING device: FD └─── Errors: Drawable, Value, Match This requests that the X server open the direct rendering device associated with drawable, driverType and RandR provider. The provider must support SourceOutput or SourceOffload. The direct rendering library used to implement the specified driverType is returned in the driver value. The file descriptor for the device is returned in FD. ┌─── DRI3PixmapFromBuffer pixmap: PIXMAP drawable: DRAWABLE width, height, stride: CARD16 Why is there a stride here if all it is is an indirect way of calculating a total size? If the total size is what the server cares about, then it seems like the client should just send that. Not all tiled formats fit nicely into a height * stride = total equation with stride being an integer. depth, bpp: CARD8 buffer: FD └─── Errors: Alloc, Drawable, IDChoice, Value, Match Creates a pixmap for the direct rendering object associated with buffer. width, height and stride specify the geometry (in pixels) of the underlying buffer. The pixels within the buffer may not be arranged in a simple linear fashion, but the total byte size of the buffer must be height * stride * bpp / 8. Precisely how any additional information about the buffer is shared is outside the scope of this extension. If buffer cannot be used with the screen associated with drawable, a Match error is returned. If depth or bpp are not supported by the screen, a Value error is returned. ┌─── DRI3BufferFromPixmap pixmap: PIXMAP ▶ depth: CARD8 width, height, stride: CARD16 depth, bpp: CARD8 buffer: FD └─── Errors: Pixmap, Match Pass back a direct rendering object associated with pixmap. Future changes to pixmap will be visible in that direct rendered object. The pixel format and geometry of the buffer are returned along with a file descriptor referencing the underlying direct rendering object. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 9. Extension Events The DRI3 extension defines no events ❄ ❄ ❄ ❄ ❄ ❄ ❄ 10. Extension Versioning
Re: Initial DRI3000 protocol specs available
Chris Wilson ch...@chris-wilson.co.uk writes: You manage ask yourself the question I was trying to lead: how the heck does the compositor learn that the underlying graphics object has changed? It can certainly tell that the underlying contents have changed with Damage events, but as to how it knows that it should do another BufferFromPixmap request, I think that's gonna require another event. Now, the big question is how to deal with compositing managers which don't know about DRI3. I suspect we'll just have to skip the DMA-BUF swapping hack for window pixmaps unless the compositor gives us the OK to do so. In DRI2 this is through the InvalidateEvent, and the lack of being able to send those from the driver before the Damage is sent is one of the reasons why the current exchange mechanism is broken. Hrm. With the current system, except for override-redirect windows, if the compositor is also the window manager, it should always know when the window pixmap is going to be replaced because that only happens when the window is resized, and the window manager is entirely responsible for making that happen. For override-redirect windows, if the ConfigureNotify event was delivered before the Damage event, then the compositor could know about that as well. I guess I'd like to know more about what is broken with the current system for compositors... In any case, as the underlying DMA-BUF is changing, the compositor is going to need to know that so it can release the old pixmap back to the application, and so it can rewire its own compositing operations to use the new object. It might be nice to have the compositor use persistent names for the various DMA-BUFs that are used for a particular window. I think that means having the compositor hold on to old DMA-BUF window pixmap IDs. That seems tricky though. The alternative will be to have the compositor create/destroy a pixmap ID per frame. Not intolerable, but not optimal. Getting buffer exchanges working in conjunction with the external compositor is more or less as tricky as it gets. The notion that the buffer is kept busy by the compositor and so prevents the DRI3 client from overdrawing it is key. And that naturally leads to the compositor needing to release the old buffer once it is referencing the new post-swap buffer. Right, an easy technique there would be to have it use NameWindowPixmap when it got an event telling it that a new pixmap was in use for the window, and then when it was finished with that pixmap, it could just use FreePixmap to tell the server it was done. That would bump the refcnt down to one in the server, at which point it could queue the pixmap to be sent as 'idle' the next time SwapRegion was called. Serialisation between rendering of the common buffers is definitely s.e.p. I agree that should solve the compositor problem. Ok, cool. So, changes that I think are needed: 1) If someone calls TextureFromPixmap on a window pixmap, we need to suppress the window pixmap swapping hack. Alternatively, we can have the compositor explicitly enable window pixmap swapping. 2) We need to send an event when the buffer underlying a window switches. 3) We need to be explicit about event ordering between the new window pixmap change notify event and any related Damage. -- keith.pack...@intel.com pgpJAG7lw5Sft.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Aaron Plattner aplatt...@nvidia.com writes: Why is there a stride here if all it is is an indirect way of calculating a total size? If the total size is what the server cares about, then it seems like the client should just send that. I don't need the size, I need the stride. I could just assert that width/height are sufficient to compute the stride on both sides of the wire, but we've had image format adventures in the past of this sort and I'd like to make sure we have sufficient information to reconstruct the precise layout. The i915 kernel driver can tell me the tiling format of a GEM buffer, but it refuses to hand back the stride, so I stuck this into the protocol as it will be useful for at least that chip. Not all tiled formats fit nicely into a height * stride = total equation with stride being an integer. Sure, I'd imagine it'd be something like (tiles-wide * tile-width * tiles-high * tile-height) where tiles-wide and tiles-high are big enough to cover the specified image size. I'd obviously prefer to not just pass a pile of driver-specific data in this request, and 'stride' feels like it's skating close to that edge. -- keith.pack...@intel.com pgphQyUbqv3IT.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Stéphane Marchesin stephane.marche...@gmail.com writes: On Wed, Feb 20, 2013 at 12:42 PM, Keith Packard kei...@keithp.com wrote: Stéphane Marchesin stephane.marche...@gmail.com writes: I'm interested in two specific use cases: - Swap to an overlay and flip a crtc in an atomic fashion, As you may remember, I proposed a bunch of RandR changes to support per-CRTC pixmaps and atomic mode setting operations a while back. With hardware now commonly supporting multiple overlays, even that stuff wouldn't suffice anymore. Off the top of my head, we'd need to construct some Drawable that represented each overlay, and then perform a PolySwapRegion operation to synchronously update their contents from appropriate back buffers. Right, that's what I'm after. If you have a bunch of GL surfaces you're rendering to, a main drawable and 2 overlays, I'd like the ability to swap to arbitrary overlays or to my main surface. Of course the GL extension for that is still TBD, but having the ability in DRI3 would be a nice start. Having an actual API to design to would be a huge help though; I suspect anything we do in advance will just get messed up by the GL ARB. Well, if you have vsync enabled for your CopyRegion implementation, then you'll need to vsync for each region right? What I'm after is a swap all these regions together, vsync only once type of thing. Oh. I've been focused on the GL swapbuffers APIs. SwapRegion isn't a general CopyRegion operation. I've neglected to write down some of the more important semantics which underlie the goals of this work. For SwapRegion, I want to be able to require that the X server always be free to just swap the entire contents of the source buffer with the destination buffer -- the region is just the 'damaged' area within the window; areas outside of that don't *need* to be copied from the new buffer, but the client guarantees that the entire buffer contain the correct contents for the window and that only the area within the specified region differs from the current window contents. For a compositing manager, you really do want to pull data from all of the window pixmaps and paint them into the frame buffer in one giant operation. The usual way of doing this is to construct the whole next screen frame in a new single image and then use SwapRegion to get that onto the screen. Then the individual updates could use a sequence of SwapRegion operations to construct that intermediate buffer; once that was ready, a single SwapRegion would move that to the scanout buffer. Presumably that final SwapRegion would be a simple page flip operation in the driver, so it would take no time or memory bandwidth. It might be fun to figure out how to bypass the intermediate back buffer though, and for that we'd need some complicated PolySwapRegion request that queued up all of the changes in one giant request to be executed at the right time, but that seems like something that wouldn't match how I imagine compositing managers working today. -- keith.pack...@intel.com pgpn9yMJDcXww.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Mario Kleiner mario.kleiner...@gmail.com writes: On 02/20/2013 09:27 PM, Keith Packard wrote: Chris Wilson ch...@chris-wilson.co.uk writes: What I don't see here is how the client instructs the server to handle a missed swap. Right, this first pass was just trying to replicate the DRI2 semantics; figuring out how to improve those seems like a good idea. From what game developers have told us, a missed swap should just tear instead of dropping a frame. It might be nice to inform the client that they're not keeping up with the target frame rate and let them scale stuff back; I'd suggest the SwapComplete event could contain enough information to let them know what actually happened. Please make this configurable. Tearing makes sense for a game, but for the kind of scientific apps i do, we don't want it to tear ever, or bad things would happen for us. We need it to just flip the frame delayed but vsync'ed and then the app can figure out via the INTEL_swap_events or glXWaitForSbcOML() that a deadline was missed and what to do to catch up. There's http://www.opengl.org/registry/specs/EXT/glx_swap_control_tear.txt that allows apps to define if they want to tear or vsync on a missed swap deadline. Oh, that's ugly -- uses negative values for the interval. We could cook up some suitable 'missed frame' mode parameter to make it match that extension, and then create a similar EGL extension. SBC in DRI2 is the running count of completed swaps for a drawable, ie., current swap count + pending swap count, not the planned swap time. Essentially a reference to a just queued swap via sbc = glXSwapBuffersMscOML(...), so you can use sbc as a unique id for that swap in glXWaitForSBCOML() or to match it up with the sbc in a returned INTEL_swap_event. Very useful, so i guess this should stay backwards-compatible. That's definitely the plan -- we need whatever parameters are necessary to implement the relevant GL specs. If we can't do that, it's not useful to anyone. -- keith.pack...@intel.com pgp50KZpXPpoZ.pgp Description: PGP signature ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
On Wed, Feb 20, 2013 at 10:50 PM, Keith Packard kei...@keithp.com wrote: Stéphane Marchesin stephane.marche...@gmail.com writes: On Wed, Feb 20, 2013 at 12:42 PM, Keith Packard kei...@keithp.com wrote: Stéphane Marchesin stephane.marche...@gmail.com writes: I'm interested in two specific use cases: - Swap to an overlay and flip a crtc in an atomic fashion, As you may remember, I proposed a bunch of RandR changes to support per-CRTC pixmaps and atomic mode setting operations a while back. With hardware now commonly supporting multiple overlays, even that stuff wouldn't suffice anymore. Off the top of my head, we'd need to construct some Drawable that represented each overlay, and then perform a PolySwapRegion operation to synchronously update their contents from appropriate back buffers. Right, that's what I'm after. If you have a bunch of GL surfaces you're rendering to, a main drawable and 2 overlays, I'd like the ability to swap to arbitrary overlays or to my main surface. Of course the GL extension for that is still TBD, but having the ability in DRI3 would be a nice start. Having an actual API to design to would be a huge help though; I suspect anything we do in advance will just get messed up by the GL ARB. I don't think we need to involve the ARB just yet, the copy sub buffer is a MESA extension and never went to the ARB. So I don't see that as a problem. With that said, I don't think it's that difficult/different. I can design a GLX extension spec and send a draft, then we can work from there. Well, if you have vsync enabled for your CopyRegion implementation, then you'll need to vsync for each region right? What I'm after is a swap all these regions together, vsync only once type of thing. Oh. I've been focused on the GL swapbuffers APIs. SwapRegion isn't a general CopyRegion operation. I've neglected to write down some of the more important semantics which underlie the goals of this work. For SwapRegion, I want to be able to require that the X server always be free to just swap the entire contents of the source buffer with the destination buffer -- the region is just the 'damaged' area within the window; areas outside of that don't *need* to be copied from the new buffer, but the client guarantees that the entire buffer contain the correct contents for the window and that only the area within the specified region differs from the current window contents. For a compositing manager, you really do want to pull data from all of the window pixmaps and paint them into the frame buffer in one giant operation. That is actually not what you want because it is a waste of bandwidth. Since compositors are typically bandwidth limited, you instead want to paint only the relevant sub regions. Those are easy to determine by transforming X damage regions into screen coordinates. Most non-trivial compositing managers are already using partial update schemes through GLX_MESA_copy_sub_buffer or the GLX_EXT_buffer_age extensions + copies. I don't think it is far fetched to support a list of rectangles instead. Stéphane The usual way of doing this is to construct the whole next screen frame in a new single image and then use SwapRegion to get that onto the screen. Then the individual updates could use a sequence of SwapRegion operations to construct that intermediate buffer; once that was ready, a single SwapRegion would move that to the scanout buffer. Presumably that final SwapRegion would be a simple page flip operation in the driver, so it would take no time or memory bandwidth. It might be fun to figure out how to bypass the intermediate back buffer though, and for that we'd need some complicated PolySwapRegion request that queued up all of the changes in one giant request to be executed at the right time, but that seems like something that wouldn't match how I imagine compositing managers working today. -- keith.pack...@intel.com ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
Re: Initial DRI3000 protocol specs available
Here's the spec for DRI3: The DRI3 Extension Version 1.0 2013-2-19 Keith Packard kei...@keithp.com Intel Corporation 1. Introduction The DRI3 extension provides mechanisms to translate between direct rendered buffers and X pixmaps. When combined with the Swap extension, a complete direct rendering solution for OpenGL is provided. The direct rendered buffers are passed across the protocol via standard POSIX file descriptor passing mechanisms. On Linux, these buffers are DMA-BUF objects. 1.1. Acknowledgments Eric Anholt e...@anholt.net Dave Airlie airl...@redhat.com Kristian Høgsberg k...@bitplanet.net ❄ ❄ ❄ ❄ ❄ ❄ ❄ 2. Data Types The DRI3 extension uses the RandR extension Provider to select among multiple GPUs on a single screen. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 4. Errors No errors are defined by the DRI3 extension. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 5. Events No events are defined by the DRI3 extension. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 6. Protocol Types DRI3DRIVER { DRI3DriverDRI DRI3DriverVDPAU } These values describe the type of driver the client will want to load. The server sends back the name of the driver to use for the screen in question. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 7. Extension Initialization The name of this extension is DRI3 (third time is the charm?). ┌─── DRI3QueryVersion client-major-version: CARD32 client-minor-version: CARD32 ▶ major-version: CARD32 minor-version: CARD32 └─── The client sends the highest supported version to the server and the server sends the highest version it supports, but no higher than the requested version. Major versions changes can introduce incompatibilities in existing functionality, minor version changes introduce only backward compatible changes. It is the clients responsibility to ensure that the server supports a version which is compatible with its expectations. Backwards compatible changes included addition of new requests. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 8. Extension Requests ┌─── DRI3Open drawable: DRAWABLE driverType: DRI3DRIVER provider: PROVIDER ▶ driver: STRING device: FD └─── Errors: Drawable, Value, Match This requests that the X server open the direct rendering device associated with drawable, driverType and RandR provider. The provider must support SourceOutput or SourceOffload. The direct rendering library used to implement the specified driverType is returned in the driver value. The file descriptor for the device is returned in FD. ┌─── DRI3PixmapFromBuffer pixmap: PIXMAP drawable: DRAWABLE width, height, stride: CARD16 depth, bpp: CARD8 buffer: FD └─── Errors: Alloc, Drawable, IDChoice, Value, Match Creates a pixmap for the direct rendering object associated with buffer. width, height and stride specify the geometry (in pixels) of the underlying buffer. The pixels within the buffer may not be arranged in a simple linear fashion, but the total byte size of the buffer must be height * stride * bpp / 8. Precisely how any additional information about the buffer is shared is outside the scope of this extension. If buffer cannot be used with the screen associated with drawable, a Match error is returned. If depth or bpp are not supported by the screen, a Value error is returned. ┌─── DRI3BufferFromPixmap pixmap: PIXMAP ▶ depth: CARD8 width, height, stride: CARD16 depth, bpp: CARD8 buffer: FD └─── Errors: Pixmap, Match Pass back a direct rendering object associated with pixmap. Future changes to pixmap will be visible in that direct rendered object. The pixel format and geometry of the buffer are returned along with a file descriptor referencing the underlying direct rendering object. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 9. Extension Events The DRI3 extension defines no events ❄ ❄ ❄ ❄ ❄ ❄ ❄ 10. Extension Versioning The DRI3 extension is adapted from the DRI2 extension. 1.0: First published version ❄ ❄ ❄ ❄ ❄ ❄ ❄ 11. Relationship with other extensions As an extension designed to support other extensions, there is naturally some interactions with other extensions. 11.1 GLX GLX has no direct relation with DRI3, but a direct rendering OpenGL
Re: Initial DRI3000 protocol specs available
And here's the Swap extension The Swap Extension Version 1.0 2013-2-14 Keith Packard kei...@keithp.com Intel Corporation 1. Introduction The Swap extension provides GL SwapBuffers semantics to move pixels From a pixmap to a drawable. This can be used by OpenGL implementations or directly by regular applications. 1.1. Acknowledgments Eric Anholt e...@anholt.net Dave Airlie airl...@redhat.com Kristian Høgsberg k...@bitplanet.net ❄ ❄ ❄ ❄ ❄ ❄ ❄ 2. Data Types The server side region support specified in the Xfixes extension version 2 is used in the SwapRegion request. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 4. Errors No errors are defined by the Swap extension. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 5. Events The Swap extension provides a new event, SwapComplete, to signal when a swap operation has finished. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 6. Protocol Types SWAPSELECTMASK { SwapCompleteMask } Used with SwapSelectInput to specify which events a client is to receive. SWAPIDLE { pixmap: PIXMAP valid: BOOL swap-hi: CARD32 swap-lo: CARD32 } This structure contains information about a pixmap which had been used in a SwapRegion request and which the server is now finished with. If valid is TRUE, swap-hi/swap-lo form a 64-bit swap count value from the SwapRegion request which matches the data that the pixmap currently contains. If valid is FALSE, then the contents of the pixmap are undefined. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 7. Extension Initialization The name of this extension is Swap. ┌─── SwapQueryVersion client-major-version: CARD32 client-minor-version: CARD32 ▶ major-version: CARD32 minor-version: CARD32 └─── The client sends the highest supported version to the server and the server sends the highest version it supports, but no higher than the requested version. Major versions changes can introduce incompatibilities in existing functionality, minor version changes introduce only backward compatible changes. It is the clients responsibility to ensure that the server supports a version which is compatible with its expectations. Backwards compatible changes included addition of new requests. ❄ ❄ ❄ ❄ ❄ ❄ ❄ 8. Extension Requests ┌─── SwapRegion destination: DRAWABLE region: REGION src-off-x,src-off-y: INT16 source: PIXMAP swap-interval: CARD32 target_msc_hi: CARD32 target_msc_lo: CARD32 divisor_hi: CARD32 divisor_lo: CARD32 remainder_hi: CARD32 remainder_lo: CARD32 ▶ swap_hi: CARD32 swap_lo: CARD32 suggested-x-off,suggested-y-off: INT16 suggested-width,suggested-height: CARD16 idle: LISTofSWAPIDLE └─── Errors: Pixmap, Drawable, Region, Value Schedule a swap of the specified region from the source pixmap to the destination drawable. region specifies the region within the destination to be swapped from the source. src-off-x and src-off-y specify the offset to be added to region to align it with the source pixmap. swap-interval specifies the minimum number of frames since the last SwapRegion request. target_msc_hi/target_msc_lo form a 64-bit value marking the target media stamp count for the swap request. When non-zero, these mark the desired time where the data should be presented. divisor_hi/divisor_lo form a 64-bit value marking the desired media stamp count interval between swaps. remainder_hi/remainder_lo form a 64-bite value marking the desired offset within the divisor_hi/divisor_lo swap interval. In the reply, swap_hi/swap_lo form a 64-bit swap count value when the swap will actually occur (e.g. the last queued swap count + (pending swap count * swap interval)). suggested-width and suggested-height offer a hint as to the best pixmap size to use for full-sized swaps in the future. suggested-x-off and suggested-y-off provide a hint as to where the window contents should be placed within that allocation for future swaps. idle provides a list of pixmaps which were passed in previous SwapRegion requests by this client targeting the same destination. ┌─── SwapSelectInput drawable: DRAWABLE mask: SETofSWAPSELECTMASK └─── Errors: Drawable Selects which Swap events are to be delivered to the