Re: Initial DRI3000 protocol specs available

2013-04-02 Thread James Jones



On 03/21/2013 03:41 PM, Keith Packard wrote:

* PGP Signed by an unknown key

James Jones jajo...@nvidia.com writes:


If you associate an X Fence Sync with your swap operation, the driver
has the option to trigger it directly from the client command stream and
wake up only the applications waiting for that fence.  The compositor,
if using GL, could have received the swap notification event and already
programmed the response compositing based on it before the swap even
completes, and just insert a token to make the GPU or kernel wait for
the fence to complete before executing the compositing rendering
commands.


Sorry for the long lag; I've been thinking about this quite a bit.


Sorry for the lag in my response as well.


I
went and read through your earlier proposal as well as reading through
the related GL fence extensions and I think this is what we want in
general terms. There are two distinct times of interest here:

  1) When the buffer is free and can be used for another frame.

  2) When the buffer contents are visible on the screen. (This
 needs some weasel wording so that the system can do things like
 severely limit background applications or invisible applications.)

Providing X Fence Sync objects for each of these times seems like it
will give applications what they need.

The second issue is that we must relate these X Fence Sync objects to
direct rendering so that clients using shared objects outside of the X
protocol can synchronize their operations with the X server. For that, I
think we can share a page between application and X server that contains
all of the necessary fence information along with a pthreads semaphore
object that can lock access to the fence and also provide a way to block
until the X Fence Sync object is signaled.


There's some work going on to get something very much like fence sync 
objects into the Linux kernel as well, I think starting with something 
from the Android kernel trees, and with input from some people working 
on fence objects usable to synchronize access to dmabuf objects.  It 
might make sense to use these primitives rather than shared 
memory+pthreads to share the syncs between direct rendering clients and 
the X server.  At the very least, it would be nice to have an 
abstraction where the implementation details of the sync objects weren't 
directly exposed by the API.



This eliminates the explicit Swap events and replaces them with Sync
objects.

I'll write this up in the extension description and try to get some code
written in the next couple of weeks.



I like the sound of the overall direction.  Looking forward to what you 
come up with.


Thanks,
-James
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-04-02 Thread Keith Packard
James Jones jajo...@nvidia.com writes:

 There's some work going on to get something very much like fence sync 
 objects into the Linux kernel as well, I think starting with something 
 from the Android kernel trees, and with input from some people working 
 on fence objects usable to synchronize access to dmabuf objects.

I'll see if I can't find out what they're up to.

 It might make sense to use these primitives rather than shared
 memory+pthreads to share the syncs between direct rendering clients
 and the X server.

I can start prototyping with shared memory and the current DRI kernel
drivers as that will not require any updates to the kernel -- as DRI
provides serialization guarantees across multiple applications, I can
perform all of the fence operations strictly from user-mode and have
things work correctly on DRI-based hardware. As such, that clearly means
that these would be 'DRI' fences, and not some more general DMA-BUF
fences.

 At the very least, it would be nice to have an 
 abstraction where the implementation details of the sync objects weren't 
 directly exposed by the API.

I think we can use your existing X Sync fence stuff, or something quite
similar, for an X API. The DRM APIs hide underneath the DRM libraries
(vdpau, vaapi, opengl), and so how that extension works isn't directly
visible to applications.

 I like the sound of the overall direction.  Looking forward to what you 
 come up with.

I'm typing; the current short-term goal is client-side buffer
allocation, shared memory fences and using CopyArea for the presentation
portion of the work. That neatly divides the design into the DRM half
and the presentation/swap-buffers half.

-- 
keith.pack...@intel.com


pgpofR2nqD1tE.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-03-21 Thread Keith Packard
James Jones jajo...@nvidia.com writes:

 If you associate an X Fence Sync with your swap operation, the driver 
 has the option to trigger it directly from the client command stream and 
 wake up only the applications waiting for that fence.  The compositor, 
 if using GL, could have received the swap notification event and already 
 programmed the response compositing based on it before the swap even 
 completes, and just insert a token to make the GPU or kernel wait for 
 the fence to complete before executing the compositing rendering
 commands.

Sorry for the long lag; I've been thinking about this quite a bit. I
went and read through your earlier proposal as well as reading through
the related GL fence extensions and I think this is what we want in
general terms. There are two distinct times of interest here:

 1) When the buffer is free and can be used for another frame.

 2) When the buffer contents are visible on the screen. (This
needs some weasel wording so that the system can do things like
severely limit background applications or invisible applications.)

Providing X Fence Sync objects for each of these times seems like it
will give applications what they need.

The second issue is that we must relate these X Fence Sync objects to
direct rendering so that clients using shared objects outside of the X
protocol can synchronize their operations with the X server. For that, I
think we can share a page between application and X server that contains
all of the necessary fence information along with a pthreads semaphore
object that can lock access to the fence and also provide a way to block
until the X Fence Sync object is signaled.

This eliminates the explicit Swap events and replaces them with Sync
objects.

I'll write this up in the extension description and try to get some code
written in the next couple of weeks.

-- 
keith.pack...@intel.com


pgpr5cOHy7U5s.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-03-11 Thread Ian Romanick

On 02/19/2013 07:45 PM, Keith Packard wrote:


11.1 GLX

GLX has no direct relation with DRI3, but a direct rendering OpenGL
application will presumably use both, and target


Was there supposed to be more of this sentence?

___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-03-11 Thread Ian Romanick

On 02/20/2013 10:50 PM, Keith Packard wrote:

Stéphane Marchesin stephane.marche...@gmail.com writes:


On Wed, Feb 20, 2013 at 12:42 PM, Keith Packard kei...@keithp.com wrote:

Stéphane Marchesin stephane.marche...@gmail.com writes:


I'm interested in two specific use cases:
- Swap to an overlay and flip a crtc in an atomic fashion,


As you may remember, I proposed a bunch of RandR changes to support
per-CRTC pixmaps and atomic mode setting operations a while back. With
hardware now commonly supporting multiple overlays, even that stuff
wouldn't suffice anymore.

Off the top of my head, we'd need to construct some Drawable that
represented each overlay, and then perform a PolySwapRegion operation to
synchronously update their contents from appropriate back buffers.


Right, that's what I'm after. If you have a bunch of GL surfaces
you're rendering to, a main drawable and 2 overlays, I'd like the
ability to swap to arbitrary overlays or to my main surface. Of course
the GL extension for that is still TBD, but having the ability in DRI3
would be a nice start.


Having an actual API to design to would be a huge help though; I suspect
anything we do in advance will just get messed up by the GL ARB.


Well, if you have vsync enabled for your CopyRegion implementation,
then you'll need to vsync for each region right? What I'm after is a
swap all these regions together, vsync only once type of thing.


Oh. I've been focused on the GL swapbuffers APIs. SwapRegion isn't a
general CopyRegion operation. I've neglected to write down some of the
more important semantics which underlie the goals of this work.


EGL_NOK_swap_region (supported by Mesa) allows specifying multiple 
subrectangles to swap together.


EGLAPI EGLBoolean EGLAPIENTRY eglSwapBuffersRegionNOK(EGLDisplay dpy, 
EGLSurface surface, EGLint numRects, const EGLint* rects);



For SwapRegion, I want to be able to require that the X server always be
free to just swap the entire contents of the source buffer with the
destination buffer -- the region is just the 'damaged' area within the
window; areas outside of that don't *need* to be copied from the new
buffer, but the client guarantees that the entire buffer contain the
correct contents for the window and that only the area within the
specified region differs from the current window contents.

For a compositing manager, you really do want to pull data from all of
the window pixmaps and paint them into the frame buffer in one giant
operation. The usual way of doing this is to construct the whole next
screen frame in a new single image and then use SwapRegion to get that
onto the screen. Then the individual updates could use a sequence of
SwapRegion operations to construct that intermediate buffer; once that
was ready, a single SwapRegion would move that to the scanout buffer.

Presumably that final SwapRegion would be a simple page flip operation
in the driver, so it would take no time or memory bandwidth.

It might be fun to figure out how to bypass the intermediate back buffer
though, and for that we'd need some complicated PolySwapRegion request
that queued up all of the changes in one giant request to be executed at
the right time, but that seems like something that wouldn't match how I
imagine compositing managers working today.

___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-03-11 Thread Ian Romanick

On 02/20/2013 02:46 AM, Chris Wilson wrote:

On Tue, Feb 19, 2013 at 07:46:22PM -0800, Keith Packard wrote:

┌───
 SwapRegion
destination: DRAWABLE
region: REGION
src-off-x,src-off-y: INT16
source: PIXMAP
swap-interval: CARD32
target_msc_hi: CARD32
target_msc_lo: CARD32
divisor_hi: CARD32
divisor_lo: CARD32
remainder_hi: CARD32
remainder_lo: CARD32
   ▶
swap_hi: CARD32
swap_lo: CARD32
suggested-x-off,suggested-y-off: INT16
suggested-width,suggested-height: CARD16
idle: LISTofSWAPIDLE
└───


What I don't see here is how the client instructs the server to
handle a missed swap. For example, with the typical use of
   swap-interval  0, divisor = 0, target = current
we can choose to either emit this SwapRegion synchronously, or
asynchronously (to risk tearing but allow the client catch up to its
target framerate). Actually, there isn't a mention of whether this
should be synchronized to the display at all (and how to handle
synchronisation across multiple scanouts).


Applications really want this.  We've had several major ISVs chastise us 
for not supporting GLX_EXT_swap_control_tear.


http://www.opengl.org/registry/specs/EXT/glx_swap_control_tear.txt


What happens for a delayed error?


In the reply, swap_hi/swap_lo form a 64-bit swap count value
when the swap will actually occur (e.g.  the last queued swap
count + (pending swap count * swap interval)).


I'm not sure exactly what SBC is meant to be. Is it a simple seqno of
the SwapRegion in this Drawable's swap queue (why then does
swap_interval matter), or is it meant to correlate with the vblank
counter (in which case it is merely a predicted value)?
-Chris


___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-03-11 Thread Keith Packard
Ian Romanick i...@freedesktop.org writes:

 EGL_NOK_swap_region (supported by Mesa) allows specifying multiple 
 subrectangles to swap together.

An X region consists of multiple rectangles, but with a common offset --
I can't see from this specification whether these rectangles have that
property, or if we need slightly different semantics.

-- 
keith.pack...@intel.com


pgpzoGYS1LTyj.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-03-11 Thread Ian Romanick

On 02/26/2013 01:01 PM, Owen Taylor wrote:

Sorry for joining the discussion late here. I went through the specs and
discussion and have various comments and questions.

- Owen

* It would be great if we could figure out a plan to get to the
   point where the exact same application code is going to work for
   proprietary and open source drivers. When you get down to the details
   of swap this isn't close to the case currently.

   - Because swap handled client side in some drivers, INTEL_swap_event
 is seen as awkward to implement.

   - There is divergence on some basic behaviors, e.g.,  whether
 glXSwapBuffers() glFinish() waits for the swap to complete or not.

   - When rendering with a compositor, the X server is innocent of
 relevant information about timing and when the application should
 draw additional new frames. I've been working on handing this
 via client = compositor protocols

(https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html)

 But this adds a lot of complexity to the minimal client, especially
 when a client wants to work both redirected and unredirected.

   I think it would be great if we could sit down and figure out what
   the Linux-ecosystem API is for this in a way we could give to
   application authors.

* One significant problem that I have currently is that the default mode
   for the Intel drivers is to use triple buffering and send back swap
   events *when rendering the next frame would not block* - that is,
   immediately. This results in a frame of unnecessary latency.
   (The too-early events are also missing ust/msc/sbc information.)

   So I'd like to make sure that we know exactly what SwapComplete means
   and not have creative reinterpretations based on what works well
   for one client or another.

   The SwapComplete event is specified as - This event is delivered
   when a SwapRegion operation completes - but the specification
   of SwapRegion itself is fuzzy enough that I'm unclear exactly what
   that means.

   - The description SwapRegion needs to define swap since the
 operation has only a vague resemblance to the English-language
 meaning of swap.


Maybe call it PresentRegion instead?  Swap is another one of those 
overloaded terms in graphics.  It's not quite as bad as normal, but 
it's pretty close.



   - My interpretation of SwapRegion is that the actual propagation of
 source to destination is *asynchronous* to the X protocol stream.
 This is implied by Schedule a swap... but probably should be
 explicitly stated, since it is radically different from other
 rendering in X.

   - Is the serial in the SwapComplete event synchronous to the protocol
 stream? E.g., can you assume that any CopyArea from the destination
 drawable before that serial will get the old contents, and a
 CopyArea from the destination after that serial will get the new
 contents?

   - What happens when multiple SwapRegion requests are made with a
 swap-interval of zero. Are previous ones discarded?

   - Is it an error to render to a non-idle pixmap? Is it an error to
 pass a non-idle pixmap as the source to SwapRegion?

   - What's the interaction between swap-interval and target-msc, etc?

   - When a window is redirected, what's the interpretation of
 swap-interval, target-msc, etc? Is it that the server performs the
 operation at the selected blanking interval (as if they window
 wasn't redirected), and then damage/other events are generated
 and the server picks it up and renders to the real front buffer
 at the next opportunity - usually a frame later.

* Do I understand correctly that the idle pixmaps returned from
   a SwapRegion request are the pixmaps that *will* be idle once the
   corresponding SwapComplete event is received?

   If this is correct, what happens if things change before the swap
   actually happens and what was scheduled as a swap ends up being
   a copy? Is it sufficient to assume that a ConfigureNotify on a
   destination window means that all pixmaps passed to previous
   SwapRegion requests are now idle?

* In the definition of SWAPIDLE you say:

 If valid is TRUE, swap-hi/swap-lo form a 64-bit
 swap count value from the SwapRegion request which matches the
 data that the pixmap currently contains

  If I'm not misunderstanding things, this is a confusing statement
  because, leaving aside damage to the front buffer, pixmaps always
  contain the same contents (whatever the client rendered into it.)

  Is the use of the swap-hi/swap-lo identify the SwapRegion problematical
  in the case where swaps aren't throttled? Would it be better to use
  sequence number of the request? Or is the pixmap itself sufficient?

* What control, if any, will applications have over the number of
   buffers used - what the behavior will be when an application starts
   rendering another frame in terms of allocating a new buffer versus
   

Re: Initial DRI3000 protocol specs available

2013-03-08 Thread James Jones

On 03/07/2013 05:17 PM, Keith Packard wrote:

* PGP Signed by an unknown key

James Jones jajo...@nvidia.com writes:


There didn't seem to be much interest outside of NVIDIA, so
besides fence sync, the ideas are tabled internally ATM.


This shouldn't surprise you though -- no-one else needs this kind of
synchronization, so it's really hard for anyone to evaluate it. And,
DRI2 offers 'sufficient' support for the various GL sync extensions.


I was referring to the multi-buffer/tear-free presentation part, not the 
synchronization parts.  I still rather surprised everyone thinks 
implicit synchronization is a good idea though.  I don't think we're the 
only ones that have loosely defined command buffer processing in HW 
anymore.  Meh.



So, what I'd like to know is if you think nVidia could take advantage of
the Swap extension so that nVidia 3D applications could do the whole
Swap redirect plan? If so, then I'm a lot more interested in figuring
out how we can get apps using the necessary fencing to actually make it
work right.


Sorry, I've been ignoring this thread because of the DRI3000 title, so I 
missed the point where it defined a generic swap mechanism in X 
protocol.  From my reading, applications do roughly this in the spec:


Pixmap pix[N] = MakeListOfPixmaps(N)
Window win = MakeWindow();
int n = 0;

while (1) {
  // Stuff here to ensure pix[n] is idle.
  Render(pix[n]);
  SwapRegion(win, pix[n]);
}

I think I saw in one branch of the thread that you might allow 
redirecting the swap request out to a composite manager rather than 
processing in X.


Basically that's what I proposed (and Aaron presented some of at XDC) a 
few years ago and got no feedback.  However, my full proposal included:


-setting the list of pixmaps associated with a window up front, so that 
the composite manager or GL applications could query them and do work 
once to bind them in to GL.  This is pretty expensive.  With your 
proposal, this could probably be done lazily and tracked in a cache-type 
thing, but if applications wanted to be dumb and generate a new pixmap 
for every frame, nothing is stopping them.  Applications would do 
something like:


Pixmap pix[N]
XCreateWindowPixmaps(win, N, pix /* out */);

up front, and composite managers would get an event notifying them that 
win now has those N pixmaps associated with it.  Swaps would be done 
by indexing into that array rather than sending the actual pixmap ID.


-Using sync object lists in place of all the hard-coded timing 
information.  We've never been a fan of the OML style swap timing 
semantics.  It doesn't line up well with our HW.  Why not allow 
arbitrary fence objects to dictate when the swapping occurs?  Then apps 
that just want a simple vsync can just send a vsync fence.  Apps that 
want exact timing can query what types of counters are available and get 
exact timing on different HW that supports different timers.


-I had a bunch of GLX proposals to solve that mess.

-Redirecting present operations (or swaps) to the composite manager 
was central to the proposal.


It looks like a lot of the details and psuedo-code didn't make it in the 
final public presentation, just a high-level overview.  I'll see if I 
can dig up more of that.  Here's the URL to the presentation.  Just skip 
all the fence sync parts.


http://people.freedesktop.org/~aplattner/x-presentation-and-synchronization.pdf

Thanks,
-James
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-03-08 Thread Daniel Stone
Hi,

On 7 March 2013 17:17, Keith Packard kei...@keithp.com wrote:

 James Jones jajo...@nvidia.com writes:

  There didn't seem to be much interest outside of NVIDIA, so
  besides fence sync, the ideas are tabled internally ATM.

 This shouldn't surprise you though -- no-one else needs this kind of
 synchronization, so it's really hard for anyone to evaluate it. And,
 DRI2 offers 'sufficient' support for the various GL sync extensions.


At least for TEXTURE_2D, sure.  TEXTURE_EXTERNAL still requires
synchronisation (i.e. fences) beyond 'just smash some commands at the
kernel and flush on context switches and we're all good'.

Cheers,
Daniel
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-03-08 Thread Kristian Høgsberg
On Fri, Mar 8, 2013 at 3:59 AM, James Jones jajo...@nvidia.com wrote:
 On 03/07/2013 05:17 PM, Keith Packard wrote:

 * PGP Signed by an unknown key

 James Jones jajo...@nvidia.com writes:

 There didn't seem to be much interest outside of NVIDIA, so
 besides fence sync, the ideas are tabled internally ATM.


 This shouldn't surprise you though -- no-one else needs this kind of
 synchronization, so it's really hard for anyone to evaluate it. And,
 DRI2 offers 'sufficient' support for the various GL sync extensions.

 I was referring to the multi-buffer/tear-free presentation part, not the
 synchronization parts.  I still rather surprised everyone thinks implicit
 synchronization is a good idea though.  I don't think we're the only ones
 that have loosely defined command buffer processing in HW anymore.  Meh.

It's not that other hw don't have that (or even other drivers for your
hw, ie nouveau).  Serializing through the kernel execution manager
lets the kernel know the the expected order of rendering.  If
rendering in hw queue A depends on a result from a hw queue B (B
renders to buffer, A textures from same buffer),  the kernel can
insert synchronization primitives to ensure that the A queue doesn't
proceed before the B queue signals the fence.  If A and B queues don't
have any inter-dependencies, no synchronization is necessary and they
can run in parallel or out of order.

Kristian
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-03-07 Thread Aaron Plattner

On 03/06/2013 10:35 PM, Keith Packard wrote:

* PGP Signed by an unknown key

Owen Taylor otay...@redhat.com writes:


A complex scheme where the compositor and the server collaborate on the
implementation of SwapRegion seems fragile to me, and still doesn't get
some details right - like the swap count returned from SwapRegion.

What if we made SwapRegion redirectable along the lines of
ResizeRedirectMask? Since it would be tricky to block the client calling
SwapRegion until the compositor responded, this would probably require
removing the reply to SwapRegion and sending everything needed back in
events.


When I first read this a week ago, I thought this was a crazy plan; but
upon reflection, I think this is exactly the right direction. I've
written up a blog posting in more detail about that here:

 http://keithp.com/blogs/composite-swap/


  SwapScheduled - whatever information is available immediately on
  receipt of SwapRegion


I think this can still be in the reply to SwapRegion itself; essentially
all we're returning is the swap-hi/swap-lo numbers and a suggestion for
future buffer allocation sizes. We could place the buffer size hints in
a separate event, but I don't think they're that critical; it's just a
hint, and we'll get it right after a couple of swaps once the user stops
moving the window around anyways.


  SwapIdle  - a buffer is returned to the application for rendering
  SwapComplete  - the swap actually happened and we know the
  msc/sbc/ust triple


Yup. The blog posting suggests how the Complete event might be delayed
until the Compositor gets the content up onto the screen itself.


If I'm understanding this correctly, this requires the X server to 
receive a notification from the GPU that the swap is complete so it can 
send the SwapComplete event.  Is there any chance this could be done 
with a Fence instead?  The application could specify the fence in the 
Swap request, and then use that fence to block further rendering on the 
GPU or wait on the fence from the CPU.  We typically try to do the 
scheduling on the GPU when possible because triggering an interrupt and 
waking up the X server burns power and adds latency for no good reason.



I also think that SwapIdle should *not* be an event. Instead, the client
should mark its pixmap as 'becomes idle upon swap'; on redirection, the
compositor ends up holding the last 'its not idle yet' bit, and when it
does the 'becomes idle upon swap', then the buffer goes idle.

The client must then tell the server to un-idle the pixmap, and that
request will return whether the contents were preserved or not. This has
to be synchronous or huge races will persist.


But I don't know that you need that much granularity. I think SwapIdle
and SwapComplete are sufficient.


As above, SwapIdle isn't good enough, an explicit un-idle request is required.


Tricky parts:

  * Not leaking buffers during redirection/unredirection could be tricky.
What if the compositor exits while a client is waiting for a
SwapIdle? An event when swap is redirected/unredirected is probably
necessary.


When the Compositor exits, the X server will know all of the pending
SwapRegion requests and can 'unredirect' them easily enough.

I don't want to tell apps when they're getting redirected/unredirected,
and I don't think it's necessary.


  * To make this somewhat safe, the concept of idle has to be one of
correct display not system stability. It can't take down the system
if the compositor sends SwapIdle at the wrong time.


See above.


  * Because the SBC is a drawable attribute it's a little complex to
continue having the right value over swap redirection.

 When a window is swap-redirected, we say that the SBC is
 incremented by one every time the redirecting client calls
 SwapRegion, and never otherwise. A query is provided for the
 current value.


We could simply decouple these values and just have a 'swap count'
associated with the window which is used to mark pixmap contents when
'UnIdled'.


  * It doesn't make sense to have both the server and the compositor
scheduling stuff. I think you'd specify that once you swap
redirect a window, it gets simple:


Good point. The redirected swap event should contain all of the swap
parameters so that the Compositor can appropriately schedule the window
swap with the matching screen swap.


Actually, from the compositor's perspective, the window's front
buffer doesn't matter, but you probably need to keep it current
to make screenshot tools, etc, work correctly.


My swap redirect plan has that pixmap getting swapped at the same time
the screen pixmap is swapped, so things will look 'right'.


Is this better than a more collaborative approach where the server and
compositor together determine what pixmaps are idle?


Idleness is certainly a joint prospect, but I don't think it's
cooperative. Instead, a pixmap is idle 

Re: Initial DRI3000 protocol specs available

2013-03-07 Thread Keith Packard
Aaron Plattner aplatt...@nvidia.com writes:

 If I'm understanding this correctly, this requires the X server to 
 receive a notification from the GPU that the swap is complete so it can 
 send the SwapComplete event.  Is there any chance this could be done 
 with a Fence instead?  The application could specify the fence in the 
 Swap request, and then use that fence to block further rendering on the 
 GPU or wait on the fence from the CPU.

From what I've heard from application developers, there are two
different operations here:

 1) Throttle application rendering to avoid racing ahead of the screen

 2) Keeping the screen up-to-date with simple application changes, but
not any faster than frame rate.

The SwapComplete event is designed for this second operation. Imagine a
terminal emulator; it doesn't want to draw any faster than frame rate,
but any particular frame can be drawn in essentially zero time. This
application doesn't want to *block* at all, it wants to keep processing
external events, like getting terminal output and user input events. As
I understand it, a HW fence would cause the terminal emulator to stall
down in the driver, blocking processing of all of the events and
terminal output.

For simple application throttling, that wouldn't use these SwapComplete
events, rather it would use whatever existing mechanisms exist for
blocking rendering to limit the application frame rate.

 We typically try to do the scheduling on the GPU when possible because
 triggering an interrupt and waking up the X server burns power and
 adds latency for no good reason.

Right, we definitely don't want a high-performance application to block
waiting for an X event to arrive before it starts preparing the next
frame.

-- 
keith.pack...@intel.com


pgpSQnDQhrpjU.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-03-07 Thread Owen Taylor
On Thu, 2013-02-28 at 16:55 -0800, Keith Packard wrote:

  * It would be great if we could figure out a plan to get to the
point where the exact same application code is going to work for
proprietary and open source drivers. When you get down to the details
of swap this isn't close to the case currently.
 
 Agreed -- the problem here is that except for the nVidia closed drivers,
 everything else implicitly serializes device access through the kernel,
 providing a natural way to provide some defined order of
 operations. Failing that, I'd love to know what mechanisms *could* work
 with that design.

I don't think serialization is actually the big issue - although it's
annoying to deal with fences that are no-op for the open sources, it's
pretty well defined where you have to insert them, and because they are
no-op's for the open source drivers, there's little overhead.

Notification is more of an issue.

- Because swap handled client side in some drivers, INTEL_swap_event
  is seen as awkward to implement.
 
 I'm not sure what could be done here, other than to have some way for
 the X server to get information about the swap and stuff it into the
 event stream, of course. It could be as simple as having the client
 stuff the event data to the X server itself.

It may be that a focus on redirection makes things easier - once the
compositor is involved, we can't get away from X server involvement. The
compositor is the main case where the X server can be completely
bypassed when swapping. And I'm less concerned about API divergence for
the compositor. (Not that I *invite* it...)

- There is divergence on some basic behaviors, e.g.,  whether
  glXSwapBuffers() glFinish() waits for the swap to complete or not.
 
 glXSwapBuffers is pretty darn explicit in saying that it *does not* wait
 for the swap to complete, and glFinish only promises to synchronize the
 effects of rendering (contents of the frame buffer), not the actual
 swap operation itself. I'm not sure how we're supposed to respond when
 drivers ignore the spec and do their own thing?

I wish the GLX specification was clear enough so we actually knew who
was ignoring the spec and doing their own thing... ;-) The GLX
specification describes the swap operation as the contents of the back
buffer become the contents of the front buffer ... that seems like an
operation on the contents of the frame buffer.

But getting into the details here is a bit of a distraction - my goal is
to try to get us to convergence so we have only one API with well
defined behaviors.

- When rendering with a compositor, the X server is innocent of
  relevant information about timing and when the application should
  draw additional new frames. I've been working on handing this
  via client = compositor protocols
 
 With 'Swap', I think the X server should be involved as it is necessary
 to get be able to 'idle' buffers which aren't in use after the
 compositor is done with them. I tried to outline a sketch of how that
 would work before.
 
  (https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html)
 
  But this adds a lot of complexity to the minimal client, especially
  when a client wants to work both redirected and unredirected.
 
 Right, which is why I think fixing the X server to help here would be better.

If the goal is really to obsolete the proposed WM spec changes, rather
than just make existing GLX apps work better, then there's quite a bit
of stuff to get right. For example, from my perspective, the
OML_sync_control defined UST timestamps are completely insufficient -
it's not even defined what the units are for these timestamps!

I think it would be great if we could sit down and figure out what
the Linux-ecosystem API is for this in a way we could give to
application authors.
 
 Ideally, a GL application using simple GLX or EGL APIs would work
 'perfectly', without the need to use additional X-specific APIs. My hope
 with splitting DRI3000 into separate DRI3 and Swap extensions is to
 provide those same semantics to simple double-buffered 2D applications
 using core X and Render drawing as well, without requiring that they be
 rewritten to use GL, and while providing all of the same functionality
 over the network as local direct rendering applications get today.

The GLX APIs have some significant holes and poorly defined aspects. And
they don't properly take compositing into account, which is the norm
today. So providing those capabilities to 2D apps seems of limited
utility.

[...]

The SwapComplete event is specified as - This event is delivered
when a SwapRegion operation completes - but the specification
of SwapRegion itself is fuzzy enough that I'm unclear exactly what
that means.
 
- The description SwapRegion needs to define swap since the
  operation has only a vague resemblance to the English-language
  meaning of swap.
 
 Right, SwapRegion can 

Re: Initial DRI3000 protocol specs available

2013-03-07 Thread James Jones

On 03/07/2013 12:49 PM, Keith Packard wrote:

* PGP Signed by an unknown key

Aaron Plattner aplatt...@nvidia.com writes:


If I'm understanding this correctly, this requires the X server to
receive a notification from the GPU that the swap is complete so it can
send the SwapComplete event.  Is there any chance this could be done
with a Fence instead?  The application could specify the fence in the
Swap request, and then use that fence to block further rendering on the
GPU or wait on the fence from the CPU.


 From what I've heard from application developers, there are two
different operations here:

  1) Throttle application rendering to avoid racing ahead of the screen

  2) Keeping the screen up-to-date with simple application changes, but
 not any faster than frame rate.

The SwapComplete event is designed for this second operation. Imagine a
terminal emulator; it doesn't want to draw any faster than frame rate,
but any particular frame can be drawn in essentially zero time. This
application doesn't want to *block* at all, it wants to keep processing
external events, like getting terminal output and user input events. As
I understand it, a HW fence would cause the terminal emulator to stall
down in the driver, blocking processing of all of the events and
terminal output.


If you associate an X Fence Sync with your swap operation, the driver 
has the option to trigger it directly from the client command stream and 
wake up only the applications waiting for that fence.  The compositor, 
if using GL, could have received the swap notification event and already 
programmed the response compositing based on it before the swap even 
completes, and just insert a token to make the GPU or kernel wait for 
the fence to complete before executing the compositing rendering commands.


Thanks,
-James


For simple application throttling, that wouldn't use these SwapComplete
events, rather it would use whatever existing mechanisms exist for
blocking rendering to limit the application frame rate.


We typically try to do the scheduling on the GPU when possible because
triggering an interrupt and waking up the X server burns power and
adds latency for no good reason.


Right, we definitely don't want a high-performance application to block
waiting for an X event to arrive before it starts preparing the next
frame.


___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-03-07 Thread James Jones

On 03/07/2013 01:19 PM, Owen Taylor wrote:

On Thu, 2013-02-28 at 16:55 -0800, Keith Packard wrote:


* It would be great if we could figure out a plan to get to the
   point where the exact same application code is going to work for
   proprietary and open source drivers. When you get down to the details
   of swap this isn't close to the case currently.


Agreed -- the problem here is that except for the nVidia closed drivers,
everything else implicitly serializes device access through the kernel,
providing a natural way to provide some defined order of
operations. Failing that, I'd love to know what mechanisms *could* work
with that design.


Fence syncs.  Note the original fence sync + multi-buffer proposal 
solved basically the same problems you're trying to solve here, as well 
as everything Owen's WM spec updates do, but more generally, and with 
that, a little more implementation complexity.  It included proposals to 
make minor updates to GLX/EGL as well to tie them in with the newer 
model.  There didn't seem to be much interest outside of NVIDIA, so 
besides fence sync, the ideas are tabled internally ATM.



I don't think serialization is actually the big issue - although it's
annoying to deal with fences that are no-op for the open sources, it's
pretty well defined where you have to insert them, and because they are
no-op's for the open source drivers, there's little overhead.

Notification is more of an issue.


   - Because swap handled client side in some drivers, INTEL_swap_event
 is seen as awkward to implement.


I'm not sure what could be done here, other than to have some way for
the X server to get information about the swap and stuff it into the
event stream, of course. It could be as simple as having the client
stuff the event data to the X server itself.


It may be that a focus on redirection makes things easier - once the
compositor is involved, we can't get away from X server involvement. The
compositor is the main case where the X server can be completely
bypassed when swapping. And I'm less concerned about API divergence for
the compositor. (Not that I *invite* it...)


   - There is divergence on some basic behaviors, e.g.,  whether
 glXSwapBuffers() glFinish() waits for the swap to complete or not.


glXSwapBuffers is pretty darn explicit in saying that it *does not* wait
for the swap to complete, and glFinish only promises to synchronize the
effects of rendering (contents of the frame buffer), not the actual
swap operation itself. I'm not sure how we're supposed to respond when
drivers ignore the spec and do their own thing?


I wish the GLX specification was clear enough so we actually knew who
was ignoring the spec and doing their own thing... ;-) The GLX
specification describes the swap operation as the contents of the back
buffer become the contents of the front buffer ... that seems like an
operation on the contents of the frame buffer.


The GLX spec is plenty clear here.  It states:

Subsequent OpenGL commands can be issued immediately, but will not be 
executed until the buffer swapping has completed...


And glFinish, besides the fact that it counts as a GL command, isn't 
defined as simply waiting until effects on the framebuffer land.  All 
rendering, client, and server (GL server, not X server) state side 
effects from previous operations must settle before it returns. 
SwapBuffers affects all three of those.  Same for fence syncs with 
condition GL_SYNC_GPU_COMMANDS_COMPLETE.


So if the drawable swapped is current to the thread calling swap 
buffers, and they issue any other GL commands afterwards, including 
glFinish, glFenceSync, etc., those commands can't complete until after 
the swap operation does.  For glFinish, that means it can't return.  For 
fence, the fence won't trigger until the swap finishes.  If 
implementations aren't behaving that way, it's a bug in the 
implementation.  Not to say our implementation doesn't have bugs, but 
AFAIK, we don't have that one.


Thanks,
-James


But getting into the details here is a bit of a distraction - my goal is
to try to get us to convergence so we have only one API with well
defined behaviors.


   - When rendering with a compositor, the X server is innocent of
 relevant information about timing and when the application should
 draw additional new frames. I've been working on handing this
 via client = compositor protocols


With 'Swap', I think the X server should be involved as it is necessary
to get be able to 'idle' buffers which aren't in use after the
compositor is done with them. I tried to outline a sketch of how that
would work before.


(https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html)

 But this adds a lot of complexity to the minimal client, especially
 when a client wants to work both redirected and unredirected.


Right, which is why I think fixing the X server to help here would be better.


If the goal is really to obsolete the proposed WM 

Re: Initial DRI3000 protocol specs available

2013-03-07 Thread Keith Packard
James Jones jajo...@nvidia.com writes:

 If you associate an X Fence Sync with your swap operation, the driver 
 has the option to trigger it directly from the client command stream and 
 wake up only the applications waiting for that fence.

Yeah, right now we're doing some hand-waving about serialization which
isn't entirely satisfying.

 The compositor, 
 if using GL, could have received the swap notification event and already 
 programmed the response compositing based on it before the swap even 
 completes, and just insert a token to make the GPU or kernel wait for 
 the fence to complete before executing the compositing rendering
 commands.

We just don't have these issues with the open source drivers, so it's
really hard for us to reason about this kind of asynchronous
operation. Access to the underlying buffers is mediated by the kernel
which ensures that as long as you serialize kernel calls, you will
serialize hw execution as well.

-- 
keith.pack...@intel.com


pgpUbjPWFp_KU.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-03-07 Thread Keith Packard
James Jones jajo...@nvidia.com writes:

 There didn't seem to be much interest outside of NVIDIA, so 
 besides fence sync, the ideas are tabled internally ATM.

This shouldn't surprise you though -- no-one else needs this kind of
synchronization, so it's really hard for anyone to evaluate it. And,
DRI2 offers 'sufficient' support for the various GL sync extensions.

So, what I'd like to know is if you think nVidia could take advantage of
the Swap extension so that nVidia 3D applications could do the whole
Swap redirect plan? If so, then I'm a lot more interested in figuring
out how we can get apps using the necessary fencing to actually make it
work right.

-- 
keith.pack...@intel.com


pgpC1oD49AHAr.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-03-06 Thread Keith Packard
Owen Taylor otay...@redhat.com writes:

 A complex scheme where the compositor and the server collaborate on the
 implementation of SwapRegion seems fragile to me, and still doesn't get
 some details right - like the swap count returned from SwapRegion.

 What if we made SwapRegion redirectable along the lines of
 ResizeRedirectMask? Since it would be tricky to block the client calling
 SwapRegion until the compositor responded, this would probably require
 removing the reply to SwapRegion and sending everything needed back in
 events.

When I first read this a week ago, I thought this was a crazy plan; but
upon reflection, I think this is exactly the right direction. I've
written up a blog posting in more detail about that here:

http://keithp.com/blogs/composite-swap/

  SwapScheduled - whatever information is available immediately on
  receipt of SwapRegion

I think this can still be in the reply to SwapRegion itself; essentially
all we're returning is the swap-hi/swap-lo numbers and a suggestion for
future buffer allocation sizes. We could place the buffer size hints in
a separate event, but I don't think they're that critical; it's just a
hint, and we'll get it right after a couple of swaps once the user stops
moving the window around anyways.

  SwapIdle  - a buffer is returned to the application for rendering
  SwapComplete  - the swap actually happened and we know the
  msc/sbc/ust triple

Yup. The blog posting suggests how the Complete event might be delayed
until the Compositor gets the content up onto the screen itself.

I also think that SwapIdle should *not* be an event. Instead, the client
should mark its pixmap as 'becomes idle upon swap'; on redirection, the
compositor ends up holding the last 'its not idle yet' bit, and when it
does the 'becomes idle upon swap', then the buffer goes idle.

The client must then tell the server to un-idle the pixmap, and that
request will return whether the contents were preserved or not. This has
to be synchronous or huge races will persist.

 But I don't know that you need that much granularity. I think SwapIdle
 and SwapComplete are sufficient.

As above, SwapIdle isn't good enough, an explicit un-idle request is required.

 Tricky parts:

  * Not leaking buffers during redirection/unredirection could be tricky.
What if the compositor exits while a client is waiting for a
SwapIdle? An event when swap is redirected/unredirected is probably
necessary.

When the Compositor exits, the X server will know all of the pending
SwapRegion requests and can 'unredirect' them easily enough.

I don't want to tell apps when they're getting redirected/unredirected,
and I don't think it's necessary.

  * To make this somewhat safe, the concept of idle has to be one of
correct display not system stability. It can't take down the system
if the compositor sends SwapIdle at the wrong time.

See above.

  * Because the SBC is a drawable attribute it's a little complex to
continue having the right value over swap redirection.

 When a window is swap-redirected, we say that the SBC is
 incremented by one every time the redirecting client calls
 SwapRegion, and never otherwise. A query is provided for the
 current value.

We could simply decouple these values and just have a 'swap count'
associated with the window which is used to mark pixmap contents when
'UnIdled'.

  * It doesn't make sense to have both the server and the compositor
scheduling stuff. I think you'd specify that once you swap
redirect a window, it gets simple:

Good point. The redirected swap event should contain all of the swap
parameters so that the Compositor can appropriately schedule the window
swap with the matching screen swap.

Actually, from the compositor's perspective, the window's front
buffer doesn't matter, but you probably need to keep it current
to make screenshot tools, etc, work correctly.

My swap redirect plan has that pixmap getting swapped at the same time
the screen pixmap is swapped, so things will look 'right'.

 Is this better than a more collaborative approach where the server and
 compositor together determine what pixmaps are idle?

Idleness is certainly a joint prospect, but I don't think it's
cooperative. Instead, a pixmap is idle when both application and
compositor say it is idle. The SwapRedirect explicitly marks the pixmap
as 'not idle' for the compositor, and an explicit 'make it idle' call is
required by the compositor.

-- 
keith.pack...@intel.com


pgpywS4NeLds1.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-28 Thread Keith Packard
Owen Taylor otay...@redhat.com writes:

 Sorry for joining the discussion late here. I went through the specs and
 discussion and have various comments and questions.

Thanks for reviewing stuff, of course. You're not late at all :-)

 * It would be great if we could figure out a plan to get to the
   point where the exact same application code is going to work for
   proprietary and open source drivers. When you get down to the details
   of swap this isn't close to the case currently.

Agreed -- the problem here is that except for the nVidia closed drivers,
everything else implicitly serializes device access through the kernel,
providing a natural way to provide some defined order of
operations. Failing that, I'd love to know what mechanisms *could* work
with that design.

   - Because swap handled client side in some drivers, INTEL_swap_event
 is seen as awkward to implement.

I'm not sure what could be done here, other than to have some way for
the X server to get information about the swap and stuff it into the
event stream, of course. It could be as simple as having the client
stuff the event data to the X server itself.

   - There is divergence on some basic behaviors, e.g.,  whether
 glXSwapBuffers() glFinish() waits for the swap to complete or not.

glXSwapBuffers is pretty darn explicit in saying that it *does not* wait
for the swap to complete, and glFinish only promises to synchronize the
effects of rendering (contents of the frame buffer), not the actual
swap operation itself. I'm not sure how we're supposed to respond when
drivers ignore the spec and do their own thing?

   - When rendering with a compositor, the X server is innocent of
 relevant information about timing and when the application should
 draw additional new frames. I've been working on handing this
 via client = compositor protocols

With 'Swap', I think the X server should be involved as it is necessary
to get be able to 'idle' buffers which aren't in use after the
compositor is done with them. I tried to outline a sketch of how that
would work before.

 (https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html)

 But this adds a lot of complexity to the minimal client, especially
 when a client wants to work both redirected and unredirected.

Right, which is why I think fixing the X server to help here would be better.

   I think it would be great if we could sit down and figure out what
   the Linux-ecosystem API is for this in a way we could give to
   application authors.

Ideally, a GL application using simple GLX or EGL APIs would work
'perfectly', without the need to use additional X-specific APIs. My hope
with splitting DRI3000 into separate DRI3 and Swap extensions is to
provide those same semantics to simple double-buffered 2D applications
using core X and Render drawing as well, without requiring that they be
rewritten to use GL, and while providing all of the same functionality
over the network as local direct rendering applications get today.

 * One significant problem that I have currently is that the default mode
   for the Intel drivers is to use triple buffering and send back swap
   events *when rendering the next frame would not block* - that is,
   immediately. This results in a frame of unnecessary latency.
   (The too-early events are also missing ust/msc/sbc information.)

As you noted, there are a whole range of suitable times to tell clients
about their buffers:

 1) Right after the swap, the client needs to know what happened to each
buffer and what the scheduled swap time is.

 2) When the buffer usage changes; for DRI2, that's what the Invalidate
events are for. With DRI3 as proposed, that's what the reply to the
SwapRegion contains.

 3) When their contents actually appear on the screen.

I suggest that we'll need all three to provide applications enough
information to make good drawing choices.

   The SwapComplete event is specified as - This event is delivered
   when a SwapRegion operation completes - but the specification
   of SwapRegion itself is fuzzy enough that I'm unclear exactly what
   that means.

   - The description SwapRegion needs to define swap since the
 operation has only a vague resemblance to the English-language
 meaning of swap.

Right, SwapRegion can either be a copy operation or an actual swap. The
returned information about idle buffers tells the client what they
contain, so I think the only confusion here is over the name of the request?

   - My interpretation of SwapRegion is that the actual propagation of
 source to destination is *asynchronous* to the X protocol stream.
 This is implied by Schedule a swap... but probably should be
 explicitly stated, since it is radically different from other
 rendering in X.

Ok, a bit more wording clarifying the inherent asynchronous nature of
the Swap operation seems necessary.

   - Is the serial in the SwapComplete event synchronous to the 

Re: Initial DRI3000 protocol specs available

2013-02-26 Thread Owen Taylor
Sorry for joining the discussion late here. I went through the specs and
discussion and have various comments and questions.

- Owen

* It would be great if we could figure out a plan to get to the
  point where the exact same application code is going to work for
  proprietary and open source drivers. When you get down to the details
  of swap this isn't close to the case currently.

  - Because swap handled client side in some drivers, INTEL_swap_event
is seen as awkward to implement.

  - There is divergence on some basic behaviors, e.g.,  whether
glXSwapBuffers() glFinish() waits for the swap to complete or not.

  - When rendering with a compositor, the X server is innocent of
relevant information about timing and when the application should
draw additional new frames. I've been working on handing this
via client = compositor protocols

(https://mail.gnome.org/archives/wm-spec-list/2013-January/msg0.html)

But this adds a lot of complexity to the minimal client, especially
when a client wants to work both redirected and unredirected. 

  I think it would be great if we could sit down and figure out what
  the Linux-ecosystem API is for this in a way we could give to
  application authors.
  
* One significant problem that I have currently is that the default mode
  for the Intel drivers is to use triple buffering and send back swap
  events *when rendering the next frame would not block* - that is,
  immediately. This results in a frame of unnecessary latency.
  (The too-early events are also missing ust/msc/sbc information.)

  So I'd like to make sure that we know exactly what SwapComplete means
  and not have creative reinterpretations based on what works well
  for one client or another.

  The SwapComplete event is specified as - This event is delivered
  when a SwapRegion operation completes - but the specification
  of SwapRegion itself is fuzzy enough that I'm unclear exactly what
  that means.

  - The description SwapRegion needs to define swap since the
operation has only a vague resemblance to the English-language
meaning of swap.

  - My interpretation of SwapRegion is that the actual propagation of
source to destination is *asynchronous* to the X protocol stream.
This is implied by Schedule a swap... but probably should be
explicitly stated, since it is radically different from other
rendering in X.

  - Is the serial in the SwapComplete event synchronous to the protocol
stream? E.g., can you assume that any CopyArea from the destination
drawable before that serial will get the old contents, and a
CopyArea from the destination after that serial will get the new
contents?

  - What happens when multiple SwapRegion requests are made with a
swap-interval of zero. Are previous ones discarded?

  - Is it an error to render to a non-idle pixmap? Is it an error to
pass a non-idle pixmap as the source to SwapRegion?

  - What's the interaction between swap-interval and target-msc, etc?

  - When a window is redirected, what's the interpretation of 
swap-interval, target-msc, etc? Is it that the server performs the
operation at the selected blanking interval (as if they window
wasn't redirected), and then damage/other events are generated
and the server picks it up and renders to the real front buffer
at the next opportunity - usually a frame later.

* Do I understand correctly that the idle pixmaps returned from
  a SwapRegion request are the pixmaps that *will* be idle once the
  corresponding SwapComplete event is received?

  If this is correct, what happens if things change before the swap
  actually happens and what was scheduled as a swap ends up being
  a copy? Is it sufficient to assume that a ConfigureNotify on a
  destination window means that all pixmaps passed to previous
  SwapRegion requests are now idle?

* In the definition of SWAPIDLE you say:

If valid is TRUE, swap-hi/swap-lo form a 64-bit
swap count value from the SwapRegion request which matches the
data that the pixmap currently contains

 If I'm not misunderstanding things, this is a confusing statement
 because, leaving aside damage to the front buffer, pixmaps always
 contain the same contents (whatever the client rendered into it.)

 Is the use of the swap-hi/swap-lo identify the SwapRegion problematical
 in the case where swaps aren't throttled? Would it be better to use
 sequence number of the request? Or is the pixmap itself sufficient?

* What control, if any, will applications have over the number of
  buffers used - what the behavior will be when an application starts
  rendering another frame in terms of allocating a new buffer versus
  swapping?

* Do we need to deal with stereo as part of this?


___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-02-26 Thread Owen Taylor
A complex scheme where the compositor and the server collaborate on the
implementation of SwapRegion seems fragile to me, and still doesn't get
some details right - like the swap count returned from SwapRegion.

What if we made SwapRegion redirectable along the lines of
ResizeRedirectMask? Since it would be tricky to block the client calling
SwapRegion until the compositor responded, this would probably require
removing the reply to SwapRegion and sending everything needed back in
events. At the most granular, this would be:

 SwapScheduled - whatever information is available immediately on
 receipt of SwapRegion
 SwapIdle  - a buffer is returned to the application for rendering
 SwapComplete  - the swap actually happened and we know the
 msc/sbc/ust triple

But I don't know that you need that much granularity. I think SwapIdle
and SwapComplete are sufficient.

Tricky parts:

 * Not leaking buffers during redirection/unredirection could be tricky.
   What if the compositor exits while a client is waiting for a
   SwapIdle? An event when swap is redirected/unredirected is probably
   necessary.

 * To make this somewhat safe, the concept of idle has to be one of
   correct display not system stability. It can't take down the system
   if the compositor sends SwapIdle at the wrong time.

 * Because the SBC is a drawable attribute it's a little complex to
   continue having the right value over swap redirection.

When a window is swap-redirected, we say that the SBC is
incremented by one every time the redirecting client calls
SwapRegion, and never otherwise. A query is provided for the
current value.

 * It doesn't make sense to have both the server and the compositor
   scheduling stuff. I think you'd specify that once you swap
   redirect a window, it gets simple:

   - SwapRegion called by another client - SwapRegionRequest is
 immediately generated.
   - SwapRegion called by redirecting client - action (either a
 swap or a copy) happens immediately.

   Actually, from the compositor's perspective, the window's front
   buffer doesn't matter, but you probably need to keep it current
   to make screenshot tools, etc, work correctly.

Is this better than a more collaborative approach where the server and
compositor together determine what pixmaps are idle? I think it might be
a little simpler and more flexible. It also potentially allows a
SwapRegion on a client window to be directly turned to a SwapRegion on
the root window for a full-screen client. But there probably are a host
of difficulties I'm not thinking of :-)

- Owen


___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-02-26 Thread Keith Packard
Owen Taylor otay...@redhat.com writes:

 A complex scheme where the compositor and the server collaborate on the
 implementation of SwapRegion seems fragile to me, and still doesn't get
 some details right - like the swap count returned from SwapRegion.

The question is whether the SwapCount is the count of swaps on the
window, or the count of swaps to the final screen image. I think it's
just the former, in which case the presence of a compositor doesn't have
any effect on the correctness.

The effect of the compositor is strictly in holding a reference to the
window buffer and potentially delaying the reuse of that buffer within
the application.

 What if we made SwapRegion redirectable along the lines of
 ResizeRedirectMask? Since it would be tricky to block the client calling
 SwapRegion until the compositor responded, this would probably require
 removing the reply to SwapRegion and sending everything needed back in
 events. At the most granular, this would be:

We're really trying to avoid using events here -- they're quite messy on
the client side as you must capture them deep within the X library
implementation and shovel them over to the correct context within the
Mesa library. Using replies makes it all quite simple; the correct
context will be present automatically to receive the reply from SwapRegion.

  * Not leaking buffers during redirection/unredirection could be tricky.
What if the compositor exits while a client is waiting for a
SwapIdle? An event when swap is redirected/unredirected is probably
necessary.

With the current pixmap ID referencing technique, that ID will be freed
when the compositor exits (I mentioned this would be required before)
which will reduce the reference count to the point where the SwapIdle
event would get sent.

 Is this better than a more collaborative approach where the server and
 compositor together determine what pixmaps are idle? I think it might be
 a little simpler and more flexible. It also potentially allows a
 SwapRegion on a client window to be directly turned to a SwapRegion on
 the root window for a full-screen client. But there probably are a host
 of difficulties I'm not thinking of :-)

Oh. Having the compositor just take the new window buffer and send a
SwapRegion from that to the root window. I hadn't thought of that, as
I was still assuming we'd be 'unredirecting' full screen windows, but
this sure looks like a compelling alternative plan that should simplify
the compositor fairly nicely.

Let's stick that one on the list of 'it sure would be nice if this were
possible' list and see what it will take to make it work.

-- 
keith.pack...@intel.com


pgp0TjY1M2fjo.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-21 Thread Chris Wilson
On Wed, Feb 20, 2013 at 10:17:56PM -0800, Keith Packard wrote:
 Chris Wilson ch...@chris-wilson.co.uk writes:
 
  You manage ask yourself the question I was trying to lead: how the heck
  does the compositor learn that the underlying graphics object has
  changed?
 
 It can certainly tell that the underlying contents have changed with
 Damage events, but as to how it knows that it should do another
 BufferFromPixmap request, I think that's gonna require another
 event.
 
 Now, the big question is how to deal with compositing managers which
 don't know about DRI3. I suspect we'll just have to skip the DMA-BUF
 swapping hack for window pixmaps unless the compositor gives us the OK
 to do so.

If a DRI2 client also grabs the buffer, then we have to fallback to blits.
That should be fairly easy to detect and handle.
 
  In DRI2 this is through the InvalidateEvent, and the lack of
  being able to send those from the driver before the Damage is sent is
  one of the reasons why the current exchange mechanism is broken.
 
 Hrm. With the current system, except for override-redirect windows, if
 the compositor is also the window manager, it should always know when
 the window pixmap is going to be replaced because that only happens when
 the window is resized, and the window manager is entirely responsible
 for making that happen. For override-redirect windows, if the
 ConfigureNotify event was delivered before the Damage event, then the
 compositor could know about that as well.

We are concerned with the GEM objects backing the Pixmaps, which may
be changed at whim by the driver.

 I guess I'd like to know more about what is broken with the current
 system for compositors...

We cannot perform simple name exchanges currently in DRI2 because the
Damage is badly ordered wrt the Invalidate event and there is no
coordination between client - server - compositor on when the
buffers are reusable by the client.

 In any case, as the underlying DMA-BUF is changing, the compositor is
 going to need to know that so it can release the old pixmap back to the
 application, and so it can rewire its own compositing operations to use
 the new object.
 
 It might be nice to have the compositor use persistent names for the
 various DMA-BUFs that are used for a particular window. I think that
 means having the compositor hold on to old DMA-BUF window pixmap
 IDs. That seems tricky though. The alternative will be to have the
 compositor create/destroy a pixmap ID per frame. Not intolerable, but
 not optimal.
 
  Getting buffer exchanges working in conjunction with the external
  compositor is more or less as tricky as it gets. The notion that the
  buffer is kept busy by the compositor and so prevents the DRI3 client
  from overdrawing it is key. And that naturally leads to the compositor
  needing to release the old buffer once it is referencing the new
  post-swap buffer.
 
 Right, an easy technique there would be to have it use NameWindowPixmap
 when it got an event telling it that a new pixmap was in use for the
 window, and then when it was finished with that pixmap, it could just
 use FreePixmap to tell the server it was done. That would bump the
 refcnt down to one in the server, at which point it could queue the
 pixmap to be sent as 'idle' the next time SwapRegion was called.
 
  Serialisation between rendering of the common buffers
  is definitely s.e.p. I agree that should solve the compositor problem.
 
 Ok, cool.
 
 So, changes that I think are needed:
 
  1) If someone calls TextureFromPixmap on a window pixmap, we need to
 suppress the window pixmap swapping hack. Alternatively, we can have
 the compositor explicitly enable window pixmap swapping.

I think we definitely want to support window swapping with DRI3
compositors. DRI2 compositors will just have to continue to force blits.

  2) We need to send an event when the buffer underlying a window switches.
 
  3) We need to be explicit about event ordering between the new window
 pixmap change notify event and any related Damage.

As I see it the challenge is to prevent sending the buffer release
(SwapIdle) back to the client before all interested third parties have
had a chance to snoop its contents and react. Sketching that out we
need to increment the busy count everytime we send an Invalidate and
expect the client (compositor) to send a release after they have
finished processing the buffer.

For fun, imagine a fullscreen redirected Window (because Wine still
manages to confuse everybody):

client servercompositor

new drawable - setup Damage
show A  (Swap) -
  invalidate buffer  damage -
-  show A (Swap)
   
  flip A
show B 
  invalidate buffer  damage -
-  show B
  

Re: Initial DRI3000 protocol specs available

2013-02-21 Thread Michal Suchanek
On 21 February 2013 07:17, Keith Packard kei...@keithp.com wrote:
 Chris Wilson ch...@chris-wilson.co.uk writes:

 Hrm. With the current system, except for override-redirect windows, if
 the compositor is also the window manager, it should always know when

There are compositors that are not window mangers I use one because
without a compositor xwd -id does not work.

Thanks

Michal
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-02-21 Thread Clemens Eisserer
Hi,

 And for Render, along with passing blobs.

 Yeah, I can easily imagine doing a PictureFromBuffer as well. Let's
 focus on Pixmaps for now and get Mesa fixed up.

Passing blobs (if this means what I think it does) would be something
we are looking forward for Java's XRender backend. Currently we upload
32x32 alpha masks using XPutImage and most drivers (except SNA/intel)
don't cope with this very well.
In effect it would help to cure the weakness of XRender when it comes
to geometry.

Regards, Clemens
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-02-21 Thread Keith Packard
Stéphane Marchesin stephane.marche...@gmail.com writes:

 With that said, I don't think it's that difficult/different. I can
 design a GLX extension spec and send a draft, then we can work from
 there.

Yeah, some concrete plan for GL would be really nice to have, at least
as a starting point.

 That is actually not what you want because it is a waste of bandwidth.
 Since compositors are typically bandwidth limited, you instead want to
 paint only the relevant sub regions. Those are easy to determine by
 transforming X damage regions into screen coordinates.

Of course, that's what SwapRegion is for -- it will get to pick whether
to copy or page flip and let the client know what happened, the region
you pass

 Most non-trivial compositing managers are already using partial update
 schemes through GLX_MESA_copy_sub_buffer or the GLX_EXT_buffer_age
 extensions + copies. I don't think it is far fetched to support a list
 of rectangles instead.

A region is already a list of rectangles; the only restriction that the
relative location of all of the source and dest rectangles is the
same. This satisfies the goal of doing a damage-based back-front
update.

-- 
keith.pack...@intel.com


pgpb3Q267E6v2.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-21 Thread Keith Packard
Chris Wilson ch...@chris-wilson.co.uk writes:

 If a DRI2 client also grabs the buffer, then we have to fallback to blits.
 That should be fairly easy to detect and handle.

So, the question is whether the NameWindowPixmap IDs are stable across
pixmap replacement. I'm frankly tempted to add a new event to Composite
that is sent whenever the window pixmap changes -- that way applications
wouldn't have to guess that the pixmap changed whenever the window was
resized. This would also provide an opportunity to improve resize
performance as the X server could over-allocate window pixmaps during
the resize operation, and then shrink them back down once the final size
had been selected.

 We are concerned with the GEM objects backing the Pixmaps, which may
 be changed at whim by the driver.

Huh?

 We cannot perform simple name exchanges currently in DRI2 because the
 Damage is badly ordered wrt the Invalidate event and there is no
 coordination between client - server - compositor on when the
 buffers are reusable by the client.

Right, so we clearly need to pass the backing buffer from application to
X server and thence to the compositor. What I'm not sure about is how to
name these buffers, and how to scope their lifetime.

Here's a quick proposal -- have the X server assign server XIDs to the
buffers, and send those XIDs to the compositor in events. Now the
compositor is responsible for telling the server (some new 'IdlePixmap'
call?) when it finishes with the objects, at which point they can be
released back to the application.

We'd need some magic to make sure the pixmaps got freed if the
compositor crashed, but I think that's easier than trying to figure out
how to allocate XIDs in the compositor ID space from within the X
server.

This would replace NameWindowPixmap, and would eliminate the current
race conditions between the ConfigureNotify and the NameWindowPixmap
call while also providing traceable ownership of the buffer contents:

busyapplication X servercompositor


A   Draw to buffer A
Allocate pixmap ID for buffer A, 'Pixmap Aa'
SwapRegion Pixmap Aa

Allocate server ID for Pixmap Aa, 'Pixmap Ax'
Send 'window pixmap changed' event 'Pixmap Ax'

Receive event
Convert 'Pixmap Ax' into buffer A
using TextureFromPixmap
paint screen using buffer A
...
AB  Draw to buffer B
Allocate pixmap ID for buffer B, 'Pixmap Ba'
SwapRegion Pixmap Ba

Allocate server id for Pixmap Ba, 'Pixmap Bx'
Send 'window pixmap changed' event 'Pixmap Ax'

Receive event
IdlePixmap Pixmap Ax
Convert 'Pixmap Bx' into buffer B
using TextureFromPixmap
paint screen using buffer B

B   Mark Pixmap Aa as idle

BC  Draw to buffer C
Allocate pixmap ID for buffer C, 'Pixmap Ca'
SwapRegion Pixmap Ca
Reply with Pixmap Aa idle
Allocate server id for Pixmap Ca, 'Pixmap Cx'
Send 'window pixmap changed' event 'Pixmap Cx'

Receive event
IdlePixmap Pixmap Bx
Convert 'Pixmap Cx' into buffer C
using TextureFromPixmap
paint screen using buffer C

C   Mark Pixmap Ba as idle

CA  Draw to buffer A
SwapRegion Pixmap Aa
Reply with Pixmap Ba idle
Send 'window pixmap changed' event 'Pixmap Ax'

Receive event
IdlePixmap Pixmap Cx
paint screen using buffer A

A   Mark Pixmap Ca as idle

At this point, we're in a steady state, using three buffers for
the window -- a 'back buffer', a 'front buffer' and an 'idle buffer'.

One easy thing for memory usage is to consider idle buffers as suitable
for discard in the kernel; that would get us to one pinned buffer in the
idle case, although we'd be using three buffers while active.

It would be nice to flip between two buffers instead, but there may be
compositor rendering traffic in flight using buffer A as the application
draws to B and then C.

Hrm. What we need is for the client to learn that the compositor has
marked a buffer idle before it starts drawing; the current design places
that information in the reply to SwapRegion, 

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Stéphane Marchesin
On Tue, Feb 19, 2013 at 7:46 PM, Keith Packard kei...@keithp.com wrote:

 And here's the Swap extension

   The Swap Extension
   Version 1.0
   2013-2-14

 Keith Packard
   kei...@keithp.com
   Intel Corporation

 1. Introduction

 The Swap extension provides GL SwapBuffers semantics to move pixels
 From a pixmap to a drawable. This can be used by OpenGL
 implementations or directly by regular applications.

 1.1. Acknowledgments

 Eric Anholt e...@anholt.net
 Dave Airlie airl...@redhat.com
 Kristian Høgsberg k...@bitplanet.net

  ❄ ❄ ❄  ❄  ❄ ❄ ❄

 2. Data Types

 The server side region support specified in the Xfixes extension
 version 2 is used in the SwapRegion request.

  ❄ ❄ ❄  ❄  ❄ ❄ ❄

 4. Errors

 No errors are defined by the Swap extension.

  ❄ ❄ ❄  ❄  ❄ ❄ ❄

 5. Events

 The Swap extension provides a new event, SwapComplete, to signal when
 a swap operation has finished.

  ❄ ❄ ❄  ❄  ❄ ❄ ❄


 6. Protocol Types

 SWAPSELECTMASK { SwapCompleteMask }

 Used with SwapSelectInput to specify which events a client is
 to receive.

 SWAPIDLE {
 pixmap: PIXMAP
 valid: BOOL
 swap-hi: CARD32
 swap-lo: CARD32
 }

 This structure contains information about a pixmap which had
 been used in a SwapRegion request and which the server is now
 finished with. If valid is TRUE, swap-hi/swap-lo form a 64-bit
 swap count value from the SwapRegion request which matches the
 data that the pixmap currently contains. If valid is FALSE,
 then the contents of the pixmap are undefined.

  ❄ ❄ ❄  ❄  ❄ ❄ ❄

 7. Extension Initialization

 The name of this extension is Swap.

 ┌───
 SwapQueryVersion
 client-major-version:   CARD32
 client-minor-version:   CARD32
   ▶
 major-version:  CARD32
 minor-version:  CARD32
 └───

 The client sends the highest supported version to the server
 and the server sends the highest version it supports, but no
 higher than the requested version. Major versions changes can
 introduce incompatibilities in existing functionality, minor
 version changes introduce only backward compatible changes.
 It is the clients responsibility to ensure that the server
 supports a version which is compatible with its expectations.

 Backwards compatible changes included addition of new
 requests.


  ❄ ❄ ❄  ❄  ❄ ❄ ❄

 8. Extension Requests

 ┌───
 SwapRegion
 destination: DRAWABLE
 region: REGION
 src-off-x,src-off-y: INT16
 source: PIXMAP
 swap-interval: CARD32
 target_msc_hi: CARD32
 target_msc_lo: CARD32
 divisor_hi: CARD32
 divisor_lo: CARD32
 remainder_hi: CARD32
 remainder_lo: CARD32
   ▶
 swap_hi: CARD32
 swap_lo: CARD32
 suggested-x-off,suggested-y-off: INT16
 suggested-width,suggested-height: CARD16
 idle: LISTofSWAPIDLE
 └───
 Errors: Pixmap, Drawable, Region, Value

 Schedule a swap of the specified region from the source pixmap
 to the destination drawable.

 region specifies the region within the destination to be
 swapped from the source.

 src-off-x and src-off-y specify the offset to be added to
 region to align it with the source pixmap.

 swap-interval specifies the minimum number of frames since the
 last SwapRegion request.

 target_msc_hi/target_msc_lo form a 64-bit value marking the
 target media stamp count for the swap request. When non-zero,
 these mark the desired time where the data should be
 presented.

 divisor_hi/divisor_lo form a 64-bit value marking the desired
 media stamp count interval between swaps.

 remainder_hi/remainder_lo form a 64-bite value marking the
 desired offset within the divisor_hi/divisor_lo swap interval.

 In the reply, swap_hi/swap_lo form a 64-bit swap count value
 when the swap will actually occur (e.g.  the last queued swap
 count + (pending swap count * swap interval)).

 suggested-width and suggested-height offer a hint as to the
 best pixmap size to use for full-sized swaps in the
 future. suggested-x-off and suggested-y-off provide a hint as
 to where the window contents should be placed within that
 allocation for future swaps.

 idle provides a list of pixmaps which were passed in previous
 SwapRegion requests by this client targeting the same destination.

How 

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Chris Wilson
On Tue, Feb 19, 2013 at 07:45:09PM -0800, Keith Packard wrote:
 ┌───
 DRI3BufferFromPixmap
   pixmap: PIXMAP
   ▶
   depth: CARD8
   width, height, stride: CARD16
   depth, bpp: CARD8
   buffer: FD
 └───
   Errors: Pixmap, Match
 
   Pass back a direct rendering object associated with
   pixmap. Future changes to pixmap will be visible in that
   direct rendered object.
 
   The pixel format and geometry of the buffer are returned along
   with a file descriptor referencing the underlying direct
   rendering object.

What is the serialization for multiple clients using BufferFromPixmap?
(In particular, with a compositor reading from a DRI3 client controlled
PixmapFromBuffer.) Do we need an Invalidate for when the GEM object is
exchanged for the Pixmap following a Swap (or other external
modifications)? Are all operations still implicitly flushed to the GPU
before any reply to the Client?

Extending this protocol to supersede MIT-SHM would also be useful if it
makes the serialization explicit.

 11.2 XvMC / Xv

 It might be nice to be able to reference YUV formatted direct rendered
 objects from the X server.

And for Render, along with passing blobs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Chris Wilson
On Tue, Feb 19, 2013 at 07:46:22PM -0800, Keith Packard wrote:
 ┌───
 SwapRegion
   destination: DRAWABLE
   region: REGION
   src-off-x,src-off-y: INT16
   source: PIXMAP
   swap-interval: CARD32
   target_msc_hi: CARD32
   target_msc_lo: CARD32
   divisor_hi: CARD32
   divisor_lo: CARD32
   remainder_hi: CARD32
   remainder_lo: CARD32
   ▶   
   swap_hi: CARD32
   swap_lo: CARD32
   suggested-x-off,suggested-y-off: INT16
   suggested-width,suggested-height: CARD16
   idle: LISTofSWAPIDLE
 └───

What I don't see here is how the client instructs the server to
handle a missed swap. For example, with the typical use of
  swap-interval  0, divisor = 0, target = current
we can choose to either emit this SwapRegion synchronously, or
asynchronously (to risk tearing but allow the client catch up to its
target framerate). Actually, there isn't a mention of whether this
should be synchronized to the display at all (and how to handle
synchronisation across multiple scanouts).

What happens for a delayed error?

   In the reply, swap_hi/swap_lo form a 64-bit swap count value
   when the swap will actually occur (e.g.  the last queued swap
   count + (pending swap count * swap interval)).

I'm not sure exactly what SBC is meant to be. Is it a simple seqno of
the SwapRegion in this Drawable's swap queue (why then does
swap_interval matter), or is it meant to correlate with the vblank
counter (in which case it is merely a predicted value)?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Peter Harris
On 2013-02-19 22:46, Keith Packard wrote:
 A.3 Protocol Events
 
 The Swap extension specifies the SwapComplete event.
 
 ┌───
 SwapComplete
   1   CARD8   type
   1   CARD8   extension
   2   CARD16  sequenceNumber
   4   DRAWABLEdrawable
   4   CARD32  ust_hi
   4   CARD32  ust_lo
   4   CARD32  msc_hi
   4   CARD32  msc_lo
   4   CARD32  sbc_hi
   4   CARD32  sbc_lo
 └───

May I suggest that all new events be Generic Events? One event isn't too
bad, but the legacy event space is already crowded.

SwapComplete
1   35  GenericEvent
1   CARD8   extension
2   CARD16  sequenceNumber
4   2   length
2   CARD16  evtype
2   unused
4   DRAWABLEdrawable
4   CARD32  ust_hi
4   CARD32  ust_lo
4   CARD32  msc_hi
4   CARD32  msc_lo
4   CARD32  sbc_hi
4   CARD32  sbc_lo

(I assume extension in the original is a typo. If it isn't and an
extra byte of data is needed, it easily fits in the two bytes of
unused after evtype).

Peter Harris
-- 
   Open Text Connectivity Solutions Group
Peter Harrishttp://connectivity.opentext.com/
Research and DevelopmentPhone: +1 905 762 6001
phar...@opentext.comToll Free: 1 877 359 4866
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Chris Wilson ch...@chris-wilson.co.uk writes:

 What is the serialization for multiple clients using BufferFromPixmap?

Once they've got a handle to the object, it's really up to them to
serialize among themselves. We don't have any control of the underlying
direct rendering infrastructure.

 (In particular, with a compositor reading from a DRI3 client controlled
 PixmapFromBuffer.)

This case is half supported by the Swap semantics -- the pixmap is
handed from glxgears to the server with PixmapFromBuffer, then used in a
SwapRegion operation. Once given to that, that pixmap is 'busy' until
the server releases it back to the client in a future reply to
SwapRegion.

When the compositor then does a BufferFromPixmap on the same object, we
want that pixmap to remain busy until the compositor is done using
it. Would it suffice to require that the compositor call FreePixmap to
signal that it was done using it?

And that leads to another question here -- if we're swapping pixmaps
between back buffer and window buffer, how the heck does the compositor
learn that the underlying graphics object has changed? Ideally, we'd be
able to re-use the same BufferFromPixmap result across multiple frames,
but that would mean the compositor would need to be provided new window
pixmap IDs.

I wonder if we need a Swap events to send new pixmap IDs when the
swap happened? That would be pretty easy at least. Those would need to
be delivered before any related Damage events so that the compositor
would see the new Pixmap ID before it responded to the damage.

 Do we need an Invalidate for when the GEM object is
 exchanged for the Pixmap following a Swap (or other external
 modifications)?

I'm afraid I don't understand this question. Are you thinking that
pixmap IDs will end up changing which GEM objects they point at?

 Are all operations still implicitly flushed to the GPU
 before any reply to the Client?

Not necessarily flushed to the GPU, of course, but there definitely
needs to be some serialization mechanism that multiple DRI clients
sharing the same DRI buffer are using, and the X server needs to
participate in that serialization mechanism as appropriate for the
underlying hardware.

That's really up to the specific DRI infrastructure though.

We should make this explicit in the DRI3 spec though so that DRI
implementations aren't surprised by the requirement again.

 Extending this protocol to supersede MIT-SHM would also be useful if it
 makes the serialization explicit.

I've hacked up MIT-SHM to use FD passing already. It's nice to have
something that can pass *arbitrary* memory mappings instead of just DMA-BUFs.

 And for Render, along with passing blobs.

Yeah, I can easily imagine doing a PictureFromBuffer as well. Let's
focus on Pixmaps for now and get Mesa fixed up.

-- 
keith.pack...@intel.com


pgpPa4RQRqoRJ.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Stéphane Marchesin stephane.marche...@gmail.com writes:

 How would you handle atomic swaps? Multiple of these back to back?

Do you want to synchronously swap multiple windows? Or swap sections
From multiple back buffers?

-- 
keith.pack...@intel.com


pgpga6esDld5C.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Stéphane Marchesin
On Wed, Feb 20, 2013 at 12:01 PM, Keith Packard kei...@keithp.com wrote:
 Stéphane Marchesin stephane.marche...@gmail.com writes:

 How would you handle atomic swaps? Multiple of these back to back?

 Do you want to synchronously swap multiple windows? Or swap sections
 From multiple back buffers?

I'm interested in two specific use cases:
- Swap to an overlay and flip a crtc in an atomic fashion,
- Specify a list of dirty rectangles for a single frame, like what
CopyRegion does but with multiple rectangles.

Stéphane
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Chris Wilson ch...@chris-wilson.co.uk writes:

 What I don't see here is how the client instructs the server to
 handle a missed swap.

Right, this first pass was just trying to replicate the DRI2 semantics;
figuring out how to improve those seems like a good idea.

From what game developers have told us, a missed swap should just tear
instead of dropping a frame. It might be nice to inform the client that
they're not keeping up with the target frame rate and let them scale
stuff back; I'd suggest the SwapComplete event could contain enough
information to let them know what actually happened.

 For example, with the typical use of
   swap-interval  0, divisor = 0, target = current
 we can choose to either emit this SwapRegion synchronously, or
 asynchronously (to risk tearing but allow the client catch up to its
 target framerate).

I haven't heard anyone asking for us to skip a frame in this case to
avoid tearing.

 Actually, there isn't a mention of whether this
 should be synchronized to the display at all (and how to handle
 synchronisation across multiple scanouts).

Eric suggested we fix the multi screen problem by making the application
figure this out (if they like). I'm thinking we would just add a RandR
CRTC to the request, or let the client set it to None if they don't care.

 What happens for a delayed error?

What kind of errors can happen after the request is validated? I'd hope
that these cases would be truly exceptional. Realistically, we don't
have any place to report these that the DRI library could ever hope to
see them reliably. We could report them in an event, but that would
need to trust on the kindness of the application to send it along to the
library. We could pend the errors and report them in a future SwapRegion
reply, but that would presume that the application is continuously
rendering frames.

  In the reply, swap_hi/swap_lo form a 64-bit swap count value
  when the swap will actually occur (e.g.  the last queued swap
  count + (pending swap count * swap interval)).

 I'm not sure exactly what SBC is meant to be. Is it a simple seqno of
 the SwapRegion in this Drawable's swap queue (why then does
 swap_interval matter), or is it meant to correlate with the vblank
 counter (in which case it is merely a predicted value)?

Just like DRI2, it's the planned swap time. Don't be late!

-- 
keith.pack...@intel.com


pgp8GmR9HTF2y.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Peter Harris phar...@opentext.com writes:

 May I suggest that all new events be Generic Events? One event isn't too
 bad, but the legacy event space is already crowded.

Yes, of course. I didn't worry too much about the encoding part, I'm
afraid :-)

-- 
keith.pack...@intel.com


pgpW0UG26xHXe.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Stéphane Marchesin stephane.marche...@gmail.com writes:

 I'm interested in two specific use cases:
 - Swap to an overlay and flip a crtc in an atomic fashion,

As you may remember, I proposed a bunch of RandR changes to support
per-CRTC pixmaps and atomic mode setting operations a while back. With
hardware now commonly supporting multiple overlays, even that stuff
wouldn't suffice anymore.

Off the top of my head, we'd need to construct some Drawable that
represented each overlay, and then perform a PolySwapRegion operation to
synchronously update their contents from appropriate back buffers.

 - Specify a list of dirty rectangles for a single frame, like what
 CopyRegion does but with multiple rectangles.

And they're not arranged so that a single region and source offset x/y
could be used?

I can imagine creating a SwapRectangles request, but I don't know that
it would be any better than simply executing multiple SwapRegion
requests.

-- 
keith.pack...@intel.com


pgpmZJlgmN56A.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Stéphane Marchesin
On Wed, Feb 20, 2013 at 12:42 PM, Keith Packard kei...@keithp.com wrote:
 Stéphane Marchesin stephane.marche...@gmail.com writes:

 I'm interested in two specific use cases:
 - Swap to an overlay and flip a crtc in an atomic fashion,

 As you may remember, I proposed a bunch of RandR changes to support
 per-CRTC pixmaps and atomic mode setting operations a while back. With
 hardware now commonly supporting multiple overlays, even that stuff
 wouldn't suffice anymore.

 Off the top of my head, we'd need to construct some Drawable that
 represented each overlay, and then perform a PolySwapRegion operation to
 synchronously update their contents from appropriate back buffers.

Right, that's what I'm after. If you have a bunch of GL surfaces
you're rendering to, a main drawable and 2 overlays, I'd like the
ability to swap to arbitrary overlays or to my main surface. Of course
the GL extension for that is still TBD, but having the ability in DRI3
would be a nice start.


 - Specify a list of dirty rectangles for a single frame, like what
 CopyRegion does but with multiple rectangles.

 And they're not arranged so that a single region and source offset x/y
 could be used?

 I can imagine creating a SwapRectangles request, but I don't know that
 it would be any better than simply executing multiple SwapRegion
 requests.

Well, if you have vsync enabled for your CopyRegion implementation,
then you'll need to vsync for each region right? What I'm after is a
swap all these regions together, vsync only once type of thing.

Stéphane
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Chris Wilson
On Wed, Feb 20, 2013 at 11:55:56AM -0800, Keith Packard wrote:
 Chris Wilson ch...@chris-wilson.co.uk writes:
  Do we need an Invalidate for when the GEM object is
  exchanged for the Pixmap following a Swap (or other external
  modifications)?
 
 I'm afraid I don't understand this question. Are you thinking that
 pixmap IDs will end up changing which GEM objects they point at?

You manage ask yourself the question I was trying to lead: how the heck
does the compositor learn that the underlying graphics object has
changed? In DRI2 this is through the InvalidateEvent, and the lack of
being able to send those from the driver before the Damage is sent is
one of the reasons why the current exchange mechanism is broken.

Getting buffer exchanges working in conjunction with the external
compositor is more or less as tricky as it gets. The notion that the
buffer is kept busy by the compositor and so prevents the DRI3 client
from overdrawing it is key. And that naturally leads to the compositor
needing to release the old buffer once it is referencing the new
post-swap buffer. Serialisation between rendering of the common buffers
is definitely s.e.p. I agree that should solve the compositor problem.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Mario Kleiner

On 02/20/2013 09:27 PM, Keith Packard wrote:

Chris Wilson ch...@chris-wilson.co.uk writes:


What I don't see here is how the client instructs the server to
handle a missed swap.

Right, this first pass was just trying to replicate the DRI2 semantics;
figuring out how to improve those seems like a good idea.

 From what game developers have told us, a missed swap should just tear
instead of dropping a frame. It might be nice to inform the client that
they're not keeping up with the target frame rate and let them scale
stuff back; I'd suggest the SwapComplete event could contain enough
information to let them know what actually happened.


Please make this configurable. Tearing makes sense for a game, but for 
the kind of scientific apps i do, we don't want it to tear ever, or bad 
things would happen for us. We need it to just flip the frame delayed 
but vsync'ed and then the app can figure out via the INTEL_swap_events 
or glXWaitForSbcOML() that a deadline was missed and what to do to catch up.


There's 
http://www.opengl.org/registry/specs/EXT/glx_swap_control_tear.txt that 
allows apps to define if they want to tear or vsync on a missed swap 
deadline.



For example, with the typical use of
   swap-interval  0, divisor = 0, target = current
we can choose to either emit this SwapRegion synchronously, or
asynchronously (to risk tearing but allow the client catch up to its
target framerate).

I haven't heard anyone asking for us to skip a frame in this case to
avoid tearing.


See above :-).

...

In the reply, swap_hi/swap_lo form a 64-bit swap count value
when the swap will actually occur (e.g.  the last queued swap
count + (pending swap count * swap interval)).


t
I'm not sure exactly what SBC is meant to be. Is it a simple seqno of
the SwapRegion in this Drawable's swap queue (why then does
swap_interval matter), or is it meant to correlate with the vblank
counter (in which case it is merely a predicted value)?

Just like DRI2, it's the planned swap time. Don't be late!


SBC in DRI2 is the running count of completed swaps for a drawable, ie., 
current swap count + pending swap count, not the planned swap time. 
Essentially a reference to a just queued swap via sbc = 
glXSwapBuffersMscOML(...), so you can use sbc as a unique id for that 
swap in glXWaitForSBCOML() or to match it up with the sbc in a returned 
INTEL_swap_event. Very useful, so i guess this should stay 
backwards-compatible.


thanks,
-mario




___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Aaron Plattner

On 02/19/13 19:45, Keith Packard wrote:

* PGP Signed by an unknown key


Here's the spec for DRI3:

  The DRI3 Extension
  Version 1.0
  2013-2-19

Keith Packard
  kei...@keithp.com
  Intel Corporation

1. Introduction

The DRI3 extension provides mechanisms to translate between direct
rendered buffers and X pixmaps. When combined with the Swap extension,
a complete direct rendering solution for OpenGL is provided.

The direct rendered buffers are passed across the protocol via
standard POSIX file descriptor passing mechanisms. On Linux, these
buffers are DMA-BUF objects.

1.1. Acknowledgments

Eric Anholt e...@anholt.net
Dave Airlie airl...@redhat.com
Kristian Høgsberg k...@bitplanet.net

 ❄ ❄ ❄  ❄  ❄ ❄ ❄

2. Data Types

The DRI3 extension uses the RandR extension Provider to select among
multiple GPUs on a single screen.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄

4. Errors

No errors are defined by the DRI3 extension.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄

5. Events

No events are defined by the DRI3 extension.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄

6. Protocol Types

DRI3DRIVER { DRI3DriverDRI
 DRI3DriverVDPAU }

These values describe the type of driver the client will want
to load.  The server sends back the name of the driver to use
for the screen in question.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄

7. Extension Initialization

The name of this extension is DRI3 (third time is the charm?).

┌───
 DRI3QueryVersion
client-major-version:   CARD32
client-minor-version:   CARD32
   ▶
major-version:  CARD32
minor-version:  CARD32
└───

The client sends the highest supported version to the server
and the server sends the highest version it supports, but no
higher than the requested version. Major versions changes can
introduce incompatibilities in existing functionality, minor
version changes introduce only backward compatible changes.
It is the clients responsibility to ensure that the server
supports a version which is compatible with its expectations.

Backwards compatible changes included addition of new
requests.


 ❄ ❄ ❄  ❄  ❄ ❄ ❄

8. Extension Requests

┌───
 DRI3Open
drawable: DRAWABLE
driverType: DRI3DRIVER
provider: PROVIDER
   ▶
driver: STRING
device: FD
└───
Errors: Drawable, Value, Match

This requests that the X server open the direct rendering
device associated with drawable, driverType and RandR
provider. The provider must support SourceOutput or SourceOffload.

The direct rendering library used to implement the specified
driverType is returned in the driver value. The file
descriptor for the device is returned in FD.

┌───
 DRI3PixmapFromBuffer
pixmap: PIXMAP
drawable: DRAWABLE
width, height, stride: CARD16


Why is there a stride here if all it is is an indirect way of 
calculating a total size?  If the total size is what the server cares 
about, then it seems like the client should just send that.


Not all tiled formats fit nicely into a height * stride = total 
equation with stride being an integer.



depth, bpp: CARD8
buffer: FD
└───
Errors: Alloc, Drawable, IDChoice, Value, Match

Creates a pixmap for the direct rendering object associated
with buffer. width, height and stride specify the geometry (in
pixels) of the underlying buffer. The pixels within the buffer
may not be arranged in a simple linear fashion, but the total
byte size of the buffer must be height * stride * bpp /
8. Precisely how any additional information about the buffer
is shared is outside the scope of this extension.

If buffer cannot be used with the screen associated with
drawable, a Match error is returned.

If depth or bpp are not supported by the screen, a Value error
is returned.

┌───
 DRI3BufferFromPixmap
pixmap: PIXMAP
   ▶
depth: CARD8
width, height, stride: CARD16
depth, bpp: CARD8
buffer: FD
└───
Errors: Pixmap, Match

Pass back a direct rendering object associated with
pixmap. Future changes to pixmap will be visible in that
direct rendered object.

The pixel format and geometry of the buffer are returned along
with a file descriptor referencing the underlying direct
rendering object.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄

9. Extension Events

The DRI3 extension defines no events

 ❄ ❄ ❄  ❄  ❄ ❄ ❄

10. Extension Versioning


Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Chris Wilson ch...@chris-wilson.co.uk writes:

 You manage ask yourself the question I was trying to lead: how the heck
 does the compositor learn that the underlying graphics object has
 changed?

It can certainly tell that the underlying contents have changed with
Damage events, but as to how it knows that it should do another
BufferFromPixmap request, I think that's gonna require another
event.

Now, the big question is how to deal with compositing managers which
don't know about DRI3. I suspect we'll just have to skip the DMA-BUF
swapping hack for window pixmaps unless the compositor gives us the OK
to do so.

 In DRI2 this is through the InvalidateEvent, and the lack of
 being able to send those from the driver before the Damage is sent is
 one of the reasons why the current exchange mechanism is broken.

Hrm. With the current system, except for override-redirect windows, if
the compositor is also the window manager, it should always know when
the window pixmap is going to be replaced because that only happens when
the window is resized, and the window manager is entirely responsible
for making that happen. For override-redirect windows, if the
ConfigureNotify event was delivered before the Damage event, then the
compositor could know about that as well.

I guess I'd like to know more about what is broken with the current
system for compositors...

In any case, as the underlying DMA-BUF is changing, the compositor is
going to need to know that so it can release the old pixmap back to the
application, and so it can rewire its own compositing operations to use
the new object.

It might be nice to have the compositor use persistent names for the
various DMA-BUFs that are used for a particular window. I think that
means having the compositor hold on to old DMA-BUF window pixmap
IDs. That seems tricky though. The alternative will be to have the
compositor create/destroy a pixmap ID per frame. Not intolerable, but
not optimal.

 Getting buffer exchanges working in conjunction with the external
 compositor is more or less as tricky as it gets. The notion that the
 buffer is kept busy by the compositor and so prevents the DRI3 client
 from overdrawing it is key. And that naturally leads to the compositor
 needing to release the old buffer once it is referencing the new
 post-swap buffer.

Right, an easy technique there would be to have it use NameWindowPixmap
when it got an event telling it that a new pixmap was in use for the
window, and then when it was finished with that pixmap, it could just
use FreePixmap to tell the server it was done. That would bump the
refcnt down to one in the server, at which point it could queue the
pixmap to be sent as 'idle' the next time SwapRegion was called.

 Serialisation between rendering of the common buffers
 is definitely s.e.p. I agree that should solve the compositor problem.

Ok, cool.

So, changes that I think are needed:

 1) If someone calls TextureFromPixmap on a window pixmap, we need to
suppress the window pixmap swapping hack. Alternatively, we can have
the compositor explicitly enable window pixmap swapping.

 2) We need to send an event when the buffer underlying a window switches.

 3) We need to be explicit about event ordering between the new window
pixmap change notify event and any related Damage.

-- 
keith.pack...@intel.com


pgpJAG7lw5Sft.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Aaron Plattner aplatt...@nvidia.com writes:

 Why is there a stride here if all it is is an indirect way of 
 calculating a total size?  If the total size is what the server cares 
 about, then it seems like the client should just send that.

I don't need the size, I need the stride. I could just assert that
width/height are sufficient to compute the stride on both sides of the
wire, but we've had image format adventures in the past of this sort and
I'd like to make sure we have sufficient information to reconstruct the
precise layout.

The i915 kernel driver can tell me the tiling format of a GEM buffer,
but it refuses to hand back the stride, so I stuck this into the
protocol as it will be useful for at least that chip.

 Not all tiled formats fit nicely into a height * stride = total 
 equation with stride being an integer.

Sure, I'd imagine it'd be something like (tiles-wide * tile-width *
tiles-high * tile-height) where tiles-wide and tiles-high are big enough
to cover the specified image size.

I'd obviously prefer to not just pass a pile of driver-specific data in
this request, and 'stride' feels like it's skating close to that
edge.

-- 
keith.pack...@intel.com


pgphQyUbqv3IT.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Stéphane Marchesin stephane.marche...@gmail.com writes:

 On Wed, Feb 20, 2013 at 12:42 PM, Keith Packard kei...@keithp.com wrote:
 Stéphane Marchesin stephane.marche...@gmail.com writes:

 I'm interested in two specific use cases:
 - Swap to an overlay and flip a crtc in an atomic fashion,

 As you may remember, I proposed a bunch of RandR changes to support
 per-CRTC pixmaps and atomic mode setting operations a while back. With
 hardware now commonly supporting multiple overlays, even that stuff
 wouldn't suffice anymore.

 Off the top of my head, we'd need to construct some Drawable that
 represented each overlay, and then perform a PolySwapRegion operation to
 synchronously update their contents from appropriate back buffers.

 Right, that's what I'm after. If you have a bunch of GL surfaces
 you're rendering to, a main drawable and 2 overlays, I'd like the
 ability to swap to arbitrary overlays or to my main surface. Of course
 the GL extension for that is still TBD, but having the ability in DRI3
 would be a nice start.

Having an actual API to design to would be a huge help though; I suspect
anything we do in advance will just get messed up by the GL ARB.

 Well, if you have vsync enabled for your CopyRegion implementation,
 then you'll need to vsync for each region right? What I'm after is a
 swap all these regions together, vsync only once type of thing.

Oh. I've been focused on the GL swapbuffers APIs. SwapRegion isn't a
general CopyRegion operation. I've neglected to write down some of the
more important semantics which underlie the goals of this work.

For SwapRegion, I want to be able to require that the X server always be
free to just swap the entire contents of the source buffer with the
destination buffer -- the region is just the 'damaged' area within the
window; areas outside of that don't *need* to be copied from the new
buffer, but the client guarantees that the entire buffer contain the
correct contents for the window and that only the area within the
specified region differs from the current window contents.

For a compositing manager, you really do want to pull data from all of
the window pixmaps and paint them into the frame buffer in one giant
operation. The usual way of doing this is to construct the whole next
screen frame in a new single image and then use SwapRegion to get that
onto the screen. Then the individual updates could use a sequence of
SwapRegion operations to construct that intermediate buffer; once that
was ready, a single SwapRegion would move that to the scanout buffer.

Presumably that final SwapRegion would be a simple page flip operation
in the driver, so it would take no time or memory bandwidth.

It might be fun to figure out how to bypass the intermediate back buffer
though, and for that we'd need some complicated PolySwapRegion request
that queued up all of the changes in one giant request to be executed at
the right time, but that seems like something that wouldn't match how I
imagine compositing managers working today.

-- 
keith.pack...@intel.com


pgpn9yMJDcXww.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Keith Packard
Mario Kleiner mario.kleiner...@gmail.com writes:

 On 02/20/2013 09:27 PM, Keith Packard wrote:
 Chris Wilson ch...@chris-wilson.co.uk writes:

 What I don't see here is how the client instructs the server to
 handle a missed swap.
 Right, this first pass was just trying to replicate the DRI2 semantics;
 figuring out how to improve those seems like a good idea.

  From what game developers have told us, a missed swap should just tear
 instead of dropping a frame. It might be nice to inform the client that
 they're not keeping up with the target frame rate and let them scale
 stuff back; I'd suggest the SwapComplete event could contain enough
 information to let them know what actually happened.

 Please make this configurable. Tearing makes sense for a game, but for 
 the kind of scientific apps i do, we don't want it to tear ever, or bad 
 things would happen for us. We need it to just flip the frame delayed 
 but vsync'ed and then the app can figure out via the INTEL_swap_events 
 or glXWaitForSbcOML() that a deadline was missed and what to do to
 catch up.


 There's 
 http://www.opengl.org/registry/specs/EXT/glx_swap_control_tear.txt that 
 allows apps to define if they want to tear or vsync on a missed swap 
 deadline.

Oh, that's ugly -- uses negative values for the interval. We could cook
up some suitable 'missed frame' mode parameter to make it match that
extension, and then create a similar EGL extension.

 SBC in DRI2 is the running count of completed swaps for a drawable, ie., 
 current swap count + pending swap count, not the planned swap time. 
 Essentially a reference to a just queued swap via sbc = 
 glXSwapBuffersMscOML(...), so you can use sbc as a unique id for that 
 swap in glXWaitForSBCOML() or to match it up with the sbc in a returned 
 INTEL_swap_event. Very useful, so i guess this should stay 
 backwards-compatible.

That's definitely the plan -- we need whatever parameters are necessary
to implement the relevant GL specs. If we can't do that, it's not useful
to anyone.

-- 
keith.pack...@intel.com


pgp50KZpXPpoZ.pgp
Description: PGP signature
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Re: Initial DRI3000 protocol specs available

2013-02-20 Thread Stéphane Marchesin
On Wed, Feb 20, 2013 at 10:50 PM, Keith Packard kei...@keithp.com wrote:
 Stéphane Marchesin stephane.marche...@gmail.com writes:

 On Wed, Feb 20, 2013 at 12:42 PM, Keith Packard kei...@keithp.com wrote:
 Stéphane Marchesin stephane.marche...@gmail.com writes:

 I'm interested in two specific use cases:
 - Swap to an overlay and flip a crtc in an atomic fashion,

 As you may remember, I proposed a bunch of RandR changes to support
 per-CRTC pixmaps and atomic mode setting operations a while back. With
 hardware now commonly supporting multiple overlays, even that stuff
 wouldn't suffice anymore.

 Off the top of my head, we'd need to construct some Drawable that
 represented each overlay, and then perform a PolySwapRegion operation to
 synchronously update their contents from appropriate back buffers.

 Right, that's what I'm after. If you have a bunch of GL surfaces
 you're rendering to, a main drawable and 2 overlays, I'd like the
 ability to swap to arbitrary overlays or to my main surface. Of course
 the GL extension for that is still TBD, but having the ability in DRI3
 would be a nice start.

 Having an actual API to design to would be a huge help though; I suspect
 anything we do in advance will just get messed up by the GL ARB.

I don't think we need to involve the ARB just yet, the copy sub buffer
is a MESA extension and never went to the ARB. So I don't see that as
a problem.

With that said, I don't think it's that difficult/different. I can
design a GLX extension spec and send a draft, then we can work from
there.


 Well, if you have vsync enabled for your CopyRegion implementation,
 then you'll need to vsync for each region right? What I'm after is a
 swap all these regions together, vsync only once type of thing.

 Oh. I've been focused on the GL swapbuffers APIs. SwapRegion isn't a
 general CopyRegion operation. I've neglected to write down some of the
 more important semantics which underlie the goals of this work.

 For SwapRegion, I want to be able to require that the X server always be
 free to just swap the entire contents of the source buffer with the
 destination buffer -- the region is just the 'damaged' area within the
 window; areas outside of that don't *need* to be copied from the new
 buffer, but the client guarantees that the entire buffer contain the
 correct contents for the window and that only the area within the
 specified region differs from the current window contents.

 For a compositing manager, you really do want to pull data from all of
 the window pixmaps and paint them into the frame buffer in one giant
 operation.

That is actually not what you want because it is a waste of bandwidth.
Since compositors are typically bandwidth limited, you instead want to
paint only the relevant sub regions. Those are easy to determine by
transforming X damage regions into screen coordinates.

Most non-trivial compositing managers are already using partial update
schemes through GLX_MESA_copy_sub_buffer or the GLX_EXT_buffer_age
extensions + copies. I don't think it is far fetched to support a list
of rectangles instead.

Stéphane

 The usual way of doing this is to construct the whole next
 screen frame in a new single image and then use SwapRegion to get that
 onto the screen. Then the individual updates could use a sequence of
 SwapRegion operations to construct that intermediate buffer; once that
 was ready, a single SwapRegion would move that to the scanout buffer.

 Presumably that final SwapRegion would be a simple page flip operation
 in the driver, so it would take no time or memory bandwidth.

 It might be fun to figure out how to bypass the intermediate back buffer
 though, and for that we'd need some complicated PolySwapRegion request
 that queued up all of the changes in one giant request to be executed at
 the right time, but that seems like something that wouldn't match how I
 imagine compositing managers working today.

 --
 keith.pack...@intel.com
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


Re: Initial DRI3000 protocol specs available

2013-02-19 Thread Keith Packard

Here's the spec for DRI3:

  The DRI3 Extension
  Version 1.0
  2013-2-19
  
Keith Packard
  kei...@keithp.com
  Intel Corporation

1. Introduction

The DRI3 extension provides mechanisms to translate between direct
rendered buffers and X pixmaps. When combined with the Swap extension,
a complete direct rendering solution for OpenGL is provided.

The direct rendered buffers are passed across the protocol via
standard POSIX file descriptor passing mechanisms. On Linux, these
buffers are DMA-BUF objects.

1.1. Acknowledgments

Eric Anholt e...@anholt.net
Dave Airlie airl...@redhat.com
Kristian Høgsberg k...@bitplanet.net

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

2. Data Types

The DRI3 extension uses the RandR extension Provider to select among
multiple GPUs on a single screen.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

4. Errors

No errors are defined by the DRI3 extension.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

5. Events

No events are defined by the DRI3 extension.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

6. Protocol Types

DRI3DRIVER { DRI3DriverDRI
 DRI3DriverVDPAU }

These values describe the type of driver the client will want
to load.  The server sends back the name of the driver to use
for the screen in question.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

7. Extension Initialization

The name of this extension is DRI3 (third time is the charm?).

┌───
DRI3QueryVersion
client-major-version:   CARD32
client-minor-version:   CARD32
  ▶
major-version:  CARD32
minor-version:  CARD32
└───

The client sends the highest supported version to the server
and the server sends the highest version it supports, but no
higher than the requested version. Major versions changes can
introduce incompatibilities in existing functionality, minor
version changes introduce only backward compatible changes.
It is the clients responsibility to ensure that the server
supports a version which is compatible with its expectations.

Backwards compatible changes included addition of new
requests.


 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

8. Extension Requests

┌───
DRI3Open
drawable: DRAWABLE
driverType: DRI3DRIVER
provider: PROVIDER
  ▶
driver: STRING
device: FD
└───
Errors: Drawable, Value, Match

This requests that the X server open the direct rendering
device associated with drawable, driverType and RandR
provider. The provider must support SourceOutput or SourceOffload.

The direct rendering library used to implement the specified
driverType is returned in the driver value. The file
descriptor for the device is returned in FD.

┌───
DRI3PixmapFromBuffer
pixmap: PIXMAP
drawable: DRAWABLE
width, height, stride: CARD16
depth, bpp: CARD8
buffer: FD
└───
Errors: Alloc, Drawable, IDChoice, Value, Match

Creates a pixmap for the direct rendering object associated
with buffer. width, height and stride specify the geometry (in
pixels) of the underlying buffer. The pixels within the buffer
may not be arranged in a simple linear fashion, but the total
byte size of the buffer must be height * stride * bpp /
8. Precisely how any additional information about the buffer
is shared is outside the scope of this extension.

If buffer cannot be used with the screen associated with
drawable, a Match error is returned.

If depth or bpp are not supported by the screen, a Value error
is returned.

┌───
DRI3BufferFromPixmap
pixmap: PIXMAP
  ▶
depth: CARD8
width, height, stride: CARD16
depth, bpp: CARD8
buffer: FD
└───
Errors: Pixmap, Match

Pass back a direct rendering object associated with
pixmap. Future changes to pixmap will be visible in that
direct rendered object.

The pixel format and geometry of the buffer are returned along
with a file descriptor referencing the underlying direct
rendering object.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

9. Extension Events

The DRI3 extension defines no events

 ❄ ❄ ❄  ❄  ❄ ❄ ❄

10. Extension Versioning

The DRI3 extension is adapted from the DRI2 extension.

1.0: First published version

 ❄ ❄ ❄  ❄  ❄ ❄ ❄


11. Relationship with other extensions

As an extension designed to support other extensions, there is
naturally some interactions with other extensions.

11.1 GLX

GLX has no direct relation with DRI3, but a direct rendering OpenGL

Re: Initial DRI3000 protocol specs available

2013-02-19 Thread Keith Packard

And here's the Swap extension

  The Swap Extension
  Version 1.0
  2013-2-14
  
Keith Packard
  kei...@keithp.com
  Intel Corporation

1. Introduction

The Swap extension provides GL SwapBuffers semantics to move pixels
From a pixmap to a drawable. This can be used by OpenGL
implementations or directly by regular applications.

1.1. Acknowledgments

Eric Anholt e...@anholt.net
Dave Airlie airl...@redhat.com
Kristian Høgsberg k...@bitplanet.net

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

2. Data Types

The server side region support specified in the Xfixes extension
version 2 is used in the SwapRegion request.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

4. Errors

No errors are defined by the Swap extension.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

5. Events

The Swap extension provides a new event, SwapComplete, to signal when
a swap operation has finished.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 


6. Protocol Types

SWAPSELECTMASK { SwapCompleteMask }

Used with SwapSelectInput to specify which events a client is
to receive.

SWAPIDLE {
pixmap: PIXMAP
valid: BOOL
swap-hi: CARD32
swap-lo: CARD32
}

This structure contains information about a pixmap which had
been used in a SwapRegion request and which the server is now
finished with. If valid is TRUE, swap-hi/swap-lo form a 64-bit
swap count value from the SwapRegion request which matches the
data that the pixmap currently contains. If valid is FALSE,
then the contents of the pixmap are undefined.

 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

7. Extension Initialization

The name of this extension is Swap.

┌───
SwapQueryVersion
client-major-version:   CARD32
client-minor-version:   CARD32
  ▶
major-version:  CARD32
minor-version:  CARD32
└───

The client sends the highest supported version to the server
and the server sends the highest version it supports, but no
higher than the requested version. Major versions changes can
introduce incompatibilities in existing functionality, minor
version changes introduce only backward compatible changes.
It is the clients responsibility to ensure that the server
supports a version which is compatible with its expectations.

Backwards compatible changes included addition of new
requests.


 ❄ ❄ ❄  ❄  ❄ ❄ ❄ 

8. Extension Requests

┌───
SwapRegion
destination: DRAWABLE
region: REGION
src-off-x,src-off-y: INT16
source: PIXMAP
swap-interval: CARD32
target_msc_hi: CARD32
target_msc_lo: CARD32
divisor_hi: CARD32
divisor_lo: CARD32
remainder_hi: CARD32
remainder_lo: CARD32
  ▶ 
swap_hi: CARD32
swap_lo: CARD32
suggested-x-off,suggested-y-off: INT16
suggested-width,suggested-height: CARD16
idle: LISTofSWAPIDLE
└───
Errors: Pixmap, Drawable, Region, Value

Schedule a swap of the specified region from the source pixmap
to the destination drawable.

region specifies the region within the destination to be
swapped from the source.

src-off-x and src-off-y specify the offset to be added to
region to align it with the source pixmap.

swap-interval specifies the minimum number of frames since the
last SwapRegion request.

target_msc_hi/target_msc_lo form a 64-bit value marking the
target media stamp count for the swap request. When non-zero,
these mark the desired time where the data should be
presented.

divisor_hi/divisor_lo form a 64-bit value marking the desired
media stamp count interval between swaps.

remainder_hi/remainder_lo form a 64-bite value marking the
desired offset within the divisor_hi/divisor_lo swap interval.

In the reply, swap_hi/swap_lo form a 64-bit swap count value
when the swap will actually occur (e.g.  the last queued swap
count + (pending swap count * swap interval)).

suggested-width and suggested-height offer a hint as to the
best pixmap size to use for full-sized swaps in the
future. suggested-x-off and suggested-y-off provide a hint as
to where the window contents should be placed within that
allocation for future swaps.

idle provides a list of pixmaps which were passed in previous
SwapRegion requests by this client targeting the same destination.

┌───
SwapSelectInput
drawable: DRAWABLE
mask: SETofSWAPSELECTMASK
└───
Errors: Drawable

Selects which Swap events are to be delivered to the