Re: [RFC] DRI2 synchronization and swap bits
On Sun, 8 Nov 2009 08:16:51 +0100 Mario Kleiner mario.klei...@tuebingen.mpg.de wrote: My proposal to use a spinlock was probably rather stupid. Because of glXGetSyncValuesOML() - I830DRI2GetMSC - drmWaitVBlank - drm_wait_vblank - drm_vblank_count(), if multiple clients call glXGetSyncValuesOML() frequently, e.g., in a polling loop, i assume this could cause quite a bit of contention on a spinlock that must be acquired with minimal delay from the vblank irq handler. According to http://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO, critical sections protected by spinlock_t are preemptible if one uses a realtime kernel with preempt-rt patches applied, something i'd expect my users to do frequently. Maybe i overlook something, but this sounds unhealthy if it happens while drm_vblank_count() holds the lock? It's not ideal, but I don't think it would cause many problems in practice. The lock hold times will be very small, so contention shouldn't be a big problem, but as you say if we find issues we can use a lockless method. Btw., when looking at the code in drm_irq.c in the current linux-next tree, i saw that drm_handle_vblank_events() does a e-event.sequence = seq; assignment with the current seq vblank number when retiring an event, but the special shortcut path in drm_queue_vblank_event(), which retires events immediately without queuing them if the requested vblank number has been reached or exceeded already, does not do an update e-event.sequence = seq; with the most recent seq vblank number that triggered this early retirement. This looks inconsistent to me, could this be a bug? It uses vblwait-request.sequence which should be the right number. The timestamp is definitely off though... The simple seqlock implementation might be too simple though and a ringbuffer that holds multiple hundred recent vblank timestamp samples might be better. I wonder if it would be sufficient to just track the last timestamp. Any callers interested in the last event would get a precise timestamp. Callers looking at past events could be returned a calculated value based on the time difference between the last two events? The problem is the accuracy of glXGetMscRateOML(). This value - basically the duration of a video refresh interval - gets calculated from the current video mode timing, i.e., dotclock, HTotal and VTotal. This value is only useful for userspace applications like my toolkit under the assumption that both the dotclock of the GPU and the current system clock (TSC / HPET / APIC timer / ...) are perfectly accurate and drift-free. In reality, both clocks are imperfect and drift against each other, therefore the returned nominal value of glXGetMscRateOML() is basically always a bit wrong/ inaccurate wrt. system time as used by userspace applications. Our app therefore determines the real refresh duration by a calibration loop of multiple seconds duration at startup. This works ok, but it increases startup time, can't take slow clock drift over the course of a session into account, because i can't recalibrate during a session, and the calibration is also not perfect due to the timing noise (preemption, scheduling jitter, wakeup latency after swapbuffers etc.) that affects a measurement loop in userspace. A better approach would be for Linux to measure the current video refresh interval over a certain time window, e.g., computing a moving average over a few seconds. This could be done if the vblank timestamps are logged into a ringbuffer. The ringbuffer would allow for lock-free readout of the most recent vblank timestamp from drm_vblank_count(). At the same time the system could look at all samples in the ringbuffer to compute the real duration of a video refresh interval as a average over the deltas between samples in the ringbuffer and provide an accurate and current estimate of glXGetMscRateOML() that would be better than anything we can do in userspace. That would be even better than just using the last difference, and shouldn't add too much more code. On some configurations the refresh rate will change too (for power saving reasons it may be reduced and then increased again when activity occurs) so that would have to be accounted for. Presumably we wouldn't care about the reduced rate since it implies clients aren't actively using the display. The second problem is how to reinitialize the current vblank timestamp in drm_update_vblank_count() when vblank interrupts get reenabled after they've been disabled for a long period of time? One generic way to reinitialize would be to calculate elapsed time since last known vblank timestamp from the computed vblank count diff by multiplying the count with the known duration of the video refresh interval. In that case, an accurate estimate of glXGetMscRateOML would be important, so a ringbuffer with samples would probably help.
Re: [RFC] DRI2 synchronization and swap bits
On Nov 2, 2009, at 5:35 PM, Jesse Barnes wrote: Thanks a lot for taking time to go through this stuff, it's exactly the kind of feedback I was hoping for. Hello again I'm relieved that i didn't screw up and annoy you already with my first post, so i'll continue to test my boundaries ;-) Doing the wakeups within a millisecond should definitely be possible, I don't expect the context switch between display server and client would be *that* high of a cost (but as I said I'll benchmark). I don't expect that either. My comment was just to reinforce that very low latency matters for at least our class of applications. I'm currently benchmarking our toolkit wrt. timing precision and latency on a few different machine/gpu/os combinations. In case such numbers from other implementations (Linux with the proprietary drivers, OS/X, Windows) are interesting to you, let me know. I don't like this idea about entirely fake numbers and like to vote for a solution that is as close as possible to the non-redirected case. The raw numbers will always be exposed to the compositor and probably to applications via an opt-out mechanism (to be defined still, we don't even have the extra compositor protocol defined). Happy to hear that. Unreliable UST timestamps would make the whole OML_sync_control extension almost useless for us and probably other applications that require good sync e.g, btw. video and audio streams, so i'd ask you politely for improvements here. Definitely; these are just bugs, I certainly didn't design it to behave this way! :) Assumed that :). Currently 1.5% of our users are on Linux and i'd love to persuade a few more to adopt Linux in the next year. I just realized that helping to improve the Linux graphics stack in areas that matter to us makes more sense than doing what i did for all operating systems since years - trying to cope with limitations and driver bugs by use of weird hacks in our userspace application, the best i can do on OS/X and Windows. I guess one (simple from the viewpoint of a non-kernel hacker?) way would be to always timestamp the vblank in the drm_handle_vblank() routine, immediately after incrementing the vblank_count, probably protecting both the timestamp acquisition and vblank increment by one spinlock, so both get updated atomically? Then one could maybe extend drm_vblank_count() to readout and return vblank count and corresponding timestamp simultaneously under protection of the lock? Or any other way to provide the timestamp together with the vblank count in an atomic fashion to the calling code in drm_queue_vblank_event(), drm_queue_vblank_event() and drm_handle_vblank_events()? Yep, that would work and should be a fairly easy change. I spent a bit more time thinking about this, i also read about the available synchronization primitives and started to code the following possible implementation. Again apologies if i'm stating the totally obvious, or stuff that's been done or planned already. My proposal to use a spinlock was probably rather stupid. Because of glXGetSyncValuesOML() - I830DRI2GetMSC - drmWaitVBlank - drm_wait_vblank - drm_vblank_count(), if multiple clients call glXGetSyncValuesOML() frequently, e.g., in a polling loop, i assume this could cause quite a bit of contention on a spinlock that must be acquired with minimal delay from the vblank irq handler. According to http://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO, critical sections protected by spinlock_t are preemptible if one uses a realtime kernel with preempt-rt patches applied, something i'd expect my users to do frequently. Maybe i overlook something, but this sounds unhealthy if it happens while drm_vblank_count() holds the lock? A lockless method would be the better solution. Seqlocks seem to be a good fit? There's only one writer per crtc, drm_handle_vblank() (and occassionally drm_update_vblank_count() if vblank irqs get reenabled) which only writes infrequently (60 - 200 times per second on typical displays). There can be many readers which can read very frequently, and the datastructure to read is relatively simple and free of pointers, so this fits the model of seqlocks. Maybe one could even do with the versions that don't disable irqs, i.e., write_seqlock() instead of write_seqlock_irqsave()? Documentation says that one should use the irq-safe versions if the seqlock might be accessed from an interrupt handler. I looked at the implementation of seqlocks and as far as i can see, deadlock can only happen if a writer gets preempted by an irq handler that then tries to either write or read the seqlock itself? But a _vblank_seqlock would only get accessed for write access from the interrupt handler for a given crtc. The only other place of write access, drm_update_vblank_count(), gets called infrequently and within a spin_lock_irqsave(dev-vbl_lock, irqflags);
Re: [RFC] DRI2 synchronization and swap bits
On Sun, 1 Nov 2009 21:46:45 +0100 Mario Kleiner mario.klei...@tuebingen.mpg.de wrote: I read this RFC and i'm very excited about the prospect of having well working support for the OML_sync_control extension in DRI2 on Linux/X11. I was hoping for this to happen since years, so a big thank you in advance! This is why i hope to provide some input from the perspective of future power-users of functions like glXGetSyncValuesOML(), glXSwapBuffersMscOML(), glXWaitForSbcOML. I'm the co-developer of a popular free-software toolkit (Psychtoolbox) that is used mostly in the neuroscience / cognitive science community by scientist to find out how the different senses (visual, auditory, haptic, ...) work and how they work together. Our requirements to graphics are often much more demanding than what a videogame, typical vr-environment or a mediaplayer has. Thanks a lot for taking time to go through this stuff, it's exactly the kind of feedback I was hoping for. Our users often have very strict requirements for scheduling frame- accurate and tear-free visual stimulus display, synchronizing bufferswaps across display-heads, and low-latency returns from swap- completion. Often they need swap-completion timestamps which are available with the shortest possible delay after a successfull swap and accurately tied to the vblank at which scanout of a swapped frame started. The need for timestamps with sub-millisecond accuracy is not uncommon. Therefore, well working OML_sync_control support would be basically a dream come true and a very compelling feature for Linux as a platform for cognitive science. Doing the wakeups within a millisecond should definitely be possible, I don't expect the context switch between display server and client would be *that* high of a cost (but as I said I'll benchmark). 2. On the CompositePage in the DRM Wiki, there is this comment: ...It seems that composited apps should never need to know about real world screen vblank issues, ... When dealing with a redirected window it seems it would be acceptable to come up with an entirely fake number for all existing extensions that care about vblanks.. I don't like this idea about entirely fake numbers and like to vote for a solution that is as close as possible to the non-redirected case. Most of our applications run in non-redirected, full-screen, undecorated, page-flipped windows, ie., without a compositor being involved. I can think of a couple future usage cases though where reasonably well working redirected/composited windows would be very useful for us, but only if we get meaningful timestamps and vblank counts that are tied to the actual display onset. The raw numbers will always be exposed to the compositor and probably to applications via an opt-out mechanism (to be defined still, we don't even have the extra compositor protocol defined). 3. The Wiki also mentions The direct rendered cases outlined in the implementation notes above are complete, but there's a bug in the async glXSwapBuffers that sometimes causes clients to hang after swapping rather than continue. Looking through the code of http:// cgit.freedesktop.org/~jbarnes/xf86-video-intel/tree/src/i830_dri.c? id=a0e2e624c47516273fa3d260b86d8c293e2519e4 i can see that in I830DRI2SetupSwap() and I830DRI2SetupWaitMSC(), in the if (divisor == 0) { ...} path, the functions return after DRM_VBLANK_EVENT submission without assigning *event_frame = vbl.reply.sequence; This looks problematic to me, as the xserver is later submitting event_frame in the call to DRI2AddFrameEvent() inside DRI2SwapBuffers () as a cookie to find the right events for clients to wait on? Could this be a reason for clients hanging after swap? I found a few other spots where i other misunderstood something or there are small bugs. What is the appropriate way to report these? This list is fine, thanks for checking it out. I'll fix that up. 4. According to spec, the different OML_sync_control functions do return a UST timestamp which is supposed to reflect the exact time of when the MSC last incremented, i.e., at the start of scanout of a new video frame. SBC and MSC are supposed to increment atomically/ simultaneously at swap completion, so the UST in the (UST,SBC,MSC) triplet is supposed to mark the time of transition of either MSC or MSC and SBC at swap completion. This makes a lot of sense to me, it is exactly the type of timestamp that our toolkit critically depends on. Ideally the UST timestamp should be corrected to reflect start of scanout, but a UST that is consistently taken at vblank interrupt time would do as well. In the current implementation this is *not* the semantic we'd get for UST timestamps. The I830DRI2GetMSC() call uses a call to drmWaitVBlank() and its returned vbl.reply.tval_sec and vbl.reply.tval_usec values for computing UST.
Re: [RFC] DRI2 synchronization and swap bits
On Fri, Oct 30, 2009 at 1:42 PM, Eric Anholt e...@anholt.net wrote: On Fri, 2009-10-30 at 10:59 -0700, Jesse Barnes wrote: I've put up some trees (after learning my lesson about working in the main tree) with the latest DRI2 sync/swap bits: git://git.freedesktop.org/home/jbarnes/xserver master branch git://git.freedesktop.org/home/jbarnes/mesa master branch They includes support for some new DRI2 requests (proto for which is in the dri2-swapbuffers branch of dri2proto), including: DRI2SwapBuffers DRI2GetMSC DRI2WaitMSC and DRI2WaitSBC These allow us to support GLX extensions like SGI_video_sync, OML_swap_control and SGI_swap_interval. There have been a few comments about the protocol so far: 1) DRI2SwapBuffers a) Concern about doing another round trip to fetch new buffers following the swap. I think this is a valid concern, we could potentially respond from the swap with the new buffers, but this would make some memory saving optimizations more difficult (e.g. freeing buffers if no drawing comes in for a short time after the swap). You're doing one round-trip anyway, and if users are concerned about the second one, go use XCB already. (We need to go fix Mesa to do that). DRI2SwapBuffers is a one-way request, but it's required to follow up with a DRI2GetBuffers. So it's only one round trip whether we use XCB or not. cheers, Kristian ___ xorg-devel mailing list xorg-devel@lists.x.org http://lists.x.org/mailman/listinfo/xorg-devel
Re: [RFC] DRI2 synchronization and swap bits
Hello everybody My name is Mario Kleiner and i'm new to this list, so i apologize beforehand should i violate some rules of netiquette, state the totally obvious, or if this post is somehow considered off-topic or way too long. Please tell me if so, and how to do better next time. First some background to why i am posting, then some proposals more to the point of this RFC. I read this RFC and i'm very excited about the prospect of having well working support for the OML_sync_control extension in DRI2 on Linux/X11. I was hoping for this to happen since years, so a big thank you in advance! This is why i hope to provide some input from the perspective of future power-users of functions like glXGetSyncValuesOML(), glXSwapBuffersMscOML(), glXWaitForSbcOML. I'm the co-developer of a popular free-software toolkit (Psychtoolbox) that is used mostly in the neuroscience / cognitive science community by scientist to find out how the different senses (visual, auditory, haptic, ...) work and how they work together. Our requirements to graphics are often much more demanding than what a videogame, typical vr-environment or a mediaplayer has. Our users often have very strict requirements for scheduling frame- accurate and tear-free visual stimulus display, synchronizing bufferswaps across display-heads, and low-latency returns from swap- completion. Often they need swap-completion timestamps which are available with the shortest possible delay after a successfull swap and accurately tied to the vblank at which scanout of a swapped frame started. The need for timestamps with sub-millisecond accuracy is not uncommon. Therefore, well working OML_sync_control support would be basically a dream come true and a very compelling feature for Linux as a platform for cognitive science. I spent the last 12 hours reading the CompositeSwap page at the DRI- Wiki and through Jesse Barnes git-tree and the drivers/gpu/drm/ drm_irq.c file in the linux-next git-tree at kernel org, which i assume (correctly?) is the current state of art wrt. to the DRM, and have some thoughts or wishes. 1. Wrt to 2) DRI2WaitMSC/SBC a) Concern about blocking the client on the server side as opposed to a client side wait. I'm not sure about the extra latency involved by blocking the client on the server side, instead of a client side wait, but i can assure you that for our applications, 1 millisecond extra delay between swap- completion and unblocking can make a significant difference. Quite often certain actions need to be triggered in sync with swap completion. Examples are starting recording equipment for brain activity (fMRI, EEG, MEG, eye-trackers) or other physiological responses, starting sound playback or recording, sending trigger packets over a network, driving special digital/analog I/O boards, driving motion simulators etc. So low-latency unblocking would be much appreciated from our side. 2. On the CompositePage in the DRM Wiki, there is this comment: ...It seems that composited apps should never need to know about real world screen vblank issues, ... When dealing with a redirected window it seems it would be acceptable to come up with an entirely fake number for all existing extensions that care about vblanks.. I don't like this idea about entirely fake numbers and like to vote for a solution that is as close as possible to the non-redirected case. Most of our applications run in non-redirected, full-screen, undecorated, page-flipped windows, ie., without a compositor being involved. I can think of a couple future usage cases though where reasonably well working redirected/composited windows would be very useful for us, but only if we get meaningful timestamps and vblank counts that are tied to the actual display onset. 3. The Wiki also mentions The direct rendered cases outlined in the implementation notes above are complete, but there's a bug in the async glXSwapBuffers that sometimes causes clients to hang after swapping rather than continue. Looking through the code of http:// cgit.freedesktop.org/~jbarnes/xf86-video-intel/tree/src/i830_dri.c? id=a0e2e624c47516273fa3d260b86d8c293e2519e4 i can see that in I830DRI2SetupSwap() and I830DRI2SetupWaitMSC(), in the if (divisor == 0) { ...} path, the functions return after DRM_VBLANK_EVENT submission without assigning *event_frame = vbl.reply.sequence; This looks problematic to me, as the xserver is later submitting event_frame in the call to DRI2AddFrameEvent() inside DRI2SwapBuffers () as a cookie to find the right events for clients to wait on? Could this be a reason for clients hanging after swap? I found a few other spots where i other misunderstood something or there are small bugs. What is the appropriate way to report these? 4. According to spec, the different OML_sync_control functions do return a UST timestamp which is supposed to reflect the exact
Re: [RFC] DRI2 synchronization and swap bits
On Fri, 30 Oct 2009 19:15:17 -0700 Keith Packard kei...@keithp.com wrote: Excerpts from Jesse Barnes's message of Fri Oct 30 10:59:08 -0700 2009: These allow us to support GLX extensions like SGI_video_sync, OML_swap_control and SGI_swap_interval. Let's get the protocol nailed down before we go into detailed code review. Besides, you need to rebase -i to get rid of the broken versions. Yeah, some merging/splitting of the commits is in order before merging it upstream. There have been a few comments about the protocol so far: 1) DRI2SwapBuffers a) Concern about doing another round trip to fetch new buffers following the swap. Do we want to deal with stereo here? I think the protocol is sufficient for that; it just requests the swap, so for stereo buffers both would be swapped. I think this is a valid concern, we could potentially respond from the swap with the new buffers, but this would make some memory saving optimizations more difficult (e.g. freeing buffers if no drawing comes in for a short time after the swap). Hrm. Ideally, we'd send back new buffer IDs but delay creation until someone accessed them. That would require kernel magic to create an un-realized buffer, but perhaps avoiding an explicit round trip per swap would be worth it? I don't see how we can avoid the round trip entirely, without sharing some state between the server and client (i.e. re-introducing the SAREA). I'll do some benchmarking of the proposed buffer freeing and see how bad it is. 2) DRI2WaitMSC/SBC a) Concern about blocking the client on the server side as opposed to a client side wait. So, some kind of cookie that you'd pass to the kernel for the wait instead of just blocking in the server? I can see a lot of uses for this kind of mechanism beyond X, which makes it somewhat more interesting to contemplate in this case. Oh we never block the server. The protocol here just tells the server when the client should be awakened again by passing a cookie. The open question is whether the server should be putting the client to sleep and waking it back up, or whether in the direct rendered case the client gets a cookie from the server and sleeps itself (in the aiglx case the server has to handle things regardless). The implementation tries to avoid blocking the clients at all for swap requests, only blocking them on wait requests that are specified to cause blocking. This should allay the concerns raised in the page flipping thread about unnecessary blocking of clients (that's left as an implementation detail for the drivers supporting these new functions). Do we have a driver which does this the 'right' way yet? The i915 page flipping code does this correctly, by marking the buffers in question busy and blocking the client in the kernel if it tries to access a busy buffer. For the windowed swap case we may have to block the client less nicely if we end up blitting between back front. (GL fences could make this better.) Jesse ___ xorg-devel mailing list xorg-devel@lists.x.org http://lists.x.org/mailman/listinfo/xorg-devel
[RFC] DRI2 synchronization and swap bits
I've put up some trees (after learning my lesson about working in the main tree) with the latest DRI2 sync/swap bits: git://git.freedesktop.org/home/jbarnes/xserver master branch git://git.freedesktop.org/home/jbarnes/mesa master branch They includes support for some new DRI2 requests (proto for which is in the dri2-swapbuffers branch of dri2proto), including: DRI2SwapBuffers DRI2GetMSC DRI2WaitMSC and DRI2WaitSBC These allow us to support GLX extensions like SGI_video_sync, OML_swap_control and SGI_swap_interval. There have been a few comments about the protocol so far: 1) DRI2SwapBuffers a) Concern about doing another round trip to fetch new buffers following the swap. I think this is a valid concern, we could potentially respond from the swap with the new buffers, but this would make some memory saving optimizations more difficult (e.g. freeing buffers if no drawing comes in for a short time after the swap). 2) DRI2WaitMSC/SBC a) Concern about blocking the client on the server side as opposed to a client side wait. I'm a bit less worried about this, since these type of waits typically aren't very latency sensitive. Doing the wait on the client side is definitely possible though if really desired, though it would split the request for the DRM event and its consumption across the server/client boundary (we'd have to return the sequence number to look for to the client, then have it poll the DRM fd). The implementation tries to avoid blocking the clients at all for swap requests, only blocking them on wait requests that are specified to cause blocking. This should allay the concerns raised in the page flipping thread about unnecessary blocking of clients (that's left as an implementation detail for the drivers supporting these new functions). As mentioned above, these changes require new dri2proto, but they also need libdrm master and airlied's drm-next branch, which has the vblank event bits that the new server code depends on. Thanks, -- Jesse Barnes, Intel Open Source Technology Center ___ xorg-devel mailing list xorg-devel@lists.x.org http://lists.x.org/mailman/listinfo/xorg-devel
Re: [RFC] DRI2 synchronization and swap bits
On Fri, 2009-10-30 at 10:59 -0700, Jesse Barnes wrote: I've put up some trees (after learning my lesson about working in the main tree) with the latest DRI2 sync/swap bits: git://git.freedesktop.org/home/jbarnes/xserver master branch git://git.freedesktop.org/home/jbarnes/mesa master branch They includes support for some new DRI2 requests (proto for which is in the dri2-swapbuffers branch of dri2proto), including: DRI2SwapBuffers DRI2GetMSC DRI2WaitMSC and DRI2WaitSBC These allow us to support GLX extensions like SGI_video_sync, OML_swap_control and SGI_swap_interval. There have been a few comments about the protocol so far: 1) DRI2SwapBuffers a) Concern about doing another round trip to fetch new buffers following the swap. I think this is a valid concern, we could potentially respond from the swap with the new buffers, but this would make some memory saving optimizations more difficult (e.g. freeing buffers if no drawing comes in for a short time after the swap). You're doing one round-trip anyway, and if users are concerned about the second one, go use XCB already. (We need to go fix Mesa to do that). -- Eric Anholt e...@anholt.net eric.anh...@intel.com signature.asc Description: This is a digitally signed message part ___ xorg-devel mailing list xorg-devel@lists.x.org http://lists.x.org/mailman/listinfo/xorg-devel
Re: [RFC] DRI2 synchronization and swap bits
On Fri, 30 Oct 2009 11:42:06 -0700 Eric Anholt e...@anholt.net wrote: On Fri, 2009-10-30 at 10:59 -0700, Jesse Barnes wrote: I've put up some trees (after learning my lesson about working in the main tree) with the latest DRI2 sync/swap bits: git://git.freedesktop.org/home/jbarnes/xserver master branch git://git.freedesktop.org/home/jbarnes/mesa master branch They includes support for some new DRI2 requests (proto for which is in the dri2-swapbuffers branch of dri2proto), including: DRI2SwapBuffers DRI2GetMSC DRI2WaitMSC and DRI2WaitSBC These allow us to support GLX extensions like SGI_video_sync, OML_swap_control and SGI_swap_interval. There have been a few comments about the protocol so far: 1) DRI2SwapBuffers a) Concern about doing another round trip to fetch new buffers following the swap. I think this is a valid concern, we could potentially respond from the swap with the new buffers, but this would make some memory saving optimizations more difficult (e.g. freeing buffers if no drawing comes in for a short time after the swap). You're doing one round-trip anyway, and if users are concerned about the second one, go use XCB already. (We need to go fix Mesa to do that). Yeah, I don't think it's a huge deal, but every context switch we add is that much more overhead (especially for low end platforms). -- Jesse Barnes, Intel Open Source Technology Center ___ xorg-devel mailing list xorg-devel@lists.x.org http://lists.x.org/mailman/listinfo/xorg-devel
Re: [RFC] DRI2 synchronization and swap bits
Excerpts from Jesse Barnes's message of Fri Oct 30 10:59:08 -0700 2009: These allow us to support GLX extensions like SGI_video_sync, OML_swap_control and SGI_swap_interval. Let's get the protocol nailed down before we go into detailed code review. Besides, you need to rebase -i to get rid of the broken versions. There have been a few comments about the protocol so far: 1) DRI2SwapBuffers a) Concern about doing another round trip to fetch new buffers following the swap. Do we want to deal with stereo here? I think this is a valid concern, we could potentially respond from the swap with the new buffers, but this would make some memory saving optimizations more difficult (e.g. freeing buffers if no drawing comes in for a short time after the swap). Hrm. Ideally, we'd send back new buffer IDs but delay creation until someone accessed them. That would require kernel magic to create an un-realized buffer, but perhaps avoiding an explicit round trip per swap would be worth it? We can even make the Xlib API asynchronous in this case; just requires a bit of hackery to post an async reply handler and then a function to collect the async reply data. 2) DRI2WaitMSC/SBC a) Concern about blocking the client on the server side as opposed to a client side wait. So, some kind of cookie that you'd pass to the kernel for the wait instead of just blocking in the server? I can see a lot of uses for this kind of mechanism beyond X, which makes it somewhat more interesting to contemplate in this case. The implementation tries to avoid blocking the clients at all for swap requests, only blocking them on wait requests that are specified to cause blocking. This should allay the concerns raised in the page flipping thread about unnecessary blocking of clients (that's left as an implementation detail for the drivers supporting these new functions). Do we have a driver which does this the 'right' way yet? -- keith.pack...@intel.com signature.asc Description: PGP signature ___ xorg-devel mailing list xorg-devel@lists.x.org http://lists.x.org/mailman/listinfo/xorg-devel