Re: [RFC v2] Wayland presentation extension (video protocol)
On Mon, 24 Feb 2014 23:25:18 +0100 Mario Kleiner mario.kleiner...@gmail.com wrote: On 21/02/14 09:36, Pekka Paalanen wrote: ... Atm. i have to verify on specific X-Server / ddx / Linux kernel versions, at least for the most common setups i care about, because there's always potential for bugs, so doing that for a few compositors would be the same thing. The toolkit itself has a lot of paranoid startup and runtime checks, so it can detect various bugs and alert the user, which then hopefully alerts me. Users with high timing precision demands also perform independent verification with external hardware equipment, e.g., photo-diodes to attach to monitors etc., as another layer of testing. That equipment is often just not practical for production use, so they may test after each hardware/os/driver/toolkit upgrade, but then trust the system between configuration changes. Ok, sounds like you got it covered. :-) ... You could have your desktop session on a different VT than the experiment program, and switch between. Or have multiple outputs, some dedicated for the experiment, others for the desktop. The same for input devices. Or if your infrastructure allows, have X11, Wayland, and direct DRM/KMS backends choosable at runtime. But yes, it starts getting complicated. Always depends how easy that is for the user. E.g., if there's one dual-head gpu installed in the machine, i don't know if it would be easily possible to have one display output controlled by a compositor, but the other output controlled directly by a gbm/drm/kms client. In the past only one client could be a drm master on a drm device file. In the glorious future it should be possible, but I believe there is still lots of work to do before it's a reality. I think there is (was?) work going on in splitting a card into several KMS nodes by heads in the kernel. The primary use case of that is multi-seat, one machine with several physical seats for users. Anyway, as soon as my average user has to start touching configuration files, it gets complicated. Especially if different distros use different ways of doing it. Yeah, configuration is an issue. chop Thank you for explaining your use case and user base at length, it really makes me understand where you come from. I think. :-) On Thu, 20 Feb 2014 04:56:02 +0100 Mario Kleiner mario.kleiner...@gmail.com wrote: On 17/02/14 14:12, Pekka Paalanen wrote: On Mon, 17 Feb 2014 01:25:07 +0100 Mario Kleiner mario.kleiner...@gmail.com wrote: ... Yes, makes sense. The drm/kms timestamps are defined as first pixel of frame leaves the graphics cards output connector aka start of active scanout. In the time of CRT monitors, that was ~ turns to light. In my field CRT monitors are still very popular and actively hunted for because of that very well defined timing behaviour. Or very expensive displays which have custom made panel or dlp controllers with well defined timing specifically for research use. Ah, I didn't know about that the DRM vblank timestamps were defined that way, very cool. The definition is that of OML_sync_control, so that spec could be enabled in an as conformant way as possible. In practice only the kms-drivers with the high precision timestamping (i915, radeon, nouveau) will do precisely that. Other drivers just take a timestamp at vblank irq time, so it's somewhere after vblank onset and could be off in case of delayed irq execution, preemption etc. I'm happy that you do not see the turns to light definition as problematic. It was one of the open questions, whether to use turns to light or the OML_sync_control the first pixel going out of the gfx card connector. turns to light is what we ideally want, but the OML_sync_control definition is the best approximation of that if the latency of the display itself is unknown - and spot on in a world of CRT monitors. Cool. It still leaves us the problem that is a monitor lies about its latency, and a compositor uses that information, it'll be off, and no-one would know unless they actually measured it with some special equipment. Good thing that your professional users know to measure it. If the compositor knew the precise monitor latency it could add that as a constant offset to those timestamps. Do you know of reliable ways to get this info from any common commercial display equipment? Apple's OSX has API in their CoreVideo framework for getting that number, and i implement it in the OSX backend of my toolkit, but i haven't ever seen that function returning anything else than undefined from any display? I believe the situation is like with any such information blob (EDID, BIOS tables, ...): hardware manufacturers just scribble something that usually works for the usual cases of Microsoft Windows, and otherwise it's garbage or not set. So, I think in theory there was some spec that allows to
Re: [RFC v2] Wayland presentation extension (video protocol)
On 21/02/14 09:36, Pekka Paalanen wrote: On Fri, 21 Feb 2014 06:40:02 +0100 Mario Kleiner mario.kleiner...@gmail.com wrote: On 20/02/14 12:07, Pekka Paalanen wrote: Hi Mario, Ok, now i am magically subscribed. Thanks to the moderator! Cool, I can start trimming out parts of the email. :-) I have replies to your comments below, but while reading what you said, I started wondering whether Wayland would be good for you after all. It seems that your timing sensitive experiment programs and you and your developers use a great deal of effort into - detecting the hardware and drivers, - determining how the display server works, so that you can - try to make it do exactly what you want, and - detect if it still does not do exactly like you want it and bail, while also - trying to make sure you get the right timing feedback from the kernel unmangled. Yes. It's trust buf verify. If i know that the api / protocol is well defined and suitable for my purpose and have verified that at least the reference compositor implements the protocol correctly then i can at least hope that all other compositors are also implemented correctly, so stuff should work as expected. And i can verify that at least some subset of compositors really works, and try to submit bug reports or patches if they don't. I don't think we can make the Wayland protocol definition strict enough that you could just rely on other compositors implementing it the same way as Weston. We don't really want to restrict the implementations too much with generic protocol interfaces. Therefore I think you will need to test and validate not only every compositor, but possibly also their different releases. Depends what you mean with strict enough. Well defined is good enough. E.g., the level in your presentation extension RFC is good enough because it defines how the compositor should treat the target presentation timestamps precisely enough so i as a client implementer know how to specify the timestamps to get well defined behavior on a compositor that implements the protocol correctly on a system that is configured properly and not overloaded. If a compositor doesn't conform i can always try to bug the developers or submit patches myself to fix it. If otoh the protocol would be too vague, any behavior of the compositor would be consistent with the spec and i'd not even have a point submitting a bug report, or i wouldn't know in the first place how to code my client. Or if your protocol specifies timestamps with nanosecond precision and recommends that they should have an accuracy of = 1 msec and define the moment when pixels turn to light, that's fine. If it wouldn't define what the timestamps actually mean or not recommend any good minimum precision, that would be troublesome. Atm. i have to verify on specific X-Server / ddx / Linux kernel versions, at least for the most common setups i care about, because there's always potential for bugs, so doing that for a few compositors would be the same thing. The toolkit itself has a lot of paranoid startup and runtime checks, so it can detect various bugs and alert the user, which then hopefully alerts me. Users with high timing precision demands also perform independent verification with external hardware equipment, e.g., photo-diodes to attach to monitors etc., as another layer of testing. That equipment is often just not practical for production use, so they may test after each hardware/os/driver/toolkit upgrade, but then trust the system between configuration changes. Adding something special and optional for compositors to implement, with very strict implementation requirements would be possible, but with the caveat of not everyone implementing it. Yes, that would be a problem. My hope is that as different compositors get implemented and the mandatory protocol is well defined and reasonable in its requests for precision etc., people will implement it that way. I'd also hope that if the reference compositor has a well working and accurate implementation of such stuff then other compositors would mostly stay close to the implementation of the reference compositor if feasible, if only to save the developers of those compositors some time and headaches and maintenance overhead. Sounds like the display server is a huge source of problems to you, but I am not quite sure how running on top a display server benefits you. Your experiment programs want to be in precise control, get accurate timings, and they are always fullscreen. Your users / test subjects never switch away from the program while it's running, you don't need windowing or multi-tasking, AFAIU, nor any of the application interoperability features that are the primary features of a display server. They are fullscreen and timing sensitive in probably 95% of all typical application cases during actual production use while experiments are run. But some applications need the toolkit to
Re: [RFC v2] Wayland presentation extension (video protocol)
On Fri, 21 Feb 2014 06:40:02 +0100 Mario Kleiner mario.kleiner...@gmail.com wrote: On 20/02/14 12:07, Pekka Paalanen wrote: Hi Mario, Ok, now i am magically subscribed. Thanks to the moderator! Cool, I can start trimming out parts of the email. :-) I have replies to your comments below, but while reading what you said, I started wondering whether Wayland would be good for you after all. It seems that your timing sensitive experiment programs and you and your developers use a great deal of effort into - detecting the hardware and drivers, - determining how the display server works, so that you can - try to make it do exactly what you want, and - detect if it still does not do exactly like you want it and bail, while also - trying to make sure you get the right timing feedback from the kernel unmangled. Yes. It's trust buf verify. If i know that the api / protocol is well defined and suitable for my purpose and have verified that at least the reference compositor implements the protocol correctly then i can at least hope that all other compositors are also implemented correctly, so stuff should work as expected. And i can verify that at least some subset of compositors really works, and try to submit bug reports or patches if they don't. I don't think we can make the Wayland protocol definition strict enough that you could just rely on other compositors implementing it the same way as Weston. We don't really want to restrict the implementations too much with generic protocol interfaces. Therefore I think you will need to test and validate not only every compositor, but possibly also their different releases. Adding something special and optional for compositors to implement, with very strict implementation requirements would be possible, but with the caveat of not everyone implementing it. Sounds like the display server is a huge source of problems to you, but I am not quite sure how running on top a display server benefits you. Your experiment programs want to be in precise control, get accurate timings, and they are always fullscreen. Your users / test subjects never switch away from the program while it's running, you don't need windowing or multi-tasking, AFAIU, nor any of the application interoperability features that are the primary features of a display server. They are fullscreen and timing sensitive in probably 95% of all typical application cases during actual production use while experiments are run. But some applications need the toolkit to present in regular windows and GUI thingys, a few even need compositing to combine my windows with windows of other apps. Some setups run multi-display, where some displays are used for fullscreen stimulus presentation to the tested person, but another display may be used for control/feedback or during debugging by the experimenter, in which case the regular desktop GUI and UI of the scripting environment is needed on that display. One popular case during debugging is having a half-transparent fullscreen window for stimulus presentation, but behind that window the whole regular GUI with the code editor and debugger in the background, so one can set breakpoints etc. - The window is made transparent for mouse and keyboard input, so users can interact with the editor. So in most cases i need a display server running, because i sometimes need compositing and i often need a fully functional GUI during at the at least 50% of the work time where users are debugging and testing their code and also don't want to be separated from their e-mail clients and web browsers etc. during that time. You could have your desktop session on a different VT than the experiment program, and switch between. Or have multiple outputs, some dedicated for the experiment, others for the desktop. The same for input devices. Or if your infrastructure allows, have X11, Wayland, and direct DRM/KMS backends choosable at runtime. But yes, it starts getting complicated. Why not take the display server completely out of the equation? I understand that some years ago, it would probably not have been feasible and X11 was the de facto interface to do any graphics. However, it seems you are already married to DRM/KMS so that you get accurate timing feedback, so why not port your experiment programs (the framework) directly on top of DRM/KMS instead of Wayland? Yes and no. DRM/KMS will be the most often used one and is the best bet if i need timing control and it's the one i'm most familiar with. I also want to keep the option of running on other backends if timing is not of much importance, or if it can be improved on them, should the need arise. With Mesa EGL and GBM, you can still use hardware accelerated openGL if you want to, but you will also be in explicit control of when you push that rendered buffer into KMS for display. Software
Re: [RFC v2] Wayland presentation extension (video protocol)
Hi Mario, I have replies to your comments below, but while reading what you said, I started wondering whether Wayland would be good for you after all. It seems that your timing sensitive experiment programs and you and your developers use a great deal of effort into - detecting the hardware and drivers, - determining how the display server works, so that you can - try to make it do exactly what you want, and - detect if it still does not do exactly like you want it and bail, while also - trying to make sure you get the right timing feedback from the kernel unmangled. Sounds like the display server is a huge source of problems to you, but I am not quite sure how running on top a display server benefits you. Your experiment programs want to be in precise control, get accurate timings, and they are always fullscreen. Your users / test subjects never switch away from the program while it's running, you don't need windowing or multi-tasking, AFAIU, nor any of the application interoperability features that are the primary features of a display server. Why not take the display server completely out of the equation? I understand that some years ago, it would probably not have been feasible and X11 was the de facto interface to do any graphics. However, it seems you are already married to DRM/KMS so that you get accurate timing feedback, so why not port your experiment programs (the framework) directly on top of DRM/KMS instead of Wayland? With Mesa EGL and GBM, you can still use hardware accelerated openGL if you want to, but you will also be in explicit control of when you push that rendered buffer into KMS for display. Software rendering by direct pixel poking is also possible and at the end you just push that buffer to KMS as usual too. You do not need any graphics card specific code, it is all abstracted in the public APIs offered by Mesa and libdrm, e.g. GBM. The new libinput should make hooking into input devices much less painful, etc. All this thanks to Wayland, because on Wayland, there is no single the server like the X.org X server is. There will be lots of servers and each one needs the same infrastructure you would need to run without a display server. No display server obfuscating your view to the hardware, no compositing manager fiddling with your presentation, and most likely no random programs hogging the GPU at random times. Would the trade-off not be worth it? Of course your GUI tools and apps could continue using a display server and would probably like to be ported to be Wayland compliant, I'm just suggesting this for the sensitive experiment programs. Would this be possible for your infrastructure? On Thu, 20 Feb 2014 04:56:02 +0100 Mario Kleiner mario.kleiner...@gmail.com wrote: On 17/02/14 14:12, Pekka Paalanen wrote: On Mon, 17 Feb 2014 01:25:07 +0100 Mario Kleiner mario.kleiner...@gmail.com wrote: Hello Pekka, i'm not yet subscribed to wayland-devel, and a bit short on time atm., so i'll take a shortcut via direct e-mail for some quick feedback for your Wayland presentation extension v2. Hi Mario, I'm very happy to hear from you! I have seen your work fly by on dri-devel@ (IIRC) mailing list, and when I was writing the RFCv2 email, I was thinking whether I should personally CC you. Sorry I didn't. I will definitely include you on v3. I hope you don't mind me adding wayland-devel@ to CC, your feedback is much appreciated and backs up my design nicely. ;-) Hi again, still not subscribed, but maybe the server accepts the e-mail to wayland-devel anyway as i'm subscribed to other xorg lists? I got an invalid captcha error when trying to subscribe, despite no captcha was ever presented to me? You may need to cc wayland-devel again for me. I guess the web interface is still down. Emailing wayland-devel-subscr...@lists.freedesktop.org should do, if I recall the address correctly. And someone might process the moderation queue, too. No worries anyway, I won't cut anything from you out, so it's all copied below. 1. Wrt. an additional preroll_feedback request http://lists.freedesktop.org/archives/wayland-devel/2014-January/013014.html, essentially the equivalent of glXGetSyncValuesOML(), that would be very valuable to us. ... Indeed, the preroll_feedback request was modeled to match glXGetSyncValuesOML. Do you need to be able to call GetSyncValues at any time and have it return ASAP? Do you call it continuously, and even between frames? Yes, at any time, even between frames, with a return asap. This is driven by the specific needs of user code, e.g., to poll for a vblank, or to establish a baseline of current (msc, ust) for timing stuff relative. Psychtoolbox is an extension to a scripting language, so usercode often decides how this is used. Internally to the toolkit these calls are used on X11/GLX to translate target timestamps into target vblank counts for glXSwapBufferMscOML(),
Re: [RFC v2] Wayland presentation extension (video protocol)
On 20/02/14 12:07, Pekka Paalanen wrote: Hi Mario, Ok, now i am magically subscribed. Thanks to the moderator! I have replies to your comments below, but while reading what you said, I started wondering whether Wayland would be good for you after all. It seems that your timing sensitive experiment programs and you and your developers use a great deal of effort into - detecting the hardware and drivers, - determining how the display server works, so that you can - try to make it do exactly what you want, and - detect if it still does not do exactly like you want it and bail, while also - trying to make sure you get the right timing feedback from the kernel unmangled. Yes. It's trust buf verify. If i know that the api / protocol is well defined and suitable for my purpose and have verified that at least the reference compositor implements the protocol correctly then i can at least hope that all other compositors are also implemented correctly, so stuff should work as expected. And i can verify that at least some subset of compositors really works, and try to submit bug reports or patches if they don't. Sounds like the display server is a huge source of problems to you, but I am not quite sure how running on top a display server benefits you. Your experiment programs want to be in precise control, get accurate timings, and they are always fullscreen. Your users / test subjects never switch away from the program while it's running, you don't need windowing or multi-tasking, AFAIU, nor any of the application interoperability features that are the primary features of a display server. They are fullscreen and timing sensitive in probably 95% of all typical application cases during actual production use while experiments are run. But some applications need the toolkit to present in regular windows and GUI thingys, a few even need compositing to combine my windows with windows of other apps. Some setups run multi-display, where some displays are used for fullscreen stimulus presentation to the tested person, but another display may be used for control/feedback or during debugging by the experimenter, in which case the regular desktop GUI and UI of the scripting environment is needed on that display. One popular case during debugging is having a half-transparent fullscreen window for stimulus presentation, but behind that window the whole regular GUI with the code editor and debugger in the background, so one can set breakpoints etc. - The window is made transparent for mouse and keyboard input, so users can interact with the editor. So in most cases i need a display server running, because i sometimes need compositing and i often need a fully functional GUI during at the at least 50% of the work time where users are debugging and testing their code and also don't want to be separated from their e-mail clients and web browsers etc. during that time. Why not take the display server completely out of the equation? I understand that some years ago, it would probably not have been feasible and X11 was the de facto interface to do any graphics. However, it seems you are already married to DRM/KMS so that you get accurate timing feedback, so why not port your experiment programs (the framework) directly on top of DRM/KMS instead of Wayland? Yes and no. DRM/KMS will be the most often used one and is the best bet if i need timing control and it's the one i'm most familiar with. I also want to keep the option of running on other backends if timing is not of much importance, or if it can be improved on them, should the need arise. With Mesa EGL and GBM, you can still use hardware accelerated openGL if you want to, but you will also be in explicit control of when you push that rendered buffer into KMS for display. Software rendering by direct pixel poking is also possible and at the end you just push that buffer to KMS as usual too. You do not need any graphics card specific code, it is all abstracted in the public APIs offered by Mesa and libdrm, e.g. GBM. The new libinput should make hooking into input devices much less painful, etc. All this thanks to Wayland, because on Wayland, there is no single the server like the X.org X server is. There will be lots of servers and each one needs the same infrastructure you would need to run without a display server. No display server obfuscating your view to the hardware, no compositing manager fiddling with your presentation, and most likely no random programs hogging the GPU at random times. Would the trade-off not be worth it? I thought about EGL/GBM etc. as a last resort for especially demanding cases, timing-wise. But given that the good old X-Server was good enough for almost everything so far, i'd expect Wayland to perform as good as or better timing-wise. If that turns out to be true, it would be good enough for hopefully almost all use cases, with all the benefits of compositing and GUI support when
Re: [RFC v2] Wayland presentation extension (video protocol)
On Mon, 17 Feb 2014 03:23:40 + Zhang, Xiong Y xiong.y.zh...@intel.com wrote: On Thu, 2014-01-30 at 17:35 +0200, Pekka Paalanen wrote: Hi, it's time for a take two on the Wayland presentation extension. 1. Introduction The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. All the changed details are probably too much to describe here, so it is maybe better to look at this as a new proposal. It still does build on Frederic's work, and everyone who commented on it. Special thanks to Axel Davy for his counter-proposal and fighting with me on IRC. :-) Some highlights: - Accurate presentation feedback is possible also without queueing. - You can queue also EGL-based rendering, and get presentation feedback if you want. Also EGL can do this internally, too, as long as EGL and the app do not try to use queueing at the same time. - More detailed presentation feedback to better allow predicting future display refreshes. - If wl_viewport is used, neither video resolution changes nor surface (window) size changes alone require clearing the queue. Video can continue playing even during resizes. Sorry, I can't understand this. Could you explain this more? What's the current problem for resizing window? How will you resolve it in presentation extensions? Hi, the presentation extension adds a buffer queue to wl_surfaces. The compositor autonomously takes buffers from that queue, and applies them to the surface when it deems the time is right. The client is notified about this only after the fact. Without wl_viewport, the problem with resizing is, that for the client to do a guaranteed glitch-free resize, it has to: 1. Discard_queue, so that the compositor stops processing the queue. 2. Wait for the feedback to arrive, so that the client knows exactly which buffer (frame) is on screen. 3. Re-draw the frame in the new size. 4. Start queueing frames in the new size again. As you see, that algorithm requires the client and the server to synchronize by flushing out the whole queue and re-queue everything, because the surface size is determined from the buffer dimensions. Step 2 is also important to avoid going backwards in the video; that is accidentally showing a frame from an earlier time than what the compositor already put on screen. If the client wants to change either the surface size, or the buffer size, (well, the same thing really, without wl_viewport), it has to do this synchronization dance. It will have a high risk of causing a jerk in the video playback. Using a sub-surface with the proper sub-surface resizing algorithm, and re-queueing in reverse chronological order would mitigate the problem, but still have a considerable risk of producing a jerk in playback. The solution I propose, that comes from using wl_viewport is, that a client can resize the surface, or can resize the buffers independently without having to discard_queue. If the client needs to synchronize surface state updates to a particular buffer (frame), then it still needs to do the synchronization dance. However, video content is usually (right?) of fixed size and then gets scaled to a right size for display. Video players would use wl_viewport for that anyway. The added benefit explicitly enabled by the split to surface vs. buffer state in the presentation protocol is, that you can do surface (window) resizing without any synchronization wrt. the buffer queue. The compositor will always scale the buffer to the right size for the window. You do not need to send discard_queue or know what buffer is currently presenting. And you don't need to re-draw the buffers either when the window size changes. OTOH, if the window size is supposed to stay the same while the video resolution changes (for instance, QoS with a live video stream), the client still does not need to synchronize: just keep on queueing buffers like it always does, and the compositor takes care of the appropriate scaling. (When the compositor takes care of scaling with wl_viewport, it may also be able to use overlay hardware to do superior scaling quality with little cost.) Therefore, I designed the presentation extension to have this, at first perhaps rather surprising, explicit split between surface and buffer state to specifically enable this co-operation with wl_viewport. Note, that this extension now re-specifies what happens when a normal wl_surface.commit is not preceded by a wl_surface.attach! I think I need to really emphasize that on a follow-up email. Thanks, pq ___ wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
On Mon, 17 Feb 2014 01:25:07 +0100 Mario Kleiner mario.kleiner...@gmail.com wrote: Hello Pekka, i'm not yet subscribed to wayland-devel, and a bit short on time atm., so i'll take a shortcut via direct e-mail for some quick feedback for your Wayland presentation extension v2. Hi Mario, I'm very happy to hear from you! I have seen your work fly by on dri-devel@ (IIRC) mailing list, and when I was writing the RFCv2 email, I was thinking whether I should personally CC you. Sorry I didn't. I will definitely include you on v3. I hope you don't mind me adding wayland-devel@ to CC, your feedback is much appreciated and backs up my design nicely. ;-) I'm the main developer of a FOSS toolkit called Psychtoolbox-3 (www.psychtoolbox.org) which is used a lot by neuro-scientists for presentation of visual and auditory stimuli, mostly for basic and clinical brain research. As you can imagine, very good audio-video sync and precisely timed and time stamped visual stimulus presentation is very important to us. We currently use X11 + GLX + OML_sync_control extension to present OpenGL rendered frames precisely on the classic X-Server + DRI2 + DRM/KMS. We need frame accurate presentation and sub-millisecond accurate presentation timestamps. For this reason i was quite a bit involved in development and testing of DRI2/DRM/KMS functionality related to page-flipping and time stamping. One of the next steps for me will be to implement backends into Psychtoolbox for DRI3/Present and for Wayland, so i was very happy when i stumbled over your RFC for the Wayland presentation extension. Core Wayland protocol seems to only support millisecond accurate 32-Bit timestamps, which are not good enough for many neuro-science applications. So first off, thank you for your work on this! Right, and those 32-bit timestamps are not even guaranteed to be related to any particular output refresh cycle wrt. what you have presented before. Anyway, i read the mailing list thread and had some look over the proposed patches in your git branch, and this mostly looks very good to me :). I haven't had time to test any of this, or so far to play even with the basic Wayland/Weston, but i thought i give you some quick feedback from the perspective of a future power-user who will certainly use this extension heavily for very timing sensitive applications. Excellent! 1. Wrt. an additional preroll_feedback request http://lists.freedesktop.org/archives/wayland-devel/2014-January/013014.html, essentially the equivalent of glXGetSyncValuesOML(), that would be very valuable to us. To answer the question you had on #dri-devel on this: DRM/KMS implements an optional driver hook that, if implemented by a KMS driver, allows to get the vblank count and timestamps at any time with almost microsecond precision, even if vblank irq's were disabled for extended periods of time, without the need to wait for the next vblank irq. This is implemented on intel-kms and radeon-kms for all Intel/AMD/ATI gpu's since around October 2010 or Linux 2.6.35 iirc. It will be supported for nouveau / NVidia desktop starting with the upcoming Linux 3.14. I have verified with external high precision measurement equipment that those timestamps for vblank and kms-pageflip completion are accurate down to better than 20 usecs on those drivers. If a kms driver doesn't implement the hook, as is currently the case for afaik all embedded gpu drivers, then the vblank count will be reported instantaneously, but the vblank timestamp will be reported as zero until the first vblank irq has happened if irq's were turned off. For reference: http://lxr.free-electrons.com/source/drivers/gpu/drm/drm_irq.c#L883 Indeed, the preroll_feedback request was modeled to match glXGetSyncValuesOML. Do you need to be able to call GetSyncValues at any time and have it return ASAP? Do you call it continuously, and even between frames? Or would it be enough for you to present a dummy frame, and just wait for the presentation feedback as usual? Since you are asking, I guess this is not enough. We could take it even further if you need to monitor the values continuously. We could add a protocol interface, where you could subscribe to an event, per-surface or maybe per-output, whenever the vblank counter increases. If the compositor is not in a continuous repaint loop, it could use a timer to approximately sample the values (ask DRM), or whatever. The benefit would be that we would avoid a roundtrip for each query, as the compositor is streaming the events to your app. Do you need the values so often that this would be worth it? For special applications this would be ok, as in those cases power consumption is not an issue. 2. As far as the decision boundary for your presentation target timestamps. Fwiw, the only case i know of NV_present_video extension, and the way i expose it to user code in my toolkit
Re: [RFC v2] Wayland presentation extension (video protocol)
On Thu, 2014-01-30 at 17:35 +0200, Pekka Paalanen wrote: Hi, it's time for a take two on the Wayland presentation extension. 1. Introduction The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. All the changed details are probably too much to describe here, so it is maybe better to look at this as a new proposal. It still does build on Frederic's work, and everyone who commented on it. Special thanks to Axel Davy for his counter-proposal and fighting with me on IRC. :-) Some highlights: - Accurate presentation feedback is possible also without queueing. - You can queue also EGL-based rendering, and get presentation feedback if you want. Also EGL can do this internally, too, as long as EGL and the app do not try to use queueing at the same time. - More detailed presentation feedback to better allow predicting future display refreshes. - If wl_viewport is used, neither video resolution changes nor surface (window) size changes alone require clearing the queue. Video can continue playing even during resizes. Sorry, I can't understand this. Could you explain this more? What's the current problem for resizing window? How will you resolve it in presentation extensions? thanks Thanks, pq ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
I think I was wrong to say the display time would always be = to the time the client specifies. It would be rounded just like you are saying, the nearest start time would be rounded to the nearest output frame start time and thus could be earlier. I tend to think of a frame as covering a period of time. Ie I don't say it is centered at a given time, instead I tend to think of it as *starting* at a time and having a length. Therefore I see your scheme as you are required to add T/2 or you will be early. I believe it would be less confusing to describe the algorithm as start times rather than middle points, primarily because it will line up your 'P' points with the green lines, and because it makes it easier to talk about actual wall time (ie the client cannot do anything about the past so this is a time that always describes a start). But I also believe the result will be an identical algorithm so if you think otherwise I don't really see a problem. Special effects are rather inconsistent. For sound and computed motion, and often for motion blur they tend to think of the time being at the start of the frame. But keyframed animation and tracking tend to think of the time as being in the middle of a frame, primarily because the user wants to place something at a point, and not have to set two keys who's average is that point. Pekka Paalanen wrote: Ok, so what you are suggesting here is that we should change the whole design to always have presentation come late with a mean delay of half the refresh period (T/2) and the amount of delay being between 0 and T. Just like you say, then the client will have to arbitrarily guess and subtract T/2 always to be able to target the vblank at a P. Also note, that since presentation feedback reports the time P, not P - T/2, the clients really need to do the subtraction instead of just queueing updates with target time t = P + n * T. ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
On Wed, 12 Feb 2014 11:27:12 -0800 Bill Spitzak spit...@gmail.com wrote: I think I was wrong to say the display time would always be = to the time the client specifies. It would be rounded just like you are saying, the nearest start time would be rounded to the nearest output frame start time and thus could be earlier. I tend to think of a frame as covering a period of time. Ie I don't say it is centered at a given time, instead I tend to think of it as *starting* at a time and having a length. That is exactly how I think of it, a frame period begins at time P[n]. The tick of P[n] is not in the middle of a frame period, it is at the *beginning* of the frame period. The same for the T ticks, they mark the intended begin time, IOW the presentation time. Therefore I see your scheme as you are required to add T/2 or you will be early. No. I believe it would be less confusing to describe the algorithm as start times rather than middle points, primarily because it will line It *is* described as start times. up your 'P' points with the green lines, and because it makes it The green lines are NOT frame boundaries. The green lines give the range that gets rounded to a particular stating time P. Guess I should not have made them vertical, but triangles with the peak at the tick P[n]. easier to talk about actual wall time (ie the client cannot do anything about the past so this is a time that always describes a start). But I also believe the result will be an identical algorithm so if you think otherwise I don't really see a problem. Special effects are rather inconsistent. For sound and computed motion, and often for motion blur they tend to think of the time being at the start of the frame. But keyframed animation and tracking tend to think of the time as being in the middle of a frame, primarily because the user wants to place something at a point, and not have to set two keys who's average is that point. All timestamps are starting times already. The only thing being at a mid-point (a green line) is where the result of the rounding switches from one P to another. - pq Pekka Paalanen wrote: Ok, so what you are suggesting here is that we should change the whole design to always have presentation come late with a mean delay of half the refresh period (T/2) and the amount of delay being between 0 and T. Just like you say, then the client will have to arbitrarily guess and subtract T/2 always to be able to target the vblank at a P. Also note, that since presentation feedback reports the time P, not P - T/2, the clients really need to do the subtraction instead of just queueing updates with target time t = P + n * T. ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
On Mon, 10 Feb 2014 09:23:12 -0600 Jason Ekstrand ja...@jlekstrand.net wrote: On Mon, Feb 10, 2014 at 3:53 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Sat, 8 Feb 2014 15:23:29 -0600 Jason Ekstrand ja...@jlekstrand.net wrote: Pekka, First off, I think you've done a great job over-all. I think it will both cover most cases and work well I've got a few comments below. Thank you for the review. :-) Replies below. On Thu, Jan 30, 2014 at 9:35 AM, Pekka Paalanen ppaala...@gmail.com wrote: Hi, it's time for a take two on the Wayland presentation extension. 1. Introduction The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. ... My one latent concern is that I still don't think we're entirely handling the case that QtQuick wants. What they want is to do their rendering a few frames in advance in case of CPU/GPU jitter. Technically, this extension handles this by the client simply doing a good job of guessing presentation times on a one-per-frame baseis. However, it doesn't allow for any damage tracking. In the case of QtQuick they want a linear queue of buffers where no buffer ever gets skipped. In this case, you could do damage tracking by allowing it to accumulate from one frame to another and you get all of the damage-tracking advantages that you had before. I'm not sure how much this matters, but it might be worth thinking about it. Does it really want to display *every* frame regardless of time? It doesn't matter that if a deadline is missed, the animation slows down rather than jumps to keep up with intended velocity? That is my understanding of how it works now. I *think* they figure the compositor isn't the bottle-kneck and that it will git its 60 FPS. That said, I don't actually work on QtQuick. I'm just trying to make sure they don't get completely left out in the cold. Axel has a good point, cannot this be just done client side and immediate updates based on frame callbacks? Probably not. They're using GLES and EGL so they can't draw early and just stash the buffer. Oh yeah, I just realized that. I hope Axel's suggestion works out, or that they actually want the timestamp queue semantics rather than present-every-frame-really-no-skipping. But if they want the every-frame semantics, then I think that needs to be another extension. And then the interactions with immediate commits and timestamp queued updates gets fun. If there is a problem in using frame callbacks for that, that is more likely a problem in the compositor's frame scheduling than the protocol. The problem with damage tracking, why I did not take damage as queued state, is that it is given in surface coordinates. This will become a problem during resizes, where the surface size changes, and wl_viewport is used to decouple the content from the surface space. The separation makes sense. If we queue damage, we basically need to queue also surface resizes. Without wl_viewport this is what happens automatically, as surface size is taken from the buffer size. However, in the proposed design, the purpose of wl_viewport is to decouple the surface size from buffer size, so that they can change independently. The use case is live video: if you resize the window, you don't want to redo the video frames, because that would likely cause a glitch. Also if the video resolution changes on the fly, by e.g. stream quality control, you don't need to do anything extra to keep the window at the old size. Damage is a property of the content update, yes, but we have it defined in surface coordinates, so when surface and buffer sizes change asynchronously, the damage region would be incorrect. The downside is indeed that we lose damage information for queued buffers. This is a deliberate design choice, since the extension was designed primarily for video where usually the whole surface gets damaged. Yeah, I think you made the right call on this one. Queueing buffers in a completely serial fassion really does seem to be a special case. Trying to do damage tracking for an arbitrary queue would very quickly get insane. Plus all the other problems you mentioned. But, I guess we could add another request, presentation.damage, to give the damage region in buffer coordinates. Would it be worth it? Well, I don't think the damage tracking would get particularly nasty. You queue damage with the buffers, apply the damage (and convert to surface space) when
Re: [RFC v2] Wayland presentation extension (video protocol)
On Mon, 10 Feb 2014 12:20:00 -0800 Bill Spitzak spit...@gmail.com wrote: Pekka Paalanen wrote: This algorithm aims to start showing an update between t-T/2 and t+T/2, which means that presentation may occur a little early or late, with an average of zero. Another option would be to show the update between t and t+T, which would mean that presentation would be always late with an average of T/2. I think there would be a lot less confusion if this was described using the t,t+1 model. I think in your diagram it would move the 'P' line .5 to the right so they line up with the green lines, and all the red arrows would tilt to the right. It makes no difference to the result (the same frames are selected) but I think makes it a lot easier to describe. Another reason is that media starts at time 0, not time -.5*frame. Hmm, I'm not sure. The green lines are not frame boundaries, they are decision boundaries. In the picture, the points P are the exact time when a framebuffer flip happens, which means that the hardware starts to scan out a new image. Each image is shown for the interval P[n]..P[n+1], not the interval between green lines. So the green lines only divide the queue axis into intervals, that get assigned to a particular P. Both axes are the same time axis, with units of nanoseconds which are not marked. The black ticks on both axes denote when a new frame begins. Did we have some confusion here? - pq ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
On Tue, 11 Feb 2014 12:03:41 -0800 Bill Spitzak spit...@gmail.com wrote: I think in absolute time you are right, the P points do not move. Instead everything else would move left by .5, resulting in the same image I described with the vertical arrows always tilting to the right. The green decision lines move left by .5 to align with the P points since the decision is strictly whether a T is between two P. The T points move left by .5 because the client will have to timestamp everything .5 earlier. I still feel it would be less confusing to avoid negative numbers and to match what I am pretty certain is mainstream usage of integer timestamps on frames. Ok, so what you are suggesting here is that we should change the whole design to always have presentation come late with a mean delay of half the refresh period (T/2) and the amount of delay being between 0 and T. Just like you say, then the client will have to arbitrarily guess and subtract T/2 always to be able to target the vblank at a P. Also note, that since presentation feedback reports the time P, not P - T/2, the clients really need to do the subtraction instead of just queueing updates with target time t = P + n * T. To avoid is hassle with subtract T/2 or you will always be late, I deliberately chose to have the mean delay zero, which means that compared to a given target timestamp, presentation may occur between t - T/2 and t + T/2, i.e. half a period early or late. And this was not even my idea, Axel Davy pointed it out to me when I was first going for the always late case. So this is not just a technicality in drawing a diagram, no, this would affect how clients need to be programmed. And doing it your way would IMO be more complicated than my way, as you need to account for the T/2 instead of using the presentation feedback timestamps directly and extrapolating with an integer multiple of the refresh period. The timestamps may be integers, but they are in nanoseconds, so dividing the refresh period by 2 is not a problem at all. Besides, in your suggestion, the clients would need to compute T/2 and we would have to document that always subtract T/2 from your target timestamps so you can on average have zero latency with presentation compared to your actual target times. I just want that if a client estimates that a vblank will happen at time t, and it queues a frame with target time t well in advance, it will get presented at the vblank at time t, not the vblank *after* t. Your suggestion of t,t+1 may make sense if we queued updates with target time given in MSC, but here we use UST-like clock values and can do better, because the values are not limited to integers (integer multiples of the refresh period). This is probably the source of your uneasyness: all prior art that I know of uses frame counters, not a clock, so yes, this is a different design. Why have this different design? That question I replied in my original RFCv2 email, the section 3. Why UST all the way?. Thanks, pq Pekka Paalanen wrote: On Mon, 10 Feb 2014 12:20:00 -0800 Bill Spitzak spit...@gmail.com wrote: Pekka Paalanen wrote: This algorithm aims to start showing an update between t-T/2 and t+T/2, which means that presentation may occur a little early or late, with an average of zero. Another option would be to show the update between t and t+T, which would mean that presentation would be always late with an average of T/2. I think there would be a lot less confusion if this was described using the t,t+1 model. I think in your diagram it would move the 'P' line .5 to the right so they line up with the green lines, and all the red arrows would tilt to the right. It makes no difference to the result (the same frames are selected) but I think makes it a lot easier to describe. Another reason is that media starts at time 0, not time -.5*frame. Hmm, I'm not sure. The green lines are not frame boundaries, they are decision boundaries. In the picture, the points P are the exact time when a framebuffer flip happens, which means that the hardware starts to scan out a new image. Each image is shown for the interval P[n]..P[n+1], not the interval between green lines. So the green lines only divide the queue axis into intervals, that get assigned to a particular P. Both axes are the same time axis, with units of nanoseconds which are not marked. The black ticks on both axes denote when a new frame begins. Did we have some confusion here? - pq ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
On Sat, 8 Feb 2014 15:23:29 -0600 Jason Ekstrand ja...@jlekstrand.net wrote: Pekka, First off, I think you've done a great job over-all. I think it will both cover most cases and work well I've got a few comments below. Thank you for the review. :-) Replies below. On Thu, Jan 30, 2014 at 9:35 AM, Pekka Paalanen ppaala...@gmail.com wrote: Hi, it's time for a take two on the Wayland presentation extension. 1. Introduction The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. All the changed details are probably too much to describe here, so it is maybe better to look at this as a new proposal. It still does build on Frederic's work, and everyone who commented on it. Special thanks to Axel Davy for his counter-proposal and fighting with me on IRC. :-) Some highlights: - Accurate presentation feedback is possible also without queueing. - You can queue also EGL-based rendering, and get presentation feedback if you want. Also EGL can do this internally, too, as long as EGL and the app do not try to use queueing at the same time. - More detailed presentation feedback to better allow predicting future display refreshes. - If wl_viewport is used, neither video resolution changes nor surface (window) size changes alone require clearing the queue. Video can continue playing even during resizes. ... interface name=presentation version=1 description summary=timed presentation related wl_surface requests The main features of this interface are accurate presentation timing feedback, and queued wl_surface content updates to ensure smooth video playback while maintaining audio/video synchronization. Some features use the concept of a presentation clock, which is defined in presentation.clock_id event. Requests 'feedback' and 'queue' can be regarded as additional wl_surface methods. They are part of the double-buffered surface state update mechanism, where other requests first set up the state and then wl_surface.commit atomically applies the state into use. In other words, wl_surface.commit submits a content update. Interface wl_surface has requests to set surface related state and buffer related state, because there is no separate interface for buffer state alone. Queueing requires separating the surface from buffer state, and buffer state can be queued while surface state cannot. Buffer state includes the wl_buffer from wl_surface.attach, the state assigned by wl_surface requests frame, set_buffer_transform and set_buffer_scale, and any buffer-related state from extensions, for instance wl_viewport.set_source. This state is inherent to the buffer and the content update, rather than the surface. Surface state includes all other state associated with wl_surfaces, like the x,y arguments of wl_surface.attach, input and opaque regions, damage, and extension state like wl_viewport.destination. In general, anything expressed in surface local coordinates is better as surface state. The standard way of posting new content to a surface using the wl_surface requests damage, attach, and commit is called immediate content submission. This happens when a presentation.queue request has not been sent since the last wl_surface.commit. The new way of posting a content update is a queued content update submission. This happens on a wl_surface.commit when a presentation.queue request has been sent since the last wl_surface.commit. Queued content updates do not get applied immediately in the compositor but are pushed to a queue on receiving the wl_surface.commit. The queue is ordered by the submission target timestamp. Each item in the queue contains the wl_buffer, the target timestamp, and all the buffer state as defined above. All the queued state is taken from the pending wl_surface state at the time of the commit, exactly like an immediate commit would have taken it. For instance on a queueing commit, the pending buffer is queued and no buffer is pending afterwards. The stored values of the x,y parameters of wl_surface.attach are reset to zero, but they also are not queued; queued content updates do not carry the attach offsets. All other surface state (that is not queued), e.g. damage,
Re: [RFC v2] Wayland presentation extension (video protocol)
On Sat, 8 Feb 2014 15:39:57 -0600 Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Feb 5, 2014 at 1:32 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Thu, 30 Jan 2014 17:35:17 +0200 Pekka Paalanen ppaala...@gmail.com wrote: Hi, it's time for a take two on the Wayland presentation extension. 1. Introduction The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. ... interface name=presentation version=1 request name=destroy type=destructor request name=feedback request name=queue request name=discard_queue event name=clock_id /interface interface name=presentation_feedback version=1 request name=destroy type=destructor event name=sync_output event name=presented event name=discarded /interface Hi, another random thought; should we support queueing frames (buffers) in reverse chronological order? It's not really possible with the scheduling algorithm I wrote down in the spec. There is no timestamp associated with the currently showing content, which means that if you queue a frame with a target timestamp in the past, it will replace the current content, even if the current content was originally queued with a later target timestamp. I wonder if we could define, that the current content effectively has the timestamp of when it was presented, and all queued updates with an earlier target timestamp will be discarded. That should work, right? Now, is there a corner case... output update has been submitted to hardware but has not been presented yet, which means the content in flight has no timestamp determined yet... but we won't update the output again before the update in flight has completed, which gives the presented timestamp for the was-in-flight current content. If we do need the timestamp for content in flight, we could use the target timestamp it had when queued, or the timestamp the compositor is targeting. Since clients have a choice between queued and immediate updates, I guess using the compositor's target timestamp would be better defined. Opinions? I agree. The current frame (or frame for the currently pending flip) should be treated as if it has a timestamp of its expected next presentation time. I'm still not completely understanding the algorithm for presenting stuff correctly, but it should be possible for the client to sneak a frame in for the next present. I need some time at my chalkboard to try and get my head wrapped around all this. Maybe it would be good if you made a couple of little timeline pictures to go with it? Hi, I took the advice and drew a picture, attached. The picture is only about choosing the right update from the queue for the coming output refresh, and ignores the surface's existing content. A compositor has a queue with updates T1..T5. The current time is indicated in the picture, and the next output refresh will be P4. From the queue, T4 will be selected, and T1, T2, and T3 are discarded. Both T3 and T4 target the same flip, so the one with the latter timestamp wins. T5 is too far in the future. The decision boundaries are at the middle point between the two refresh times. I drew this without rechecking my spec wording, this is what I meant there IIRC. The already presented content update's timestamp can only be in the past, so it does not affect the drawn situation. However, if the queue contained only T2, and the current content was presented at P2, then T2 would be presented at P4. This could happen if the client queues T2 after P3 has passed[1]. OTOH, if the queue had only T2 and the current content was at P3, then T2 should be discarded. Right? This is what I need to fix in the spec wording. I hope this clarifies the algorithm. Does it make sense? This algorithm aims to start showing an update between t-T/2 and t+T/2, which means that presentation may occur a little early or late, with an average of zero. Another option would be to show the update between t and t+T, which would mean that presentation would be always late with an average of T/2. I would assume that zero mean would be better, but this investigation does not consider the jitter when trying present at a constant frequency that may or may not be the same as the output refresh frequency. Once we get all this implemented, we can tune the algorithm based on experimental data. I also expect that Weston may need to start delaying its repaints rather than repaint immediately after finish_frame, for clients that use immediate
Re: [RFC v2] Wayland presentation extension (video protocol)
On Mon, Feb 10, 2014 at 3:53 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Sat, 8 Feb 2014 15:23:29 -0600 Jason Ekstrand ja...@jlekstrand.net wrote: Pekka, First off, I think you've done a great job over-all. I think it will both cover most cases and work well I've got a few comments below. Thank you for the review. :-) Replies below. On Thu, Jan 30, 2014 at 9:35 AM, Pekka Paalanen ppaala...@gmail.com wrote: Hi, it's time for a take two on the Wayland presentation extension. 1. Introduction The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. All the changed details are probably too much to describe here, so it is maybe better to look at this as a new proposal. It still does build on Frederic's work, and everyone who commented on it. Special thanks to Axel Davy for his counter-proposal and fighting with me on IRC. :-) Some highlights: - Accurate presentation feedback is possible also without queueing. - You can queue also EGL-based rendering, and get presentation feedback if you want. Also EGL can do this internally, too, as long as EGL and the app do not try to use queueing at the same time. - More detailed presentation feedback to better allow predicting future display refreshes. - If wl_viewport is used, neither video resolution changes nor surface (window) size changes alone require clearing the queue. Video can continue playing even during resizes. ... interface name=presentation version=1 description summary=timed presentation related wl_surface requests The main features of this interface are accurate presentation timing feedback, and queued wl_surface content updates to ensure smooth video playback while maintaining audio/video synchronization. Some features use the concept of a presentation clock, which is defined in presentation.clock_id event. Requests 'feedback' and 'queue' can be regarded as additional wl_surface methods. They are part of the double-buffered surface state update mechanism, where other requests first set up the state and then wl_surface.commit atomically applies the state into use. In other words, wl_surface.commit submits a content update. Interface wl_surface has requests to set surface related state and buffer related state, because there is no separate interface for buffer state alone. Queueing requires separating the surface from buffer state, and buffer state can be queued while surface state cannot. Buffer state includes the wl_buffer from wl_surface.attach, the state assigned by wl_surface requests frame, set_buffer_transform and set_buffer_scale, and any buffer-related state from extensions, for instance wl_viewport.set_source. This state is inherent to the buffer and the content update, rather than the surface. Surface state includes all other state associated with wl_surfaces, like the x,y arguments of wl_surface.attach, input and opaque regions, damage, and extension state like wl_viewport.destination. In general, anything expressed in surface local coordinates is better as surface state. The standard way of posting new content to a surface using the wl_surface requests damage, attach, and commit is called immediate content submission. This happens when a presentation.queue request has not been sent since the last wl_surface.commit. The new way of posting a content update is a queued content update submission. This happens on a wl_surface.commit when a presentation.queue request has been sent since the last wl_surface.commit. Queued content updates do not get applied immediately in the compositor but are pushed to a queue on receiving the wl_surface.commit. The queue is ordered by the submission target timestamp. Each item in the queue contains the wl_buffer, the target timestamp, and all the buffer state as defined above. All the queued state is taken from the pending wl_surface state at the time of the commit, exactly like an immediate commit would have taken it. For instance on a queueing commit, the pending buffer is queued and no buffer is pending afterwards. The stored values of the x,y parameters of wl_surface.attach are
Re: [RFC v2] Wayland presentation extension (video protocol)
Pekka Paalanen wrote: This algorithm aims to start showing an update between t-T/2 and t+T/2, which means that presentation may occur a little early or late, with an average of zero. Another option would be to show the update between t and t+T, which would mean that presentation would be always late with an average of T/2. I think there would be a lot less confusion if this was described using the t,t+1 model. I think in your diagram it would move the 'P' line .5 to the right so they line up with the green lines, and all the red arrows would tilt to the right. It makes no difference to the result (the same frames are selected) but I think makes it a lot easier to describe. Another reason is that media starts at time 0, not time -.5*frame. ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
More comments! On Fri, Jan 31, 2014 at 7:29 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Thu, 30 Jan 2014 17:35:17 +0200 Pekka Paalanen ppaala...@gmail.com wrote: The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. ... interface name=presentation_feedback version=1 description summary=presentation time feedback event A presentation_feedback object returns the feedback information about a wl_surface content update becoming visible to the user. One object corresponds to one content update submission (wl_surface.commit), queued or immediate. There are two possible outcomes: the content update may be presented to the user, in which case the presentation timestamp is delivered. Otherwise, the content update is discarded, and the user never had a chance to see it before it was superseded or the surface was destroyed. Once a presentation_feedback object has delivered an event, it becomes inert, and should be destroyed by the client. /description request name=destroy type=destructor description summary=destroy presentation feedback object The object is destroyed. If a feedback event had not been delivered yet, it is cancelled. /description /request event name=sync_output description summary=presentation synchronized to this output As presentation can be synchronized to only one output at a time, this event tells which output it was. This event is only sent prior to the presented event. As clients may bind to the same global wl_output multiple times, this event is sent for each bound instance that matches the synchronized output. If a client has not bound to the right wl_output global at all, this event is not sent. /description arg name=output type=object interface=wl_output summary=presentation output/ /event event name=presented description summary=the content update was displayed The associated content update was displayed to the user at the indicated time (tv_sec, tv_nsec). For the interpretation of the timestamp, see presentation.clock_id event. The timestamp corresponds to the time when the content update turned into light the first time on the surface's main output. Compositors may approximate this from the framebuffer flip completion events from the system, and the latency of the physical display path if known. This event is preceeded by all related sync_output events telling which output's refresh cycle the feedback corresponds to, i.e. the main output for the surface. Compositors are recommended to choose to the output containing the largest part of the wl_surface, or keeping the output they previously chose. Having a stable presentation output association helps clients to predict future output refreshes (vblank). Argument 'refresh' gives the compositor's prediction of how many nanoseconds after tv_sec, tv_nsec the very next output refresh may occur. This is to further aid clients in predicting future refreshes, i.e., estimating the timestamps targeting the next few vblanks. If such prediction cannot usefully be done, the argument is zero. The 64-bit value combined from seq_hi and seq_lo is the value of the output's vertical retrace counter when the content update was first scanned out to the display. This value must be compatible with the definition of MSC in GLX_OML_sync_control specification. Note, that if the display path has a non-zero latency, the time instant specified by this counter may differ from the timestamp's. If the output does not have a constant refresh rate, explicit video mode switches excluded, then the refresh argument must be zero. If the output does not have a concept of vertical retrace or a refresh cycle, or the output device is self-refreshing without a way to query the refresh count, then the arguments seq_hi and seq_lo must be zero. /description arg name=tv_sec type=uint summary=seconds part of the presentation timestamp/ arg name=tv_nsec type=uint summary=nanoseconds part of the presentation timestamp/ arg name=refresh type=uint
Re: [RFC v2] Wayland presentation extension (video protocol)
On Wed, Feb 5, 2014 at 1:32 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Thu, 30 Jan 2014 17:35:17 +0200 Pekka Paalanen ppaala...@gmail.com wrote: Hi, it's time for a take two on the Wayland presentation extension. 1. Introduction The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. ... interface name=presentation version=1 request name=destroy type=destructor request name=feedback request name=queue request name=discard_queue event name=clock_id /interface interface name=presentation_feedback version=1 request name=destroy type=destructor event name=sync_output event name=presented event name=discarded /interface Hi, another random thought; should we support queueing frames (buffers) in reverse chronological order? It's not really possible with the scheduling algorithm I wrote down in the spec. There is no timestamp associated with the currently showing content, which means that if you queue a frame with a target timestamp in the past, it will replace the current content, even if the current content was originally queued with a later target timestamp. I wonder if we could define, that the current content effectively has the timestamp of when it was presented, and all queued updates with an earlier target timestamp will be discarded. That should work, right? Now, is there a corner case... output update has been submitted to hardware but has not been presented yet, which means the content in flight has no timestamp determined yet... but we won't update the output again before the update in flight has completed, which gives the presented timestamp for the was-in-flight current content. If we do need the timestamp for content in flight, we could use the target timestamp it had when queued, or the timestamp the compositor is targeting. Since clients have a choice between queued and immediate updates, I guess using the compositor's target timestamp would be better defined. Opinions? I agree. The current frame (or frame for the currently pending flip) should be treated as if it has a timestamp of its expected next presentation time. I'm still not completely understanding the algorithm for presenting stuff correctly, but it should be possible for the client to sneak a frame in for the next present. I need some time at my chalkboard to try and get my head wrapped around all this. Maybe it would be good if you made a couple of little timeline pictures to go with it? --Jason Ekstrand I think I should fix it like that. Isn't queueing (writing into the audio scanout buffer) audio samples in reverse chronological order the proper method to update audio content on the fly with minimal umm... latency? Wonder if some video-like playback would benefit from a similar algorithm, which minimizes latency(?) or the difference to wall time at the cost of possibly skipping older new updates. Errm, to avoid shifting the content on the time axis. Or something. Thanks, pq ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
Hi, On 08/02/2014, Jason Ekstrand wrote : For each surface with queued content updates and matching main output, the compositor picks the update with the highest timestamp no later than a half frame period after the predicted presentation time. The intent is to pick the content update whose target timestamp as rounded to the output refresh period granularity matches the same display update as the compositor is targeting, while not displaying any content update more than a I'm not really following 100% here. It's not your fault, this is just a terribly awkward sort of thing to try and put into English. It sounds to me like the following: If P0 is the time of the next present and P1 is the time of the one after that, you look for the largest thing less than the average of P1 and P2. Is this correct? Why go for the average? The client is going to have to adjust anyway. If you target t, and P0 and P1 are possible pageflips time, if P0t(1/2)P0+(1/2)P1 then you take the pageflip at P0 if (1/2)P0+(1/2)P1tP1 then you take the pageflip at P1 That way the length of the intersection of the interval (t,t+time between two pageflips) and 'time interval at which it is displayed' is maximized. half frame period too early. If all the updates in the queue are already late, the highest timestamp update is taken regardless of how late it is. Once an update in a queue has been chosen, all remaining updates with an earlier timestamp in the queue are discarded. Ok, I think what you are saying works. Again, it's difficult to parse but these things always are. My one latent concern is that I still don't think we're entirely handling the case that QtQuick wants. What they want is to do their rendering a few frames in advance in case of CPU/GPU jitter. Technically, this extension handles this by the client simply doing a good job of guessing presentation times on a one-per-frame baseis. However, it doesn't allow for any damage tracking. In the case of QtQuick they want a linear queue of buffers where no buffer ever gets skipped. In this case, you could do damage tracking by allowing it to accumulate from one frame to another and you get all of the damage-tracking advantages that you had before. I'm not sure how much this matters, but it might be worth thinking about it. If they really want to work that way, why not doing this queue client side? It doesn't need to be done in the compositor. Hope that helps, --Jason Ekstrand Axel Davy ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
OK, that makes more sense. Thank you for stating it in terms of intervals. I still need to think about it a bit more. On Feb 8, 2014 4:14 PM, Axel Davy axel.d...@ens.fr wrote: On 08/02/2014, Axel Davy wrote : Hi, On 08/02/2014, Jason Ekstrand wrote : For each surface with queued content updates and matching main output, the compositor picks the update with the highest timestamp no later than a half frame period after the predicted presentation time. The intent is to pick the content update whose target timestamp as rounded to the output refresh period granularity matches the same display update as the compositor is targeting, while not displaying any content update more than a I'm not really following 100% here. It's not your fault, this is just a terribly awkward sort of thing to try and put into English. It sounds to me like the following: If P0 is the time of the next present and P1 is the time of the one after that, you look for the largest thing less than the average of P1 and P2. Is this correct? Why go for the average? The client is going to have to adjust anyway. If you target t, and P0 and P1 are possible pageflips time, if P0t(1/2)P0+(1/2)P1 then you take the pageflip at P0 if (1/2)P0+(1/2)P1tP1 then you take the pageflip at P1 That way the length of the intersection of the interval (t,t+time between two pageflips) and 'time interval at which it is displayed' is maximized. Well it isn't really the reason why this is choosen (else one might say it would be better to maximize with (t-T/2,t+T/2) with T the time between two pageflips.). The reason is more that you want to minimize the time at when the pageflip happens and t, so the real presentation time and t doesn't differ more than T/2. Axel Davy ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
On Thu, 30 Jan 2014 17:35:17 +0200 Pekka Paalanen ppaala...@gmail.com wrote: Hi, it's time for a take two on the Wayland presentation extension. 1. Introduction The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. ... interface name=presentation version=1 request name=destroy type=destructor request name=feedback request name=queue request name=discard_queue event name=clock_id /interface interface name=presentation_feedback version=1 request name=destroy type=destructor event name=sync_output event name=presented event name=discarded /interface Hi, another random thought; should we support queueing frames (buffers) in reverse chronological order? It's not really possible with the scheduling algorithm I wrote down in the spec. There is no timestamp associated with the currently showing content, which means that if you queue a frame with a target timestamp in the past, it will replace the current content, even if the current content was originally queued with a later target timestamp. I wonder if we could define, that the current content effectively has the timestamp of when it was presented, and all queued updates with an earlier target timestamp will be discarded. That should work, right? Now, is there a corner case... output update has been submitted to hardware but has not been presented yet, which means the content in flight has no timestamp determined yet... but we won't update the output again before the update in flight has completed, which gives the presented timestamp for the was-in-flight current content. If we do need the timestamp for content in flight, we could use the target timestamp it had when queued, or the timestamp the compositor is targeting. Since clients have a choice between queued and immediate updates, I guess using the compositor's target timestamp would be better defined. Opinions? I think I should fix it like that. Isn't queueing (writing into the audio scanout buffer) audio samples in reverse chronological order the proper method to update audio content on the fly with minimal umm... latency? Wonder if some video-like playback would benefit from a similar algorithm, which minimizes latency(?) or the difference to wall time at the cost of possibly skipping older new updates. Errm, to avoid shifting the content on the time axis. Or something. Thanks, pq ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [RFC v2] Wayland presentation extension (video protocol)
On Thu, 30 Jan 2014 17:35:17 +0200 Pekka Paalanen ppaala...@gmail.com wrote: The v1 proposal is here: http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html In v2 the basic idea is the same: you can queue frames with a target presentation time, and you can get accurate presentation feedback. All the details are new, though. The re-design started from the wish to handle resizing better, preferably without clearing the buffer queue. ... interface name=presentation_feedback version=1 description summary=presentation time feedback event A presentation_feedback object returns the feedback information about a wl_surface content update becoming visible to the user. One object corresponds to one content update submission (wl_surface.commit), queued or immediate. There are two possible outcomes: the content update may be presented to the user, in which case the presentation timestamp is delivered. Otherwise, the content update is discarded, and the user never had a chance to see it before it was superseded or the surface was destroyed. Once a presentation_feedback object has delivered an event, it becomes inert, and should be destroyed by the client. /description request name=destroy type=destructor description summary=destroy presentation feedback object The object is destroyed. If a feedback event had not been delivered yet, it is cancelled. /description /request event name=sync_output description summary=presentation synchronized to this output As presentation can be synchronized to only one output at a time, this event tells which output it was. This event is only sent prior to the presented event. As clients may bind to the same global wl_output multiple times, this event is sent for each bound instance that matches the synchronized output. If a client has not bound to the right wl_output global at all, this event is not sent. /description arg name=output type=object interface=wl_output summary=presentation output/ /event event name=presented description summary=the content update was displayed The associated content update was displayed to the user at the indicated time (tv_sec, tv_nsec). For the interpretation of the timestamp, see presentation.clock_id event. The timestamp corresponds to the time when the content update turned into light the first time on the surface's main output. Compositors may approximate this from the framebuffer flip completion events from the system, and the latency of the physical display path if known. This event is preceeded by all related sync_output events telling which output's refresh cycle the feedback corresponds to, i.e. the main output for the surface. Compositors are recommended to choose to the output containing the largest part of the wl_surface, or keeping the output they previously chose. Having a stable presentation output association helps clients to predict future output refreshes (vblank). Argument 'refresh' gives the compositor's prediction of how many nanoseconds after tv_sec, tv_nsec the very next output refresh may occur. This is to further aid clients in predicting future refreshes, i.e., estimating the timestamps targeting the next few vblanks. If such prediction cannot usefully be done, the argument is zero. The 64-bit value combined from seq_hi and seq_lo is the value of the output's vertical retrace counter when the content update was first scanned out to the display. This value must be compatible with the definition of MSC in GLX_OML_sync_control specification. Note, that if the display path has a non-zero latency, the time instant specified by this counter may differ from the timestamp's. If the output does not have a constant refresh rate, explicit video mode switches excluded, then the refresh argument must be zero. If the output does not have a concept of vertical retrace or a refresh cycle, or the output device is self-refreshing without a way to query the refresh count, then the arguments seq_hi and seq_lo must be zero. /description arg name=tv_sec type=uint summary=seconds part of the presentation timestamp/ arg name=tv_nsec type=uint summary=nanoseconds part of the presentation timestamp/ arg name=refresh type=uint summary=nanoseconds till next refresh/ arg name=seq_hi type=uint summary=high 32 bits of refresh counter/ arg name=seq_lo type=uint summary=low 32 bits