Hi James, I'm not a video expert, so excuse me if my view is too simplistic. What I write is what I'd call common sense, and I'm sure literature would have specialized algorithms for exactly the things we need, like prediction.
But, I don't want to tie the protocol into any certain algorithm. The protocol should be as simple as possible while carrying all the hard facts, and below I am arguing that I do not see other facts we could have than the ones already mentioned, and I believe those facts are sufficient for a sophisticated algorithm to estimate everything necessary in a client. On Thu, 17 Oct 2013 21:34:02 +0100 James Courtier-Dutton <james.dut...@gmail.com> wrote: > The key point I was trying to make is that the media player needs to be > able to predict when frames will be displayed. Yes, the *player* needs to be able to predict, and we aim to give it the historical information to do exactly that. > All its prediction calculations will be based on a fixed scanout rate. Why only fixed scanout rate? Why can't you dynamically adapt to the latency between requested and realized presentation time, over time? There are lots of ways to predict based on past measurements and taking account uncertainty, Kalman filter is the first one to come my mind, but I guess there are better methods for meeting a deadline/cycle kind of scenarios. > If the scanout rate changes, all bets are off, and we have to start again > from scratch trying to predict the scanouts again. > So, with regards to the scanout rate changing, we will need an event to > tell us that so we can redo the prediction calculations. Yes, we have thought about such an event, and it seems a good idea. > So, if the scanouts are T=100 and then T=200, we would have to have somehow > predicted that, and as a result set the > buffer1, T=100 > buffer2, T=200 > or in actual fact, to allow for inaccuracy the media player would set: > buffer1, T = 80 > buffer2, T = 180 > This will then ensure that buffer1 would be displayed at T=100, and buffer2 > would be displayed at T=200. > > But we would only know to set T=80 for buffer1 if we had been able to > predict when the actual frame will be. > So, I guess I am still asking for an API that would return these predicted > values from the graphics driver. I don't think drivers predict anything, do they? Right now, drivers only tell you what happened (the presentation timestamp), if even that. AFAIK, most popular drivers do not currently even have queues. You can schedule *one* page flip at a time, and you cannot even cancel that later if you wanted to, from what I've heard. In the future I'm hoping we would have drivers where you could actually queue buffers for hardware overlays, so the buffer queue does not need to live completely in the userspace on the mercy of scheduling hickups and system load. But even then I can't imagine drivers doing any predictions, they would just try to meet requested presentation times. If drivers predict, they will do it internally only to make sure the requested presentation time is realized. *How* it is realized is a good question (never show before the requested time? or make sure is on screen at the requested time? or...) The compositor is similar to drivers in my opinion. If it tries to predict something, it will only do that to be able to meet the requested presentation times. Who knows, maybe predictions in the compositor and drivers would actually make prediction in the clients worse by introducing more variables. In any case, I do not see the compositor being responsible for predicting times for clients' buffers. In my view, it is clients' resposibility to predict in any way they want, based on hard facts of the past given by the compositor and drivers. The client (player) is the only one knowing everything relevant, like the original video frame rate and audio position, and whether it wants to prefer showing frames early, late, when to skip, etc. If you are concerned about video start, where you do not yet have measurements available to predict from, then yeah, all you have is the current monitor framerate. But after the first frame has been presented, you get the first presentation timestamp, which should allow you to adapt to the phase of the scanout cycle. Right? And of course repeat that "coarse calibration" every time scanout rate (monitor refresh rate) changes. While the presentation loop is running, you could theoretically adjust the gap between requested and realized presentation timestamps. If the gap gets too small and the frames start to miss their intended scanout cycle, you see it in the realized timestamps, and can back off. It is a trial and error, but I'm hoping it will take only a few frames in the beginning to reach a steady state and lock the gap length, after which the predictor in the player should be in sync. > Also, If buffer2 has T=200. What is the 200 being compared to in order to > decide to display the buffer or not? > This will not be the timestamp on the presentation callback will it? Yes, it will, if you mean clock domains. There needs to be a way to have all timestamps related to frames in the same clock domain, of course. An idea we also already discussed but didn't mention yet, is the compositor to tell clients which clock domain it uses for all frame timing purposes. Then clients can ask the current time directly from the kernel if they need it, and relate it to the audio clock. It is a good question, what does the requested presentation time actually mean. Should the system guarantee, that the frame is already on screen at that time, or that it is not shown before that time? Luckily, the realized presentation time can be defined exactly, e.g. the time when the hardware starts to scan out the new image. The realized time is the one that matters, and the specific meaning of requested time is not so important, since a client can adapt. No? > There is also the matter of clock sync. Is T=100 referenced to the system > monotonic clock or some other clock. > Video and Audio sync is not achieved by comparing Video and Audio time > stamps. > You have a global system monotonic clock, and sync Video to the system > clock, and sync Audio to the system clock. > A good side effect of this is that Audio and Video are then in sync. > The advantage of this syncing to the system monotonic clock is that you can > then run Video and Audio in separate thread, different CPUs, whenever, but > they will always be in sync. Some GStreamer people have told me that they actually prefer to use the audio clock as the master clock, and infer everything else from that. Anyway, yes, clock domains are important. I would like to know if I have understood something wrong in this whole video thing. Thanks, pq PS. please use reply-to-all. _______________________________________________ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel