I'll just add a couple of things to what Dave has said. There are two sources of latency in a video codec. The first is out-of-order encoding - which is what Dave was talking about. This causes latency because a decoder has to wait to assemble all the missing pieces before being able to display anything.
You do out of order coding because it's more efficient. For example, if you have 3 frames F1 F2 F3 you can code F1 without reference to anything ("Intra" coding), then code F3 by predicting from F1 and then code F2 by predicting from _both_ F1 and F2. This causes delay, because in coded order you have F1 F3 F2 and you have to wait for F2 _and_ F3 before you can decode and display F2. But this is more efficient than making coded order the same as display order, since by being able to predict from both sides for F2 you make the number of bits for F2 _much_ smaller. So the low-latency Schro mode that Dave is talking about would code everything in display order F1 F2 F3 with Fn predicted from Fn-1. This has an efficiency cost of maybe 10% or more over other ways of arranging the pictures. There is a second cause of latency, though, and this is bit rate variation. For example, the source pictures might arrive every 40ms, but because the pictures are encoded to different sizes (because of the different amounts of information they contain), in the coded stream they might take 12ms, 62ms, 38ms ... so you can't just display each picture when it comes or the stream will be really juddery. The pictures have to be buffered so as to get a smooth output stream. This is done by having a buffer model at the encoder and decoder. This is a "leaky bucket" where bits enter the bucket at the decoder in a constant stream, and pictures, corresponding to variable-sized dollops, are pulled out. You can operate Schro in a Constant Bit Rate mode with this buffer model. The smaller the buffer, the lower the delay, but the worse the quality. So the long and the short of it is that lower delay means lower quality. In typical video streaming scenarios, it's probably the buffering delay that dominates actually - that's why video on YouTube says "Buffering ..." at the start. The good news is that in Schro's low delay mode ("P only coding"), the amount of information in each encoded picture is mostly about constant, so a smallish buffer of around 0.5 sec or maybe even less should be ok. Note that the default is actually much larger, however: typical internet streaming services might have buffers 10s long or even longer. regards Thomas >-----Original Message----- >From: David Schleef [mailto:d...@entropywave.com] >Sent: 27 March 2009 06:19 >To: Norbert Kubiec >Cc: schrodinger-devel@lists.sourceforge.net >Subject: Re: [Schrodinger-devel] Few questions about Dirac video codec > >On Thu, Mar 26, 2009 at 11:07:32PM +0100, Norbert Kubiec wrote: >> The question of latency was not unfounded. Have You heard about >> OnLive? They use new interactive video compression >algorithm. Latency >> through the algorithm is just 1-ms instead of the 0.5- to >0.75-second >> lag inherent in conventional compression algorithms used in >corporate >> video conferencing solutions, for example. > >Glad to hear that you totally bought the marketing speak. :) > >Rather than respond to your questions directly, I'll talk >randomly about low-latency video codecs work. > >One key point about low latency video encoding is that the >output bits that represent the pixel have to exist somewhere >in the bitstream between the time the encoder gets the pixel >from the camera, and N ms later, where N is the latency. > >One method of very low-latency compression works on a scanline basis. >An example is the low-delay profile of Dirac. A camera reads >out a few scan lines (say, 16), the encoder compresses them, >and then sends those bits out over ethernet or ASI or >whatever. The latency is on the order of a few scan lines, >say 16*2 + a small number. Why 16*2? >Because it takes 16 lines to read in the 16 line chunk, then >spends the time that it takes to read in the next chunk to >encode the first chunk and send it out over the wire. >Simultaneously, the decoder reads in the data and decodes. >Then during the third set of 16 lines, the decoder scans out >the uncompressed lines. So the decoder scans out line 0 as >the camera is scanning out line 32. Real encoders need a bit >of extra time for synchronization, so 32 is ideal. Of course, >in a real system there is network latency, but we'll make >someone else worry about that. >32 lines works out to be abous 1 ms for 1080p at 30 frames per >second, depending on exactly the system you're using. >Compression ratios are purposefully low, since you can't >spread around worst-case bits at all, and because this kind of >compression is only really useful for studio work. > >Note that camera that has a few-scanline latency start at USD >10,000 and an encoder/decoder pair for DiracPro is about USD >4,000, iirc. >This is not the kind of technology you roll out in a consumer product. > >Another method is similar, but using an entire frame instead >of a few scan lines. In this case, you get a theoretical >latency of 2 frames, or about 60 ms for 30 fps video. I've >seen companies advertising encoder/decoder pairs that claim 70 >ms latency (of course, without any network latency), and I can >pretty much believe this number. Again, you can't get away >with cheap hardware -- my DV camera has an internal latency >somewhere between 90 and 120 ms, and HDV cameras are much worse. > >In a frame-based low-latency system, it's much more realistic >to use motion compensation, in which you use the previous one >or two frames as reference pictures. Since the general point >of using motion compensation is to decrease the bit rate, this >causes compression artifacts immediately after scene changes >that clear up after a few frames, and is very characteristic >of the technique. > >Due to the way that Dirac puts together pictures, the >non-low-delay profiles of Dirac has a approximate latency of 4 >pictures for a simple implementation, although you can >decrease this to nearly 2 pictures with more complex >algorithms. Schroedinger implements the simple algorithm, and >with suitable modifications (it does not do this by default) >you can get close to 4 frames latency. Schro's implementation >of Low-Delay Profile is also 4 frames, since it uses the same code. > >Entropy Wave has implementations of the more complex algorithm >for Simple and Intra profiles, as well as an actual low delay >implementation of Low-Delay profile, with latencies that are >very near the theoretical latencies. These are not open >source. Unfortunately, since all the code that currently can >use these codecs is frame based, there's very minor advantage >over Schroedinger unless you write a bunch of custom code. > > > >dave... > > >--------------------------------------------------------------- >--------------- >_______________________________________________ >Schrodinger-devel mailing list >Schrodinger-devel@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/schrodinger-devel > http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. ------------------------------------------------------------------------------ _______________________________________________ Schrodinger-devel mailing list Schrodinger-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/schrodinger-devel