Re: [whatwg] Codecs for audio and video
On Tue, Feb 02, 2010 at 02:46:40PM +1300, Chris Double wrote: On 02/02/10 06:05, Chris McCormick wrote: I think I speak for all procedural audio people when I say, can't we get the browsers to allow sample-block access to audio? Dave Humphrey has been working on adding an API to do this to Firefox. He's been blogging about it here: http://vocamus.net/dave/?cat=25 Perfect! I can't wait to see where this leads. Chris. --- http://mccormick.cx
Re: [whatwg] Codecs for audio and video
On Sun, Aug 09, 2009 at 08:29:28PM +1000, Silvia Pfeiffer wrote: On Sun, Aug 9, 2009 at 7:20 PM, Chris McCormickch...@mccormick.cx wrote: Hi Sylvia, On Sun, Aug 09, 2009 at 11:16:01AM +1000, Silvia Pfeiffer wrote: On Sun, Aug 9, 2009 at 3:15 AM, Chris McCormickch...@mccormick.cx wrote: On Wed, Jul 08, 2009 at 09:24:42AM -0700, Charles Pritchard wrote: There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. Here are a few more use-cases that many people would consider just as important: * Browser based music software and synthesis toys. * New types of 'algorithmic' music like that pioneered by Brian Eno. * Browser based games which want to use procedural audio instead of pre-rendered sound effects. Why don't you just implement an example in javascript to show off what you're talking about and make a use case for having it implemented inside the browsers? Yes, you are right I should definately do that. What is the normal process for that: write some code, post it up on my website, and then post here with a link? Is that sufficient to get the attention of the browser implementors? I would think so. Not automatically, of course, but it would go a long way. By 'implement an example in javascript' do you mean that I should implement an example of what I wish the browsers could do, or implement an actual reference vector library that the browsers could use? The former I can see myself doing, but the latter has been on my TODO list long enough for me to know that I won't get it done any time soon. :/ The former. Do it in javascript even if it is very slow. Just needs to demonstrate the idea and how useful it is for browser users. Hi Silvia, Whilst I haven't had the time to do this myself, I did hear about the perfect example use-case for what I was getting at. Someone required a very small flash applet just to do the last javascript-to-audio bit of synthesis. Everything else was done in Javascript. http://stockholm.musichackday.org/index.php?page=Webloop Since almost no browser is able to output sound directly from javascript, I currently use a small flash applet to push the sound to your speakers, I hope you don't mind. I think I speak for all procedural audio people when I say, can't we get the browsers to allow sample-block access to audio? Best regards, Chris. --- http://mccormick.cx
Re: [whatwg] Codecs for audio and video
On Tue, Feb 2, 2010 at 4:05 AM, Chris McCormick ch...@mccormick.cx wrote: On Sun, Aug 09, 2009 at 08:29:28PM +1000, Silvia Pfeiffer wrote: On Sun, Aug 9, 2009 at 7:20 PM, Chris McCormickch...@mccormick.cx wrote: Hi Sylvia, On Sun, Aug 09, 2009 at 11:16:01AM +1000, Silvia Pfeiffer wrote: On Sun, Aug 9, 2009 at 3:15 AM, Chris McCormickch...@mccormick.cx wrote: On Wed, Jul 08, 2009 at 09:24:42AM -0700, Charles Pritchard wrote: There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. Here are a few more use-cases that many people would consider just as important: * Browser based music software and synthesis toys. * New types of 'algorithmic' music like that pioneered by Brian Eno. * Browser based games which want to use procedural audio instead of pre-rendered sound effects. Why don't you just implement an example in javascript to show off what you're talking about and make a use case for having it implemented inside the browsers? Yes, you are right I should definately do that. What is the normal process for that: write some code, post it up on my website, and then post here with a link? Is that sufficient to get the attention of the browser implementors? I would think so. Not automatically, of course, but it would go a long way. By 'implement an example in javascript' do you mean that I should implement an example of what I wish the browsers could do, or implement an actual reference vector library that the browsers could use? The former I can see myself doing, but the latter has been on my TODO list long enough for me to know that I won't get it done any time soon. :/ The former. Do it in javascript even if it is very slow. Just needs to demonstrate the idea and how useful it is for browser users. Hi Silvia, Whilst I haven't had the time to do this myself, I did hear about the perfect example use-case for what I was getting at. Someone required a very small flash applet just to do the last javascript-to-audio bit of synthesis. Everything else was done in Javascript. http://stockholm.musichackday.org/index.php?page=Webloop Since almost no browser is able to output sound directly from javascript, I currently use a small flash applet to push the sound to your speakers, I hope you don't mind. I think I speak for all procedural audio people when I say, can't we get the browsers to allow sample-block access to audio? Sounds like a solid argument to me. But I'm not the one who counts. :-) Cheers, Silvia.
Re: [whatwg] Codecs for audio and video
On Feb 1, 2010, at 14:02 , Silvia Pfeiffer wrote: On Tue, Feb 2, 2010 at 4:05 AM, Chris McCormick ch...@mccormick.cx wrote: I think I speak for all procedural audio people when I say, can't we get the browsers to allow sample-block access to audio? Sounds like a solid argument to me. But I'm not the one who counts. :-) I think that the browser vendors are currently trying to nail basic HTML5 multimedia, then accessibility, then they'll take a breath and ask what's next. But do you have ideas as to how? David Singer Multimedia and Software Standards, Apple Inc.
Re: [whatwg] Codecs for audio and video
On 02/02/10 06:05, Chris McCormick wrote: I think I speak for all procedural audio people when I say, can't we get the browsers to allow sample-block access to audio? Dave Humphrey has been working on adding an API to do this to Firefox. He's been blogging about it here: http://vocamus.net/dave/?cat=25 Chris. -- http://bluishcoder.co.nz
Re: [whatwg] Codecs for audio and video
I'm sorry - we cannot say anything about our plans at this point. Thanks, Jeremy On Mon, Aug 10, 2009 at 5:30 PM, Nils Dagsson Moskoppnils-dagsson-mosk...@dieweltistgarnichtso.net wrote: Am Dienstag, den 11.08.2009, 00:44 +0100 schrieb Sam Kuper: In recent news, Google may be about to open source On2 codecs, perhaps creating a route out of the HTML 5 video codec deadlock: http://www.theregister.co.uk/2009/08/06/google_vp6_open_source/ At this point, this seems to be pure speculation. Maybe Google representatives can chime in on this issue ? -- Nils Dagsson Moskopp http://dieweltistgarnichtso.net
Re: [whatwg] Codecs for audio and video
2009/8/11 Nils Dagsson Moskopp nils-dagsson-mosk...@dieweltistgarnichtso.net: Am Dienstag, den 11.08.2009, 00:44 +0100 schrieb Sam Kuper: In recent news, Google may be about to open source On2 codecs, perhaps creating a route out of the HTML 5 video codec deadlock: http://www.theregister.co.uk/2009/08/06/google_vp6_open_source/ At this point, this seems to be pure speculation. Maybe Google representatives can chime in on this issue ? I think it would be entirely reasonable to let Google get on with what they're doing on their schedule and count our chickens precisely when they hatch ;-) But with the results the Xiph/Mozilla/Wikimedia team have managed to get with the Thusnelda encoder for Theora - comparable results to H.264 - a released open unencumbered codec with a big company defending its freedom could get very good indeed in reasonable order. - d.
Re: [whatwg] Codecs for audio and video
Hi Sam, On Sun, Aug 09, 2009 at 03:23:15PM +0100, Sam Dutton wrote: As an aside to Chris McCormick's comments, I wonder if it might also be useful/possible/appropriate (or not) to provide access to media data in the way that the ActionScript computeSpectrum function does: http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/flash/media/SoundMixer.html#computeSpectrum%28%29 Sample visualization using Canvas with computeSpectrum: http://www2.nihilogic.dk/labs/canvas_music_visualization/ If the set of signal vector operators came with an FFT operator, which they definately should, then this would be taken care of in a way which would look pretty similar to that API. Best, Chris. --- http://mccormick.cx
Re: [whatwg] Codecs for audio and video
In recent news, Google may be about to open source On2 codecs, perhaps creating a route out of the HTML 5 video codec deadlock: http://www.theregister.co.uk/2009/08/06/google_vp6_open_source/
Re: [whatwg] Codecs for audio and video
Perhaps Google will finally be able to break this horrible deadlock by doing just that. :) On Mon, Aug 10, 2009 at 6:44 PM, Sam Kuper sam.ku...@uclmail.net wrote: In recent news, Google may be about to open source On2 codecs, perhaps creating a route out of the HTML 5 video codec deadlock: http://www.theregister.co.uk/2009/08/06/google_vp6_open_source/
Re: [whatwg] Codecs for audio and video
Am Dienstag, den 11.08.2009, 00:44 +0100 schrieb Sam Kuper: In recent news, Google may be about to open source On2 codecs, perhaps creating a route out of the HTML 5 video codec deadlock: http://www.theregister.co.uk/2009/08/06/google_vp6_open_source/ At this point, this seems to be pure speculation. Maybe Google representatives can chime in on this issue ? -- Nils Dagsson Moskopp http://dieweltistgarnichtso.net
Re: [whatwg] Codecs for audio and video
Hi Sylvia, On Sun, Aug 09, 2009 at 11:16:01AM +1000, Silvia Pfeiffer wrote: On Sun, Aug 9, 2009 at 3:15 AM, Chris McCormickch...@mccormick.cx wrote: On Wed, Jul 08, 2009 at 09:24:42AM -0700, Charles Pritchard wrote: There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. Here are a few more use-cases that many people would consider just as important: * Browser based music software and synthesis toys. * New types of 'algorithmic' music like that pioneered by Brian Eno. * Browser based games which want to use procedural audio instead of pre-rendered sound effects. Why don't you just implement an example in javascript to show off what you're talking about and make a use case for having it implemented inside the browsers? Yes, you are right I should definately do that. What is the normal process for that: write some code, post it up on my website, and then post here with a link? Is that sufficient to get the attention of the browser implementors? By 'implement an example in javascript' do you mean that I should implement an example of what I wish the browsers could do, or implement an actual reference vector library that the browsers could use? The former I can see myself doing, but the latter has been on my TODO list long enough for me to know that I won't get it done any time soon. :/ Chris. --- http://mccormick.cx
Re: [whatwg] Codecs for audio and video
On Sun, Aug 9, 2009 at 7:20 PM, Chris McCormickch...@mccormick.cx wrote: Hi Sylvia, On Sun, Aug 09, 2009 at 11:16:01AM +1000, Silvia Pfeiffer wrote: On Sun, Aug 9, 2009 at 3:15 AM, Chris McCormickch...@mccormick.cx wrote: On Wed, Jul 08, 2009 at 09:24:42AM -0700, Charles Pritchard wrote: There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. Here are a few more use-cases that many people would consider just as important: * Browser based music software and synthesis toys. * New types of 'algorithmic' music like that pioneered by Brian Eno. * Browser based games which want to use procedural audio instead of pre-rendered sound effects. Why don't you just implement an example in javascript to show off what you're talking about and make a use case for having it implemented inside the browsers? Yes, you are right I should definately do that. What is the normal process for that: write some code, post it up on my website, and then post here with a link? Is that sufficient to get the attention of the browser implementors? I would think so. Not automatically, of course, but it would go a long way. By 'implement an example in javascript' do you mean that I should implement an example of what I wish the browsers could do, or implement an actual reference vector library that the browsers could use? The former I can see myself doing, but the latter has been on my TODO list long enough for me to know that I won't get it done any time soon. :/ The former. Do it in javascript even if it is very slow. Just needs to demonstrate the idea and how useful it is for browser users. Regards, Silvia.
Re: [whatwg] Codecs for audio and video
As an aside to Chris McCormick's comments, I wonder if it might also be useful/possible/appropriate (or not) to provide access to media data in the way that the ActionScript computeSpectrum function does: http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/flash/media/SoundMixer.html#computeSpectrum%28%29 Sample visualization using Canvas with computeSpectrum: http://www2.nihilogic.dk/labs/canvas_music_visualization/ Sam Dutton -- Message: 1 Date: Sun, 9 Aug 2009 11:16:01 +1000 From: Silvia Pfeiffer silviapfeiff...@gmail.com Subject: Re: [whatwg] Codecs for audio and video To: Chris McCormick ch...@mccormick.cx Cc: whatwg@lists.whatwg.org Message-ID: 2c0e02830908081816v74711d64ya72c8cc11550b...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Sun, Aug 9, 2009 at 3:15 AM, Chris McCormickch...@mccormick.cx wrote: On Wed, Jul 08, 2009 at 09:24:42AM -0700, Charles Pritchard wrote: There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. Here are a few more use-cases that many people would consider just as important: * Browser based music software and synthesis toys. * New types of 'algorithmic' music like that pioneered by Brian Eno. * Browser based games which want to use procedural audio instead of pre-rendered sound effects. I'd like to reiterate the previously expressed sentiment that only implementing pre-rendered audio playback is like having a browser that only supports static images loaded from the server instead of animations and canvas tags. What is really needed is a DSP vector processor which runs outside of ECMA script, but with a good API so that the ECMAscripts can talk to it directly. Examples of reference software, mostly open source, which do this type of thing follow: * Csound * Supercollider * Pure Data * Nyquist * Chuck * Steinberg VSTs I am going to use the terms signal vector, audio buffer, and array interchangeably below. Four major types of synthesis would be useful, but they are pretty much isomorphic, so any one of them could be implemented as a base-line: * Wavetable (implement vector write/read/lookup operators) * FM AM (implement vector + and * operators) * Subtractive (implement unit delay from which you can build filters) * Frequency domain (implemnt FFT and back again) Of these, I feel that wavetable synthesis should be the first type of synthesis to be implemented, since most of the code for manipulating audio buffers is already going to be in the browsers and exposing those buffers shouldn't be hugely difficult. Basically what this would take is ensuring some things about the audio tag: * Supports playback of arbitrarily small buffers. * Seamlessly loops those small buffers. * Allows read/write access to those buffers from ECMAscript. Given the above, the other types of synthesis are possible, albeit slowly. For example, FM AM synthesis are possible by adding adding/multiplying vectors of sine data together into a currently looping audio buffer. Subtractive synthesis is possible by adding delayed versions of the data in the buffer to itself. Frequency domain synthesis is possible by analysing the data in the buffer with FFT (and reverse FFT) and writing back new data. I see this API as working as previously posted, by Charles Prichard, but with the following extra possibility: audio id='mybuffer' buffer = document.getElementById(mybuffer); // here myfunc is a function which will change // the audio buffer each time the buffer loops buffer.loopCallback = myfunc; buffer.loop = True; buffer.play(); Of course, the ECMA script is probably going to be too slow in the short term, so moving forward it would be great if there was a library/API which can do the following vector operations in the background at a speed faster than doing them directly, element by element inside ECMAscript (a bit like Python's Numeric module). All inputs and outputs are signal vectors/audio tag buffers: * + - add two signal vectors (2 input, 1 output) * * - multiply two signal vectors (2 input, 1 output) * z - delay a signal vector with customisable sample length (2 input, 1 output) * read - do a table lookup (1 input, 1 output) * write - do a table write (2 input, 1 output) * copy - memcpy a signal vector (1 input, 1 output) * fft do a fast fourier transform - (1 input, 2 output) * rfft do a reverse fast fourier transform - (2 inputs, 1 output) It would be so great if it was possible to unify the above into an API that looked and worked something like this: audio id='mybuffer' outbuffer = document.getElementById(mybuffer); b = new AudioBuffer(64) for (x=0; x64; x++) ? ? ? ?b[x] = Math.sin(x / 64 * Math.PI)a; // inside the loopCallback do a vector multiplication of the data in our buffer
Re: [whatwg] Codecs for audio and video
On Wed, Jul 08, 2009 at 09:24:42AM -0700, Charles Pritchard wrote: There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. Here are a few more use-cases that many people would consider just as important: * Browser based music software and synthesis toys. * New types of 'algorithmic' music like that pioneered by Brian Eno. * Browser based games which want to use procedural audio instead of pre-rendered sound effects. I'd like to reiterate the previously expressed sentiment that only implementing pre-rendered audio playback is like having a browser that only supports static images loaded from the server instead of animations and canvas tags. What is really needed is a DSP vector processor which runs outside of ECMA script, but with a good API so that the ECMAscripts can talk to it directly. Examples of reference software, mostly open source, which do this type of thing follow: * Csound * Supercollider * Pure Data * Nyquist * Chuck * Steinberg VSTs I am going to use the terms signal vector, audio buffer, and array interchangeably below. Four major types of synthesis would be useful, but they are pretty much isomorphic, so any one of them could be implemented as a base-line: * Wavetable (implement vector write/read/lookup operators) * FM AM (implement vector + and * operators) * Subtractive (implement unit delay from which you can build filters) * Frequency domain (implemnt FFT and back again) Of these, I feel that wavetable synthesis should be the first type of synthesis to be implemented, since most of the code for manipulating audio buffers is already going to be in the browsers and exposing those buffers shouldn't be hugely difficult. Basically what this would take is ensuring some things about the audio tag: * Supports playback of arbitrarily small buffers. * Seamlessly loops those small buffers. * Allows read/write access to those buffers from ECMAscript. Given the above, the other types of synthesis are possible, albeit slowly. For example, FM AM synthesis are possible by adding adding/multiplying vectors of sine data together into a currently looping audio buffer. Subtractive synthesis is possible by adding delayed versions of the data in the buffer to itself. Frequency domain synthesis is possible by analysing the data in the buffer with FFT (and reverse FFT) and writing back new data. I see this API as working as previously posted, by Charles Prichard, but with the following extra possibility: audio id='mybuffer' buffer = document.getElementById(mybuffer); // here myfunc is a function which will change // the audio buffer each time the buffer loops buffer.loopCallback = myfunc; buffer.loop = True; buffer.play(); Of course, the ECMA script is probably going to be too slow in the short term, so moving forward it would be great if there was a library/API which can do the following vector operations in the background at a speed faster than doing them directly, element by element inside ECMAscript (a bit like Python's Numeric module). All inputs and outputs are signal vectors/audio tag buffers: * + - add two signal vectors (2 input, 1 output) * * - multiply two signal vectors (2 input, 1 output) * z - delay a signal vector with customisable sample length (2 input, 1 output) * read - do a table lookup (1 input, 1 output) * write - do a table write (2 input, 1 output) * copy - memcpy a signal vector (1 input, 1 output) * fft do a fast fourier transform - (1 input, 2 output) * rfft do a reverse fast fourier transform - (2 inputs, 1 output) It would be so great if it was possible to unify the above into an API that looked and worked something like this: audio id='mybuffer' outbuffer = document.getElementById(mybuffer); b = new AudioBuffer(64) for (x=0; x64; x++) b[x] = Math.sin(x / 64 * Math.PI)a; // inside the loopCallback do a vector multiplication of the data in our buffer // with a sine wave we created earlier. outbuffer.multiply(b); I hope this email is not too obvious and helps clarify thinking rather than confusing things. As a game developer and music software developer I look forward to making dynamic audio applications on the web! Best regards, Chris. --- http://mccormick.cx
Re: [whatwg] Codecs for audio and video
Chris McCormick wrote: Of course, the ECMA script is probably going to be too slow in the short term, so moving forward it would be great if there was a library/API which can do the following vector operations in the background at a speed faster than doing them directly, element by element inside ECMAscript (a bit like Python's Numeric module). All inputs and outputs are signal vectors/audio tag buffers: * + - add two signal vectors (2 input, 1 output) * * - multiply two signal vectors (2 input, 1 output) * z - delay a signal vector with customisable sample length (2 input, 1 output) * read - do a table lookup (1 input, 1 output) * write - do a table write (2 input, 1 output) * copy - memcpy a signal vector (1 input, 1 output) * fft do a fast fourier transform - (1 input, 2 output) * rfft do a reverse fast fourier transform - (2 inputs, 1 output) I'm sort of wondering what the performance of these would actually be if implemented directly in ECMAScript, before we decide that's too slow and start looking for alternate solutions. Do you happen to have any sample implementations? What size arrays are we talking about here? I just did a quick test in SpiderMonkey, and adding two arrays of integers with 441000 elements each (so 10s of 44.1kHz audio; the time includes allocating the sum array and all that) element-by-element like so: var a3 = new Array(size); for (var j = 0; j size; ++j) { a3[j] = a1[j] + a2[j]; } takes about 25ms on my computer. Multiplication takes about 35ms. Duplicating an array takes about 20ms. This is without any of the in-progress optimizations for type-specializing arrays, etc. What sort of performance are we looking for here? -Boris
Re: [whatwg] Codecs for audio and video
Hi Boris, On Sat, Aug 08, 2009 at 02:15:19PM -0400, Boris Zbarsky wrote: Chris McCormick wrote: Of course, the ECMA script is probably going to be too slow in the short term, so moving forward it would be great if there was a library/API which can do the following vector operations in the background at a speed faster than doing them directly, element by element inside ECMAscript (a bit like Python's Numeric module). All inputs and outputs are signal vectors/audio tag buffers: * + - add two signal vectors (2 input, 1 output) * * - multiply two signal vectors (2 input, 1 output) * z - delay a signal vector with customisable sample length (2 input, 1 output) * read - do a table lookup (1 input, 1 output) * write - do a table write (2 input, 1 output) * copy - memcpy a signal vector (1 input, 1 output) * fft do a fast fourier transform - (1 input, 2 output) * rfft do a reverse fast fourier transform - (2 inputs, 1 output) I'm sort of wondering what the performance of these would actually be if implemented directly in ECMAScript, before we decide that's too slow and start looking for alternate solutions. Do you happen to have any sample implementations? What size arrays are we talking about here? I just did a quick test in SpiderMonkey, and adding two arrays of integers with 441000 elements each (so 10s of 44.1kHz audio; the time includes allocating the sum array and all that) element-by-element like so: var a3 = new Array(size); for (var j = 0; j size; ++j) { a3[j] = a1[j] + a2[j]; } takes about 25ms on my computer. Multiplication takes about 35ms. Duplicating an array takes about 20ms. This is without any of the in-progress optimizations for type-specializing arrays, etc. What sort of performance are we looking for here? It's a bit of an open ended how-long-is-a-piece-of-string sort of a question in that you will generally make synthesizers which require less CPU than what your computer is able to provide, for the obvious reason that they won't work otherwise. So the real answer is that you want the DSP system to go absolutely as fast as possible I guess, so that you can squeeze as much synthesis out as possible. I'll throw some numbers out there anyway. A game with procedural audio, or a synth, or a piece of algorithmic music might contain between tens and tens of thousands of such vector operatons per frame, and buffers might be between 10ms and 100ms = vector sizes of 441 samples to 4410 samples. So you could do some simple synthesis with pure Javascript if it was able to loop through say 100 arrays of 4410 samples each, doing vector operations on those arrays, in under 100ms. Hope this helps a little bit. Chris. --- http://mccormick.cx
Re: [whatwg] Codecs for audio and video
On Sun, Aug 9, 2009 at 3:15 AM, Chris McCormickch...@mccormick.cx wrote: On Wed, Jul 08, 2009 at 09:24:42AM -0700, Charles Pritchard wrote: There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. Here are a few more use-cases that many people would consider just as important: * Browser based music software and synthesis toys. * New types of 'algorithmic' music like that pioneered by Brian Eno. * Browser based games which want to use procedural audio instead of pre-rendered sound effects. I'd like to reiterate the previously expressed sentiment that only implementing pre-rendered audio playback is like having a browser that only supports static images loaded from the server instead of animations and canvas tags. What is really needed is a DSP vector processor which runs outside of ECMA script, but with a good API so that the ECMAscripts can talk to it directly. Examples of reference software, mostly open source, which do this type of thing follow: * Csound * Supercollider * Pure Data * Nyquist * Chuck * Steinberg VSTs I am going to use the terms signal vector, audio buffer, and array interchangeably below. Four major types of synthesis would be useful, but they are pretty much isomorphic, so any one of them could be implemented as a base-line: * Wavetable (implement vector write/read/lookup operators) * FM AM (implement vector + and * operators) * Subtractive (implement unit delay from which you can build filters) * Frequency domain (implemnt FFT and back again) Of these, I feel that wavetable synthesis should be the first type of synthesis to be implemented, since most of the code for manipulating audio buffers is already going to be in the browsers and exposing those buffers shouldn't be hugely difficult. Basically what this would take is ensuring some things about the audio tag: * Supports playback of arbitrarily small buffers. * Seamlessly loops those small buffers. * Allows read/write access to those buffers from ECMAscript. Given the above, the other types of synthesis are possible, albeit slowly. For example, FM AM synthesis are possible by adding adding/multiplying vectors of sine data together into a currently looping audio buffer. Subtractive synthesis is possible by adding delayed versions of the data in the buffer to itself. Frequency domain synthesis is possible by analysing the data in the buffer with FFT (and reverse FFT) and writing back new data. I see this API as working as previously posted, by Charles Prichard, but with the following extra possibility: audio id='mybuffer' buffer = document.getElementById(mybuffer); // here myfunc is a function which will change // the audio buffer each time the buffer loops buffer.loopCallback = myfunc; buffer.loop = True; buffer.play(); Of course, the ECMA script is probably going to be too slow in the short term, so moving forward it would be great if there was a library/API which can do the following vector operations in the background at a speed faster than doing them directly, element by element inside ECMAscript (a bit like Python's Numeric module). All inputs and outputs are signal vectors/audio tag buffers: * + - add two signal vectors (2 input, 1 output) * * - multiply two signal vectors (2 input, 1 output) * z - delay a signal vector with customisable sample length (2 input, 1 output) * read - do a table lookup (1 input, 1 output) * write - do a table write (2 input, 1 output) * copy - memcpy a signal vector (1 input, 1 output) * fft do a fast fourier transform - (1 input, 2 output) * rfft do a reverse fast fourier transform - (2 inputs, 1 output) It would be so great if it was possible to unify the above into an API that looked and worked something like this: audio id='mybuffer' outbuffer = document.getElementById(mybuffer); b = new AudioBuffer(64) for (x=0; x64; x++) b[x] = Math.sin(x / 64 * Math.PI)a; // inside the loopCallback do a vector multiplication of the data in our buffer // with a sine wave we created earlier. outbuffer.multiply(b); Why don't you just implement an example in javascript to show off what you're talking about and make a use case for having it implemented inside the browsers? Cheers, Silvia,
Re: [whatwg] Codecs for audio and video
Chris McCormick wrote: It's a bit of an open ended how-long-is-a-piece-of-string sort of a question in that you will generally make synthesizers which require less CPU than what your computer is able to provide, for the obvious reason that they won't work otherwise. So the real answer is that you want the DSP system to go absolutely as fast as possible I guess, so that you can squeeze as much synthesis out as possible. OK, sure. But you indicated that ECMAScript would be unacceptable for the use cases, period So you could do some simple synthesis with pure Javascript if it was able to loop through say 100 arrays of 4410 samples each, doing vector operations on those arrays, in under 100ms. See attached HTML file. It creates 103 arrays of length 4410, then goes through them starting with array #4 and sets array n to the product of arrays n-1 and n-2 plus array n-3. If I initialize the arrays with random 16-bit integers, the numbers I see are like so (on a year-old laptop): Gecko (build that is pretty close to Firefox 3.6a1): Setup: 96ms Vector ops: 43ms Webkit nightly (in Safari, so using SFX): Setup: 170ms Vector ops: 41ms I don't have a Chrome build around to test how well V8 would do on this testcase, and Opera 10 beta is giving me numbers about 20x slower than the above. Firefox 3 gives numbers about 5x slower than the above. I can't test IE over here easily (it's a Mac laptop), but I would expect it to be somewhat slower than the above numbers too. So I can certainly see how you would feel that ECMAScript is not fast enough for this sort of thing: until recently it wasn't, for the most part. But at this point it seems to be ok, if not great (I have about 2x headroom for your target numbers for simple sound synthesis over here; someone with an older computer would have less, and more complicated operations might take more time). And I can tell that at least in Gecko's case there's ongoing work to make array access faster... So it seems fairly likely to me that all UAs will end up with ECMAScript implementations fast enough to do what you want here sooner than all UAs would implement a brand-new set of functionality. Certainly that's the case for Safari and Gecko-based browsers. :) -Boris
Re: [whatwg] Codecs for audio and video
On Wed, 08 Jul 2009 18:24:42 +0200, Charles Pritchard ch...@jumis.com wrote: On 7/8/09 2:20 AM, Philip Jagenstedt wrote: On Tue, 07 Jul 2009 22:45:41 +0200, Charles Pritchard ch...@jumis.com wrote: At some point, a Blob / Stream API could make things like this easier. If the idea is to write a Vorbis decoder in JavaScript that would be quite cool in a way, but for vendors already implementing Vorbis it wouldn't really add anything. A pure JS-implementation of any modern audio codec would probably be a ridiculous amount of code and slow, so I doubt it would be that useful in practice. Well I'd like to disagree, and reiterate my prior arguments. Vorbis decoders have been written in ActionScript and in Java. They are not ridiculous, in size, nor in CPU usage. They can play audio streams, smoothly, and the file size is completely tolerable. And the idea is codec neutrality, a Vorbis decoder is just one example. OK, I won't make any assumptions of the size/speed of such an implementation until I see one. Well, again, there exist implementations running on Sun/Oracle's Java VM and the Flash VM. These two use byte-code packaging, so the file size is under 100k, deflated ECMAScript source would also weigh under 100k. Transcoding lossy data is a sub-optimal solution. Allowing for arbitrary audio codecs is a worthwhile endeavor. ECMAScript can detect if playback is too slow. I want to point this out again. While there is some struggle to define a standard codec (so we might be spared the burden of so very many encoders), there is a very large supply of already-encoded media in the wild. I've recently worked on a project that required a difficult to obtain/install codec. Open source descriptions were available, and if it was an option, I certainly would have paid to have the codec written in ECMAScript, and delivered it with the media files. In that particular case, paying someone to write a decoder for one particular, minority codec, would have been cheaper, and more correct, than paying for the transcoding of 60 gigs of low bit-rate audio. Most media formats are lossy, making their current format, whatever the encumbrance, the best solution. Yes, re-encoding always lowers the quality, so this use case is something I would agree with. Additionally, in some cases, the programmer could work-around broken codec implementations. It's forward-looking, it allows real backward compatibility and interoperability across browsers. canvas allows for arbitrary, programmable video, audio should allow for programmable audio. Then, we can be codec neutral in our media elements. While stressing that I don't think this should go into the spec until there's a proof-of-concept implementation that does useful stuff, is the idea to set audio.src=new MySynthesizer() and play()? (MySynthesizer would need to implement some standard interface.) You also have the question of push vs pull, i.e. does the audio source request data from the synthesizer when needed or does the synthesizer need to run a loop pushing audio data? Well we really need to define what useful stuff is, you know, to set that bar. It really doesn't matter if you and agree on what it useful. If one browser implements an audio synthesis interface and it's good enough, others will follow and the spec work will begin. There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. I don't know what would qualify for accessibility. A topographical map, which makes a lower or higher pitched hum, based on elevation (surrounding the pointer), is an example. On that same line of thinking, a hum of varying intensity signaling proximity to a clickable element, (we're still talking about canvas) might be useful. If there is no sound in the right-channel, there are no elements to be clicked on, to the right of the pointer. If it is a low-sound, then the element is rather far away. Site developers still need to put in the work. With a buffered audio API, they'll at least have the option to do so. Can we come to an agreement as to what would constitute a reasonable proof of concept? This is meant to allow canvas to be more accessible to the visually impaired. Obviously, audio src tags could be used in many cases with canvas, so our test-case should be one where audio src would be insufficient. Both of these use cases can be accomplished with a raw audio buffer. They do not need native channel mixing, nor toDataURL support. In the long term, I think those two options would be nice, but in the short term, would just cause delays in adoption. As Robert has said, there are much more important things to work on ( https://bugzilla.mozilla.org/show_bug.cgi?id=490705 ). I think at this point, the model should play buffered bytes as they are made available
Re: [whatwg] Codecs for audio and video
On Tue, 07 Jul 2009 22:45:41 +0200, Charles Pritchard ch...@jumis.com wrote: On 7/7/09 1:10 PM, Philip Jagenstedt wrote: On Tue, 07 Jul 2009 17:52:29 +0200, Charles Pritchard ch...@jumis.com wrote: Philip Jagenstedt wrote: For all of the simpler use cases you can already generate sounds yourself with a data uri. For example, with is 2 samples of silence: data:audio/wav;base64,UklGRigAAABXQVZFZm10IBABAAEARKwAAIhYAQACABAAZGF0YQQA. Yes you can use this method, and with the current audio tag and autobuffer, it may work to some degree. It does not produce smooth transitions. At some point, a Blob / Stream API could make things like this easier. If the idea is to write a Vorbis decoder in JavaScript that would be quite cool in a way, but for vendors already implementing Vorbis it wouldn't really add anything. A pure JS-implementation of any modern audio codec would probably be a ridiculous amount of code and slow, so I doubt it would be that useful in practice. Well I'd like to disagree, and reiterate my prior arguments. Vorbis decoders have been written in ActionScript and in Java. They are not ridiculous, in size, nor in CPU usage. They can play audio streams, smoothly, and the file size is completely tolerable. And the idea is codec neutrality, a Vorbis decoder is just one example. OK, I won't make any assumptions of the size/speed of such an implementation until I see one. For some use cases you could use 2 audio elements in tandem, mixing new sound to a new data URI when the first is nearing the end (although sync can't be guaranteed with the current API). But yes, there are things which can only be done by a streaming API integrating into the underlying media framework. Yes, the current API is inadequate. data: encoding is insufficient. Here's the list of propsed features right out of a comment block in the spec: This list of features can be written without a spec, using canvas, using a raw data buffer, and using ECMAScript. A few of these features may need hardware level support, or a fast computer. The audio tag would be invisible, and the canvas tag would provide the user interface. Your use cases probably fall under audio filters and synthesis. I expect that attention will turn to gradually more complex use cases when the basic API we have now is implemented and stable cross-browser and cross-platform. Yes, some of these use cases qualify as filters, some qualify as synthesis. I'm proposing that simple filters and synthesis can be accomplished with modern ECMAScript virtual machines and a raw data buffer. My use cases are qualified to current capabilities. Apart from those use cases, I'm proposing that a raw data buffer will allow for codec neutrality. There are dozens of minor audio codecs, some simpler than others, some low bitrate, that could be programmed in ECMAScript and would run just fine with modern ECMAScript VMs. Transcoding lossy data is a sub-optimal solution. Allowing for arbitrary audio codecs is a worthwhile endeavor. ECMAScript can detect if playback is too slow. Additionally, in some cases, the programmer could work-around broken codec implementations. It's forward-looking, it allows real backward compatibility and interoperability across browsers. canvas allows for arbitrary, programmable video, audio should allow for programmable audio. Then, we can be codec neutral in our media elements. While stressing that I don't think this should go into the spec until there's a proof-of-concept implementation that does useful stuff, is the idea to set audio.src=new MySynthesizer() and play()? (MySynthesizer would need to implement some standard interface.) You also have the question of push vs pull, i.e. does the audio source request data from the synthesizer when needed or does the synthesizer need to run a loop pushing audio data? -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Codecs for audio and video
On 7/8/09 2:20 AM, Philip Jagenstedt wrote: On Tue, 07 Jul 2009 22:45:41 +0200, Charles Pritchard ch...@jumis.com wrote: At some point, a Blob / Stream API could make things like this easier. If the idea is to write a Vorbis decoder in JavaScript that would be quite cool in a way, but for vendors already implementing Vorbis it wouldn't really add anything. A pure JS-implementation of any modern audio codec would probably be a ridiculous amount of code and slow, so I doubt it would be that useful in practice. Well I'd like to disagree, and reiterate my prior arguments. Vorbis decoders have been written in ActionScript and in Java. They are not ridiculous, in size, nor in CPU usage. They can play audio streams, smoothly, and the file size is completely tolerable. And the idea is codec neutrality, a Vorbis decoder is just one example. OK, I won't make any assumptions of the size/speed of such an implementation until I see one. Well, again, there exist implementations running on Sun/Oracle's Java VM and the Flash VM. These two use byte-code packaging, so the file size is under 100k, deflated ECMAScript source would also weigh under 100k. Transcoding lossy data is a sub-optimal solution. Allowing for arbitrary audio codecs is a worthwhile endeavor. ECMAScript can detect if playback is too slow. I want to point this out again. While there is some struggle to define a standard codec (so we might be spared the burden of so very many encoders), there is a very large supply of already-encoded media in the wild. I've recently worked on a project that required a difficult to obtain/install codec. Open source descriptions were available, and if it was an option, I certainly would have paid to have the codec written in ECMAScript, and delivered it with the media files. In that particular case, paying someone to write a decoder for one particular, minority codec, would have been cheaper, and more correct, than paying for the transcoding of 60 gigs of low bit-rate audio. Most media formats are lossy, making their current format, whatever the encumbrance, the best solution. Additionally, in some cases, the programmer could work-around broken codec implementations. It's forward-looking, it allows real backward compatibility and interoperability across browsers. canvas allows for arbitrary, programmable video, audio should allow for programmable audio. Then, we can be codec neutral in our media elements. While stressing that I don't think this should go into the spec until there's a proof-of-concept implementation that does useful stuff, is the idea to set audio.src=new MySynthesizer() and play()? (MySynthesizer would need to implement some standard interface.) You also have the question of push vs pull, i.e. does the audio source request data from the synthesizer when needed or does the synthesizer need to run a loop pushing audio data? Well we really need to define what useful stuff is, you know, to set that bar. There are two use cases that I think are important: a codec implementation (let's use Vorbis), and an accessibility implementation, working with a canvas element. I don't know what would qualify for accessibility. A topographical map, which makes a lower or higher pitched hum, based on elevation (surrounding the pointer), is an example. On that same line of thinking, a hum of varying intensity signaling proximity to a clickable element, (we're still talking about canvas) might be useful. If there is no sound in the right-channel, there are no elements to be clicked on, to the right of the pointer. If it is a low-sound, then the element is rather far away. Site developers still need to put in the work. With a buffered audio API, they'll at least have the option to do so. Can we come to an agreement as to what would constitute a reasonable proof of concept? This is meant to allow canvas to be more accessible to the visually impaired. Obviously, audio src tags could be used in many cases with canvas, so our test-case should be one where audio src would be insufficient. Both of these use cases can be accomplished with a raw audio buffer. They do not need native channel mixing, nor toDataURL support. In the long term, I think those two options would be nice, but in the short term, would just cause delays in adoption. As Robert has said, there are much more important things to work on ( https://bugzilla.mozilla.org/show_bug.cgi?id=490705 ). I think at this point, the model should play buffered bytes as they are made available (if the buffer has anything, start playing it). I believe the buffered attribute can be used by the ECMAScript loop to detect how much data is buffered, and whether it should continue decoding or take other actions. The buffered audio API should be handled by the media API in a way similar to streaming Web radio. There should be an origin-clean flag, for future use. One might theoretically add audio
Re: [whatwg] Codecs for audio and video
On Tue, 07 Jul 2009 03:44:25 +0200, Charles Pritchard ch...@jumis.com wrote: Ian Hickson wrote: On Mon, 6 Jul 2009, Charles Pritchard wrote: Ian Hickson wrote: On Mon, 6 Jul 2009, Charles Pritchard wrote: This is on the list of things to consider in a future version. At this point I don't really want to add new features yet because otherwise we'll never get the browser vendors caught up to implementing the same spec. :-) Consider a programmable audio element as a priority. Could you elaborate on what your use cases are? Is it just the ability to manually decode audio tracks? Some users could manually decode a Vorbis audio stream. I'm interested in altering pitch and pre-mixing channels. I believe some of these things are explored in CSS already. There are accessibility cases, for the visually impaired, and I think that they will be better explored. If you could elaborate on these use cases that would be really useful. How do you envisage using these features on Web pages? Use a sound of varying pitch to hint to a user the location of their mouse (is it hovering over a button, is it x/y pixels away from the edge of the screen, how close is it to the center). Alter the pitch of a sound to make a very cheap midi instrument. Pre-mix a few generated sounds, because the client processor is slow. Alter the pitch of an actual audio recording, and pre-mix it, to give different sounding voices to pre-recorded readings of a single text. As has been tried for male female sound fonts. Support very simple audio codecs, and programmable synthesizers. The API must support a playback buffer. putAudioBuffer(in AudioData) [ Error if audiodata properties are not supported ] createAudioData( in sample hz, bits per sample, length ) [ Error if properties are not supported. ] AudioData ( sampleHz, bitsPerSample, length, AudioDataArray ) AudioDataArray ( length, IndexGetter, IndexSetter ) 8 bits per property. I think that's about it. (There has been some discussion of suppoting an audio canvas before, but a lack of compelling use cases has really been the main blocker. Without a good understanding of the use cases, it's hard to design an API.) For all of the simpler use cases you can already generate sounds yourself with a data uri. For example, with is 2 samples of silence: data:audio/wav;base64,UklGRigAAABXQVZFZm10IBABAAEARKwAAIhYAQACABAAZGF0YQQA. It might be worthwhile implementing the API you want as a JavaScript library and see if you can actually do useful things with it. If the use cases are compelling and require native browser support to be performant enough, perhaps it could go into a future version of HTML. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Codecs for audio and video -- informative note?
On Mon, Jul 6, 2009 at 6:01 PM, David Gerarddger...@gmail.com wrote: 2009/7/6 Jim Jewett jimjjew...@gmail.com: As of 2009, there is no single efficient codec which works on all modern browsers. Content producers are encouraged to supply the video in both Theora and H.264 formats, as per the following example A spec that makes an encumbered format a SHOULD is unlikely to be workable for those content providers, e.g. Wikimedia, who don't have the money, and won't under principle, to put up stuff in a format rendered radioactive by known enforced patents. Your wording presumes a paid Web all the way through. According to http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Licensing, the W3C will not approve a Recommendation if it is aware that Essential Claims exist which are not available on Royalty-Free terms. So, until the time that H.264 is available royalty-free, I do not see how it can be included - in particular since there is a royalty-free alternative. Regards, Silvia.
Re: [whatwg] Codecs for audio and video
Philip Jagenstedt wrote: For all of the simpler use cases you can already generate sounds yourself with a data uri. For example, with is 2 samples of silence: data:audio/wav;base64,UklGRigAAABXQVZFZm10IBABAAEARKwAAIhYAQACABAAZGF0YQQA. Yes you can use this method, and with the current audio tag and autobuffer, it may work to some degree. We've used the data:audio/midi technique, and we've experimented with audio/wav, using the data: injection work-around, does not currently work all that well. It does not produce smooth transitions. We can use raw encoding instead of base64 to save on cpu cycles, but it's still quite hackish. It might be worthwhile implementing the API you want as a JavaScript library and see if you can actually do useful things with it. If the use cases are compelling and require native browser support to be performant enough, perhaps it could go into a future version of HTML. Overall, we can not make near-real-time effects, nor jitter-free compositions. We've used wav and midi in a JavaScript library, using the data: url technique. The data: injection technique is inefficient, it's not workable. Opera has been championing Xiph codecs on this list, There are ActionScript and Java Vorbis-players developed using the most basic of APIs. Isn't that use-case compelling enough? -Charles
Re: [whatwg] Codecs for audio and video
On Tue, 07 Jul 2009 17:52:29 +0200, Charles Pritchard ch...@jumis.com wrote: Philip Jagenstedt wrote: For all of the simpler use cases you can already generate sounds yourself with a data uri. For example, with is 2 samples of silence: data:audio/wav;base64,UklGRigAAABXQVZFZm10IBABAAEARKwAAIhYAQACABAAZGF0YQQA. Yes you can use this method, and with the current audio tag and autobuffer, it may work to some degree. We've used the data:audio/midi technique, and we've experimented with audio/wav, using the data: injection work-around, does not currently work all that well. It does not produce smooth transitions. We can use raw encoding instead of base64 to save on cpu cycles, but it's still quite hackish. It might be worthwhile implementing the API you want as a JavaScript library and see if you can actually do useful things with it. If the use cases are compelling and require native browser support to be performant enough, perhaps it could go into a future version of HTML. Overall, we can not make near-real-time effects, nor jitter-free compositions. We've used wav and midi in a JavaScript library, using the data: url technique. The data: injection technique is inefficient, it's not workable. Opera has been championing Xiph codecs on this list, There are ActionScript and Java Vorbis-players developed using the most basic of APIs. Isn't that use-case compelling enough? If the idea is to write a Vorbis decoder in JavaScript that would be quite cool in a way, but for vendors already implementing Vorbis it wouldn't really add anything. A pure JS-implementation of any modern audio codec would probably be a ridiculous amount of code and slow, so I doubt it would be that useful in practice. For some use cases you could use 2 audio elements in tandem, mixing new sound to a new data URI when the first is nearing the end (although sync can't be guaranteed with the current API). But yes, there are things which can only be done by a streaming API integrating into the underlying media framework. Here's the list of propsed features right out of a comment block in the spec: * frame forward / backwards / step(n) while paused * hasAudio, hasVideo, hasCaptions, etc * per-frame control: get current frame; set current frame * queue of content - pause current stream and insert content at front of queue to play immediately - pre-download another stream - add stream(s) to play at end of current stream - pause playback upon reaching a certain time - playlists, with the ability to get metadata out of them (e.g. xspf) * control over closed captions: - enable, disable, select language - event that sends caption text to script * in-band metadata and cue points to allow: - Chapter markers that synchronize to playback (without having to poll the playhead position) - Annotations on video content (i.e., pop-up video) - General custom metadata store (ratings, etc.) * notification of chapter labels changing on the fly: - onchapterlabelupdate, which has a time and a label * cue points that trigger at fixed intervals, so that e.g. animation can be synced with the video * general meta data, implemented as getters (don't expose the whole thing) - getMetadata(key: string, language: string) = HTMLImageElement or string - onmetadatachanged (no context info) * external captions support (request from John Foliot) * video: applying CSS filters * an event to notify people of when the video size changes (e.g. for chained Ogg streams of multiple independent videos) * balance and 3D position audio * audio filters * audio synthesis * feedback to the script on how well the video is playing - frames per second? - skipped frames per second? - an event that reports playback difficulties? - an arbitrary quality metric? * bufferingRate/bufferingThrottled (see v3BUF) * events for when the user agent's controls get shown or hidden so that the author's controls can get away of the UA's Your use cases probably fall under audio filters and synthesis. I expect that attention will turn to gradually more complex use cases when the basic API we have now is implemented and stable cross-browser and cross-platform. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Codecs for audio and video
On 7/7/09 1:10 PM, Philip Jagenstedt wrote: On Tue, 07 Jul 2009 17:52:29 +0200, Charles Pritchard ch...@jumis.com wrote: Philip Jagenstedt wrote: For all of the simpler use cases you can already generate sounds yourself with a data uri. For example, with is 2 samples of silence: data:audio/wav;base64,UklGRigAAABXQVZFZm10IBABAAEARKwAAIhYAQACABAAZGF0YQQA. Yes you can use this method, and with the current audio tag and autobuffer, it may work to some degree. It does not produce smooth transitions. At some point, a Blob / Stream API could make things like this easier. If the idea is to write a Vorbis decoder in JavaScript that would be quite cool in a way, but for vendors already implementing Vorbis it wouldn't really add anything. A pure JS-implementation of any modern audio codec would probably be a ridiculous amount of code and slow, so I doubt it would be that useful in practice. Well I'd like to disagree, and reiterate my prior arguments. Vorbis decoders have been written in ActionScript and in Java. They are not ridiculous, in size, nor in CPU usage. They can play audio streams, smoothly, and the file size is completely tolerable. And the idea is codec neutrality, a Vorbis decoder is just one example. For some use cases you could use 2 audio elements in tandem, mixing new sound to a new data URI when the first is nearing the end (although sync can't be guaranteed with the current API). But yes, there are things which can only be done by a streaming API integrating into the underlying media framework. Yes, the current API is inadequate. data: encoding is insufficient. Here's the list of propsed features right out of a comment block in the spec: This list of features can be written without a spec, using canvas, using a raw data buffer, and using ECMAScript. A few of these features may need hardware level support, or a fast computer. The audio tag would be invisible, and the canvas tag would provide the user interface. Your use cases probably fall under audio filters and synthesis. I expect that attention will turn to gradually more complex use cases when the basic API we have now is implemented and stable cross-browser and cross-platform. Yes, some of these use cases qualify as filters, some qualify as synthesis. I'm proposing that simple filters and synthesis can be accomplished with modern ECMAScript virtual machines and a raw data buffer. My use cases are qualified to current capabilities. Apart from those use cases, I'm proposing that a raw data buffer will allow for codec neutrality. There are dozens of minor audio codecs, some simpler than others, some low bitrate, that could be programmed in ECMAScript and would run just fine with modern ECMAScript VMs. Transcoding lossy data is a sub-optimal solution. Allowing for arbitrary audio codecs is a worthwhile endeavor. ECMAScript can detect if playback is too slow. Additionally, in some cases, the programmer could work-around broken codec implementations. It's forward-looking, it allows real backward compatibility and interoperability across browsers. canvas allows for arbitrary, programmable video, audio should allow for programmable audio. Then, we can be codec neutral in our media elements.
Re: [whatwg] Codecs for audio and video -- informative note?
2009/7/6 Jim Jewett jimjjew...@gmail.com: As of 2009, there is no single efficient codec which works on all modern browsers. Content producers are encouraged to supply the video in both Theora and H.264 formats, as per the following example A spec that makes an encumbered format a SHOULD is unlikely to be workable for those content providers, e.g. Wikimedia, who don't have the money, and won't under principle, to put up stuff in a format rendered radioactive by known enforced patents. Your wording presumes a paid Web all the way through. - d.
Re: [whatwg] Codecs for audio and video -- informative note?
The spec (at least from what I know) wants to create a unified experience, we don't want users to have a different experience from browser to browser. Nor do developers want to implement hacks for every browser. If no common ground can be reached then maybe no common ground is better than common ground, who knows it may spark new ideas that we haven't thought of yet. On Mon, Jul 6, 2009 at 1:18 PM, Aryeh Gregor simetrical+...@gmail.comsimetrical%2b...@gmail.com wrote: On Mon, Jul 6, 2009 at 4:01 AM, David Gerarddger...@gmail.com wrote: A spec that makes an encumbered format a SHOULD is unlikely to be workable for those content providers, e.g. Wikimedia, who don't have the money, and won't under principle, to put up stuff in a format rendered radioactive by known enforced patents. That's why should is not the same as must. Those who have a good reason not to do it can decline to do it. -- - Adam Shannon ( http://ashannon.us )
Re: [whatwg] Codecs for audio and video
Ian Hickson wrote: On Thu, 2 Jul 2009, Charles Pritchard wrote: I'd like to see canvas support added to the video tag (it's as natural as img). video elements can be painted onto canvas elements already; did you have something more in mind? This is sufficient. video can be used with the drawImage tag, and original video element can be hidden. and enable the audio tag to accept raw data lpcm), just as the canvas tag accepts raw data (bitmap). This is on the list of things to consider in a future version. At this point I don't really want to add new features yet because otherwise we'll never get the browser vendors caught up to implementing the same spec. :-) Consider a programmable audio element as a priority. Apple has said they will not carry the Vorbis codec in their distributions. I don't see any objection in them carrying a programmable audio element. With a raw data api, Vorbis enthusiasts may share their codec. Otherwise, HTML 5 is codec naive. Programmable audio has support in Flash and Java, and their wide audience. WebIDL is described in ECMAScript and Java. I believe this is an achievable compromise. It's already available in ActionScript and Java, and their associated VMs installed on well over 90% of web browsers, with open source development tools and compilers. And add an event handler when subtitles are enabled / disabled. This is on the list for the next version. (Generally speaking, control over subtitles is one of the very next features that will be added.) Meanwhile, using a canvas tag, drawImage(HTMLVideoElement) and fillText are sufficient. I'd think that FLAC would make more sense than PCM-in-Wave, as a PNG analog. I encourage you to suggest this straight to the relevant vendors. As it stands, HTML5 is codec-neutral. Perhaps it's not necessary. There's no need to standardize on an audio codec. It can be handled in a device independent manner, provided the player for that codec can be expressed in ECMAScript+Java (see: WebIDL). I'd like to see a font analog in audio as well. Canvas supports the font attribute, audio could certainly support sound fonts. Use a generated pitch if your platform can't or doesn't store sound fonts. This seems like an issue for the CSS or SVG working groups, who are working on font-related technologies. Should it gather any momentum, this would be an attribute which HTMLMediaElement tags should be aware of. For example: audio style=-ext-audio-font: tone|url(arbitrary.url) /. Should the audio source be a container format, such as midi, it would interpret the css property. User agents should provide controls to enable the manual selection of fallback content. There's no reason the UA couldn't provide an override mechanism to select an alternative source if the UA vendor so desires. I was considering the current dead-lock about the situation of h.264 and theora codecs. An explicit statement encourages UA-developers to adopt good practices. Many non-technical users will want to know why there is a black screen (or still image), even though they can hear the audio. This section works well: User agents that cannot render the video may instead make the element represent a link to an external video playback utility or to the video data itself.
Re: [whatwg] Codecs for audio and video
On Mon, 6 Jul 2009, Charles Pritchard wrote: This is on the list of things to consider in a future version. At this point I don't really want to add new features yet because otherwise we'll never get the browser vendors caught up to implementing the same spec. :-) Consider a programmable audio element as a priority. Could you elaborate on what your use cases are? Is it just the ability to manually decode audio tracks? I'd like to see a font analog in audio as well. Canvas supports the font attribute, audio could certainly support sound fonts. Use a generated pitch if your platform can't or doesn't store sound fonts. This seems like an issue for the CSS or SVG working groups, who are working on font-related technologies. Should it gather any momentum, this would be an attribute which HTMLMediaElement tags should be aware of. For example: audio style=-ext-audio-font: tone|url(arbitrary.url) /. Should the audio source be a container format, such as midi, it would interpret the css property. If the CSS working group go down this road, I shall make sure HTML5 is compatible with what the CSS working group define. User agents should provide controls to enable the manual selection of fallback content. There's no reason the UA couldn't provide an override mechanism to select an alternative source if the UA vendor so desires. I was considering the current dead-lock about the situation of h.264 and theora codecs. An explicit statement encourages UA-developers to adopt good practices. I encourage you to contact the user agents directly. I generally try to avoid specifying UI in the specification. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
Ian Hickson wrote: On Mon, 6 Jul 2009, Charles Pritchard wrote: This is on the list of things to consider in a future version. At this point I don't really want to add new features yet because otherwise we'll never get the browser vendors caught up to implementing the same spec. :-) Consider a programmable audio element as a priority. Could you elaborate on what your use cases are? Is it just the ability to manually decode audio tracks? Some users could manually decode a Vorbis audio stream. I'm interested in altering pitch and pre-mixing channels. I believe some of these things are explored in CSS already. There are accessibility cases, for the visually impaired, and I think that they will be better explored. -Charles
Re: [whatwg] Codecs for audio and video
On Mon, 6 Jul 2009, Charles Pritchard wrote: Ian Hickson wrote: On Mon, 6 Jul 2009, Charles Pritchard wrote: This is on the list of things to consider in a future version. At this point I don't really want to add new features yet because otherwise we'll never get the browser vendors caught up to implementing the same spec. :-) Consider a programmable audio element as a priority. Could you elaborate on what your use cases are? Is it just the ability to manually decode audio tracks? Some users could manually decode a Vorbis audio stream. I'm interested in altering pitch and pre-mixing channels. I believe some of these things are explored in CSS already. There are accessibility cases, for the visually impaired, and I think that they will be better explored. If you could elaborate on these use cases that would be really useful. How do you envisage using these features on Web pages? (There has been some discussion of suppoting an audio canvas before, but a lack of compelling use cases has really been the main blocker. Without a good understanding of the use cases, it's hard to design an API.) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
Ian Hickson wrote: On Mon, 6 Jul 2009, Charles Pritchard wrote: Ian Hickson wrote: On Mon, 6 Jul 2009, Charles Pritchard wrote: This is on the list of things to consider in a future version. At this point I don't really want to add new features yet because otherwise we'll never get the browser vendors caught up to implementing the same spec. :-) Consider a programmable audio element as a priority. Could you elaborate on what your use cases are? Is it just the ability to manually decode audio tracks? Some users could manually decode a Vorbis audio stream. I'm interested in altering pitch and pre-mixing channels. I believe some of these things are explored in CSS already. There are accessibility cases, for the visually impaired, and I think that they will be better explored. If you could elaborate on these use cases that would be really useful. How do you envisage using these features on Web pages? Use a sound of varying pitch to hint to a user the location of their mouse (is it hovering over a button, is it x/y pixels away from the edge of the screen, how close is it to the center). Alter the pitch of a sound to make a very cheap midi instrument. Pre-mix a few generated sounds, because the client processor is slow. Alter the pitch of an actual audio recording, and pre-mix it, to give different sounding voices to pre-recorded readings of a single text. As has been tried for male female sound fonts. Support very simple audio codecs, and programmable synthesizers. The API must support a playback buffer. putAudioBuffer(in AudioData) [ Error if audiodata properties are not supported ] createAudioData( in sample hz, bits per sample, length ) [ Error if properties are not supported. ] AudioData ( sampleHz, bitsPerSample, length, AudioDataArray ) AudioDataArray ( length, IndexGetter, IndexSetter ) 8 bits per property. I think that's about it. (There has been some discussion of suppoting an audio canvas before, but a lack of compelling use cases has really been the main blocker. Without a good understanding of the use cases, it's hard to design an API.)
Re: [whatwg] Codecs for audio and video
On Jul 6, 2009, at 6:08 PM, Ian Hickson wrote: On Mon, 6 Jul 2009, Charles Pritchard wrote: This is on the list of things to consider in a future version. At this point I don't really want to add new features yet because otherwise we'll never get the browser vendors caught up to implementing the same spec. :-) Consider a programmable audio element as a priority. Could you elaborate on what your use cases are? Is it just the ability to manually decode audio tracks? I actually have thought about this. Having an ability to post- process, mix, or generate audio content manually is useful for certain classes of games and applications. --Oliver
Re: [whatwg] Codecs for audio and video
Could you elaborate on what your use cases are? Is it just the ability to manually decode audio tracks? I actually have thought about this. Having an ability to post-process, mix, or generate audio content manually is useful for certain classes of games and applications. There's been some discussion about this type of functionality in this Mozilla bug too: https://bugzilla.mozilla.org/show_bug.cgi?id=490705 Chris. -- http://www.bluishcoder.co.nz
Re: [whatwg] Codecs for audio and video
Audible mouse feedback is an OS thing, not an HTML thing. I would rather have programmatic access to the MIDI synthesizer rather than be able to simulate it with a beep. How do you detect that the client mixer is too slow? Why can't you just get the premixed jingles from the server? Isn't the reading voice a CSS thing? Isn't sound transformation hard enough to deserve a complete API? I think allowing playing with binary audio data is not going to help most programmers who do not have the slightest idea of how to deal with it. Imagine a Canvas interface with PutPixel only. IMHO, Chris
Re: [whatwg] Codecs for audio and video -- informative note?
On Sun, Jul 5, 2009 at 8:02 PM, Jim Jewett jimjjew...@gmail.com wrote: Ian Hickson wrote: | video does support fallback, so in practice you can just use Theora and | H.264 and cover all bases. Could you replace the codec section with at least an informative note to this effect? Something like, As of 2009, there is no single efficient codec which works on all modern browsers. Content producers are encouraged to supply the video in both Theora and H.264 formats, as per the following example (If there is an older royalty-free format that is universally supported, then please mention that as well, as it will still be sufficient for some types of videos, such as crude animations.) The browser vendors were not able to implement the same codec (because of patent's and copyrights), so no codec was able to be chosen. ( http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-July/ ) -jJ -- - Adam Shannon ( http://ashannon.us )
Re: [whatwg] Codecs for audio and video -- informative note?
On Sun, 5 Jul 2009, Jim Jewett wrote: Ian Hickson wrote: | video does support fallback, so in practice you can just use Theora and | H.264 and cover all bases. Could you replace the codec section with at least an informative note to this effect? Something like, As of 2009, there is no single efficient codec which works on all modern browsers. Content producers are encouraged to supply the video in both Theora and H.264 formats, as per the following example I might add some text along these lines once it's clearer what the implementations have actually deployed. Right now, video implementations are still quite young. (If there is an older royalty-free format that is universally supported, then please mention that as well, as it will still be sufficient for some types of videos, such as crude animations.) There isn't, as far as I know. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
On Fri, 3 Jul 2009, Silvia Pfeiffer wrote: The codecs issue is being still actively worked on by a number of parties, as it has been for over a year now; the removal of the sections from the spec doesn't affect this. I still hold high hopes that one day we can find a solution to this that all vendors are willing to implement. The bit that would be important to keep is the list of requirements on a baseline codec. The only real requirement is that all the relevant implementors be willing to ship support for the codec. I think pretty much everyone understand the requirements that the various parties have for that (though it varies from vendor to vendor). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
On Thu, 2 Jul 2009, Charles Pritchard wrote: I'd like to see canvas support added to the video tag (it's as natural as img). video elements can be painted onto canvas elements already; did you have something more in mind? and enable the audio tag to accept raw data lpcm), just as the canvas tag accepts raw data (bitmap). This is on the list of things to consider in a future version. At this point I don't really want to add new features yet because otherwise we'll never get the browser vendors caught up to implementing the same spec. :-) add raw pixel support to video (via CanvasRenderingContext2D). Not really sure what this would look like. Can't you just do that with canvas already? And add an event handler when subtitles are enabled / disabled. This is on the list for the next version. (Generally speaking, control over subtitles is one of the very next features that will be added.) On Tue, 30 Jun 2009, Dr. Markus Walther wrote: Having removed everything else in these sections, I figured there wasn't that much value in requiring PCM-in-Wave support. However, I will continue to work with browser vendors directly and try to get a common codec at least for audio, even if that is just PCM-in-Wave. I'd think that FLAC would make more sense than PCM-in-Wave, as a PNG analog. I encourage you to suggest this straight to the relevant vendors. As it stands, HTML5 is codec-neutral. I'd like to see a font analog in audio as well. Canvas supports the font attribute, audio could certainly support sound fonts. Use a generated pitch if your platform can't or doesn't store sound fonts. This seems like an issue for the CSS or SVG working groups, who are working on font-related technologies. User agents should provide controls to enable the manual selection of fallback content. There's no reason the UA couldn't provide an override mechanism to select an alternative source if the UA vendor so desires. User agents should provide an activation behavior, when fallback content is required, detailing why the primary content could not be used. There's no reason why UAs can't do that now. The spec doesn't prohibit such UI. Many non-technical users will want to know why there is a black screen (or still image), even though they can hear the audio. This kind of situation is of course why we really want to have a common codec. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
--- On Mon, 6/29/09, Ian Hickson i...@hixie.ch wrote: 2. The remaining H.264 baseline patents owned by companies who are not willing to license them royalty-free expire, leading to H.264 support being available without license fees. = H.264 becomes the de facto codec for the Web. This could be awhile. I took the US patents listed on at: http://www.mpegla.com/avc/avc-patentlist.cfm in http://www.mpegla.com/avc/avc-att1.pdf and checked their expiration dates. The last ones expires in 2028. There maybe other patents that are not listed, and some that are listed may not actually be essential to the real date for H.264 being royalty free could be different. Here is the complete list: Patent: 7,292,636 Filed: 02 mar 2004 Granted: 06 nov 2007 Expiration: 02 mar 2024 Summary: Using order value for processing a video picture Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=7,292,636 Patent: RE35,093 Filed: 03 dec 1990 Granted: 09 mar 1993 Expiration: 03 dec 2010 Summary: Systems and methods for coding even fields of interlaced video sequences Notes: Reissue of 05193004 filed 09 dec 1994 granted 21 nov 1995 http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=RE35,093 Patent: 7,388,916 Filed: 07 jul 2001 Granted: 17 jun 2008 Expiration: 07 jul 2021 Summary: Water ring scanning apparatus and method, and apparatus and method for encoding/decoding video sequences using the same Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=7,388,916 Patent: 4,796,087 Filed: 01 jun 1987 Granted: 03 jan 1989 Expiration: 01 jun 2007 Summary: Process for coding by transformation for the transmission of picture signals Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=4,796,087 Patent: 6,894,628 Filed: 17 jul 2003 Granted: 17 may 2005 Expiration: 17 jul 2023 Summary: Apparatus and methods for entropy-encoding or entropy-decoding using an initialization of context variables Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=6,894,628 Patent: 6,900,748 Filed: 17 jul 2003 Granted: 31 may 2005 Expiration: 17 jul 2023 Summary: Method and apparatus for binarization and arithmetic coding of a data value Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=6,900,748 Patent: 7,088,271 Filed: 03 may 2005 Granted: 08 aug 2006 Expiration: 03 may 2025 Summary: Method and apparatus for binarization and arithmetic coding of a data value Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=7,088,271 Patent: 6,943,710 Filed: 04 dec 2003 Granted: 13 sep 2005 Expiration: 04 dec 2023 Summary: Method and arrangement for arithmetic encoding and decoding binary states and a corresponding computer program and a corresponding computer-readable storage medium Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=6,943,710 Patent: 7,286,710 Filed: 01 oct 2003 Granted: 23 oct 2007 Expiration: 01 oct 2023 Summary: Coding of a syntax element contained in a pre-coded video signal Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=7,286,710 Patent: 7,379,608 Filed: 04 dec 2003 Granted: 27 may 2008 Expiration: 04 dec 2023 Summary: Arithmetic coding for transforming video and picture data units Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=7,379,608 Patent: 7,496,143 Filed: 27 dec 2004 Granted: 24 feb 2009 Expiration: 27 dec 2024 Summary: Method and arrangement for coding transform coefficients in picture and/or video coders and decoders and a corresponding computer program and a corresponding computer-readable storage medium Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=7,496,143 Patent: 5,235,618 Filed: 06 nov 1990 Granted: 10 aug 1993 Expiration: 06 nov 2010 Summary: Video signal coding apparatus, coding method used in the video signal coding apparatus and video signal coding transmission system having the video signal coding apparatus Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=5,235,618 Patent: 4,849,812 Filed: 24 feb 1988 Granted: 18 jul 1989 Expiration: 24 feb 2008 Summary: Television system in which digitized picture signals subjected to a transform coding are transmitted from an encoding station to a decoding station Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=4,849,812 Patent: 5,021,879 Filed: 24 sep 1990 Granted: 04 jun 1991 Expiration: 24 sep 2010 Summary: System for transmitting video pictures Notes: http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=5,021,879 Patent: 5,128,758 Filed: 02 jun 1989 Granted: 07 jul 1992 Expiration: 02 jun 2009 Summary: Method and apparatus for digitally processing a high definition television augmentation signal Notes:
Re: [whatwg] Codecs for audio and video
On Wed, 01 Jul 2009 19:01:02 +0200, Philip Jägenstedt phil...@opera.com wrote: On Wed, 01 Jul 2009 18:29:17 +0200, Peter Kasting pkast...@google.com wrote: On Wed, Jul 1, 2009 at 2:41 AM, Anne van Kesteren ann...@opera.com wrote: On Tue, 30 Jun 2009 21:39:05 +0200, Peter Kasting pkast...@google.com wrote: There is no other reason to put a codec in the spec -- the primary reason to spec a behavior (to document vendor consensus) does not apply. Some vendors agreed, and some objected violently is not consensus. The vendor consensus line of argument seems like a very dangerous slippery slope. It would mean that whenever a vendor refuses to implement something it has to be taken out of the specification. I.e. giving a single vendor veto power over the documentation of the Web Platform. Not good at all in my opinion. I am merely echoing Hixie; from his original email in this thread: At the end of the day, the browser vendors have a very effective absolute veto on anything in the browser specs, You mean they have the power to derail a spec? They have the power to not implement the spec, turning the spec from a useful description of implementations into a work of fiction. That's something I would have considered before the advent of Mozilla Firefox. Mozilla also has the power of veto here. For example, if we required that the browsers implement H.264, and Mozilla did not, then the spec would be just as equally fictional as it would be if today we required Theora. My sole goal was to try and point out that the situation with codecs is not equivalent to past cases where vendors merely _hadn't implemented_ part of the spec; in this case vendors have _actively refused_ to implement support for various codecs (Apple with Theora and Mozilla(/Opera?) with H.264). PK That is correct, we consider H.264 to be incompatible with the open web platform due to its patent licensing. For the time being we will support Ogg Vorbis/Theora, which is the best option patent-wise and neck-in-neck with the competition in the quality-per-bit section (especially with recent encoder improvements). We would love to see it as the baseline for HTML5, but in the absense of that hope that the web community will push it hard enough so that it becomes the de-facto standard. A private email has correctly pointed out that neck-in-neck is exaggerating Theora's quality when compared to a state of the art H.264 encoder. It's worth pointing out though that in the immediate future Theora/Vorbis will also be competing against FLV (H.263/MP3), where it compares favorably (especially for audio). Some relevant reading: http://web.mit.edu/xiphmont/Public/theora/demo7.html http://people.xiph.org/~greg/video/ytcompare/comparison.html http://www-cs-faculty.stanford.edu/~nick/theora-soccer/ Previously, I have watched the video's from Greg's ~499kbit/sec comparison of H.264 (as encoded by YouTube) and Theora and it is clear that Theora has more block artifacts and noise overall. However, in the H.264 version when the bunny hole is zoomed in the bunny flicks with a period with about 1 sec, similar to the effect of a keyframe in a dark scene (I haven't inspected if these were actually keyframes). I'd argue that for this clip, which you prefer depends on how you like your artifacts and if you're watching in fullscreen or not (block artifacts are more noticeable when magnified while a keyframe flick is noticeable also at 1:1 magnification). However, this comparison was only made to counter a claim by Chris DiBona and doesn't show that Theora is neck-in-neck with H.264 in the general case. It does however show that it is neck-in-neck with what YouTube produces. I haven't watched the video's of the soccer comparison, but assume that the conclusions are roughly correct. H.264 is a more modern codec than Theora and will outperform it if the best encoders from both sides are used, I don't want to imply anything else. Still, Theora is certainly good enough at this point that the deciding factor is no longer about technical quality, but about finding a common baseline that everyone can support. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Codecs for audio and video
whatwg-requ...@lists.whatwg.org wrote: Message: 3 Date: Tue, 30 Jun 2009 04:50:31 + (UTC) From: Ian Hickson i...@hixie.ch Subject: [whatwg] Codecs for audio and video To: WHATWG wha...@whatwg.org Message-ID: pine.lnx.4.62.0906292331380.1...@hixie.dreamhostps.com After an inordinate amount of discussions, both in public and privately, on the situation regarding codecs for video and audio in HTML5, I have reluctantly come to the conclusion that there is no suitable codec that all vendors are willing to implement and ship. There are many considerations when supporting video codecs, licensing, hardware, complexity, cost. Can the standard simply address video containers (OGG, MKV, AVI) ? Each container is fairly easy to implement and codecs can be identified within the container. Vendors can decide on their own what to do with that information. I'd appreciate for the container formats to be supported, so I can get a nice message saying: Your codec : Vorbis, is not supported by your computer without a plugin, and/or have the ability to right click, hit properties, and get some data even without supporting the codec. I've come across a few wav and avi files, where I've needed to download the file, and download a helper application to simply identify what codec I need to install. If the browser had at least [properly] supported the container, it would have saved me quite a bit of time and effort. Better to say: Codec XXX not installed instead of: I don't know what it is, good luck with that. I realize the codec= attribute provides hints, it's a pain for the document author. As far as I know, there are no patent issues for popular container formats. -Charles I have therefore removed the two subsections in the HTML5 spec in which codecs would have been required, and have instead left the matter undefined, as has in the past been done with other features like img and image formats, embed and plugin APIs, or Web fonts and font formats.
Re: [whatwg] Codecs for audio and video
I understand that people are disappointed that we can't require Theora support. I am disappointed in the lack of progress on this issue also. On Tue, 30 Jun 2009, Silvia Pfeiffer wrote: On Tue, Jun 30, 2009 at 2:50 PM, Ian Hicksoni...@hixie.ch wrote: I considered requiring Ogg Theora support in the spec, since we do have three implementations that are willing to implement it, but it wouldn't help get us true interoperabiliy, since the people who are willing to implement it are willing to do so regardless of the spec, and the people who aren't are not going to be swayed by what the spec says. Inclusion of a required baseline codec into a standard speaks more loudly than you may think. It provides confidence - confidence that an informed choice has been made as to the best solution in a given situation. Confidence to Web developers, confidence to hosting providers, confidence also (but less so, since they are gatekeepers in this situation) to Browser Vendors. Words in the spec only provide this confidence because that confidence is well-placed. If we start making the spec say things that don't represent reality -- like everyone agrees that you should use Theora -- then that confidence will erode in all parts of the spec. In my opinion, including a baseline codec requirement into a W3C specification that is not supported by all Browser Vendors is much preferable over an unclear situation Having a baseline codec requirement that is not supported by all Browser Vendors _is_ an unclear situation. Adding the requirement doesn't stop that. In fact, it is a tradition of HTML to have specifications that are only supported by a limited set of Browser Vendors and only over time increasingly supported by all - e.g. how long did it take for all Browser vendors to accept css2, and many of the smaller features of html4 such as fixed positioning? CSS2.1 and HTML5 were reactions to the exact problem you describe in CSS2 and HTML4. I firmly believe that making the decision to give up on baseline codecs is repeating a mistake made and repeatedly cited as a mistake on the lack of specification of a baseline format for images - which is one of the reasons why it took years to have two baseline image codecs available in all browsers. We could try the other route for a change and see if standards can actually make a difference to adoption. We basically tried the idea of requiring something that browser vendors didn't agree with with XHTML2. IMHO that is a far bigger mistake than not specifying something that people don't agree to implement. Going forward, I see several (not mutually exclusive) possibilities, all of which will take several years: �1. Ogg Theora encoders continue to improve. Off-the-shelf hardware Ogg � �Theora decoder chips become available. Google ships support for the � �codec for long enough without getting sued that Apple's concern � �regarding submarine patents is reduced. = Theora becomes the de facto � �codec for the Web. This to me is a defeat of the standardisation process. Standards are not there to wait for the market to come up with a de-facto standard. They are there to provide confidence to the larger market about making a choice - no certainty of course, but just that much more confidence that it matters. Actually HTML5 is largely built on the idea of speccing the de-facto standards, either long after they were implemented, or in tandem with them being implemented. Very little of HTML5 has been ahead of implementations. �2. The remaining H.264 baseline patents owned by companies who are not � �willing to license them royalty-free expire, leading to H.264 support � �being available without license fees. = H.264 becomes the de facto � �codec for the Web. That could take many years. Yup. Your main argument against Theora is a recent email stating that YouTube could not be run using Theora. No; my only argument against Theora is that Apple won't implement it. No, that was just a tactical withdrawal. This e-mail here is the one that admits defeat. :-) First rule of standards: never give up! My first rule of standards, if I have one, would be spec reality. Including a Theora prescription and having it only partially supported with at least give a large part of the world an interoperable platform and that's all HTML has traditionally been. The same argument could be made for H.264. At the end of the day, if the relevant implementors (in this case browser vendors) aren't willing to all implement the same codec, then the codecs will just have to compete on their own merits. This isn't HTML5's fight. On Tue, 30 Jun 2009, Jeff McAdams wrote: But Most people agreed, and one or two vendors objected violently probably is. Just because one or two people are really loud, doesn't mean that there isn't concensus. The WHATWG doesn't work on consensus.
Re: [whatwg] Codecs for audio and video
On Fri, Jul 3, 2009 at 9:26 AM, Ian Hicksoni...@hixie.ch wrote: Going forward, I see several (not mutually exclusive) possibilities, all of which will take several years: 1. Ogg Theora encoders continue to improve. Off-the-shelf hardware Ogg Theora decoder chips become available. Google ships support for the codec for long enough without getting sued that Apple's concern regarding submarine patents is reduced. = Theora becomes the de facto codec for the Web. This to me is a defeat of the standardisation process. Standards are not there to wait for the market to come up with a de-facto standard. They are there to provide confidence to the larger market about making a choice - no certainty of course, but just that much more confidence that it matters. Actually HTML5 is largely built on the idea of speccing the de-facto standards, either long after they were implemented, or in tandem with them being implemented. Very little of HTML5 has been ahead of implementations. What about Internet Explorer, the browser with the largest market share? Bascially all of HTML5 is ahead of being implemented in IE. I don't think that argument holds. No; my only argument against Theora is that Apple won't implement it. What about XiphQT? It doesn't matter that Apple doesn't natively support Theora - the software exists to provide the support. Therefore, the argument that Apple doesn't support Theora doesn't hold up. It's not Apple that matters, but their browser. Safari and Webkit have Theora support. There is an implementation. In fact, I have not heard Apple object violently to an inclusion of Theora into the specification as baseline codec. I have only heard them object to a native implementation in Safari for submarine patent threat reasons. Including a Theora prescription and having it only partially supported with at least give a large part of the world an interoperable platform and that's all HTML has traditionally been. The same argument could be made for H.264. Except that H.264 does not qualify any of the royalty-free requirements that W3C has for standardising technology. You've said a couple of things that I perceive as contradictory, here. You've said that you want to help bring about interoperability, but then you've also said that you're only documenting what it is that the browser makers are implementing. There is room in the standards bodies world for both goals, and both goals, at times are valid and beneficial. But, if your intent is to help bring about interoperability, *real* interoperability, then I think its pretty clear that the way forward involves specifying a baseline codec. The way forward involves getting to the point where we can specify a baseline codec, yes. But we do that _after_ everyone agrees on a baseline codec. While it may appear that I write stuff in the spec and the browser vendors then agree to it, in practice it's usually the other way around. That I can understand. But in this case, you should leave the paragraph in the spec that states the need for a baseline codec, since the situation hasn't changed and we are still striving for a baseline codec. Regards, Silvia.
Re: [whatwg] Codecs for audio and video
On Fri, 3 Jul 2009, Silvia Pfeiffer wrote: Actually HTML5 is largely built on the idea of speccing the de-facto standards, either long after they were implemented, or in tandem with them being implemented. Very little of HTML5 has been ahead of implementations. What about Internet Explorer, the browser with the largest market share? Bascially all of HTML5 is ahead of being implemented in IE. I don't think that argument holds. Not every part of HTML5 is implemented by everyone at the same time. There's always going to be some areas that are ahead of simple implementations, as different vendors focus on different things. Plenty in HTML5 was implemented by IE first, though -- drag and drop, contentEditable, XHR (now in a separate spec), etc. No; my only argument against Theora is that Apple won't implement it. What about XiphQT? It doesn't matter that Apple doesn't natively support Theora - the software exists to provide the support. Does XiphQT ship with Safari? If not, then it doesn't count as part of the implementation. Therefore, the argument that Apple doesn't support Theora doesn't hold up. It's not Apple that matters, but their browser. Safari and Webkit have Theora support. There is an implementation. Safari on my iPod Touch certainly doesn't, even with all the will in the world on the behalf of the user to install third-party software. In fact, I have not heard Apple object violently to an inclusion of Theora into the specification as baseline codec. I have only heard them object to a native implementation in Safari for submarine patent threat reasons. Their objection to implementing the requirement if it is put in the spec is what is holding up the spec requiring it. That I can understand. But in this case, you should leave the paragraph in the spec that states the need for a baseline codec, since the situation hasn't changed and we are still striving for a baseline codec. I'm not holding up the spec just because we haven't found a codec to use with the spec. This working group can override me on this if it is the desire of the group, but in the meantime, I'm trying to drive down to Last Call by October and part of that is going through open issues and either resolving them, or admitting that they can't be resolved by then and moving on. The alternative is to deadlock, and that is worse. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
On Fri, Jul 3, 2009 at 10:19 AM, Ian Hicksoni...@hixie.ch wrote: On Fri, 3 Jul 2009, Silvia Pfeiffer wrote: That I can understand. But in this case, you should leave the paragraph in the spec that states the need for a baseline codec, since the situation hasn't changed and we are still striving for a baseline codec. I'm not holding up the spec just because we haven't found a codec to use with the spec. This working group can override me on this if it is the desire of the group, but in the meantime, I'm trying to drive down to Last Call by October and part of that is going through open issues and either resolving them, or admitting that they can't be resolved by then and moving on. The alternative is to deadlock, and that is worse. OK, that answers my earlier question of why now. That action I can understand. But then I would say we should have a collection of things that we need to work on for a second version of HTML5, which obviously includes the baseline codec question and also audio/video a11y, and probably many more things I am not aware of. Is there such a list somewhere? Since all these things are now being pulled out of the spec, they need to continue living somewhere else. Regards, Silvia.
Re: [whatwg] Codecs for audio and video
On Fri, 3 Jul 2009, Silvia Pfeiffer wrote: That action I can understand. But then I would say we should have a collection of things that we need to work on for a second version of HTML5, which obviously includes the baseline codec question and also audio/video a11y, and probably many more things I am not aware of. Is there such a list somewhere? Since all these things are now being pulled out of the spec, they need to continue living somewhere else. The spec itself documents these in comments in the source. Search for v2 in the spec source. The codecs issue is being still actively worked on by a number of parties, as it has been for over a year now; the removal of the sections from the spec doesn't affect this. I still hold high hopes that one day we can find a solution to this that all vendors are willing to implement. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
On Fri, Jul 3, 2009 at 10:36 AM, Ian Hicksoni...@hixie.ch wrote: On Fri, 3 Jul 2009, Silvia Pfeiffer wrote: That action I can understand. But then I would say we should have a collection of things that we need to work on for a second version of HTML5, which obviously includes the baseline codec question and also audio/video a11y, and probably many more things I am not aware of. Is there such a list somewhere? Since all these things are now being pulled out of the spec, they need to continue living somewhere else. The spec itself documents these in comments in the source. Search for v2 in the spec source. Ok, cool. The codecs issue is being still actively worked on by a number of parties, as it has been for over a year now; the removal of the sections from the spec doesn't affect this. I still hold high hopes that one day we can find a solution to this that all vendors are willing to implement. The bit that would be important to keep is the list of requirements on a baseline codec. Cheers, Silvia.
Re: [whatwg] Codecs for audio and video
I'd like to see some progress on these two tags. I'd like people to consider that Vorbis can be implemented in virtual machines (Java, Flash) which support raw PCM data. Theora is no different. I'd like to see canvas support added to the video tag (it's as natural as img). and enable the audio tag to accept raw data lpcm), just as the canvas tag accepts raw data (bitmap). Then you can support any codec you create, as well as use system codecs. You can't make the impossible happen (no HD video on an old 300mhz machine), but you'd have the freedom to do the improbable. Add raw pcm and sound font support to audio, add raw pixel support to video (via CanvasRenderingContext2D). And add an event handler when subtitles are enabled / disabled. I have further, more specific comments, below. and at the end of the e-mail, two additions to the standard. Ian Hickson wrote: I understand that people are disappointed that we can't require Theora support. I am disappointed in the lack of progress on this issue also. On Tue, 30 Jun 2009, Dr. Markus Walther wrote: Having removed everything else in these sections, I figured there wasn't that much value in requiring PCM-in-Wave support. However, I will continue to work with browser vendors directly and try to get a common codec at least for audio, even if that is just PCM-in-Wave. I'd think that FLAC would make more sense than PCM-in-Wave, as a PNG analog. Consider the canvas element. PNG implementations may be broken. Internally, canvas accepts a raw byte array, a 32 bit bitmap, and allows a string-based export of a compressed bitmap, as a base64 encoded 32 bit png. The audio element should accept a raw byte array, 32 bit per sample lpcm, and allow a similar export of base64 encoded file, perhaps using FLAC. Canvas can currently be used to render unsupported image formats (and mediate unsupported image containers), it's been proven with ActionScript that a virtual machine can also support otherwise unsupported audio codecs. I'd like to see a font analog in audio as well. Canvas supports the font attribute, audio could certainly support sound fonts. Use a generated pitch if your platform can't or doesn't store sound fonts. Please, please do so - I was shocked to read that PCM-in-Wave as the minimal 'consensus' container for audio is under threat of removal, too. There seems to be some confusion between codecs and containers. WAV, OGG, AVI and MKV are containers, OSC is another. Codecs are a completely separate matter. It's very clear that Apple will not distribute the Vorbis and Theora codecs with their software packages. It's likely that Apple would like to use a library they don't have to document, as required by most open source licenses, and they see no current reason to invest money into writing a new one. Apple supports many chipsets, and many content agreements, it would be costly. I see no reason why Apple could not support the OGG container. That said, I see no reason why a list of containers needs to be in the HTML 5 spec. On Thu, 2 Jul 2009, Charles Pritchard wrote: Can the standard simply address video containers (OGG, MKV, AVI) ? Each container is fairly easy to implement and codecs can be identified within the container. Vendors can decide on their own what to do with that information. The spec does document how to distinguish containers via MIME type. Beyond that I'm not sure what we can do. video does support fallback, so in practice you can just use Theora and H.264 and cover all bases. I'd like to see this added to audio and video: User agents should provide controls to enable the manual selection of fallback content. User agents should provide an activation behavior, when fallback content is required, detailing why the primary content could not be used. Many non-technical users will want to know why there is a black screen (or still image), even though they can hear the audio. -Charles
Re: [whatwg] Codecs for audio and video
Do you have an idea on how to introduce fall back support for browsers that don't even support canvas, how will they be expected to implement a base64 string when they skip the element's attributes? Might a img tag work with the src= set to the same string as the base64? Or would that contradict the point of allowing video, audio, and canvas the extra abilities? Because if people can just use the img tag which is more comfortable to them, why would they feel the urge to switch? On Thu, Jul 2, 2009 at 8:51 PM, Charles Pritchard ch...@jumis.com wrote: I'd like to see some progress on these two tags. I'd like people to consider that Vorbis can be implemented in virtual machines (Java, Flash) which support raw PCM data. Theora is no different. I'd like to see canvas support added to the video tag (it's as natural as img). and enable the audio tag to accept raw data lpcm), just as the canvas tag accepts raw data (bitmap). Then you can support any codec you create, as well as use system codecs. You can't make the impossible happen (no HD video on an old 300mhz machine), but you'd have the freedom to do the improbable. Add raw pcm and sound font support to audio, add raw pixel support to video (via CanvasRenderingContext2D). And add an event handler when subtitles are enabled / disabled. I have further, more specific comments, below. and at the end of the e-mail, two additions to the standard. Ian Hickson wrote: I understand that people are disappointed that we can't require Theora support. I am disappointed in the lack of progress on this issue also. On Tue, 30 Jun 2009, Dr. Markus Walther wrote: Having removed everything else in these sections, I figured there wasn't that much value in requiring PCM-in-Wave support. However, I will continue to work with browser vendors directly and try to get a common codec at least for audio, even if that is just PCM-in-Wave. I'd think that FLAC would make more sense than PCM-in-Wave, as a PNG analog. Consider the canvas element. PNG implementations may be broken. Internally, canvas accepts a raw byte array, a 32 bit bitmap, and allows a string-based export of a compressed bitmap, as a base64 encoded 32 bit png. The audio element should accept a raw byte array, 32 bit per sample lpcm, and allow a similar export of base64 encoded file, perhaps using FLAC. Canvas can currently be used to render unsupported image formats (and mediate unsupported image containers), it's been proven with ActionScript that a virtual machine can also support otherwise unsupported audio codecs. I'd like to see a font analog in audio as well. Canvas supports the font attribute, audio could certainly support sound fonts. Use a generated pitch if your platform can't or doesn't store sound fonts. Please, please do so - I was shocked to read that PCM-in-Wave as the minimal 'consensus' container for audio is under threat of removal, too. There seems to be some confusion between codecs and containers. WAV, OGG, AVI and MKV are containers, OSC is another. Codecs are a completely separate matter. It's very clear that Apple will not distribute the Vorbis and Theora codecs with their software packages. It's likely that Apple would like to use a library they don't have to document, as required by most open source licenses, and they see no current reason to invest money into writing a new one. Apple supports many chipsets, and many content agreements, it would be costly. I see no reason why Apple could not support the OGG container. That said, I see no reason why a list of containers needs to be in the HTML 5 spec. On Thu, 2 Jul 2009, Charles Pritchard wrote: Can the standard simply address video containers (OGG, MKV, AVI) ? Each container is fairly easy to implement and codecs can be identified within the container. Vendors can decide on their own what to do with that information. The spec does document how to distinguish containers via MIME type. Beyond that I'm not sure what we can do. video does support fallback, so in practice you can just use Theora and H.264 and cover all bases. I'd like to see this added to audio and video: User agents should provide controls to enable the manual selection of fallback content. User agents should provide an activation behavior, when fallback content is required, detailing why the primary content could not be used. Many non-technical users will want to know why there is a black screen (or still image), even though they can hear the audio. -Charles -- - Adam Shannon ( http://ashannon.us )
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 2:35 PM, Maciej Stachowiakm...@apple.com wrote: However, it's quite clear from even a cursory investigation that H.264 ASICs are available from multiple vendors. This would not be the case if they weren't shipping in high volume products. As I'm sure you know, ASICs have fairly high up-front costs so they need volume to be cost effective. It's a chicken and egg problem then. Once there is volume in Theora (speak: uptake), the vendors will adapt their hardware to support it. But we will not adopt Theora because we require hardware support. I think requiring hardware support is therefore an unfair requirement - when H.264 was being standardised, no hardware support (i.e. ASICs) were available either. As far as I know, there are currently no commercially available ASICs for Ogg Theora video decoding. (Searching Google for Theora ASIC finds some claims that technical aspects of the Theora codec would make it hard to implement in ASIC form and/or difficult to run on popular DSPs, but I do not have the technical expertise to evaluate the merit of these claims.) .. Silvia implied that mass-market products just have general-purpose hardware that could easily be used to decode a variety of codecs rather than true hardware support for specific codecs, and to the best of my knowledge, that is not the case. I have no deep knowledge in this space, but have spoken to people who have and was quoting their basic statement. Even if there is no vendor right now who produces an ASIC for Theora, the components of the Theora codec are not fundamentally different to the components of other DCT based codecs. Therefore, AISCs that were built for other DCT based codecs may well be adaptable by the ASIC vendor to support Theora. Even if this would need to be done by the chip vendor - it's not a fundamental obstacle. I think the real issue around the hardware support requirement is *not* whether there are existing ASICs for Theora and whether they are commercially available and used. These can be developed where necessary - and indeed such new challenges are a good thing for the market. Instead, the real issue is what you mentioned above: the statement that technical aspects of the Theora codec make it hard to implement in ASIC form and/or difficult to run on popular DSPs. If this was the case, it would indeed pose a strong obstacle to the use of Theora. However, unless I see a detailed technical description on why it is impossible or very hard/difficult to implement ASICs for Theora, I believe this is just another urban myth. I'd be very happy for anyone knowledgeable to prove or bust this myth and clarify the situation. Regards, Silvia.
Re: [whatwg] Codecs for audio and video
I'm not sure I have much useful information to add to this discussion, but I wanted to address a few points: On Jun 30, 2009, at 10:54 PM, Gregory Maxwell wrote: Then please don't characterize it as it won't work when the situation is it would work, but would probably have unacceptable battery life on the hardware we are shipping. I don't believe I ever said it won't work or made any claim along those lines. All I said was that some products use dedicated hardware for H.264, and no such hardware is available for Theora. There was an implication that this claim was a smokescreen because really it was all just programmable hardware; that is not the case. The battery life question is a serious and important one, but its categorically different one than can it work at all. (In particular because many people wouldn't consider the battery life implications of a rarely used fallback format to be especially relevant to their own development). If Theora is only going to be a rarely used fallback format, then it doesn't seem like a great candidate to mandate in external specs. Indeed, others have argued that inclusion in the HTML5 spec would drive far greater popularity. If it's going to be widely used, it needs power-efficient implementations on mobile. Battery life is a very important consideration to mobile devices. To give an example of a concrete data point, the iPhone 3G S can deliver 10 hours of video playback on a full charge. It's not very persuasive to say that availability of hardware implementations is unimportant because, even though battery life will be considerably worse, video will still more or less function. On Jun 30, 2009, at 11:03 PM, Silvia Pfeiffer wrote: It's a chicken and egg problem then. Once there is volume in Theora (speak: uptake), the vendors will adapt their hardware to support it. But we will not adopt Theora because we require hardware support. I think requiring hardware support is therefore an unfair requirement - when H.264 was being standardised, no hardware support (i.e. ASICs) were available either. I believe the wide availability of H.264 hardware is in part because H. 264 was developed through an open standards process that included the relevant stakeholders. In addition, H.264 was included in standards such as Blu-Ray, HD-DVD and 3GPP. This created built-in demand for hardware implementations. I believe hardware implementations were available before H.264 saw significant deployment for Web content. It's not clear if a similar virtuous cycle would repeat for other codecs. Might happen, but it took a lot of industry coordination around H.264 to build the ecosystem around it that exists today. So I don't think it's reasonable to assume that hardware implementations will just appear. Regards, Maciej
Re: [whatwg] Codecs for audio and video
Maciej Stachowiak wrote: So I don't think it's reasonable to assume that hardware implementations will just appear. The dire need of ASIC hardwired-style implementations for Theora hasn't been demonstrated either. H.264 has much higher computational complexity, it may be interesting to consider if using less-rigid DSPs (or even the already available DSP extensions of widespread mobile processors) gives good enough results for Theora. Given there's less to compute one may very well live with a lower energy efficiency per operation. Maik
Re: [whatwg] Codecs for audio and video
2009/6/30 Robert O'Callahan rob...@ocallahan.org: If we are going to allow individual vendors to exert veto power, at least lets make them accountable. Let's require them to make public statements with justifications instead of passing secret notes to Hixie. +1 Particularly when (e.g. Google's YouTube claim) the reason for the claim is then firmly proven not to be factually based. - d.
Re: [whatwg] Codecs for audio and video
[Maciej, sorry for sending this to you twice] 2009/7/1 Maciej Stachowiak m...@apple.com: It's not clear if a similar virtuous cycle would repeat for other codecs. Might happen, but it took a lot of industry coordination around H.264 to build the ecosystem around it that exists today. So I don't think it's reasonable to assume that hardware implementations will just appear. Even without any apparent industry coordination around Vorbis, many portable music players (not the ones produced by Apple, admittedly) can play Ogg audio files. Note that many of them do *not* say this on the tin: e.g. my cheap noname MP4 player is advertised as being able to play only MP3 and WMA audio and AMV video files, but it also supports Ogg/Vorbis just fine. When Vorbis files reached a small critical mass a few years ago many hardware manufacturers without much fanfare started supporting it. Having a free implementation with a very liberal licence may have helped. This player is also a good example of how some DSPs can be used for different tasks: its DSP is exactly the same that has been used only for MP3 decoding in earlier players, but in these new models it decodes the video part of AMV (a modified MJPEG). Obviously I'm not suggesting that this particular model can also decode Theora (the main CPU is an 8-bit Z80 at 60 MHz max). Anyway I think that the spec can be made more informative for web authors by pointing out (in a non-normative section) the fact that there's one and only one format that can be played by all browser that support the video element: Ogg/Theora+Vorbis. Safari can play Theora if the Xiph Quicktime component is installed, while Firefox cannot play H.264. -- Lino Mastrodomenico
Re: [whatwg] Codecs for audio and video
On Tue, 30 Jun 2009 21:39:05 +0200, Peter Kasting pkast...@google.com wrote: There is no other reason to put a codec in the spec -- the primary reason to spec a behavior (to document vendor consensus) does not apply. Some vendors agreed, and some objected violently is not consensus. The vendor consensus line of argument seems like a very dangerous slippery slope. It would mean that whenever a vendor refuses to implement something it has to be taken out of the specification. I.e. giving a single vendor veto power over the documentation of the Web Platform. Not good at all in my opinion. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 4:41 AM, Anne van Kesteren ann...@opera.com wrote: On Tue, 30 Jun 2009 21:39:05 +0200, Peter Kasting pkast...@google.com wrote: There is no other reason to put a codec in the spec -- the primary reason to spec a behavior (to document vendor consensus) does not apply. Some vendors agreed, and some objected violently is not consensus. The vendor consensus line of argument seems like a very dangerous slippery slope. It would mean that whenever a vendor refuses to implement something it has to be taken out of the specification. I.e. giving a single vendor veto power over the documentation of the Web Platform. Not good at all in my opinion. -- Anne van Kesteren http://annevankesteren.nl/ We had a vendor majority, right? Of the four major vendors of browsers participating (Mozilla, Google, Opera, Apple), three have committed to including the codecs in one form or another for usage with the video and audio tags (Mozilla, Opera, Google). I agree it would have been better would a full consensus, but the fact is that all of these companies look towards their goals. Mozilla wants to move towards its vision of the Open Web (which I personally agree with), Opera has said some time back they plan to support it, Google is fence-sitting by including ffmpeg due to their ideal of being universal (and doing a good job of it too), and Apple's vision of an Apple-centric world means they use the MPEG4 stuff, because it fits more with their current offerings of iPod, iPhone, Macs, and the Apple TV without exerting more effort to comply. We would never get everyone to agree. However, Apple didn't shut the Theora stuff out entirely. They left it open through QuickTime, which is fine. If we went through and put the Theora and Vorbis recommendation through, and the browsers implement it, then pressure would eventually make Apple somehow concede and do something about the situation. At the very least, add the XiphQT codec pack to the listing of codecs when QuickTime errors out and opens Safari to a list of codecs. If we push through it with the codecs, I don't think there will be too much of a problem at all.
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 8:54 AM, Gregory Maxwellgmaxw...@gmail.com wrote: There are mass market products that do this. Specifically palm-pre is OMAP3, the N810 is OMAP2. These have conventional DSPs with publicly available toolchains. Hrm, I worked on the n810 (nowhere near DSP, thankfully, although I did get to hear people crying). The technical existence of a generic DSP does not a useful implementation make. So, can you please give me the url for a useful DSP codec I can install on my n810 (it runs Maemo 4.1 Diablo, and no, asking me to install your own custom platform just to play video isn't OK) ? Hypothetically, just getting the stuff the vendor offers working in a shape which is shippable is depressingly hard. I really hate the double and triple standards espoused here. For lack of a better reference, however i trust that you're capable of finding hundreds (as a google search [battery life claims] did for me), http://www.techradar.com/news/computing-components/lawsuits-planned-over-laptop-battery-life-claims-612614 and adding the word 'cell' leads to: http://reviews.cnet.com/cell-phone-battery-life-charts I have nothing to do with the 5800 and don't have any idea what it is, but it was on the first page of results, so: http://www.wirelessinfo.com/content/Nokia-5800-Cell-Phone-Review/Battery-Life.htm Summarily, it said that getting 75% of the claimed battery life was respectable (not stellar). I think it's safe to say that consumers really do care about battery life (and at least with laptops are starting to complain violently). I have no idea about purchasing costs (again, we work on software), but I think people will accept that the cost for an FPGA is orders of magnitudes higher than and not commercially viable in contrast to ASICs. Let's consider a different position. If you heard that a hardware vendor had a product which could decode a video format called QuperVide and they provided an opensource implementation, but they had a patent on another (better) technique for decoding QuperVide which they used in their ASIC. Would you support them in their bid to mandate QuperVide as a Required codec for a Standard (e.g. HTML5:Video)? I'd hope that most people would say that it's unfair to mandate such a standard which gave the QuperVide vendor a sales advantage in its market (hardware ASICs). Would you say: oh, that's ok, we can standardize on this, and then 5 years from now we'll have an open source hardware implementation that's as good? You could do that, but for the intervening years anyone who sells hardware and wants to support QuperVide will have three choices: 1. Pay QuperVide's fees for its ASICs and get tolerable battery life. 2. Pay for an extra DSP or a faster CPU+BUS and pay a penalty in terms of battery life. 3. Pay engineers to try to develop a competing hardware ASIC which doesn't run afoul of QuperVide's vendor's patent for its hardware ASIC. I also don't like how people enjoy a good run of corporation hunting. First you go after Microsoft. Then you go after Google. Then you after Apple. And yes, you've already hunted Nokia, a couple of times, but I can't remember when in the sequence it was. I guess that it's sort of open season for corporation hunting and maybe Nokia is currently slightly out of season. Actually, it sounds like Congress has opened up a session there, so maybe you're just politely waiting in line. Mozilla has sketches for adding pluggable support for its video module too, but seems to be reticent about working on it as doing that would distract from their open web position. Sadly, Apple gets no points for being Pluggable on Desktop (QT has an open API). If I were to complain about Mozilla not being open, they'd claim oh, we're open source, anyone can contribute. That isn't true btw, if I were to write a pluggable module for video, their benevolent dictator has every right to veto it. And sure, I can maintain my fork, but just like Linus with Linux, they have every right to change their APIs regularly to make my life hell. Note: Microsoft, like Apple has Pluggable APIs, and again, like Apple, people don't care, and just say ooh, they're a bad company, they won't play with us. Microsoft, like Opera, like Apple, like every other company in the world is busy working on things. Often quitely (and yes, Mozilla does things quietly too). Opera has a policy (like Apple, like Microsoft, like Nokia, ...) of not announcing things until they announce them. Microsoft presumably has a roadmap and freeze features out at a certain point in time, most companies do. Sometimes groups have to rewrite or reorganize large portions of their code, and can't fix certain things until then. Gecko, View Manager, Table Rewrites, HTML5 parser, lots of these things happen with Mozilla. Heck, the ACID tests traditionally are such that the first release of mozilla after the release of an ACID test can't possibly pass the ACID test, because of the schedule. And while people
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 1:58 PM, King InuYashangomp...@gmail.com wrote: We had a vendor majority, right? Who is we, and which vendor do you represent? Please don't use royal we.
Re: [whatwg] Codecs for audio and video
2009/7/1 Maik Merten maikmer...@googlemail.com Maciej Stachowiak wrote: So I don't think it's reasonable to assume that hardware implementations will just appear. The dire need of ASIC hardwired-style implementations for Theora hasn't been demonstrated either. H.264 has much higher computational complexity, it may be interesting to consider if using less-rigid DSPs (or even the already available DSP extensions of widespread mobile processors) gives good enough results for Theora. Given there's less to compute one may very well live with a lower energy efficiency per operation. For information only (I haven't investigated these in any depth), note the references to Theora in the following pages: http://wiki.openmoko.org/wiki/Snapshot_review#Media_Player http://wiki.openmoko.org/wiki/Wish_List_-_Hardware:FPGA#AT91CAP9S500A_.28ARM9_.2B_FPGA-port.29
Re: [whatwg] Codecs for audio and video
Only one point to touch on here, and perhaps its a bit off-topic, and if it is, I apologize. Maciej Stachowiak wrote: I believe the wide availability of H.264 hardware is in part because H.264 was developed through an open standards process that included the relevant stakeholders. Saying that ITU-T includes relevant stakeholders, implying that it includes *all* relevant stakeholders is a joke. End-users are relevant stakeholders in standards development, and ITU-T excludes participation from end-users. -- Jeff McAdams je...@iglou.com
Re: [whatwg] Codecs for audio and video
timeless wrote: I also don't like how people enjoy a good run of corporation hunting. First you go after Microsoft. Then you go after Google. Then you after Apple. Many (most?) corporations choose to operate under a heavy veil of secrecy (*particularly* Apple). That choice is also a choice to open themselves up these criticisms. These corporations have to take the good with the bad. If they chose to operate with greater transparency, then they would almost certainly come into less criticism. I have exactly zero sympathy for Apple, MS, and Google, for the criticisms they have received. They choose to operate in secrecy, then they choose to be the target of these criticisms. Suck it up. -- Jeff McAdams je...@iglou.com
Re: [whatwg] Codecs for audio and video
2009/7/1 timeless timel...@gmail.com: I have no idea about purchasing costs (again, we work on software), but I think people will accept that the cost for an FPGA is orders of magnitudes higher than and not commercially viable in contrast to ASICs. I think we can all agree that a FPGA is being used only for development and debugging of a Theora hardware decoder (http://www.students.ic.unicamp.br/~ra031198/theora_hardware/), while the final design will be burned to an ASIC if/when there's commercial demand for it. Sadly, Apple gets no points for being Pluggable on Desktop (QT has an open API). If I were to complain about Mozilla not being open, they'd claim oh, we're open source, anyone can contribute. That isn't true btw, if I were to write a pluggable module for video, their benevolent dictator has every right to veto it. I fear that wide support for pluggable codecs for the video element may end up putting end users in a codec hell. If there's only one or at most two supported video formats/codecs, then it's obviously responsibility of the websites to correctly encode their videos. But if Firefox starts supporting any codec that happens to be installed on the system, many small and medium websites will probably start using videos in a lot of different formats (that works on the computer of the web developer) and the burden of finding and installing the correct codecs for each site will shift on the end user. Not good for interoperability, non-x86 platforms, and a good opportunity for spreading malware using trojan codecs. So please browser vendors, don't do this. The only exception is Safari, since it wouldn't otherwise support Theora. -- Lino Mastrodomenico
Re: [whatwg] Codecs for audio and video
Regarding the fear of Trojan codecs: it would help if third-party plug-ins for codecs could be sandboxed so that they cannot have access to anything they do not have to access in order to do their job, and only via an API provided by the host. IMHO, Chris
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 4:12 PM, Kristof Zelechovskigiecr...@stegny.2a.pl wrote: Regarding the fear of Trojan codecs: it would help if third-party plug-ins for codecs could be sandboxed so that they cannot have access to anything they do not have to access in order to do their job, and only via an API provided by the host. historically people who want to write codec support want to do it in DSP land on devices, and in DSP land on devices you can kill devices very dead (among other interesting thing). Sandboxing that way is basically impossible. Now if you rule out DSP, you still have all the other problems (general proliferation of codecs, finding them, keeping them sandboxed, bandwidth to network/video). I'm not saying I don't look forward to this, they're just notes.
Re: [whatwg] Codecs for audio and video
That would have to be done by each browser not the spec. Some vendors would include their own plugins that were safe so they may not feel the need to sandbox them (even though they should). On Wed, Jul 1, 2009 at 8:12 AM, Kristof Zelechovski giecr...@stegny.2a.plwrote: Regarding the fear of Trojan codecs: it would help if third-party plug-ins for codecs could be sandboxed so that they cannot have access to anything they do not have to access in order to do their job, and only via an API provided by the host. IMHO, Chris -- - Adam Shannon ( http://ashannon.us )
Re: [whatwg] Codecs for audio and video
Clearly allowing a third-party codec to reprogram a hardware DSP would be one of the silliest things to do. (If it turns out that I cannot answer to something important from now on, assume I am banned for using an offensive word.) Chris
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 2:41 AM, Anne van Kesteren ann...@opera.com wrote: On Tue, 30 Jun 2009 21:39:05 +0200, Peter Kasting pkast...@google.com wrote: There is no other reason to put a codec in the spec -- the primary reason to spec a behavior (to document vendor consensus) does not apply. Some vendors agreed, and some objected violently is not consensus. The vendor consensus line of argument seems like a very dangerous slippery slope. It would mean that whenever a vendor refuses to implement something it has to be taken out of the specification. I.e. giving a single vendor veto power over the documentation of the Web Platform. Not good at all in my opinion. I am merely echoing Hixie; from his original email in this thread: At the end of the day, the browser vendors have a very effective absolute veto on anything in the browser specs, You mean they have the power to derail a spec? They have the power to not implement the spec, turning the spec from a useful description of implementations into a work of fiction. That's something I would have considered before the advent of Mozilla Firefox. Mozilla also has the power of veto here. For example, if we required that the browsers implement H.264, and Mozilla did not, then the spec would be just as equally fictional as it would be if today we required Theora. My sole goal was to try and point out that the situation with codecs is not equivalent to past cases where vendors merely _hadn't implemented_ part of the spec; in this case vendors have _actively refused_ to implement support for various codecs (Apple with Theora and Mozilla(/Opera?) with H.264). PK
Re: [whatwg] Codecs for audio and video
2009/7/1 Jeff McAdams je...@iglou.com timeless wrote: I also don't like how people enjoy a good run of corporation hunting. First you go after Microsoft. Then you go after Google. Then you after Apple. Many (most?) corporations choose to operate under a heavy veil of secrecy (*particularly* Apple). That choice is also a choice to open themselves up these criticisms. These corporations have to take the good with the bad. If they chose to operate with greater transparency, then they would almost certainly come into less criticism. I have exactly zero sympathy for Apple, MS, and Google, for the criticisms they have received. They choose to operate in secrecy, then they choose to be the target of these criticisms. I'm not asking for sympathy, but I also don't think the characterization of Google as operating in secrecy is fair. There's a large number of people from the Google Chrome team participating on WHATWG and trying to contribute openly to these discussions. We're operating as an open source project, and trying to be as open as possible. At the same time, Google is a company whose purpose (as is any company) is to make money. YouTube is a separate team and not an open source project, I don't think it's reasonable to expect all of Google to suddenly release all of its information that has legitimate business reasons for staying company-internal. We've made what statements we can make, and I don't honestly think it reasonable to expect more. Suck it up. -- Jeff McAdams je...@iglou.com
Re: [whatwg] Codecs for audio and video
2009/7/1 Ian Fette (イアンフェッティ) ife...@google.com: all of Google to suddenly release all of its information that has legitimate business reasons for staying company-internal. We've made what statements we can make, and I don't honestly think it reasonable to expect more. I think it is reasonable to expect Google to address their statements of reasons being demonstrated false, however. They have notably failed to do so. Is Chris DiBona still reading? Oh sorry, I was completely wrong or you're wrong and here's why would go a long way to restore any trust in Google on this matter. - d.
Re: [whatwg] Codecs for audio and video
On Wed, 01 Jul 2009 18:29:17 +0200, Peter Kasting pkast...@google.com wrote: On Wed, Jul 1, 2009 at 2:41 AM, Anne van Kesteren ann...@opera.com wrote: On Tue, 30 Jun 2009 21:39:05 +0200, Peter Kasting pkast...@google.com wrote: There is no other reason to put a codec in the spec -- the primary reason to spec a behavior (to document vendor consensus) does not apply. Some vendors agreed, and some objected violently is not consensus. The vendor consensus line of argument seems like a very dangerous slippery slope. It would mean that whenever a vendor refuses to implement something it has to be taken out of the specification. I.e. giving a single vendor veto power over the documentation of the Web Platform. Not good at all in my opinion. I am merely echoing Hixie; from his original email in this thread: At the end of the day, the browser vendors have a very effective absolute veto on anything in the browser specs, You mean they have the power to derail a spec? They have the power to not implement the spec, turning the spec from a useful description of implementations into a work of fiction. That's something I would have considered before the advent of Mozilla Firefox. Mozilla also has the power of veto here. For example, if we required that the browsers implement H.264, and Mozilla did not, then the spec would be just as equally fictional as it would be if today we required Theora. My sole goal was to try and point out that the situation with codecs is not equivalent to past cases where vendors merely _hadn't implemented_ part of the spec; in this case vendors have _actively refused_ to implement support for various codecs (Apple with Theora and Mozilla(/Opera?) with H.264). PK That is correct, we consider H.264 to be incompatible with the open web platform due to its patent licensing. For the time being we will support Ogg Vorbis/Theora, which is the best option patent-wise and neck-in-neck with the competition in the quality-per-bit section (especially with recent encoder improvements). We would love to see it as the baseline for HTML5, but in the absense of that hope that the web community will push it hard enough so that it becomes the de-facto standard. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Codecs for audio and video
Ian Fette (イアンフェッティ) wrote: 2009/7/1 Jeff McAdams je...@iglou.com mailto:je...@iglou.com timeless wrote: I also don't like how people enjoy a good run of corporation hunting. First you go after Microsoft. Then you go after Google. Then you after Apple. Many (most?) corporations choose to operate under a heavy veil of secrecy (*particularly* Apple). That choice is also a choice to open themselves up these criticisms. These corporations have to take the good with the bad. If they chose to operate with greater transparency, then they would almost certainly come into less criticism. I have exactly zero sympathy for Apple, MS, and Google, for the criticisms they have received. They choose to operate in secrecy, then they choose to be the target of these criticisms. I'm not asking for sympathy, but I also don't think the characterization of Google as operating in secrecy is fair. There's a large number of people from the Google Chrome team participating on WHATWG and trying to contribute openly to these discussions. We're operating as an open source project, and trying to be as open as possible. At the same time, Google is a company whose purpose (as is any company) is to make money. YouTube is a separate team and not an open source project, I don't think it's reasonable to expect all of Google to suddenly release all of its information that has legitimate business reasons for staying company-internal. We've made what statements we can make, and I don't honestly think it reasonable to expect more. I don't disagree with you on any of that, really. I said you (Google, and others) have made a choice, corporately, on how open and transparent to be. Certainly Google is less secretive than many other corporations as a whole, and seemingly the Chrome team is considerably more open than most of the rest of Google, even. Nonetheless, as a whole, Google is a corporation and they have made a business decision to remain secretive on at least certain things. I do think that's a reasonable decision to make, and I might very well make the same decision in your shoes. My point was only to say that part and parcel of that decision is actions that tend to lead to criticisms of the company as a whole that Mozilla gets less of because they are more open. I won't exactly hold Mozilla up as a paragon of openness and transparency, but they are better than Google, just as Google is better than MS, and I would even argue that MS is better than Apple. I understand that you have said what you can say, and that's fantastic, and truthfully, I don't really expect more. That doesn't, however, mean that I'm going to cease criticisms of the stated positions As to the comparison between the Chrome and Youtube groups. I wish that the Youtube portion of the company were more engaged here as they clearly are a relevant party to the discussion. Again, I understand that as a business decision they may choose not to, but my understanding of that doesn't mean I'm not going to criticize them on it. -- Jeff McAdams je...@iglou.com
Re: [whatwg] Codecs for audio and video
I don't believe Chris was speaking in any official capacity for YT or Google any more than I am. I think it is inappropriate to conflate his opinion of the matter with Google's. I have not seen _any_ official statement from Google regarding codec quality. As an aside, I think taking the available recent public comparisons as definitive proof that Theora is (or is not!) comparable to H.264 is inappropriate (and goes further than the Theora developers have). Codec comparison is tricky and broad, and a definitive comparison (which I have not performed) would require a large variety of types/quality of input, compressed with many different option choices, and compared on both subjective and objective criteria. It also would include coverage of issues like how much buffer is needed to ensure continuous play, whether the quality can be dynamically degraded, storage space and CPU usage required on th encoding side, device support (current and projected), etc. Or, to simplify, you're oversimplifying in your declarations that one codec is as good as another. PK On Jul 1, 2009 9:55 AM, David Gerard dger...@gmail.com wrote: 2009/7/1 Ian Fette (イアンフェッティ) ife...@google.com: all of Google to suddenly release all of its information that has legitimate business reasons f... I think it is reasonable to expect Google to address their statements of reasons being demonstrated false, however. They have notably failed to do so. Is Chris DiBona still reading? Oh sorry, I was completely wrong or you're wrong and here's why would go a long way to restore any trust in Google on this matter. - d.
Re: [whatwg] Codecs for audio and video
On Wed, 01 Jul 2009 18:29:17 +0200, Peter Kasting pkast...@google.com wrote: On Wed, Jul 1, 2009 at 2:41 AM, Anne van Kesteren ann...@opera.com wrote: The vendor consensus line of argument seems like a very dangerous slippery slope. It would mean that whenever a vendor refuses to implement something it has to be taken out of the specification. I.e. giving a single vendor veto power over the documentation of the Web Platform. Not good at all in my opinion. I am merely echoing Hixie; from his original email in this thread: At the end of the day, the browser vendors have a very effective absolute veto on anything in the browser specs, You mean they have the power to derail a spec? They have the power to not implement the spec, turning the spec from a useful description of implementations into a work of fiction. That's something I would have considered before the advent of Mozilla Firefox. Mozilla also has the power of veto here. For example, if we required that the browsers implement H.264, and Mozilla did not, then the spec would be just as equally fictional as it would be if today we required Theora. I disagree with the characterization Ian makes here as I believe being royalty free is very important for the formats we actively deploy to the Web and as such H.264 is not an option. My sole goal was to try and point out that the situation with codecs is not equivalent to past cases where vendors merely _hadn't implemented_ part of the spec; in this case vendors have _actively refused_ to implement support for various codecs (Apple with Theora and Mozilla(/Opera?) with H.264). Somehow I doubt that if e.g. Opera vetoed the video element it would actually be removed from the specification. And if it that were the case I would consider it to be very bad as I mentioned in my initial email in this thread. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 12:14 PM, Anne van Kesterenann...@opera.com wrote: On Wed, 01 Jul 2009 18:29:17 +0200, Peter Kasting pkast...@google.com wrote: On Wed, Jul 1, 2009 at 2:41 AM, Anne van Kesteren ann...@opera.com wrote: The vendor consensus line of argument seems like a very dangerous slippery slope. It would mean that whenever a vendor refuses to implement something it has to be taken out of the specification. I.e. giving a single vendor veto power over the documentation of the Web Platform. Not good at all in my opinion. I am merely echoing Hixie; from his original email in this thread: At the end of the day, the browser vendors have a very effective absolute veto on anything in the browser specs, You mean they have the power to derail a spec? They have the power to not implement the spec, turning the spec from a useful description of implementations into a work of fiction. That's something I would have considered before the advent of Mozilla Firefox. Mozilla also has the power of veto here. For example, if we required that the browsers implement H.264, and Mozilla did not, then the spec would be just as equally fictional as it would be if today we required Theora. I disagree with the characterization Ian makes here as I believe being royalty free is very important for the formats we actively deploy to the Web and as such H.264 is not an option. Agreed. (Has anyone seriously proposed H.264 as a standard for the web?) The only arguments against Theora has been: * Too poor quality to be workable. * Risk of hidden/unknown patents. * Doesn't have hardware decoders. I think the first bullet has been demonstrated to be false. The relative quality between theora and h.264 is still being debated, but the arguments are over a few percent here or there. Arguments that theora is simply not good enough seems based on poor or outdated information at this point. The second bullet I don't buy. First of all because that argument applies to absolutely everything we do. While video is particularly bad, there is simply always a risk of unknown software patents. Second, two big browser companies, with a third on the way, have at this point deemed it safe enough to risk implementing, so presumably they have done some amount of research into existing public patents. And that's on top of any company that has shipped Theora support in other contexts than browsers. Submarine patents are of course still a problem, but no more so for video than for other technologies as far as I can tell. The third applies to basically anything other than a very short list of codecs. As far as I know none of which are interesting for one reason or another. If that is a requirement then we might pack up and give up on making video a integral part of the open web. If a codec is going to have a chance to become popular enough to make hardware vendors get behind it, we need to take a first step. Hardware vendors are not going to. / Jonas
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 4:06 PM, Jonas Sickingjo...@sicking.cc wrote: [snip] I think the first bullet has been demonstrated to be false. The relative quality between theora and h.264 is still being debated, but the arguments are over a few percent here or there. Arguments that theora is simply not good enough seems based on poor or outdated information at this point. I'm commenting here because I don't my own posts to be a source of misinformation. Depending on how and what you compare it's more than a few percent. It turns out that H.264 as used in many places on web is within spitting distance of the newer theora encoder due to encode side and decode side computational complexity and compatibility concerns and the selection of encoder software. For these same reasons there are many 'older' formats still in wide use which Theora clearly outperforms. The reality of what people are using puts the lie to broad claims that Theora is generally unusable because it under-performs the best available H.264 encoders in the lab. Different uses and organizations will have different requirements. Which is a good reason why HTML5 never required solutions to support only one codec. I do not doubt that there are uses which Theora is clearly inferior, because of the mixture of tolerance for licensing, computational load, intolerance for bitrate, requirements to operate at bits-per-pixel levels below the range that theora operates well at, etc. but it is an enormous jump to go from there are some uses to apply the claim to the general case, or to go from it's needs some more bitrate to achieve equivalent subjective quality to remarks that the bitrate inflation would endanger the Internet. It was this kind of over generalization that my commentary on Theora quality was targeting. (And it should be absolutely unsurprising that at the limit Theora does a somewhat worse off than H.264 in terms of quality/bits— it's an older less CPU hungry design which is, from 50,000 ft, almost a strict subset of H.264) At the same time, we have clearly defined cases where H.264/AAC is absolutely unacceptable. Not merely inferior, but completely unworkable due to the licensing issues. Different uses and organizations will have different requirements. Different codecs will be superior depending on your requirements. Which is a good reason why HTML5 never required solutions to support only one codec. But what I think is key is that the inclusion of Theora as a baseline should do nothing to inhibit the parties which are already invested in H.264, or whom have particular requirements which make it especially attractive, from continuing to offer and use it. The advantage of a baseline isn't necessarily that it's the best at anything in particular, but that it's workable and mostly universal. If when talking about a baseline you find yourself debating details over efficiency vs the state of the art you've completely missed the point. This is a field which is still undergoing rapid development. Even if codec-science were to see no improvements we will still see the state of the art advance tremendously in the next years simply due to increasing tolerance for CPU hungry techniques invented many years ago but still under-used. Anything we use today is going to look pretty weak compared to the options available 10 years from now. It's important for a codec to be efficient, but the purpose of the baseline is to be compatible. As such the relevant arguments should be largely limited to workability, of which efficiency is only one part. It was suggested here that MJPEG be added as a baseline. I considered this as an option for Wikipedia video support some years ago before we had the Theora in Java playback working. I quickly determined that it was unworkable for over-the-web use because of the bitrate: we're talking about on the order of 10x the required bitrate over Theora before considering the audio (which would also be 10x the bitrate of Vorbis). At lest for general public web use I think the hard workability threshold could be fairly set as can a typical consumer broadband connection stream a 'web resolution' (i.e. somewhat sub-standard definition) in real time with decent quality. Even though thats a fairly vague criteria it seems clear that Ogg/Theora is well inside this limit while MJPEG is well outside it. Obviously different parties will have different demands. As far as I'm concerned spec might as well recommend a lossless codec as MJPEG— at least lossless has advantages for the set of applications which are completely insensitive to bitrate.
Re: [whatwg] Codecs for audio and video
On Tue, Jun 30, 2009 at 4:50 PM, Ian Hickson i...@hixie.ch wrote: - has off-the-shelf decoder hardware chips available I don't think this should be a requirement. As written, this requirement primarily means need to be able to build devices today that play back with minimal power consumption. Obviously this is desirable, but why is it *necessary* for a baseline codec? Why would a vendor refuse to support a format because of high power consumption? It seems to me that using up power can't be worse than refusing to play the content at all. Does Apple block iPhone apps because they max out the CPU while they're running? It seems to me that this requirement forces HTML5 to merely document the codec preferences of device vendors. I think HTML5 could be proactive without being obnoxious, by recommending that Theora be supported wherever possible. That would encourage the consumption and production of Theora content, which would increase pressure on device vendors to support it well, which would increase the chances us all getting a codec with good universal support. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Codecs for audio and video
--- On Wed, 7/1/09, Gregory Maxwell gmaxw...@gmail.com wrote: It was suggested here that MJPEG be added as a baseline. I considered this as an option for Wikipedia video support some years ago before we had the Theora in Java playback working. I quickly determined that it was unworkable for over-the-web use because of the bitrate: we're talking about on the order of 10x the required bitrate over Theora before considering the audio (which would also be 10x the bitrate of Vorbis). Mozilla already supports Motion JPEG for the image tag (but not for video tag so far as I know). Basically, right now if you want a video file that will play on Quicktime, Media Player and Gstreamer's good set of plugins, the best option is Motion JPEG. I have mailed CDs with MJPEG video and PCM audio, and you can fit ~15 minutes of this in ~TV quality. For ~TV quality video and audio (240 x 320 pixels 30 fps) we are talking something like (If you have better numbers, point them out to me): 5 MBit/s MJPEG video with PCM audio 1-2 MBit/s MPEG-1 0.5 MBit/s Ogg Vorbis My suggestion (and I am not particularly serious) was: [(H.264 OR Theora) AND Motion JPEG] If you care about bandwidth more than licensing fees, you provide both H.264 and Theora. If you care more about licensing costs, you can provide Theora and Motion JPEG. I don't think that enshrining this in the spec is a very good idea however since it is a somewhat poor compromise. I can envision a future where a year from now Apple still has not added Theora support, but Mozilla has added gstreamer support, and suddenly Motion JPEG is the 'best' baseline codec, and the defacto video support is [(H.264 OR Theora) AND Motion JPEG] As far as I'm concerned spec might as well recommend a lossless codec as MJPEG— at least lossless has advantages for the set of applications which are completely insensitive to bitrate. What lossless codecs might be available without patent problems?
Re: [whatwg] Codecs for audio and video
On Tue, 30 Jun 2009, Matthew Gregan wrote: Is there any reason why PCM in a Wave container has been removed from HTML 5 as a baseline for audio? Having removed everything else in these sections, I figured there wasn't that much value in requiring PCM-in-Wave support. However, I will continue to work with browser vendors directly and try to get a common codec at least for audio, even if that is just PCM-in-Wave. The reason for not selecting a video codec doesn't seem to have much weight when considering Ogg Vorbis as a required audio codec. Unfortunately, the reasons don't really matter at the end of the day. If they don't implement it, they don't implement it. :-( -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
Even if Apple decides to implement Ogg Theora, iPod users will still get QuickTime served and get a better rendering because the common codec is the failsafe solution and will be specified as the last one. This phenomenon is expected to happen for any platform, not just Apple's. I cannot see how this effect can be perceived as diminishing the significance of the HTML specification, however. I believe proprietary codecs will always be better than public domain codecs, until hardware development makes this question irrelevant, because this application requires a large investment in research. I understand that the reason for rejecting MPEG-1 as a fallback mechanism is that the servers will not serve it because of increased bandwidth usage, right? Cheers, Chris
Re: [whatwg] Codecs for audio and video
On Tue, 30 Jun 2009, Kristof Zelechovski wrote: I understand that the reason for rejecting MPEG-1 as a fallback mechanism is that the servers will not serve it because of increased bandwidth usage, right? Right. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Codecs for audio and video
Hi Ian, I have just posted a detailed reply on your email to public-html (http://lists.w3.org/Archives/Public/public-html/2009Jun/0830.html), so let me not repeat myself, but only address the things that I haven't already addressed there. On Tue, Jun 30, 2009 at 2:50 PM, Ian Hicksoni...@hixie.ch wrote: I considered requiring Ogg Theora support in the spec, since we do have three implementations that are willing to implement it, but it wouldn't help get us true interoperabiliy, since the people who are willing to implement it are willing to do so regardless of the spec, and the people who aren't are not going to be swayed by what the spec says. Inclusion of a required baseline codec into a standard speaks more loudly than you may think. It provides confidence - confidence that an informed choice has been made as to the best solution in a given situation. Confidence to Web developers, confidence to hosting providers, confidence also (but less so, since they are gatekeepers in this situation) to Browser Vendors. In my opinion, including a baseline codec requirement into a W3C specification that is not supported by all Browser Vendors is much preferable over an unclear situation, where people are forced to gather their own information about a given situation and make a decision on what to choose based on potentially very egoistic and single-sided reasons/recommendations. In fact, it is a tradition of HTML to have specifications that are only supported by a limited set of Browser Vendors and only over time increasingly supported by all - e.g. how long did it take for all Browser vendors to accept css2, and many of the smaller features of html4 such as fixed positioning? I firmly believe that making the decision to give up on baseline codecs is repeating a mistake made and repeatedly cited as a mistake on the lack of specification of a baseline format for images - which is one of the reasons why it took years to have two baseline image codecs available in all browsers. We could try the other route for a change and see if standards can actually make a difference to adoption. Going forward, I see several (not mutually exclusive) possibilities, all of which will take several years: 1. Ogg Theora encoders continue to improve. Off-the-shelf hardware Ogg Theora decoder chips become available. Google ships support for the codec for long enough without getting sued that Apple's concern regarding submarine patents is reduced. = Theora becomes the de facto codec for the Web. This to me is a defeat of the standardisation process. Standards are not there to wait for the market to come up with a de-facto standard. They are there to provide confidence to the larger market about making a choice - no certainty of course, but just that much more confidence that it matters. 2. The remaining H.264 baseline patents owned by companies who are not willing to license them royalty-free expire, leading to H.264 support being available without license fees. = H.264 becomes the de facto codec for the Web. That could take many years. I would encourage proponents of particular codecs to attempt to address the points listed above, as eventually I expect one codec will emerge as the common codec, but not before it fulfills all these points: OK, let me try to address these for Theora. The replies for Vorbis are simply yes to each of these points. - is implementable without cost and distributable by anyone Theora is. - has off-the-shelf decoder hardware chips available decoder hardware for video means that there are software libraries available that use specific hardware in given chips to optimise decoding. It is not a matter of hardware vendors to invent new hardware to support Theora, but it is a matter of somebody implementing some code to take advantage of available hardware on specific platforms. This is already starting to happen, and will increasingly happen if Theora became the baseline codec. - is used widely enough to justify the extra patent exposure This is a double requirement: firstly one has to quantify the extra patent exposure, and independent of that is wide uptake. We are now seeming wide uptake happening for Theora with Dailymotion, Wikimedia, Archive.org and many small medium size video platforms (such as thevideobay, metavid, pad.me) taking it up. As for the extra patent exposure - with every month that goes by, this is shrinking. And obviously many players have already decided that the extra patent exposure of Theora is acceptable, since already three Browser Vendors are supporting Theora natively. - has a quality-per-bit high enough for large volume sites Your main argument against Theora is a recent email stating that YouTube could not be run using Theora. Several experiments with current Theora encoder version have demonstrated that this statement was based on misinformation and not on fact. Until I see fact that confirms that YouTube would indeed
Re: [whatwg] Codecs for audio and video
Thank you, Ian, for the summary. I just wanted to say that we're not happy with the situation. We continue to monitor it, to take what action we can, and we continue to hope that we will, at some time, find a solution that reaches consensus. -- David Singer Multimedia Standards, Apple Inc.
Re: [whatwg] Codecs for audio and video
Ian Hickson wrote: on the situation regarding codecs for video and audio in HTML5, I have reluctantly come to the conclusion that there is no suitable codec that all vendors are willing to implement and ship. I have therefore removed the two subsections in the HTML5 spec in which codecs would have been required, and have instead left the matter undefined, as has in the past been done with other features like img and image formats, embed and plugin APIs, or Web fonts and font formats. The current situation is as follows: Apple refuses to implement Ogg Theora in Quicktime by default (as used by Safari), citing lack of hardware support and an uncertain patent landscape. Google has implemented H.264 and Ogg Theora in Chrome, but cannot provide the H.264 codec license to third-party distributors of Chromium, and have indicated a belief that Ogg Theora's quality-per-bit is not yet suitable for the volume handled by YouTube. Opera refuses to implement H.264, citing the obscene cost of the relevant patent licenses. Mozilla refuses to implement H.264, as they would not be able to obtain a license that covers their downstream distributors. Microsoft has not commented on their intent to support video at all. Short summary: Theora is supported by everyone else but Apple and Microsoft, H.264 can only be supported (in theory) by Apple, Google and Microsoft because of patent licensing. Patent licensing issues aside, H.264 would be better baseline codec than Theora. I considered requiring Ogg Theora support in the spec, since we do have three implementations that are willing to implement it, but it wouldn't help get us true interoperabiliy, since the people who are willing to implement it are willing to do so regardless of the spec, and the people who aren't are not going to be swayed by what the spec says. I don't know about Microsoft but Apple has displayed willingness to implement what specifications say (see http://acid3.acidtests.org/ for example). By W3C standards a spec can get REC status if it has at least two implementations and we already have three. The current HTML 5 spec already has stuff not implemented by every vendor, why should video be different? I'd suggest one of the two choices (I prefer the first one): (1) Specify Theora as the baseline codec. Hopefully it will be tested by acid4 test (or by some other popular test) and Apple will either implement it regardless of the assumed patent risks or finds the actual patent owners and acquires the required licenses for Theora to be implemented by Apple. In the future, if Apple implements Theora, then perhaps even Microsoft will do so, too. (2) Specify {Theora or H.264} as the baseline. That way all vendors that have displayed any interest for video could implement the spec. Authors would be required to provide the video in both formats to be sure that any spec compliant user agent is able to display the content, but at least there would be some real target set by the spec. However, I think that this just moves the H.264 patent licensing issue from browser vendors to content authors: if you believe that you cannot decode H.264 without proper patent license there's no way you could encode H.264 content without the very same license. As a result, many authors will not be able to provide H.264 variant -- and as a result the Theora would become de facto standard in the future. -- Mikko signature.asc Description: OpenPGP digital signature
Re: [whatwg] Codecs for audio and video
Ian Hickson wrote: I considered requiring Ogg Theora support in the spec, since we do have three implementations that are willing to implement it, but it wouldn't help get us true interoperabiliy, since the people who are willing to implement it are willing to do so regardless of the spec, and the people who aren't are not going to be swayed by what the spec says. Ian, first off, thank you for your efforts to this point, your patience in the face of conflicting opinions has been awe-inspiring (and I'll certainly include my messages in the set of those requiring patience from you) I feel I have to disagree a bit with what you say above, though. Yes, clearly publishing the spec with a baseline codec specified isn't *sufficient* for definitively get[ting] us true interoperabiliy[sic], but it certain does *help* get us true interoperability, in two ways that I can think of off the top of my head. First, there is some inherent pressure for implementing the spec. Again, some parties have indicated that it is not enough to get them to do so, but that eliminates their ability to claim adherence to this standard when others are doing so. (Well, to truthfully claim, anyway. I don't think any of the parties involved here are unscrupulous enough to claim compliance when they don't actually comply because of the lack of this codec support, but other, non-engaged parties certainly might). Specifying a baseline codec takes away a marketing bullet point that can be used to sell their product, while hurting interoperability. Second, it gives us (people like me) an extra tool to go back to vendors and say, Hey, please support HTML5, its important to me, and the video tag, with the correct baseline codec support, is important to me. Without the baseline codec being specified, it takes away a lot of our leverage, as customers of companies that have said they won't support this, to push on them. (I, personally, as a single data point, use a Mac, and mostly to this point use Safari, but have already made sure I've gotten the Firefox 3.5-mumble-not-yet-released that has the video tag support so that I can begin making use of it to some degree, and plan to do so more in the future). Certainly you, of all people, can appreciate the benefits to interoperability that we've seen through publication of the ACID tests. No, they aren't full compliance tests, but look at the public pressure that has been brought to bear on browser makers by the public's awareness of them. Look at how much better the interoperability has gotten over the same period. No, its still not perfect, by a long shot, but at least now we're moving in the right direction. Give us, the end users, the tools we need to help bring that pressure for interoperability to bear on the browser makers. There is one thing that I'm not quite clear on, however. You've said a couple of things that I perceive as contradictory, here. You've said that you want to help bring about interoperability, but then you've also said that you're only documenting what it is that the browser makers are implementing. There is room in the standards bodies world for both goals, and both goals, at times are valid and beneficial. But, if your intent is to help bring about interoperability, *real* interoperability, then I think its pretty clear that the way forward involves specifying a baseline codec. Leaving such an important point of interoperability completely up to the whims of people out there seems unwise here (I look at MS's latest attempt at supporting ODF as a great example of how interoperability can actually be harmed, even by a complying implementation, when important parts of guidelines to interoperability are left out...there are plenty more examples). I think its nearly imperative that important points of interoperability contention such as this be specified, else it gives unscrupulous developers the ability to intentionally worsen interoperability and making the spec considerably less valuable by developing an implementation that is compliant, but not interoperable with anyone else (Oh, I implemented video using animated gifs...yes its absurd, but someone could, at least in theory, claim compliance that way). I would also point out that scrupulous developers could unintentionally worsen interoperability in the same way. By allowing this opening, end-users see browsers that have the HTML5 stamp (figuratively), but their browsing experience suffers and they start to lose faith in the spec as actually meaning anything useful regarding the reliability of their browsing experience. Again, thank you for your efforts, and add me to the camp of believing that the baseline codec is vitally important, even without all of the browser makers being willing (at least initially) to support it. -- Jeff McAdams je...@iglou.com
Re: [whatwg] Codecs for audio and video
--- On Tue, 6/30/09, Mikko Rantalainen mikko.rantalai...@peda.net wrote: (2) Specify {Theora or H.264} as the baseline. That way all vendors that have displayed any interest for video could implement the spec. Authors would be required to provide the video in both formats to be sure that any spec compliant user agent is able to display the content, but at least there would be some real target set by the spec. However, I think that this just moves the H.264 patent licensing issue from browser vendors to content authors: if you believe that you cannot decode H.264 without proper patent license there's no way you could encode H.264 content without the very same license. As a result, many authors will not be able to provide H.264 variant -- and as a result the Theora would become de facto standard in the future. -- Mikko Specify {Theora or H.264} AND {Motion JPEG} That way there is a fallback mechanism when you care more about compatibility than bandwidth and don't want to deal with the hassle of the H.264 patents. Sometimes compatibility is more important than bandwidth. (HTML is a common method of putting content on CD-ROMs.) Josh Cogliati
Re: [whatwg] Codecs for audio and video
On Tue, Jun 30, 2009 at 12:50 AM, Ian Hicksoni...@hixie.ch wrote: Finally, what is Google/YouTube's official position on this? As I understand it, based on other posts to this mailing list in recent days: Google ships both H.264 and Theora support in Chrome; YouTube only supports H.264, and is unlikely to use Theora until the codec improves substantially from its current quality-per-bit. It would be good to understand what the threshold for acceptability is here; earlier reports on this mailing list have indicated that (on at least the tested content) Theora can produce quality-per-bit that is quite comparable to that of H.264 as employed by YouTube. As one organization investing, and invested, in the success of Theora, Mozilla would be very glad to know so that we can help reach that target. Can one of the Google representatives here get a statement from YouTube about the technical threshold here? I think it could have significant impact on the course of video on the web; perhaps more than SHOULD language in HTML5 here. I personally believe that putting codec requirements in the specification could have significant market effects, because it would take advantage of general market pressure for standards compliance. As an example, if you put it in HTML5 then you could put it in ACID4, and the ACID tests have historically been quite influential in driving browser implementation choices. Theora could get the same boost NodeIterator has seen, I daresay to greater positive impact on the web. Mike
Re: [whatwg] Codecs for audio and video
On Tue, Jun 30, 2009 at 5:31 AM, Mikko Rantalainenmikko.rantalai...@peda.net wrote: [snip] Patent licensing issues aside, H.264 would be better baseline codec than Theora. I don't know that I necessarily agree there. H.264 achieves better efficiency (quality/bitrate) than Theora, but it does so with greater peak computational complexity and memory requirements on the decoder. This isn't really a fault in H.264, it's just a natural consequence of codec development. Compression efficiency will always be strongly correlated to computational load. So, I think there would be an argument today for including something else as a *baseline* even in the absence of licensing. (Though the growth of computational power will probably moot this in the 15-20 years it will take for H.264 to become licensing clear) Of course there are profiles, but they create a lot of confusion: People routinely put out files that others have a hard time playing. Of course, were it not for the licensing Theora wouldn't exist but there would likely be many other codec alternatives with differing CPU/bandwidth/quality tradeoffs. I just wanted to make the point that there are other considerations which have been ignored simply because the licensing issue is so overwhelmingly significant, but if it weren't we'd still have many things to discuss. The subject does bring me to a minor nit on Ian's decent state of affairs message: One of the listed problems is lack of hardware support. I think Ian may be unwittingly be falling to a common misconception. This is a real issue, but it's being misdescribed— it would be more accurately and clearly stated as lack of software support on embedded devices. Although people keep using the word hardware in this context I believe that in 999 times out of 1000 they are mistaken in doing so. As far as I, or anyone I've spoken to, can tell no one is actually doing H.264 decode directly in silicon, at least no one with a web browser. The closest thing to that I see are small microcoded DSPs which you buy pre-packaged with codec software and ready to go. I'm sure someone can correct me if I'm mistaken. There are a number of reasons for this such as the rapid pace of codec development vs ASIC design horizons, and the mode switching heavy nature of modern codecs (H.264 supports many mixtures of block sizes, for example) simply requiring a lot of chip real-estate if implemented directly in hardware. In some cases the DSP is proprietary and not sufficiently open for other software. But at least in the mobile device market it appears to be the norm to use an off the shelf general purpose DSP. This is a very important point because the hardware doesn't support it sounds like an absolute deal breaker while No one has bothered porting Theora to the TMS320c64x DSP embedded in the OMAP3 CPU used in this handheld device is an obviously surmountable problem. In the future, when someone says no hardware support it would be helpful to find out if they are talking about actual hardware support or just something they're calling hardware because it's some mysterious DSP running a vendor-blob that they themselves aren't personally responsible for programming... or if they are just regurgitating common wisdom. Cheers!
Re: [whatwg] Codecs for audio and video
On Tue, Jun 30, 2009 at 10:43 AM, Gregory Maxwellgmaxw...@gmail.com wrote: No one has bothered porting Theora to the TMS320c64x DSP embedded in the OMAP3 CPU used in this handheld device is an obviously surmountable problem. Unless I'm mistaken about the DSP in question, that work is in fact underway, and should bear fruit in the next handful of months. Mike
Re: [whatwg] Codecs for audio and video
Ian Hickson wrote: On Tue, 30 Jun 2009, Matthew Gregan wrote: Is there any reason why PCM in a Wave container has been removed from HTML 5 as a baseline for audio? Having removed everything else in these sections, I figured there wasn't that much value in requiring PCM-in-Wave support. However, I will continue to work with browser vendors directly and try to get a common codec at least for audio, even if that is just PCM-in-Wave. Please, please do so - I was shocked to read that PCM-in-Wave as the minimal 'consensus' container for audio is under threat of removal, too. Frankly, I don't understand why audio was drawn into this. Is there any patent issue with PCM-in-Wave? If not, then IMHO the decision should be orthogonal to video. -- Markus
Re: [whatwg] Codecs for audio and video
Gregory Maxwell wrote: PCM in wav is useless for many applications: you're not going to do streaming music with it, for example. It would work fine for sound effects... The world in which web browsers live is quite a bit bigger than internet and ordinary consumer use combined... Browser-based intranet applications for companies working with professional audio or speech are but one example. Please see my earlier contributions to this list for more details. but it still is more code to support, a lot more code in some cases depending on how the application is layered even though PCM wav itself is pretty simple. And what exactly does PCM wav mean? float samples? 24 bit integers? 16bit? 8bit? ulaw? big-endian? 2 channel? 8 channel? Is a correct duration header mandatory? To give one specific point in this matrix: 16-bit integer samples, little-endian, 1 channel, correct duration header not mandatory. This is relevant in practice in what we do. I can't speak for others. It would be misleading to name a 'partial baseline'. If the document can't manage make a complete workable recommendation, why make one at all? I disagree. Why insist on perfection here? In my view, the whole of HTML 5 as discussed here is about reasonable compromises that can be supported now or pretty soon. As the browsers which already support PCM wav (e.g. Safari, Firefox) show, it isn't impossible to get this right. Regards, -- Markus
Re: [whatwg] Codecs for audio and video
Assuming bandwidth will increase with technological advance, it seems unreasonable that the bandwidth issue is allowed to block fallback solutions such as PCM within a specification that is expected to live longer than three years from now. IMHO, Chris
Re: [whatwg] Codecs for audio and video
On Tue, Jun 30, 2009 at 12:50 AM, Ian Hicksoni...@hixie.ch wrote: I considered requiring Ogg Theora support in the spec, since we do have three implementations that are willing to implement it, but it wouldn't help get us true interoperabiliy, since the people who are willing to implement it are willing to do so regardless of the spec, and the people who aren't are not going to be swayed by what the spec says. Why can't you make support for Theora and Vorbis a should requirement? That wouldn't be misleading, especially if worded right. It would serve as a hint to future implementers who might not be familiar with this whole sordid format war. It would also hopefully help put more emphasis on Ogg and get more authors to view lack of Ogg support as a deficiency or bug to be worked around, thus encouraging implementers to support it. It's only about two lines total -- what's the downside? Proselytism is a valid reason to add material to the spec, right? Certainly I recall you mentioning that in the case of alt text -- you didn't want to allow alt text to be omitted in general lest it discourage authors from using it. I think it's clear that of the two contenders for video, Theora is a much closer fit to HTML 5's goal of open standards and deserves whatever support is possible without sacrificing other goals (like accuracy).
Re: [whatwg] Codecs for audio and video
Peter Kasting wrote: As a contributor to multiple browsers, I think it's important to note the distinctions between cases like Acid3 (where IIRC all tests were supposed to test specs that had been published with no dispute for 5 years), much of HTML5 (where items not yet implemented generally have agreement-on-principle from various vendors) and this issue, where vendors have publicly refused to implement particular cases. Particular specs in the first two cases represent vendor consensus, and when vendors discover problems during implementation the specs are changed. This is not a case where vendor consensus is currently possible (despite the apparently naive beliefs on the part of some who think the vendors are merely ignorant and need education on the benefits of codec x or y), and just put it in the spec to apply pressure is not a reasonable response. I don't know that anyone has suggested putting it in the spec *only* to apply pressure to vendors. Certainly that is an added bonus (I'll put that in quotes because not everyone will consider that a positive thing), and certainly doing so will achieve the goal of applying pressure. But I agree that putting it in the spec to *only* apply pressure to vendors is not reasonable, but considering it as an additional reason to put it in the spec, is quite reasonable. -- Jeff McAdams je...@iglou.com
Re: [whatwg] Codecs for audio and video
2009/6/30 Peter Kasting pkast...@google.com On Jun 30, 2009 2:17 AM, Sam Kuper sam.ku...@uclmail.net wrote: 2009/6/30 Silvia Pfeiffer silviapfeiff...@gmail.com On Tue, Jun 30, 2009 at 2:50 PM, Ian Hicksoni...@hixie.ch wrote: I considered requiring Og... Right. Waiting for all vendors to support the specified codec would be like waiting for them all to be Acid3 compliant. Better to specify how browsers should behave (especially if it's how most of them will behave), and let the stragglers pick up the slack in their own time under consumer pressure. Sam As a contributor to multiple browsers, I think it's important to note the distinctions between cases like Acid3 (where IIRC all tests were supposed to test specs that had been published with no dispute for 5 years), much of HTML5 (where items not yet implemented generally have agreement-on-principle from various vendors) and this issue, where vendors have publicly refused to implement particular cases. [...] I'd question, based on the following statements, whether your memory of Acid3 is correct: Controversially, [Acid3] includes several elements from the CSS2 recommendation that were later removed in CSS2.1 but reintroduced in W3C CSS3 working drafts that have not made it to candidate recommendations yet.[1] The following standards are tested by Acid3: [...] * SMIL 2.1 (subtests 75-76) [...][1] SMIL 2.1 became a W3C Recommendation in December 2005.[2] [1] http://en.wikipedia.org/wiki/Acid3 [2] http://en.wikipedia.org/wiki/Synchronized_Multimedia_Integration_Language#SMIL_2.1 So, there is some precedent for the W3C to publish specs/tests, expecting browser vendors to catch up with them further down the line. Sam
Re: [whatwg] Codecs for audio and video
On Wed, Jul 1, 2009 at 7:15 AM, Peter Kasting pkast...@google.com wrote: As a contributor to multiple browsers, I think it's important to note the distinctions between cases like Acid3 (where IIRC all tests were supposed to test specs that had been published with no dispute for 5 years), much of HTML5 (where items not yet implemented generally have agreement-on-principle from various vendors) and this issue, where vendors have publicly refused to implement particular cases. Particular specs in the first two cases represent vendor consensus, and when vendors discover problems during implementation the specs are changed. It's not true that all the specs tested in Acid3 represented vendor consensus. For example, a lot of browser people were skeptical of the value of SVG Animation (SMIL), but it was added to Acid3. That was a clear example of something being implemented primarily because of pressure from specifications and tests. It's true, though, that no-one flat-out refused to implement it, so that situation isn't quite the same. Personally I think it's appropriate to use specs to exert some pressure. We've always done it. Flat-out refusal of a vendor to implement something is a problem, but I assume there are limits to how much we allow that to affect the process. If Microsoft suddenly announces they hate HTML5 and won't implement any of it, would we just throw it all out? If we are going to allow individual vendors to exert veto power, at least lets make them accountable. Let's require them to make public statements with justifications instead of passing secret notes to Hixie. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Codecs for audio and video
* I didn't say 5 years from Rec status * Acid3 was meant to be an illustrative example of a case where the test itself was not intentionally introducing new behavior or attempting to force consensus on unwilling vendors, not a perfect analogy to something PK On Jun 30, 2009 12:36 PM, Sam Kuper sam.ku...@uclmail.net wrote: 2009/6/30 Peter Kasting pkast...@google.com On Jun 30, 2009 2:17 AM, Sam Kuper sam.ku...@uclmail.net wrote: 2009/6/30 Silvia Pfeiffe... As a contributor to multiple browsers, I think it's important to note the distinctions between cases like Acid3 (where IIRC all tests were supposed to test specs that had been published with no dispute for 5 years), much of HTML5 (where items not yet implemented generally have agreement-on-principle from various vendors) and this issue, where vendors have publicly refused to implement particular cases. [...] I'd question, based on the following statements, whether your memory of Acid3 is correct: Controversially, [Acid3] includes several elements from the CSS2 recommendation that were later removed in CSS2.1 but reintroduced in W3C CSS3 working drafts that have not made it to candidate recommendations yet.[1] The following standards are tested by Acid3: [...] * SMIL 2.1 (subtests 75-76) [...][1] SMIL 2.1 became a W3C Recommendation in December 2005.[2] [1] http://en.wikipedia.org/wiki/Acid3 [2] http://en.wikipedia.org/wiki/Synchronized_Multimedia_Integration_Language#SMIL_2.1 So, there is some precedent for the W3C to publish specs/tests, expecting browser vendors to catch up with them further down the line. Sam
Re: [whatwg] Codecs for audio and video
Peter Kasting wrote: There is no other reason to put a codec in the spec -- the primary reason to spec a behavior (to document vendor consensus) does not apply. Some vendors agreed, and some objected violently is not consensus. But Most people agreed, and one or two vendors objected violently probably is. Just because one or two people are really loud, doesn't mean that there isn't concensus. I'm not saying that this is the case, here, but it is possible. Also, I find the focus on vendors to the exclusion of other stakeholders a bit concerning. -- Jeff McAdams je...@iglou.com