Re: [whatwg] Remove addCueRange/removeCueRanges
Hi, I see no reason why they should not be applicable to data URIs when it is obvious that the data URI is a media file. This has not yet been discussed, but would be an obvious use case. OK. That would be welcome - although there could be syntactic problems as where to place fragment parameters. BTW: Did the start and end attribute implementations that you refer to cover the data scheme, too? Yes, on Apple's Safari at the time. I had a working prototype which now longer works. Or if you really wanted to do it in javascript, you'd only need to reload the resource: Of course we want to do this dynamically in JavaScript - IMHO it would be the norm not the exeception to select fragments based on user input. Precomputed fragments are of limited use. I don't quite understand why the dynamic case is so often underrepresented in these discussions... http://open.bbc.co.uk/rad/demos/html5/rdtv/episode2/index.html This example from the BBC shows how to dynamically jump to fragments based on user input by setting the currentTime of the video. I don't see a difference between using the currentTime and using start and end. Precision is influenced more strongly by the temporal resolution of the decoding pipeline rather than the polling resolution for currentTime. I agree regarding 'start'. W.r.t. 'end', the difference is quite simple IMHO: when treated as a declarative description as to where audio has to stop, it is up to the UA to implement it correctly (there could be some friendly browser competition as to who is most accurate...). I see no practical reason why this could not be done sample-accurately for media types such as 16-bit PCM WAVE that support sample-accurate work. On the other hand, 'currentTime' has no such semantics. So you would be looking at a bewildering array of factors influencing temporal resolution, including JS machine speed, and you could not make any guarantees to your customer. In speech applications, sub-millisecond resolution matters as to whether you hear an audible artifact or not. If your task would be, say, to remove such artifacts from the signal, a 'currentTime'-based solution in all likelihood would not give reproducible results. I doubt the previous implementations of start and end gave you a 3 sample accurate resolution even for wav files. 'end' was quite inaccurate for Safari, but this may be due to a buffering issue - to this day, Safari's audio latency when doing something like onmouseover='audio.play(); is much higher than Mozilla Firefox. So, to my mind it seems a solvable UA implementation issue, not a problem with semantics. Kind regards, -- Markus _ SVOX AG, Baslerstr. 30, CH-8048 Zürich, Switzerland Dr. Markus Walther, Software Engineer Speech Technology Tel.: +41 43 544 06 36 Fax: +41 43 544 06 01 Mail: walt...@svox.com This e-mail message contains confidential information which is for the use of the addressee(s) only. Please notify the sender by return e-mail or call us immediately, if you erroneously received this e-mail.
Re: [whatwg] Remove addCueRange/removeCueRanges
Max Romantschuk wrote: I'll chime in here, having done extensive work with audio and video codecs. With current codec implementations getting sample- or frame-accurate resolution is largely a pipe dream. (Outside of the realm of platforms dedicated to content production and playback.) Especially for video there can be several seconds between keyframes, frame-accurate jumps requiring complex buffering tricks. Quick feedback: I completely agree it is illusory to guarantee sample-accuracy accross codecs, and never meant to imply such a requirement. The much weaker goal I would propose is to support at least one simple lossless audio format in this regard (I am not qualified to comment on the video case). Simple means 'simple to generate, simple to decode', and PCM WAVE meets these requirements, so would be an obvious candidate. For that candidate at least I think one could give sample-accurate implementations of subinterval selection - tons of audio applications demonstrate this is possible. -- Markus _ SVOX AG, Baslerstr. 30, CH-8048 Zürich, Switzerland Dr. Markus Walther, Software Engineer Speech Technology Tel.: +41 43 544 06 36 Fax: +41 43 544 06 01 Mail: walt...@svox.com This e-mail message contains confidential information which is for the use of the addressee(s) only. Please notify the sender by return e-mail or call us immediately, if you erroneously received this e-mail.
Re: [whatwg] Remove addCueRange/removeCueRanges
Hi, The .start/.end properties were dropped in favor of media fragments, which the Media Fragments Working Group is producing a spec for. Who decided this? Has this decision been made public on this list? It will be something like http://www.example.com/movie.mov#t=12.33,21.16 var audioObject = new Audio(); audioObject.src ='data:audio/x-wav;base64,UklGRiIAAABXQVZFZm10IBABAAEAIlYAAESsAAACABAAZGF0Yf7///8A'; // play entire audio audioObject.play(); // play (0.54328,0.72636) media fragment ? See http://www.w3.org/2008/01/media-fragments-wg.html and http://www.w3.org/2008/WebVideo/Fragments/wiki/Syntax#Examples Did you look at these yourself? I couldn't find something that approaches a spec of comparable quality to WHATWG in these pages. Is there any provision for the dynamic case, where you want to change the media fragment after it has been loaded, with zero server interaction, and working for data URIs as well? Actually, out of curiousity: could gapless concatenation of several audio objects be added as well, e.g. audioObject1.append(audioObject2) or even audioObject.join([audioObject1,audioObject2,...,audioObjectN) There has been much discussion about audio canvas API:s and I trust this could fit into that scope. As the 'inventor' of the term, I am of course familiar with the discussion - here I was merely adding an item to the wishlist. View source at http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#video and search for v2 and you'll find some of these ideas. Could these be lifted from hidden HTML comments to something with better visibility somehow? -- Markus
Re: [whatwg] Remove addCueRange/removeCueRanges
Silvia, 2009/8/13 Dr. Markus Walther walt...@svox.com: please note that with cue ranges removed, the last HTML 5 method to perform audio subinterval selection is gone. Not quite. You can always use the video.currentTime property in a javascript to directly jump to a time offset in a video. And in your javascript you can check this property until it arrives at your determined end time. So, there is a way to do this even now. How can polling approach that somehow monitors currentTime meet any halfway-decent accuracy requirements? E.g. to be accurate to 1-3 samples at 22050 Hz sampling frequency? I doubt your approach could fulfill this. To my mind, the current turn of events suggests simply to allow start/end attributes back into the WHATWG spec, eased by the fact that there were already browser implementations of it. -- Markus
Re: [whatwg] Remove addCueRange/removeCueRanges
Silvia Pfeiffer wrote: 2009/8/14 Dr. Markus Walther walt...@svox.com: Hi, The .start/.end properties were dropped in favor of media fragments, which the Media Fragments Working Group is producing a spec for. Who decided this? Has this decision been made public on this list? It will be something like http://www.example.com/movie.mov#t=12.33,21.16 var audioObject = new Audio(); audioObject.src ='data:audio/x-wav;base64,UklGRiIAAABXQVZFZm10IBABAAEAIlYAAESsAAACABAAZGF0Yf7///8A'; // play entire audio audioObject.play(); // play (0.54328,0.72636) media fragment ? Not in this way. In fact, the new way will be much much simpler and does not require javascript. With the code snippet given I was pointing out that it is not obvious (to me at least) how the proposed media fragment solution covers data URIs. If it is not meant to cover them, it is limited in a way that the solution it seeks to replace is not. Or if you really wanted to do it in javascript, you'd only need to reload the resource: Of course we want to do this dynamically in JavaScript - IMHO it would be the norm not the exeception to select fragments based on user input. Precomputed fragments are of limited use. I don't quite understand why the dynamic case is so often underrepresented in these discussions... -- Markus
Re: [whatwg] Remove addCueRange/removeCueRanges
Hi, please note that with cue ranges removed, the last HTML 5 method to perform audio subinterval selection is gone. AFAIK, when dropping support for 'start' and 'end' attributes it was noted on this list that cue ranges would provide a replacement to dynamically select, say, a 3-second range from a 1-hour audio source. So, if cue ranges will indeed be dropped, could browser vendors and standards people consider putting 'start' and 'end' back in, just like Safari had it for a while (albeit buggy)? Actually, out of curiousity: could gapless concatenation of several audio objects be added as well, e.g. audioObject1.append(audioObject2) or even audioObject.join([audioObject1,audioObject2,...,audioObjectN) Just my 2c. --Markus Philip Jägenstedt wrote: Hi, We would like to request that addCueRange/removeCueRanges be dropped from the spec before going into Last Call. We are not satisfied with it and want to see it replaced with a solution that includes (scriptless) timed text (a.k.a captions/subtitles). I don't think that this will be finished in time for Last Call however, because we need implementor experience to write a good spec. However, we have no intention of implementing both cue ranges and its replacement, so it is better if the spec doesn't provide any solution for now. I have been briefly in contact with other browser vendors and while I cannot speak for them here, I hope those that agree will chime in if necessary.
Re: [whatwg] Codecs for audio and video
Ian Hickson wrote: On Tue, 30 Jun 2009, Matthew Gregan wrote: Is there any reason why PCM in a Wave container has been removed from HTML 5 as a baseline for audio? Having removed everything else in these sections, I figured there wasn't that much value in requiring PCM-in-Wave support. However, I will continue to work with browser vendors directly and try to get a common codec at least for audio, even if that is just PCM-in-Wave. Please, please do so - I was shocked to read that PCM-in-Wave as the minimal 'consensus' container for audio is under threat of removal, too. Frankly, I don't understand why audio was drawn into this. Is there any patent issue with PCM-in-Wave? If not, then IMHO the decision should be orthogonal to video. -- Markus
Re: [whatwg] Codecs for audio and video
Gregory Maxwell wrote: PCM in wav is useless for many applications: you're not going to do streaming music with it, for example. It would work fine for sound effects... The world in which web browsers live is quite a bit bigger than internet and ordinary consumer use combined... Browser-based intranet applications for companies working with professional audio or speech are but one example. Please see my earlier contributions to this list for more details. but it still is more code to support, a lot more code in some cases depending on how the application is layered even though PCM wav itself is pretty simple. And what exactly does PCM wav mean? float samples? 24 bit integers? 16bit? 8bit? ulaw? big-endian? 2 channel? 8 channel? Is a correct duration header mandatory? To give one specific point in this matrix: 16-bit integer samples, little-endian, 1 channel, correct duration header not mandatory. This is relevant in practice in what we do. I can't speak for others. It would be misleading to name a 'partial baseline'. If the document can't manage make a complete workable recommendation, why make one at all? I disagree. Why insist on perfection here? In my view, the whole of HTML 5 as discussed here is about reasonable compromises that can be supported now or pretty soon. As the browsers which already support PCM wav (e.g. Safari, Firefox) show, it isn't impossible to get this right. Regards, -- Markus
Re: [whatwg] video tag : loop for ever
Silvia Pfeiffer wrote: I believe your use case of creating an adio editor through using the audio tag is a bit far fetched. I don't think it lends itself to that kind of functionality. Your belief is fine with me - you haven't seen the prototype running on Safari ;-) You would not use the img tag to implement a picture editor either. This is a non-compelling analogy that I already discussed on the list - IMHO it's simply a matter of taste whether to proliferate HTML elements or extend the API of an existing element a bit. For what it's worth, people _could_ have extended img instead of going for canvas ... As for start/end attributes - I still believe that a javascript API towards changing start and end times for playback is much more appropriate than changing attributes and expecting the media framework to react to the changed attribute values. If the main usecase for you is dynamic and not static, then you should have an interface that has direct access to the video controls (i.e. directly run a function) rather than going through an attribute indirection (i.e. change state, which needs to trigger a function). Note that this does not imply a roundtrip to the server. My use cases are neutral to the finer points you raise here - I would simply do a pause() before setting start/end to new values and calling play() again. If necessary, there could be restrictions in the spec on resetting those values during playback, or no guarantees of audible glitches from the UA. Either way is fine, just not dropping start/end altogether. --Markus
[whatwg] Web-based dynamic audio apps - WAS: Re: video tag : loop for ever
Eric Carlson wrote: Imagine e.g. an audio editor in a browser and the task play this selection of the oscillogram... Why should such use cases be left to the Flash 10 crowd (http://www.adobe.com/devnet/flash/articles/dynamic_sound_generation.html)? I for one want to see them become possible with open web standards! I am anxious to see audio-related web apps appear too, I just don't think that including 'start' and 'end' attributes won't make them significantly easier to write. I did a tiny prototype of the above use case - audio editor in a browser - and it would have been significantly easier to write, if not Apple's Safari had a bad and still unfixed bug in the implementation of 'end' ... (http://bugs.webkit.org/show_bug.cgi?id=19305) In addition, cutting down on number of HTTP transfers is generally advocated as a performance booster, so the ability to play sections of a larger media file using only client-side means might be of independent interest. The 'start' and 'end' attributes, as currently defined in the spec, only limit the portion of a file that is played - not the portion of a file that is downloaded. I know that, but for me that's not the issue at all. The issue is _latency_. How long from a user action to audible playback - that's what's relevant to any end user. You can't do responsive audio manipulation in the browser without fast, low-latency client-side computation. All the server-side proposals miss this crucial point. For another use case, consider web-based tools for DJs, for mixing and combining audio clips. There's a lot of clips on the web. But if manipulating them is not realtime enough, people won't care. For another use case, consider web-based games with dynamic audio, etc. Robert O'Callahan wrote: On Fri, Oct 17, 2008 at 5:24 AM, Dr. Markus Walther [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Imagine e.g. an audio editor in a browser and the task play this selection of the oscillogram... Why should such use cases be left to the Flash 10 crowd (http://www.adobe.com/devnet/flash/articles/dynamic_sound_generation.html)? If people go in that direction they won't be using cue ranges etc, they'll be using dynamic audio generation, which deserves its own API. And I proposed the beginnings of such an API in several postings on this list under the topic 'audio canvas', but it seemingly met with little interest. Now Flash 10 has some of the things I proposed... maybe that's a louder voice? OK, in principle you could use audio with data:audio/wav, but that would be crazy. Then again, this is the Web so of course people will do that. I did exactly that in my tiny audio-editor prototype for proof-of-concept purposes - I guess I must be crazy :-) Actually it was partly a workaround for browser bugginess, see above. Give me an API with at least float getSample(long samplePosition) putSample(long samplePosition, float sampleValue) play(long samplePositionStart, unsigned long numSamples), and sanity will be restored ;-) The current speed race w.r.t. the fastest JavaScript on the planet will then take care of the rest. Silvia Pfeiffer wrote: Linking to a specific time point or section in a media file is not something that needs to be solved by HTML. It is in fact a URI issue and is being developed by the W3C Media Fragments working group. If you use a URI such as http://example.com/mediafile.ogv#time=12-30 in the src attribute of the video element, you will not even have to worry about start and end attributes for the video element. Unless Media Fragments can be a) set dynamically for an already downloaded media file _without triggering re-download_, b) time specification can be accurate to the individual sample for the case of audio, c) W3C finishes this quickly enough and d) browsers take the W3C recommendation seriously, it is not an alternative for my use cases. It's all about dynamic audio and the future. By the time the spec hits the market, static media is not the only thing on the web anymore. Jonas Sicking wrote: The problem with relying on cues is that audio plays a lot faster than we can gurentee that cue-callbacks will happen. So if you for example create a audio file with a lot of sound effects back to back it is possible that a fraction of a second of the next sound will play before the cue-callback is able to stop it. If I understand this correctly, cue callback delay would potentially make it impossible to have precise audio intervals, needed for the above use cases. But _then_ replacing 'start' and 'end' with cue ranges and 'currentTime' is NOT possible, because they are no longer guaranteed to be equivalent in terms of precision. It seems the arguments are converging more towards keeping 'start' and 'end' in the spec. -- Markus
Re: [whatwg] video tag : loop for ever
Eric Carlson wrote: On Oct 15, 2008, at 8:31 PM, Chris Double wrote: On Thu, Oct 16, 2008 at 4:07 PM, Eric Carlson [EMAIL PROTECTED] wrote: However I also think that playing just a segment of a media file will be a common use-case, so I don't think we need start and end either. How would you emulate end via JavaScript in a reasonably accurate manner? With a cue point. If I have a WAV audio file and I want to start and stop between specific points? For example a transcript of the audio may provide the ability to play a particular section of the transcript. If you use a script-based controller instead of the one provided by the UA, you can easily limit playback to whatever portion of the file you want: SetTime: function(time) { this.elem.currentTime = (timethis._minTime) ? this._minTime : (timethis._maxTIme?this._maxTIme:time); } IMHO, using 'currentTime' and cue ranges is - while technically possible - a more cumbersome and roundabout way to delimitate a single audio interval than just using 'start' and 'end' attributes. I advocate keeping the simple way to do it, with 'start' and 'end', in the spec. Also, since you just showed how it can be implemented using cue ranges and currentTime, having a second, simpler interface (for the case of a single interval) should be cheap in terms of implementation cost, if you plan to implement the other one anyway. I agree that it is more work to implement a custom controller, but it seems a reasonable requirement given that this is likely to be a relatively infrequent usage pattern. How do you know this will be infrequent? Or do you think that people will frequently want to limit playback to a section of a media file? Yes, I think so - if people include those folks working with professional audio/speech/music production. More specifically the innovative ones among those, who would like to see audio-related web apps to appear. Imagine e.g. an audio editor in a browser and the task play this selection of the oscillogram... Why should such use cases be left to the Flash 10 crowd (http://www.adobe.com/devnet/flash/articles/dynamic_sound_generation.html)? I for one want to see them become possible with open web standards! In addition, cutting down on number of HTTP transfers is generally advocated as a performance booster, so the ability to play sections of a larger media file using only client-side means might be of independent interest. -- Markus
Re: [whatwg] Audio canvas?
I think an interesting approach for an audio canvas would be to allow you to both manipulate audio data directly (through a getSampleData/putSampleData type interface), but also build up an audio filter graph, both with some predefined filters/generators and with the ability to do filters in javascript. Would make for some interesting possibilities, esp. if it's able to take audio as input. I entirely agree. In my own proposal sofar I only mentioned simple time-domain ops (cut/add silence/fade) as filters - what would be the filters/generators on your wishlist? -- Markus
Re: [whatwg] Audio canvas?
ddailey wrote: I recall a little app called soundEdit (I think) that ran in the Mac back in the mid 1980's. I think it was shareware (at least it was ubiquitous). The editing primitives were fairly cleanly defined and, had a reasonable metaphoric correspondence to the familiar drawing actions. There was a thing where you could grab a few seconds of sound and copy it and paste it; you could drag and drop; you could invert (by just subtracting each of the tones from a ceiling) you could reverse (by inverting the time axis). You could even go in with your mouse and drag formants around. It was pretty cool. It would not be a major task for someone to standardize such an interface and I believe any patents would be expired by now. No need to go to particular _applications_ for inspirations when libraries developed with some generality in mind (e.g. http://www.speech.kth.se/snack/man/snack2.2/tcl-man.html) can serve as inspiration already. A carefully chosen subset of Snack might be a good start. David - Original Message - From: Dave Singer [EMAIL PROTECTED] To: whatwg@lists.whatwg.org Sent: Wednesday, July 16, 2008 2:25 PM Subject: Re: [whatwg] Audio canvas? At 20:18 +0200 16/07/08, Dr. Markus Walther wrote: get/setSample(samplePoint t, sampleValue v, channel c). For the sketched use case - in-browser audio editor -, functions on sample regions from {cut/add silence/amplify/fade} would be nice and were mentioned as an extended possibility, but that is optional. I don't understand the reference to MIDI, because my use case has no connection to musical notes, it's about arbitrary audio data on which MIDI has nothing to say. get/set sample are 'drawing primitives' that are the equivalent of get/setting a single pixel in images. Yes, you can draw anything a pixel at a time, but it's mighty tedious. You might want to lay down a tone, or some noise, or shape the sound with an envelope, or do a whole host of other operations at a higher level than sample-by-sample, just as canvas supports drawing lines, shapes, and so on. That's all I meant by the reference to MIDI. I see. However, to repeat what I said previously: audio =/= music. The direction you're hinting at would truly justify inventing a new element, since it sounds like it's specialized to synthesized music. But that's a pretty narrow subset of what audio encompasses. Regarding the tediousness of doing things one sample at a time I agree, but maybe it's not as bad as it sounds. It depends on how fast JavaScript gets, and Squirrelfish is a very promising step (since the developers acknowledge they learnt the lessions from Lua, the next acceleration step could be to copy ideas from luajit, the extremely fast Lua-to-machine-code JIT compiler). If it gets fast enough, client-side libraries could do amazing stuff using sample-at-a-time primitives. Still, as I suggest above, a few higher-level methods could be useful, -- Markus
[whatwg] Audio canvas?
I have noted an asymmetry between canvas and audio: canvas supports loading of ready-made images _and_ pixel manipulation (get/putImageData). audio supports loading of ready-made audio but _not_ sample manipulation. With browser JavaScript getting faster all the time (Squirrelfish...), audio manipulation in the browser is within reach, if supported by rich enough built-in objects. Minimally, sample-accurate methods would be needed to - get/set a sample value v at sample point t on channel c from audio - play a region from sample point t1 to sample point t2 (Currently, everything is specified using absolute time, so rounding errors might prevent sample-accurate work). More powerful methods might cut/add silence/amplify/fade portions of audio in a sample-accurate way. It would be OK if this support were somewhat restricted, e.g. only for certain uncompressed audio formats such as PCM WAVE. Question: What do people think about making audio more like canvas as sketched above? -- Markus
Re: [whatwg] Audio canvas?
My understanding of HTMLMediaElement is that the currentTime, volume and playbackRate properties can be modified live. So in a way Audio is already like Canvas : the developer modify things on the go. There is no automated animations/transitions like in SVG for instance. Doing a cross fade in Audio is done exactly the same way as in Canvas. That's not what I described, however. Canvas allows access to the most primitive element with which an image is composed, the pixel. Audio does not allow access to the sample, which is the equivalent of pixel in the sound domain. That's a severe limitation. Using tricks with data URIs and a known simple audio format such as PCM WAVE is no real substitute, because JavaScript strings are immutable. It is unclear to me why content is still often seen as static by default - if desktop apps are moved to the browser, images and sound will increasingly be generated and modified on-the-fly, client-side. And if you're thinking special effects ( e.g.: delay, chorus, flanger, pass band, ... ) remember that with Canvas, advanced effects require trickery and to composite multiple Canvas elements. I have use cases in mind like an in-browser audio editor for music or speech applications (think 'Cooledit/Audacity in a browser'), where doing everything server-side would be prohibitive due to the amount of network traffic. --Markus
Re: [whatwg] Audio canvas?
Thanks for all the feedback sofar! Dave Singer wrote: As others have pointed out, I think you're asking for a new element, where you can 'draw' audio as well as pre-load it, just like canvas where you can load pictures and also draw them. This is not the audio element, any more than canvas is the img element. Not sure I agree. Your line of reasoning in general leads to a proliferation of elements, whereas my proposal to extend audio makes that same element more powerful. I guess it's more a matter of aesthetics which approach is better. It's an interesting idea, but you'd have to answer 'what are your drawing primitives', and so on. More, when creating visual content, you are drawing on spatial axes, whereas in audio you are creating or modifying samples, which lie themselves on a temporal axis. I agree and I think I pointed that out already in my initial posting. I'm guessing that something like MIDI would be drawing primitives, but overall this idea would seem to need a lot of working out... Again in that initial posting I was quite specific about an initial set of 'drawing' primitives - audio-manipulation primitives -, minimally get/setSample(samplePoint t, sampleValue v, channel c). For the sketched use case - in-browser audio editor -, functions on sample regions from {cut/add silence/amplify/fade} would be nice and were mentioned as an extended possibility, but that is optional. I don't understand the reference to MIDI, because my use case has no connection to musical notes, it's about arbitrary audio data on which MIDI has nothing to say. -- Markus