Re: [whatwg] re-thinking cue ranges
On Wed, 9 Jul 2008, Dave Singer wrote: One of the features proposed for the next version of the video API is chapter markers and other embedded timed metadata, with corresponding callbacks for authors to hook into. Would that resolve the problem you mention? It may be that if we can define a way to embed cue-range-generating meta-data in the media resource, with an abstract 'api' to get it out, we'd deal with the only add by script issue here, yes. Ok. Overall, we remain concerned that typically it is the media author who would define what the ranges are, not really the page or particularly the script author. Media authors tend not to be happy writing scripts. I totally agree, but that's what the in-media annotations, and future APIs that deal with them, are for. JavaScript is really the only concern from HTML5's point of view; if other languages become relevant, they should get specially-crafted APIs for them when it comes to this kind of issue. The problem is that the current API more or less requires use of closures and currying except for trivial cases. We don't think that is good API design even for languages that have them. Perhaps at the very least a cookie could be passed? Done. Secondly this mechanism is not very powerful. You can't do anything else with the ranges besides receiving callbacks and removing them. You can't modify them. They are not visible to scripts or CSS. You can't link to them. You can't link out from them. I'm not sure what it would really mean to link to or from a range, unless you turned the entire video into a link, in which case you can just wrap the video in an a href= element for the duration of the range, using script. Linking into a cue-range would be using its beginning or end as a seek point, or its duration as a restricted view of the media (only show me cue-range called InTheBathroom). Linking out of a cue-range would be establishing a click-through URL that would be dispatched directly if the user clicked on the media during that range (dispatched without script). We agree that neither of these should be in scope now, but it would be nice to have a framework that could be extended to cover these, in future. Jumping into a point of video is supported using other aspects of the API (setting 'currentTime'); looping a certain part similarly has a dedicated API ('loopStart' etc). I don't know that we'd ever want to use the cue ranges for those purposes. I don't really understand the use cases. Thirdly, a script is somewhat strange place to define the ranges. A set of ranges usually relates closely to some particular piece of media content. The same set of ranges rarely makes much sense in the context of some other content. It seems that ranges should be defined or supplied along with the media content. For in-band data, callbacks for chapter markers as mentioned earlier seem like the best solution. For out-of-band data, if the ranges are just intended to trigger script, I don't think we gain much from providing a way to mark up ranges semi- declaratively as opposed to just having HTML-based media players define their own range markup and have them implement it using this API. It wouldn't be especially hard. This seems to conflict with the answer (1) above, doesn't it? How so? Fourth, this kind of callback API is pretty strange creature in the HTML specification. The only other callback APIs are things like setTimeout() and the new SQL API which don't have associated elements. Events are the callback mechanism for everything else. Events use callbacks themselves, so it's not that unusual. I don't really think events would be a good interface for this. Consistency is good, but if one can come up with a better API, it's better to use that than just be consistent for the sake of it. It does seem strange that events are right in the spatial domain (mouse enter/exit), but not in the temporal domain. Yet the basic semantic of the english word event, let alone the web meaning, is pretty well exactly matched by what is happening here -- crossing a temporal boundary! Events are well-known and design uniformity suggests that they be used, if nothing else. An event is fired whenever a cue range is entered or exited (timeupdate), but I really don't think events are appropriate for the cue ranges themselves. To start with, it would decouple the range registration from the event registration. It would also mean losing the ability to register event listeners for cue ranges of a particular class rather than all of them. I'm also not sure we really want the whole capture/bubble mechanism for these callbacks, not to mention the ability for one callback to cancel another one, etc. Events just seem like a very blunt and heavy weapon for this task. In SMIL the equivalent
Re: [whatwg] re-thinking cue ranges
On Tue, 2008-07-22 at 09:58 +, Ian Hickson wrote: On Wed, 9 Jul 2008, Dave Singer wrote: On Sat, 12 Jul 2008, Philip Jgenstedt wrote: Like Dave, I am not terribly enthusiastic about the current cue ranges spec, which strikes me adding a fair amount of complexity and yet doesn't solve the basic use case in straightforward manner. What are the use cases you think are basic? It's unclear to me what isn't being solved. Here's one use case, a slide deck: The most obvious use case in my mind is displaying captions/subtitles. I agree that proper events make a lot of sense here instead of callbacks. We could use some new event -- CueEvent maybe -- which would actually include the start and stop times and a reference to the target HTMLMediaElement. I might suggest a modified addCueRange which takes a data argument which is also passed along in the event object. Does the identifier argument address this sufficiently? Yes, it makes sense and should eliminate the need for closures in most cases. -- Philip Jägenstedt Opera Software
Re: [whatwg] re-thinking cue ranges
On Tue, 22 Jul 2008, Philip J�genstedt wrote: What are the use cases you think are basic? It's unclear to me what isn't being solved. Here's one use case, a slide deck: The most obvious use case in my mind is displaying captions/subtitles. I'd much, much ratio subtitles were done by the user agent natively based on captions actually included in the video data. We shouldn't rely on authors to provide accessibility features. Having said that, changing the code I gave in my last e-mail to support captions is pretty trivial -- simply add an exit callback that empties the current subtitles display. The rest is basically the same. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] re-thinking cue ranges
On Tue, 2008-07-22 at 22:00 +, Ian Hickson wrote: On Tue, 22 Jul 2008, Philip Jgenstedt wrote: What are the use cases you think are basic? It's unclear to me what isn't being solved. Here's one use case, a slide deck: The most obvious use case in my mind is displaying captions/subtitles. I'd much, much ratio subtitles were done by the user agent natively based on captions actually included in the video data. We shouldn't rely on authors to provide accessibility features. Given how unreliable embedded subtitles tend to be in desktop media players (at least in my experience) I think it's very likely someone will write an JavaScript SRT parser library to use with this API rather than hoping that the embedded subtitles can be reliably detected in all different combinations of media frameworks and browsers. I guess standardizing on an embedded caption/subtitle format might be possible after we actually have decided on baseline codecs though... Having said that, changing the code I gave in my last e-mail to support captions is pretty trivial -- simply add an exit callback that empties the current subtitles display. The rest is basically the same. Indeed, I expect that some would even abuse the id parameter to pass the caption text directly, although that isn't very elegant. -- Philip Jägenstedt Opera Software
Re: [whatwg] re-thinking cue ranges
Hi Philip, Dave, all, I agree with Philip and Dave that we need a simple way to include the cue ranges concept into video for video authors. As one of the authors of Annodex, I have been meaning to look over the HTML5 video element for a while and examine how it's details works - sorry for my late contribution. In Annodex we created a simple XML markup language called CMML (for Continuous Media Markup Language) that would turn time-continuous data such as audio and video into Web-style documents with the ability to define temporal segments (or events or cues or clips - call them whatever you prefer), attach a description and meta data to it, attach an outgoing hyperlink to it, and address these segments directly through URLs. If this feels almost like a web page, then that's exactly what we intended to achieve. In addition to this author-controlled creation of cue ranges, we also allowed for the creation of temporal hyperlinks, which would point directly to a time-defined (dynamic) segment inside a video. This is now being examined more closely in the new W3C Media Fragments Working Group http://www.w3.org/2008/01/media-fragments-wg.html . But I digress... Taking the definition of cue ranges out of html and including it into the media content itself, but providing a similarly simple markup language to create the segmentation makes a lot of sense. Above everything else, it reduces the complexity of the HTML specification and puts the definition of the segmentation into the hands of the person that would create it: the video content author. But you want to stay flexible with the segmentation since it may be needed in multiple representations: * you may want to have it burnt into the video such that every copy of the video continues retaining the segmentation created by the author - for this case we created a representation of CMML that is a binary interleave of the original video file and the CMML temporally multiplexed into it such that the right right cues are aligned with the video data they refer to. The multiplexing is done to allow for live streaming of such cues with the video within one network connection. This is what we called an Annodex stream (annotated and indexed video). * you may want to keep your cues and associated data in a database and only create the CMML and/or the Annodex stream upon a user request. This is similar to the dynamic creation of a Web page from a database. * or you may indeed want to continue keeping one or more cue range segmentations in separate CMML files aside the original video file to make the cues and annotations for a video available in separation of the video (e.g. for use by a search engine crawler). Imagine Google could index deeply inside a video because the cues and annotations of the video are made available in a standard crawlable format. In such a scenario, all you need to do in the video element is the creation of a set of javascript API calls that can directly make use of the information defined in the CMML file, like is demonstrated in this video: http://au.youtube.com/watch?v=LbWb1dkvm0s The code for this demo is available here: http://svn.annodex.net/browser_plugin/trunk/test/test.html . Notice how the problem of addressing cues has been taken totally out of the javascript API - all we do in javascript is address time offsets. The semantics of the time offsets is stored in the annotations, which can be retrieved using their own javascript API call. Cheers, Silvia. On Sat, Jul 12, 2008 at 4:00 PM, Philip Jägenstedt [EMAIL PROTECTED] wrote: Just to add some of my thought on cue ranges. Like Dave, I am not terribly enthusiastic about the current cue ranges spec, which strikes me adding a fair amount of complexity and yet doesn't solve the basic use case in straightforward manner. If I were a content author and looked at the available options to display subtitles, I would probably simply add a timeupdate event listener and use e.target.currentTime to decide on an action to take. While lexical closures are fun and useful, depending on them isn't terribly nice to those who don't have experience with functional programming (you can use ECMAScript without realizing that it's a function language, so it doesn't count). I agree that proper events make a lot of sense here instead of callbacks. We could use some new event -- CueEvent maybe -- which would actually include the start and stop times and a reference to the target HTMLMediaElement. I might suggest a modified addCueRange which takes a data argument which is also passed along in the event object. If we support external annotations we need some open format for this which all browsers can support. I'm not very familiar with SMIL, but it looks like a Swiss army knife. Perhaps http://www.annodex.net/ is also worth a closer look: CMML is a HTML-like markup language for time-continuous data such as audio/video. Then there's the new
Re: [whatwg] re-thinking cue ranges
Just to add some of my thought on cue ranges. Like Dave, I am not terribly enthusiastic about the current cue ranges spec, which strikes me adding a fair amount of complexity and yet doesn't solve the basic use case in straightforward manner. If I were a content author and looked at the available options to display subtitles, I would probably simply add a timeupdate event listener and use e.target.currentTime to decide on an action to take. While lexical closures are fun and useful, depending on them isn't terribly nice to those who don't have experience with functional programming (you can use ECMAScript without realizing that it's a function language, so it doesn't count). I agree that proper events make a lot of sense here instead of callbacks. We could use some new event -- CueEvent maybe -- which would actually include the start and stop times and a reference to the target HTMLMediaElement. I might suggest a modified addCueRange which takes a data argument which is also passed along in the event object. If we support external annotations we need some open format for this which all browsers can support. I'm not very familiar with SMIL, but it looks like a Swiss army knife. Perhaps http://www.annodex.net/ is also worth a closer look: CMML is a HTML-like markup language for time-continuous data such as audio/video. Then there's the new http://www.w3.org/2008/01/media-annotations-wg.html which has a relevant-sounding name, but I'm not sure it really is. Finally, has any browser implemented the current cue ranges API yet? If not, it's not too late to come up with something that we can all feel a bit happier about. Philip On Wed, 2008-07-09 at 10:37 +0200, Dave Singer wrote: OK, some comments back on the cue range design. Sorry for the summer-vacation-induced delay in response! At 1:00 + 12/06/08, Ian Hickson wrote: In the current HTML5 draft cue ranges are available using a DOM API. This way of doing ranges is less than ideal. First of all, it is hard to use. The ranges must be added by script, can't be supplied with the media, and the callbacks are awkward to handle. The only way to identify the range a received callback applies to is by creating not one but two separate functions for each range: one for enter, one for exit. While creating functions on-demand is easy in JavaScript it does fall under advanced techniques that most authors will be unfamiliar with. One of the features proposed for the next version of the video API is chapter markers and other embedded timed metadata, with corresponding callbacks for authors to hook into. Would that resolve the problem you mention? It may be that if we can define a way to embed cue-range-generating meta-data in the media resource, with an abstract 'api' to get it out, we'd deal with the only add by script issue here, yes. The others, not so much. Using elements makes ranges identifiable, traversable and modifiable by using familiar APIs and concepts. However it is true that there are other ways to get some of the same functionality. Unless the elements have some non-scripting functionality (like linking) the case is perhaps not totally compelling. Instantiating ranges from custom markup using script is a possibility. Overall, we remain concerned that typically it is the media author who would define what the ranges are, not really the page or particularly the script author. Media authors tend not to be happy writing scripts. This kind of feature is also not available in all languages that might provide access to the DOM API. JavaScript is really the only concern from HTML5's point of view; if other languages become relevant, they should get specially-crafted APIs for them when it comes to this kind of issue. The problem is that the current API more or less requires use of closures and currying except for trivial cases. We don't think that is good API design even for languages that have them. Perhaps at the very least a cookie could be passed? Secondly this mechanism is not very powerful. You can't do anything else with the ranges besides receiving callbacks and removing them. You can't modify them. They are not visible to scripts or CSS. You can't link to them. You can't link out from them. I'm not sure what it would really mean to link to or from a range, unless you turned the entire video into a link, in which case you can just wrap the video in an a href= element for the duration of the range, using script. Linking into a cue-range would be using its beginning or end as a seek point, or its duration as a restricted view of the media (only show me cue-range called InTheBathroom). Linking out of a cue-range would be establishing a click-through URL that would be dispatched directly if the user clicked on the media during that range (dispatched without script). We agree that neither of these should be in
Re: [whatwg] re-thinking cue ranges
OK, some comments back on the cue range design. Sorry for the summer-vacation-induced delay in response! At 1:00 + 12/06/08, Ian Hickson wrote: In the current HTML5 draft cue ranges are available using a DOM API. This way of doing ranges is less than ideal. First of all, it is hard to use. The ranges must be added by script, can't be supplied with the media, and the callbacks are awkward to handle. The only way to identify the range a received callback applies to is by creating not one but two separate functions for each range: one for enter, one for exit. While creating functions on-demand is easy in JavaScript it does fall under advanced techniques that most authors will be unfamiliar with. One of the features proposed for the next version of the video API is chapter markers and other embedded timed metadata, with corresponding callbacks for authors to hook into. Would that resolve the problem you mention? It may be that if we can define a way to embed cue-range-generating meta-data in the media resource, with an abstract 'api' to get it out, we'd deal with the only add by script issue here, yes. The others, not so much. Using elements makes ranges identifiable, traversable and modifiable by using familiar APIs and concepts. However it is true that there are other ways to get some of the same functionality. Unless the elements have some non-scripting functionality (like linking) the case is perhaps not totally compelling. Instantiating ranges from custom markup using script is a possibility. Overall, we remain concerned that typically it is the media author who would define what the ranges are, not really the page or particularly the script author. Media authors tend not to be happy writing scripts. This kind of feature is also not available in all languages that might provide access to the DOM API. JavaScript is really the only concern from HTML5's point of view; if other languages become relevant, they should get specially-crafted APIs for them when it comes to this kind of issue. The problem is that the current API more or less requires use of closures and currying except for trivial cases. We don't think that is good API design even for languages that have them. Perhaps at the very least a cookie could be passed? Secondly this mechanism is not very powerful. You can't do anything else with the ranges besides receiving callbacks and removing them. You can't modify them. They are not visible to scripts or CSS. You can't link to them. You can't link out from them. I'm not sure what it would really mean to link to or from a range, unless you turned the entire video into a link, in which case you can just wrap the video in an a href= element for the duration of the range, using script. Linking into a cue-range would be using its beginning or end as a seek point, or its duration as a restricted view of the media (only show me cue-range called InTheBathroom). Linking out of a cue-range would be establishing a click-through URL that would be dispatched directly if the user clicked on the media during that range (dispatched without script). We agree that neither of these should be in scope now, but it would be nice to have a framework that could be extended to cover these, in future. Thirdly, a script is somewhat strange place to define the ranges. A set of ranges usually relates closely to some particular piece of media content. The same set of ranges rarely makes much sense in the context of some other content. It seems that ranges should be defined or supplied along with the media content. For in-band data, callbacks for chapter markers as mentioned earlier seem like the best solution. For out-of-band data, if the ranges are just intended to trigger script, I don't think we gain much from providing a way to mark up ranges semi- declaratively as opposed to just having HTML-based media players define their own range markup and have them implement it using this API. It wouldn't be especially hard. This seems to conflict with the answer (1) above, doesn't it? Fourth, this kind of callback API is pretty strange creature in the HTML specification. The only other callback APIs are things like setTimeout() and the new SQL API which don't have associated elements. Events are the callback mechanism for everything else. Events use callbacks themselves, so it's not that unusual. I don't really think events would be a good interface for this. Consistency is good, but if one can come up with a better API, it's better to use that than just be consistent for the sake of it. It does seem strange that events are right in the spatial domain (mouse enter/exit), but not in the temporal domain. Yet the basic semantic of the english word event, let alone the web meaning, is pretty well exactly matched by what is happening here -- crossing a temporal boundary! Events are well-known and design uniformity suggests
Re: [whatwg] re-thinking cue ranges
At 14:20 +1000 23/05/08, Silvia Pfeiffer wrote: Hi Dave, If the W3C standardises time ranges through a URI approach, would there still be a need to have a specification in the DOM or the HTML code? I think the two have different purposes and use-cases, don't they? I am talking about this planned activity http://www.w3.org/2008/01/media-fragments-wg.html and a scheme akin to the one mentioned here http://www.w3.org/2001/tag/doc/hash-in-url#id2261226 or specified here http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.html or here http://www.chiariglione.org/mpeg/technologies/mp21-fid/index.htm.htm. The idea is that if you specify the fragment of the media in the URL (e.g. in the src attribute of the video tag), there is no need to handle it anywhere in the HTML code itself. I am wondering about the use case for the timerange tag that you are suggesting - could you explain? As I see it (and others may have other ideas), * with fragments in the URL, I can identify sub-sets of a resource that I would like to present or select; * with timeranges in the markup, I can get notified when 'interesting parts' of the resource are being presented. An example use-case for timeranges is if you want to flip an HTML explanatory frame alongside a video. As the subject of the video changes, or the scene, you want to put different explanatory material in the adjacent frame. Timeranges make that easy; if you have N pages of explanation, which each apply to a sub-section of the video, make N timeranges and have the enter event of each flip in the appropriate explanation. Note that this works even with seeking, the way it's defined. There are, of course, other use cases. Does this help? Best Regards, Silvia. On Fri, May 23, 2008 at 4:53 AM, Dave Singer [EMAIL PROTECTED] wrote: WARNING: this email is sent to both the WhatWG and W3C Public HTML list, as it is a proposal. Please be careful about where you reply/follow-up to. The editors may have a preference (and if they do, I hope they express it). The following discussion is also in the attached proposal, but reproduced here for convenience. * * * * * * In the current HTML5 draft cue ranges are available using a DOM API. This way of doing ranges is less than ideal. First of all, it is hard to use. The ranges must be added by script, can't be supplied with the media, and the callbacks are awkward to handle. The only way to identify the range a received callback applies to is by creating not one but two separate functions for each range: one for enter, one for exit. While creating functions on-demand is easy in JavaScript it does fall under advanced techniques that most authors will be unfamiliar with. This kind of feature is also not available in all languages that might provide access to the DOM API. Secondly this mechanism is not very powerful. You can't do anything else with the ranges besides receiving callbacks and removing them. You can't modify them. They are not visible to scripts or CSS. You can't link to them. You can't link out from them. Thirdly, a script is somewhat strange place to define the ranges. A set of ranges usually relates closely to some particular piece of media content. The same set of ranges rarely makes much sense in the context of some other content. It seems that ranges should be defined or supplied along with the media content. Fourth, this kind of callback API is pretty strange creature in the HTML specification. The only other callback APIs are things like setTimeout() and the new SQL API which don't have associated elements. Events are the callback mechanism for everything else. In SMIL the equivalent concept is the area element which is used like this: video src=http://www.example.org/CoolStuff; area id=area1 begin=0s end=5s/ area id=area2 begin=5s end=10s/ /video This kind of approach has several advantages. * Ranges are defined as part of the document, in the context of a particular media stream. * This uses events, a more flexible and more appropriate callback mechanism. * The callbacks have a JavaScript object associated with them, namely a DOM element, which carries information about the range. The main disadvantage is the relative difficulty of creating ranges from JavaScript since it requires creating elements and giving them attributes. Some sort of shortcut interface could be provided, of course, perhaps similar to the existing API. The SMIL definition is perhaps a little broad and also the name is not ideal, if the element is primarily used for generating events vs. linking. We would like to suggest a timerange element that can be used as a child of the video and audio elements. Note that there is an existing concept called timeranges in the HTML5 specification; a new name needs to be found for one or the other. The event listeners should probably be added to HTMLElement where other listener
Re: [whatwg] re-thinking cue ranges
Hi Dave, If the W3C standardises time ranges through a URI approach, would there still be a need to have a specification in the DOM or the HTML code? I am talking about this planned activity http://www.w3.org/2008/01/media-fragments-wg.html and a scheme akin to the one mentioned here http://www.w3.org/2001/tag/doc/hash-in-url#id2261226 or specified here http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.html or here http://www.chiariglione.org/mpeg/technologies/mp21-fid/index.htm.htm. The idea is that if you specify the fragment of the media in the URL (e.g. in the src attribute of the video tag), there is no need to handle it anywhere in the HTML code itself. I am wondering about the use case for the timerange tag that you are suggesting - could you explain? Best Regards, Silvia. On Fri, May 23, 2008 at 4:53 AM, Dave Singer [EMAIL PROTECTED] wrote: WARNING: this email is sent to both the WhatWG and W3C Public HTML list, as it is a proposal. Please be careful about where you reply/follow-up to. The editors may have a preference (and if they do, I hope they express it). The following discussion is also in the attached proposal, but reproduced here for convenience. * * * * * * In the current HTML5 draft cue ranges are available using a DOM API. This way of doing ranges is less than ideal. First of all, it is hard to use. The ranges must be added by script, can't be supplied with the media, and the callbacks are awkward to handle. The only way to identify the range a received callback applies to is by creating not one but two separate functions for each range: one for enter, one for exit. While creating functions on-demand is easy in JavaScript it does fall under advanced techniques that most authors will be unfamiliar with. This kind of feature is also not available in all languages that might provide access to the DOM API. Secondly this mechanism is not very powerful. You can't do anything else with the ranges besides receiving callbacks and removing them. You can't modify them. They are not visible to scripts or CSS. You can't link to them. You can't link out from them. Thirdly, a script is somewhat strange place to define the ranges. A set of ranges usually relates closely to some particular piece of media content. The same set of ranges rarely makes much sense in the context of some other content. It seems that ranges should be defined or supplied along with the media content. Fourth, this kind of callback API is pretty strange creature in the HTML specification. The only other callback APIs are things like setTimeout() and the new SQL API which don't have associated elements. Events are the callback mechanism for everything else. In SMIL the equivalent concept is the area element which is used like this: video src=http://www.example.org/CoolStuff; area id=area1 begin=0s end=5s/ area id=area2 begin=5s end=10s/ /video This kind of approach has several advantages. * Ranges are defined as part of the document, in the context of a particular media stream. * This uses events, a more flexible and more appropriate callback mechanism. * The callbacks have a JavaScript object associated with them, namely a DOM element, which carries information about the range. The main disadvantage is the relative difficulty of creating ranges from JavaScript since it requires creating elements and giving them attributes. Some sort of shortcut interface could be provided, of course, perhaps similar to the existing API. The SMIL definition is perhaps a little broad and also the name is not ideal, if the element is primarily used for generating events vs. linking. We would like to suggest a timerange element that can be used as a child of the video and audio elements. Note that there is an existing concept called timeranges in the HTML5 specification; a new name needs to be found for one or the other. The event listeners should probably be added to HTMLElement where other listener attributes are. (You should be able to capture events everywhere, not just on target.) -- David Singer Apple/QuickTime