Re: [whatwg] Thoughts on video accessibility
On Thu, 16 Jul 2009 07:58:30 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Hi Ian, Great to see the new efforts to move the subtitle/caption/karaoke issues forward! I actually have a contract with Mozilla starting this month to help solve this, so I am more than grateful that you have proposed some ideas in this space. On Thu, Jul 16, 2009 at 9:38 AM, Ian Hicksoni...@hixie.ch wrote: On Sat, 27 Dec 2008, Silvia Pfeiffer wrote: 1. Timed text in the resource itself (or linked from the resource itself), rendered as part of the video automatically by the user agent. For case 1, the practical implications are that browser vendors will have to develop support for a large variety of text codecs, each one providing different functionalities. I would hope that as with a video codec, we can standardise on a single subtitle format, ideally some simple media-independent combination of SRT and LRC [1]. It's difficult to solve this problem without a standard codec, though. I have myself thought about creating a new format to address the needs for time-aligned text in audio/video. However, the problem with creating a new format is that you start from scratch and already spreaded formats are not supported. I can see that your proposed format is trying to be backwards compatible with SRT, so at least it would work for the large number of existing srt file collections. I am still skeptical, in particular because there are no authoring systems for this format around. But I would be curious what others think about your proposed SRT-LRC-mix. There are already more formats than you could possibly want on the scale between SRT (dumb text) and complex XML formats like DFXP or USF (used in Matroska). In my layman opinion both extremes make sense, but anything in between I'm rather skeptical to. In fact, the easiest solution would be if that particular format was really only HTML. IMHO that would be absurd. HTML means scripting, embedded videos, an unbelivably complex rendering system, complex parsing, etc; plus, what's more, it doesn't even support timing yet, so we'd have to add all the timing and karaoke features on top of it. Requiring that video players embed a timed HTML renderer just to render subtitles is like saying that we should ship Microsoft Word with every DVD player, to handle the user input when the user wants to type in a new chapter number to jump to. I agree, it cannot be a format that contains all the complexity of HTML. It would only support a subpart of HTML that is relevant, plus the addition of timing - and in this case is indeed a new format. I have therefore changed my mind since I sent that email in Dec 08 and am hoping we can do it with existing formats. I think that eventually we will want timing/synchronization in HTML for synchronizing multiple video or audio tracks. As far as I can tell no browser wants to implement the addCueRange API (removing this should be the topic of a separate mail), so we really need to re-think this part and I think that timed text plays an important part here. In particular, I have taken an in-depth look at the latest specification from the Timed Text working group that have put years of experiments and decades of experience into developing DFXP. You can see my review of DFXP here: http://blog.gingertech.net/2009/06/28/a-review-of-the-w3c-timed-text-authoring-format/ . I think it is both too flexible in a lot of ways, but also too restrictive in others. However, it is a well formulated format that is also getting market traction. In addition, it is possible to formulate profiles to add missing functionality. If we want a quick and dirty hack, srt itself is probably the best solution. If we want a well thought-out solution, DFXP is probably a better idea. I am currently experimenting with these and will be able to share something soon for further discussion. 3. Timed text stored in a separate file, which is then parsed by the user agent and rendered as part of the video automatically by the browser. Maybe we should consider solving this differently. Either we could encapsulate into the video container upon download. Or we could create a zip-file or tarball upon download. I'd just find it a big mistake to ignore the majority use case in the standard, which is why I proposed the text elements inside the video tag. If browser vendors are willing to merge subtitles and video files when saving them, that would be great. Is this easy to do? My suggestion was really about doing this server-side, which we have already implemented years ago in the Annodex project for Ogg Theora/Vorbis. However, it is also possible to do this in the browser: in the case of Ogg, the browser just needs to have a multiplexing library installed as well as a means to encode the subtitle file (which I like to call a text codec). Since it's text, it's nowhere near as complex as encoding audio or video and just consists of
Re: [whatwg] Thoughts on video accessibility
Thanks for the analysis, but two pieces of feedback: 1) Though sub-titles and captions are the most common accessibility issue for audio/video content, they are not the only one. There are people: -- who cannot see, and need audio description of video -- who cannot hear, and prefer sign language -- who have vision issues and prefer high or low contrast video -- who have audio issues and prefer audio that lacks background music, noise, etc. This is only a partial list. Note that some content is only available with open captions (aka burned-in). Clearly sub-optimal, but better than nothing. 2) I think the environment can and should help select and configure type-1 resources, where it can. It shouldn't need to be always a manual step by the user interacting with the media player. That is, I don't see why we cannot have the markup express this source is better for people who have accessibility need X (probably as a media query). However, media queries are CSS, not HTML... -- David Singer Multimedia Standards, Apple Inc.
Re: [whatwg] Thoughts on video accessibility
On Thu, Jul 16, 2009 at 10:31 PM, David Singersin...@apple.com wrote: Thanks for the analysis, but two pieces of feedback: 1) Though sub-titles and captions are the most common accessibility issue for audio/video content, they are not the only one. There are people: -- who cannot see, and need audio description of video -- who cannot hear, and prefer sign language -- who have vision issues and prefer high or low contrast video -- who have audio issues and prefer audio that lacks background music, noise, etc. This is only a partial list. Note that some content is only available with open captions (aka burned-in). Clearly sub-optimal, but better than nothing. Agreed. Plus there is time-aligned textual markup that is not just subtitles, captions, lyrics and karaoke: much is being talked about timed metadata these days, and clickable regions, as well as spatial and temporal notes. The lowest hanging fruit for such time-aligned text are, however, indeed subtitles and captions. 2) I think the environment can and should help select and configure type-1 resources, where it can. It shouldn't need to be always a manual step by the user interacting with the media player. That is, I don't see why we cannot have the markup express this source is better for people who have accessibility need X (probably as a media query). However, media queries are CSS, not HTML... Would you mind providing an example that demonstrates the use of media queries? I cannot currently imagine what that could look like and how it could work. Feels free to use CSS in addition to any require HTML (and javascript?). Since I cannot imagine what that would look like and how it could work, I cannot start to understand it as an alternative. Thanks, Silvia.
Re: [whatwg] Thoughts on video accessibility
On Thu, Jul 16, 2009 at 6:28 PM, Philip Jägenstedtphil...@opera.com wrote: On Thu, 16 Jul 2009 07:58:30 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: 3. Timed text stored in a separate file, which is then parsed by the user agent and rendered as part of the video automatically by the browser. Maybe we should consider solving this differently. Either we could encapsulate into the video container upon download. Or we could create a zip-file or tarball upon download. I'd just find it a big mistake to ignore the majority use case in the standard, which is why I proposed the text elements inside the video tag. If browser vendors are willing to merge subtitles and video files when saving them, that would be great. Is this easy to do? My suggestion was really about doing this server-side, which we have already implemented years ago in the Annodex project for Ogg Theora/Vorbis. However, it is also possible to do this in the browser: in the case of Ogg, the browser just needs to have a multiplexing library installed as well as a means to encode the subtitle file (which I like to call a text codec). Since it's text, it's nowhere near as complex as encoding audio or video and just consists of light-weight packaging code. So, yes, it is totally possible to have the browsers create a binary video file that has the subtitles encapsulated that were previously only accessible as referenced text files behind a separate URL. The only issue I see is the baseline codec issue: every browser that wants to support multiple media formats has to implement this multiplexing and text encoding for every media encapsulation format differently, which is annoying and increases complexity. It's however generally a small amount of complexity compared to the complexity created by having to support multiple codecs. I disagree, remuxing files would be much more of an implementation burden than supporting multiple codecs, at least if a format-agnostic media framework is used (be that internal or external to the browser). Remuxing would require you to support/create parts of the media framework that you otherwise aren't using, i.e. parsers, muxers, file writers and plugging of these together (which unlike decoding isn't automatic in any framework I've seen). The point that I was trying to make is that if one had to only implement it for one encapsulation format, it would be simple and a small piece of dedicated code. However, if one has to be format-agnostic, it indeed requires supporting parts of a media framework that is not needed for demuxing and decoding. So, yes, I agree with you: in the general case it might create extraneous complexity in a browser. Anything is doable of course, but I think this is really something that is best done server-side using specialized tools. I agree with this. This can be a special service that some servers would offer who want to allow their users to share single video files that contain their timed text within. It would be interesting to hear back from the browser vendors about how easily the subtitles could be kept with the video in a way that survives reuse in other contexts. I think that in the case of external subtitles the browser could simply save it alongside with the video. It is my experience that is media players have much more robust support for external subtitles (like SRT) than for internal subtitles, so this is my preferred option (plus it's easier). Agreed: this would be the fallback for content downloaded from servers that do no offer the special muxing capability. In fact, such a separate handling of composed content through multiple files is nothing new to HTML: all Web pages that I download from the Internet require me to download each component of the Web page separately: the images, the text, the css, the javascript. (Worse even if the text is created e.g. through a database query.) I agree with Philip that the separate handling of subtitle files and media files is not as much of an issue as it may seem. Regards, Silvia.
Re: [whatwg] Thoughts on video accessibility
At 23:28 +1000 16/07/09, Silvia Pfeiffer wrote: 2) I think the environment can and should help select and configure type-1 resources, where it can. It shouldn't need to be always a manual step by the user interacting with the media player. That is, I don't see why we cannot have the markup express this source is better for people who have accessibility need X (probably as a media query). However, media queries are CSS, not HTML... Would you mind providing an example that demonstrates the use of media queries? I cannot currently imagine what that could look like and how it could work. Feels free to use CSS in addition to any require HTML (and javascript?). Since I cannot imagine what that would look like and how it could work, I cannot start to understand it as an alternative. sure. using deliberately vague way of writing the media queries video blah blah ... source src=xx-O.ers media=want-captions / source src=xx-N.ers media=not want-captions / /video xx-O has open (burned in captions), uses the same codecs etc. It gets selected if the user says they want captions, otherwise XX-N (no captions) is selected. video blah blah ... source src=xx-S.ers media=want-sign-language / source src=xx.ers / /video xx-S has a sign-language overlay capability. It gets selected for those users expressing a positive preference for sign language; otherwise we don't waste the bandwidth loading that, and we load the plain xx file. It may be that the media part of the UA also detects this user preference and automatically enables sign language in xx-S. Basically, I think we should have a framework which attempts to select and configure the appropriate source, so we get it right most of the time by default. This (accessibility) is a subject that covers multiple groups, of course... -- David Singer Multimedia Standards, Apple Inc.
Re: [whatwg] Thoughts on video accessibility
On Thu, Jul 16, 2009 at 11:56 PM, David Singersin...@apple.com wrote: At 23:28 +1000 16/07/09, Silvia Pfeiffer wrote: 2) I think the environment can and should help select and configure type-1 resources, where it can. It shouldn't need to be always a manual step by the user interacting with the media player. That is, I don't see why we cannot have the markup express this source is better for people who have accessibility need X (probably as a media query). However, media queries are CSS, not HTML... Would you mind providing an example that demonstrates the use of media queries? I cannot currently imagine what that could look like and how it could work. Feels free to use CSS in addition to any require HTML (and javascript?). Since I cannot imagine what that would look like and how it could work, I cannot start to understand it as an alternative. sure. using deliberately vague way of writing the media queries video blah blah ... source src=xx-O.ers media=want-captions / source src=xx-N.ers media=not want-captions / /video xx-O has open (burned in captions), uses the same codecs etc. It gets selected if the user says they want captions, otherwise XX-N (no captions) is selected. video blah blah ... source src=xx-S.ers media=want-sign-language / source src=xx.ers / /video xx-S has a sign-language overlay capability. It gets selected for those users expressing a positive preference for sign language; otherwise we don't waste the bandwidth loading that, and we load the plain xx file. It may be that the media part of the UA also detects this user preference and automatically enables sign language in xx-S. I just noticed that the media attribute is already part of the source element definition in HTML5. I wonder which browsers have implemented this attribute. After having looked at http://www.w3.org/TR/css3-mediaqueries/, my understanding is that media queries specify the different presentation media that the html page's different stylesheets were built for and thus allow choosing between these stylesheets through the link element and its media attribute where the query goes. Also, IIUC, the list of presentation media is currently restricted to ‘print’, ‘screen’ , ‘aural’, ‘braille’, ‘handheld’, ‘print’, ‘projection’, ‘screen’, ‘tty’, ‘tv’, and 'all' and the queries cover only the features width, height, device-width, device-height, orientation, aspect-ratio, device-aspect-ratio, color, color-index, monochrome, resolution, scan, and grid. This is different for the source elements though: instead of specifying different presentation media and choosing between stylesheets, the media attribute specifies different user requirements and chooses between video source files. This makes it independent from CSS, IIUC. Is the intention to extend the specification of media queries to include generic means of selecting between alternative files to load into a HTML page? Is there a W3C activity that actually extends the media queries to audio and video files? If this is the case, it could also be used for the associated text elements that Ian and I discussed earlier in this thread. The alternatives there would be based on a combination of languages and the different categories of time-aligned text. The language would choose between different text files to load, and the text category would choose between different default styles to apply. I can imagine that that would work, but has anyone started extending existing media query specifications for this yet? Regards, Silvia.
Re: [whatwg] Thoughts on video accessibility
On Sat, 27 Dec 2008, Calogero Alex Baldacchino wrote: A flying thought: why not thinking also to a further option for embedding everything in a sort of all-in-one html page generated on the fly when downloading, making of it a global container for video and text to be consumed by UAs (while maintaining the opportunity to download a video as a separate file, of course)? For instance, the video itself might become the base64-encoded (or otherwise acceptably encoded) value of a data-* attribute (or a more specific attribute) to be decoded by a script (as well generated on the fly) and served to the video engine as a javascript: url in place of the video src (or, perhaps better, the UA might do that itself by supporting the data: protocol as a valid source for the video, or a fragid pointing to an element following the /video tag, perhaps a paintext or something else, and containing the encoded video); while text elements might wrap the corresponding timed text file, to be embedded into the page as bare text, similarly to a script code -- if a certain format contained text tag, those might be changed into lt;textgt; or similarly (or perhaps the file content might be encoded as well) to avoid conflicts with html tags. Of course, it's a first-glance idea, and needs further considerations on its reliability (e.g. such an html page perhaps shouldn't be the source set for a video in another page, and an option should be provided to extract embedded contet; seeking might require a sequential decoding to reach a desired point, and so on). This idea seems out of scope for HTML5; it can already be done using features like multipart/related or data: URLs. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Thoughts on video accessibility
On Sat, 27 Dec 2008, Silvia Pfeiffer wrote: 6. Timed text stored in a separate file, which is then fetched and parsed by the Web page, and which is then rendered by the Web page. For case 6, while it works for deaf people, we actually create an accessibility nightmare for blind people and their web developers. There is no standard means for a screen reader to identify that a particular part in the DOM is actually text related to the video and supposed to be displayed with the video (through a screenreader or a braille reader). As far as I can tell, that's exactly what ARIA is for. Such functionality would need to be implemented through javascript by every single site that wanted to provide audio annotations. Right. It's also a nightmare for search engines, since there is no clear way of identifying a specific text as video-related and use it as such to extend knowledge about the video. Embedding subtitles inside the video file is certainly the best option overall, for both accessibility and for automated analysis, yes. 1. Timed text in the resource itself (or linked from the resource itself), rendered as part of the video automatically by the user agent. For case 1, the practical implications are that browser vendors will have to develop support for a large variety of text codecs, each one providing different functionalities. I would hope that as with a video codec, we can standardise on a single subtitle format, ideally some simple media-independent combination of SRT and LRC [1]. It's difficult to solve this problem without a standard codec, though. In fact, the easiest solution would be if that particular format was really only HTML. IMHO that would be absurd. HTML means scripting, embedded videos, an unbelivably complex rendering system, complex parsing, etc; plus, what's more, it doesn't even support timing yet, so we'd have to add all the timing and karaoke features on top of it. Requiring that video players embed a timed HTML renderer just to render subtitles is like saying that we should ship Microsoft Word with every DVD player, to handle the user input when the user wants to type in a new chapter number to jump to. But strategically can we keep our options open towards using such a format in HTML5? As far as I can tell, HTML5 doesn't preclude any particular direction for subtitling. And now to option 3: 3. Timed text stored in a separate file, which is then parsed by the user agent and rendered as part of the video automatically by the browser. This would make authoring subtitles somewhat easier, but would typically lose the benefits of subtitles surviving when the video file is extracted. It would also involve a distinct increase in implementation and language complexity. We would also have to pick a timed text format, or add yet another format war to the video/audio codec debacle, which I think would be a really big mistake right now. Given the immature state of timed text formats (it seems there are new formats announced every month), it's probably premature to pick one -- we should let the market pick one first. I think excluding option 3 from our list of ways of supporting time-aligned text is a big mistake. We're not excluding it, we're just delaying its standardisation. The majority of subtitles currently available on the Web come from separate files, in particular in srt or sub format. They are simple formats, easily authored in a text editor, and can be related to any container format. It is easy to implement support for them in authoring applications and in player applications. Encapsulating them into a video file and extracting them from a video file again for decoding seems an unnecessary nuisance. This is why I think dealing with separate caption files will continue to be the main way we deal with captions into the future and why we should consider supporting this natively in Web browsers rather than leaving it to every web developer to sort this out himself. I agree that if we can't get people to embed subtitles straight into their video streams, that providing a standard way to associate a video file with a subtitle stream is the way to go on the long term. The only real issue that we have with separate files is that the captions may get lost when people download the video, store it locally, and share it with friends. This is a pretty big problem, IMHO. Maybe we should consider solving this differently. Either we could encapsulate into the video container upon download. Or we could create a zip-file or tarball upon download. I'd just find it a big mistake to ignore the majority use case in the standard, which is why I proposed the text elements inside the video tag. If browser vendors are willing to merge subtitles and video files when saving them, that would be great. Is this easy to do? Here is my example again:
Re: [whatwg] Thoughts on video accessibility
Hi Ian, Great to see the new efforts to move the subtitle/caption/karaoke issues forward! I actually have a contract with Mozilla starting this month to help solve this, so I am more than grateful that you have proposed some ideas in this space. On Thu, Jul 16, 2009 at 9:38 AM, Ian Hicksoni...@hixie.ch wrote: On Sat, 27 Dec 2008, Silvia Pfeiffer wrote: 1. Timed text in the resource itself (or linked from the resource itself), rendered as part of the video automatically by the user agent. For case 1, the practical implications are that browser vendors will have to develop support for a large variety of text codecs, each one providing different functionalities. I would hope that as with a video codec, we can standardise on a single subtitle format, ideally some simple media-independent combination of SRT and LRC [1]. It's difficult to solve this problem without a standard codec, though. I have myself thought about creating a new format to address the needs for time-aligned text in audio/video. However, the problem with creating a new format is that you start from scratch and already spreaded formats are not supported. I can see that your proposed format is trying to be backwards compatible with SRT, so at least it would work for the large number of existing srt file collections. I am still skeptical, in particular because there are no authoring systems for this format around. But I would be curious what others think about your proposed SRT-LRC-mix. In fact, the easiest solution would be if that particular format was really only HTML. IMHO that would be absurd. HTML means scripting, embedded videos, an unbelivably complex rendering system, complex parsing, etc; plus, what's more, it doesn't even support timing yet, so we'd have to add all the timing and karaoke features on top of it. Requiring that video players embed a timed HTML renderer just to render subtitles is like saying that we should ship Microsoft Word with every DVD player, to handle the user input when the user wants to type in a new chapter number to jump to. I agree, it cannot be a format that contains all the complexity of HTML. It would only support a subpart of HTML that is relevant, plus the addition of timing - and in this case is indeed a new format. I have therefore changed my mind since I sent that email in Dec 08 and am hoping we can do it with existing formats. In particular, I have taken an in-depth look at the latest specification from the Timed Text working group that have put years of experiments and decades of experience into developing DFXP. You can see my review of DFXP here: http://blog.gingertech.net/2009/06/28/a-review-of-the-w3c-timed-text-authoring-format/ . I think it is both too flexible in a lot of ways, but also too restrictive in others. However, it is a well formulated format that is also getting market traction. In addition, it is possible to formulate profiles to add missing functionality. If we want a quick and dirty hack, srt itself is probably the best solution. If we want a well thought-out solution, DFXP is probably a better idea. I am currently experimenting with these and will be able to share something soon for further discussion. 3. Timed text stored in a separate file, which is then parsed by the user agent and rendered as part of the video automatically by the browser. Maybe we should consider solving this differently. Either we could encapsulate into the video container upon download. Or we could create a zip-file or tarball upon download. I'd just find it a big mistake to ignore the majority use case in the standard, which is why I proposed the text elements inside the video tag. If browser vendors are willing to merge subtitles and video files when saving them, that would be great. Is this easy to do? My suggestion was really about doing this server-side, which we have already implemented years ago in the Annodex project for Ogg Theora/Vorbis. However, it is also possible to do this in the browser: in the case of Ogg, the browser just needs to have a multiplexing library installed as well as a means to encode the subtitle file (which I like to call a text codec). Since it's text, it's nowhere near as complex as encoding audio or video and just consists of light-weight packaging code. So, yes, it is totally possible to have the browsers create a binary video file that has the subtitles encapsulated that were previously only accessible as referenced text files behind a separate URL. The only issue I see is the baseline codec issue: every browser that wants to support multiple media formats has to implement this multiplexing and text encoding for every media encapsulation format differently, which is annoying and increases complexity. It's however generally a small amount of complexity compared to the complexity created by having to support multiple codecs. Here is my example again: video src=http://example.com/video.ogv; controls text
Re: [whatwg] Thoughts on video accessibility
I have carefully read all the feedback in this thread concerning associating text with video, for various purposes such as captions, annotations, etc. Taking a step back as far as I can tell there are two axes: where the timed text comes from, and how it is rendered. Where it comes from, it seems, boils down to three options: - embedded in or referenced from the media resource itself - as a separate file parsed by the user agent - as a separate file parsed by the web page Where the timed text is rendered boils down to two options: - rendered automatically by the user agent - rendered by the web page overlaying content on the video For the purposes of this discussion I am ignoring burned-in captions, since they're basically equivalent to a different video, much like videos with overlayed sign language interpreters (or VH1 pop-up's annotations!). These 5 options give us 6 cases: 1. Timed text in the resource itself (or linked from the resource itself), rendered as part of the video automatically by the user agent. This is the optimal situation from an accessibility and usability point of view, because it works when the video is shown full-screen, it works when the video is saved separate from the Web page, it works easily when other pages link to the same video file, it requires minimal work from the page author, and so forth. This is what I think we should be encouraging. It would probably make sense to expose the timed text track selection to the Web page through the API, maybe even expose the text itself somehow, but these are features that can and should probably wait until video has been more reliably implemented. 2. Timed text in the resource itself (or linked from the resource itself), exposed to the Web page with no native rendering. This allows pages to implement experimental subtitling mechanisms while still allowing the timed text tracks to survive re-use of the video file, but it seems to introduce a high cost (all pages have to implement subtitling themselves) with very little gain, and with several disadvantages -- different sites will have inconsistent subtitling, bugs will be prevalent in the subtitling and accessibility will thus suffer, and in all likelihood even videos that have subtitles will end up not having them shown as small sites sites don't bother to implement anything but the most basic controls. 3. Timed text stored in a separate file, which is then parsed by the user agent and rendered as part of the video automatically by the browser. This would make authoring subtitles somewhat easier, but would typically lose the benefits of subtitles surviving when the video file is extracted. It would also involve a distinct increase in implementation and language complexity. We would also have to pick a timed text format, or add yet another format war to the video/audio codec debacle, which I think would be a really big mistake right now. Given the immature state of timed text formats (it seems there are new formats announced every month), it's probably premature to pick one -- we should let the market pick one first. 4. Timed text stored in a separate file, which is then parsed by the user agent and exposed to the Web page with no native rendering. This combines the disadvantages of the previous two options, without really introducing any groundbreaking advantages. 5. Timed text stored in a separate file, which is then fetched and parsed by the Web page, which then passes it to the browser for rendering. This is just an excessive level of complexity for a feature that could just be supported exclusively by the user agent. In particular, it doesn't actually provide for much space for experimentation -- whatever API we provide to expose the subtitles would limit what the rendering would be like regardless of what the pages want to try. This option side-steps the issue of picking a format, though. 6. Timed text stored in a separate file, which is then fetched and parsed by the Web page, and which is then rendered by the Web page. We can't stop this from being available, and there's not much we can do to help with this case beyond what we do now. The disadvantages are that it doesn't work when the video is shown full-screen, when the video is saved separate from the Web page, when other pages link to the same video file without using their own implementation of the feature, and it requires substantial implementation work from the page. The _advantages_, and they are significant, are that pages can easily create subtitles separate from the video, they can easily provide features such as automated translations, and they can easily implement features that would otherwise seem overly ambitious, e.g. hyperlinked annotations with ad tracking. Based on this analysis it seems to me that cases 1 and 6 are important to support, but that cases 2 to 5 aren't as compelling -- they either have disadvantages
Re: [whatwg] Thoughts on video accessibility
Hi Ian, Thanks for taking the time to go through all the options, analyse and understand them - especially on your birthday! :-) Much appreciated! I agree with your analysis and the 6 options you have identified. However, I disagree slightly with the conclusions you have come to - mostly from a strategic viewpoint rather than from where we currently stand. Your proposal is to support cases 1 and 6 and not to worry about the others at this stage. This is a fair enough statement for the current state of play. Support for case 1 comes from the fact that there are indeed a number of video container formats that have text codecs (e.g. QTtext for Quicktime, TimedText for MPEG, CMML and Kate for Ogg). Support for case 6 comes from the fact that it is already possible, it is flexible, and it is therefore an easy way out of the need of providing video accessibility support into Web pages. This is in fact how this example http://v2v.cc/~j/jquery.srt/ is implemented. As I said - for the current state of play, you have come to the right conclusions. Theoretically. But we should look at the practical implications. For case 6, while it works for deaf people, we actually create an accessibility nightmare for blind people and their web developers. There is no standard means for a screen reader to identify that a particular part in the DOM is actually text related to the video and supposed to be displayed with the video (through a screenreader or a braille reader). Such functionality would need to be implemented through javascript by every single site that wanted to provide audio annotations. It's also a nightmare for search engines, since there is no clear way of identifying a specific text as video-related and use it as such to extend knowledge about the video. As much as case 6 is the easy way out, I would like us to discourage such solutions right before they start by providing a viable alternative: a standard way of relating time-aligned text with video (or audio). And that unfortunately means attacking case 3 (let me address case 3 and your objections below). For case 1, the practical implications are that browser vendors will have to develop support for a large variety of text codecs, each one providing different functionalities. It would indeed be nice if we had one standard format that everybody used, but alas that is not the case. What will browser vendors do in this situation? Probably just simply nothing - maybe use the underlying media frameworks that are being used to decode the video formats to also decode the text formats and render them on top of the video - thus taking them completely out of reach of the Web page. This again means that screenreaders cannot get to them, search engines will need to find a different way of extracting them form the video rather than the web page and generally a worse accessibility experience. Now, is it realistic to expect a standard format to emerge? I think this is actually a chicken and egg problem. We currently have poor solutions (e.g. srt as extra files, or the above mentioned text codecs inside specific containers). Lacking an alternative, people will continue to use these to author captions - and use their own hacked-up formats to provide other formats such as video annotations in speech bubbles at certain time points and coordinates etc. If there was however a compelling case to use a different standard format, people would go for it, IMHO. If e.g. all browser vendors had agreed to support one particular format. In fact, the easiest solution would be if that particular format was really only HTML. Then, browser vendors would find it trivial to implement, which in turn would encourage Web developers to choose this format. Which in turn would encourage video container formats to adopt it also inside itself. And then we have created a uniform means of dealing with time-aligned text coming from any of the three locations listed by you and going to the Web page. As we haven't got any experience with this proposal yet, we can obviously not support it. But strategically can we keep our options open towards using such a format in HTML5? And now to option 3: 3. Timed text stored in a separate file, which is then parsed by the user agent and rendered as part of the video automatically by the browser. This would make authoring subtitles somewhat easier, but would typically lose the benefits of subtitles surviving when the video file is extracted. It would also involve a distinct increase in implementation and language complexity. We would also have to pick a timed text format, or add yet another format war to the video/audio codec debacle, which I think would be a really big mistake right now. Given the immature state of timed text formats (it seems there are new formats announced every month), it's probably premature to pick one -- we should let the market pick one first. I think excluding option 3 from our list of ways of supporting
Re: [whatwg] Thoughts on video accessibility
Silvia Pfeiffer ha scritto: Hi Ian, Thanks for taking the time to go through all the options, analyse and understand them - especially on your birthday! :-) Much appreciated! Than, happy birthday to Ian! [...] The only real issue that we have with separate files is that the captions may get lost when people download the video, store it locally, and share it with friends. Maybe we should consider solving this differently. Either we could encapsulate into the video container upon download. Or we could create a zip-file or tarball upon download. I'd just find it a big mistake to ignore the majority use case in the standard, which is why I proposed the text elements inside the video tag. [...] A flying thought: why not thinking also to a further option for embedding everything in a sort of all-in-one html page generated on the fly when downloading, making of it a global container for video and text to be consumed by UAs (while maintaining the opportunity to download a video as a separate file, of course)? For instance, the video itself might become the base64-encoded (or otherwise acceptably encoded) value of a data-* attribute (or a more specific attribute) to be decoded by a script (as well generated on the fly) and served to the video engine as a javascript: url in place of the video src (or, perhaps better, the UA might do that itself by supporting the data: protocol as a valid source for the video, or a fragid pointing to an element following the /video tag, perhaps a paintext or something else, and containing the encoded video); while text elements might wrap the corresponding timed text file, to be embedded into the page as bare text, similarly to a script code -- if a certain format contained text tag, those might be changed into lt;textgt; or similarly (or perhaps the file content might be encoded as well) to avoid conflicts with html tags. Of course, it's a first-glance idea, and needs further considerations on its reliability (e.g. such an html page perhaps shouldn't be the source set for a video in another page, and an option should be provided to extract embedded contet; seeking might require a sequential decoding to reach a desired point, and so on). Regards, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Partecipa al concorso Crea il tuo Webshire su Leiweb: vincere è un gioco da ragazze! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8518d=27-12
Re: [whatwg] Thoughts on video accessibility
Another implementation comes from the W3C TimedText working group: They have a test suite for DFXP files at http://www.w3.org/2008/12/dfxp-testsuite/web-framework/START.html . Philippe just announced that he added HTML5 video tag support using the javascript file that Jan had written for srt support and adapting it to work like this: video src=example.ogv id=video controls text lang='en' type=application/ttaf+xml src=testsuite/Content/Br001.xml/text /video You'll need to use Firefox 3.1 to test it. If you select the HTML5 DFXP player prototype, you can click on the tests on the left and it will load the DFXP content. The adapted javascript is at http://www.w3.org/2008/12/dfxp-testsuite/web-framework/HTML5_player.js . It works by mapping DFXP to HTML and DFXP styling attributes to CSS. This is exactly what was also discussed yesterday on irc - if we can find a simple way to map time-aligned text formats to HTML, it will be easy to deal with in HTML5. Regards, Silvia. On Thu, Dec 11, 2008 at 9:57 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: And now we have a first demo of the proposed syntax in action. Michael Dale implemented SRT support like this: video src=sample_fish.ogg poster=sample_fish.jpg duration=26 text category=SUB lang=en type=text/x-srt default=true title=english SRT subtitles src=sample_fish_text_en.srt /text text category=SUB lang=es type=text/x-srt title=spanish SRT subtitles src=sample_fish_text_es.srt /text /video Michael writes: the demo: (tested on IE, Firefox, Safari ... with varying degrees of success ;) http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php (bottom example) If Firefox exposes timed text tracks in ogg media the script could query them and display them alongside any available markup text tracks (but of course other browsers like IE wont easily expose those muxed text tracks so its likely the least common denominator of text based markup / pointers will be dominate for some time) You will need to click on the CC button on the player and click on select transcripts to see the different subtitles in English and Spanish. Regards, Silvia. On Wed, Dec 10, 2008 at 3:49 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: I heard some complaints about there not being any implementation of the suggestions I made. So here goes: 1. out-of-band There is an example of using srt with ogg in a out-of-band approach here: http://v2v.cc/~j/jquery.srt/ You will need Firefox3.1 to play it. The syntax of what Jan implemented is different to what I proposed, but I wanted to take it forward and make it more generic. 2. in-band There is also a draft implementation of srt inside Ogg through the OggText specification, but I it's not released yet. It is also not as relevant to this group as the out-of-band example. Cheers, Silvia. On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan rob...@ocallahan.org wrote: On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins m...@degeneration.co.uk wrote: Silvia Pfeiffer wrote: I'm interested to hear people's opinions on these ideas. I agree with Ralph and think having a simple, explicit mechanism at the html level is worthwhile - and very open and explicit to a web author. Having a redirection through a ROE-type file on the server is more opaque, but maybe more consistent with existing similar approaches as taken by RealNetworks in rm files and WindowsMedia files in asx files. This (having a separate document that references other streams) is what I was thinking of. I guess which is more natural depends on who is doing the assembling. If it is the HTML author that takes the individual pieces and links them together then doing it in the HTML is probably easiest. For what it's worth, loading an intermediate document of some new type which references other streams to be loaded adds a lot of complexity to the browser implementation. It creates new states that the decoder can be in, and introduces new failure modes. It creates new timing issues and possibly new security issues. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Thoughts on video accessibility
And now we have a first demo of the proposed syntax in action. Michael Dale implemented SRT support like this: video src=sample_fish.ogg poster=sample_fish.jpg duration=26 text category=SUB lang=en type=text/x-srt default=true title=english SRT subtitles src=sample_fish_text_en.srt /text text category=SUB lang=es type=text/x-srt title=spanish SRT subtitles src=sample_fish_text_es.srt /text /video Michael writes: the demo: (tested on IE, Firefox, Safari ... with varying degrees of success ;) http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php (bottom example) If Firefox exposes timed text tracks in ogg media the script could query them and display them alongside any available markup text tracks (but of course other browsers like IE wont easily expose those muxed text tracks so its likely the least common denominator of text based markup / pointers will be dominate for some time) You will need to click on the CC button on the player and click on select transcripts to see the different subtitles in English and Spanish. Regards, Silvia. On Wed, Dec 10, 2008 at 3:49 AM, Silvia Pfeiffer [EMAIL PROTECTED] wrote: I heard some complaints about there not being any implementation of the suggestions I made. So here goes: 1. out-of-band There is an example of using srt with ogg in a out-of-band approach here: http://v2v.cc/~j/jquery.srt/ You will need Firefox3.1 to play it. The syntax of what Jan implemented is different to what I proposed, but I wanted to take it forward and make it more generic. 2. in-band There is also a draft implementation of srt inside Ogg through the OggText specification, but I it's not released yet. It is also not as relevant to this group as the out-of-band example. Cheers, Silvia. On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan [EMAIL PROTECTED] wrote: On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins [EMAIL PROTECTED] wrote: Silvia Pfeiffer wrote: I'm interested to hear people's opinions on these ideas. I agree with Ralph and think having a simple, explicit mechanism at the html level is worthwhile - and very open and explicit to a web author. Having a redirection through a ROE-type file on the server is more opaque, but maybe more consistent with existing similar approaches as taken by RealNetworks in rm files and WindowsMedia files in asx files. This (having a separate document that references other streams) is what I was thinking of. I guess which is more natural depends on who is doing the assembling. If it is the HTML author that takes the individual pieces and links them together then doing it in the HTML is probably easiest. For what it's worth, loading an intermediate document of some new type which references other streams to be loaded adds a lot of complexity to the browser implementation. It creates new states that the decoder can be in, and introduces new failure modes. It creates new timing issues and possibly new security issues. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Thoughts on video accessibility
Yea as Silvia outlines in the intro to this thread we will likely continue to see external timed text files winning out over muxed timed text. Its just more flexible ... Javascript embedding libraries which are widely used today for flash video will be even more widely used with the emerging browser support of the video tag. So its not too big a deal... the complexity of transcript formats will be handled by dedicated javascript libraries. If the browser does expose some timed text tracks muxed with the media then the library will integrate that into the interface. That being said having some semantically meaningful source data representation is worthwhile. By supporting these proposed syntax with the embedding libraries we are promoting the idea that the syntax will eventually be adopted and natively handled by the browser. But in practice even if we do get native timed text support a javascript library will likely rewrite it to conform the sites layout and skinning anyway. The take-away point here is if people do mux text tracks that they should be exposed via javascript. Otherwise it will be of very limited value in the context of web media. peace, --michael Silvia Pfeiffer wrote: And now we have a first demo of the proposed syntax in action. Michael Dale implemented SRT support like this: video src=sample_fish.ogg poster=sample_fish.jpg duration=26 text category=SUB lang=en type=text/x-srt default=true title=english SRT subtitles src=sample_fish_text_en.srt /text text category=SUB lang=es type=text/x-srt title=spanish SRT subtitles src=sample_fish_text_es.srt /text /video Michael writes: the demo: (tested on IE, Firefox, Safari ... with varying degrees of success ;) http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php (bottom example) If Firefox exposes timed text tracks in ogg media the script could query them and display them alongside any available markup text tracks (but of course other browsers like IE wont easily expose those muxed text tracks so its likely the least common denominator of text based markup / pointers will be dominate for some time) You will need to click on the CC button on the player and click on select transcripts to see the different subtitles in English and Spanish. Regards, Silvia. On Wed, Dec 10, 2008 at 3:49 AM, Silvia Pfeiffer [EMAIL PROTECTED] wrote: I heard some complaints about there not being any implementation of the suggestions I made. So here goes: 1. out-of-band There is an example of using srt with ogg in a out-of-band approach here: http://v2v.cc/~j/jquery.srt/ You will need Firefox3.1 to play it. The syntax of what Jan implemented is different to what I proposed, but I wanted to take it forward and make it more generic. 2. in-band There is also a draft implementation of srt inside Ogg through the OggText specification, but I it's not released yet. It is also not as relevant to this group as the out-of-band example. Cheers, Silvia. On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan [EMAIL PROTECTED] wrote: On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins [EMAIL PROTECTED] wrote: Silvia Pfeiffer wrote: I'm interested to hear people's opinions on these ideas. I agree with Ralph and think having a simple, explicit mechanism at the html level is worthwhile - and very open and explicit to a web author. Having a redirection through a ROE-type file on the server is more opaque, but maybe more consistent with existing similar approaches as taken by RealNetworks in rm files and WindowsMedia files in asx files. This (having a separate document that references other streams) is what I was thinking of. I guess which is more natural depends on who is doing the assembling. If it is the HTML author that takes the individual pieces and links them together then doing it in the HTML is probably easiest. For what it's worth, loading an intermediate document of some new type which references other streams to be loaded adds a lot of complexity to the browser implementation. It creates new states that the decoder can be in, and introduces new failure modes. It creates new timing issues and possibly new security issues. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Thoughts on video accessibility
On Wed, Dec 10, 2008 at 5:56 PM, Dave Singer [EMAIL PROTECTED] wrote: At 21:33 +1300 9/12/08, Robert O'Callahan wrote: For what it's worth, loading an intermediate document of some new type which references other streams to be loaded adds a lot of complexity to the browser implementation. It creates new states that the decoder can be in, and introduces new failure modes. It creates new timing issues and possibly new security issues. I'm not sure I agree; but if you believe that, we should address it no matter which way this discussion goes. It should absolutely be possible to reference a SMIL file, or an MP4 or MOV file with external data (to give only two examples) from a video or audio element, and have the DOM, events, states, and APis work correctly. I agree it should be done eventually, it's just significantly more complicated than what we have to deal with currently. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Thoughts on video accessibility
At 14:40 +1300 11/12/08, Robert O'Callahan wrote: On Wed, Dec 10, 2008 at 5:56 PM, Dave Singer mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: At 21:33 +1300 9/12/08, Robert O'Callahan wrote: For what it's worth, loading an intermediate document of some new type which references other streams to be loaded adds a lot of complexity to the browser implementation. It creates new states that the decoder can be in, and introduces new failure modes. It creates new timing issues and possibly new security issues. I'm not sure I agree; but if you believe that, we should address it no matter which way this discussion goes. It should absolutely be possible to reference a SMIL file, or an MP4 or MOV file with external data (to give only two examples) from a video or audio element, and have the DOM, events, states, and APis work correctly. I agree it should be done eventually, it's just significantly more complicated than what we have to deal with currently. But if the state machine or other aspects are actually wrong for this case, then we should fix it now. We have, for example, tried to keep out these kinds of assumptions: a) all media is downloaded (no, it might be streamed or even arriving over non-IP, e.g. a TV broadcast) b) all delivery methods are self-contained (no, they might reference resources as well as contain them) c) all delivery is sequential in play order (no, some file formats decouple data timing and data ordering) Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6] -- David Singer Multimedia Standards, Apple Inc.
Re: [whatwg] Thoughts on video accessibility
On Mon, Dec 8, 2008 at 9:20 PM, Martin Atkins [EMAIL PROTECTED] wrote: My concern is that if the only thing linking the various streams together is the HTML document then the streams are less useful outside of a web browser context. Absolutely. This proposal places an additional burden on the user to download and integrate multiple resources. This trade-off to supports applications where having the text available separately is valuable. -r
Re: [whatwg] Thoughts on video accessibility
I heard some complaints about there not being any implementation of the suggestions I made. So here goes: 1. out-of-band There is an example of using srt with ogg in a out-of-band approach here: http://v2v.cc/~j/jquery.srt/ You will need Firefox3.1 to play it. The syntax of what Jan implemented is different to what I proposed, but I wanted to take it forward and make it more generic. 2. in-band There is also a draft implementation of srt inside Ogg through the OggText specification, but I it's not released yet. It is also not as relevant to this group as the out-of-band example. Cheers, Silvia. On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan [EMAIL PROTECTED] wrote: On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins [EMAIL PROTECTED] wrote: Silvia Pfeiffer wrote: I'm interested to hear people's opinions on these ideas. I agree with Ralph and think having a simple, explicit mechanism at the html level is worthwhile - and very open and explicit to a web author. Having a redirection through a ROE-type file on the server is more opaque, but maybe more consistent with existing similar approaches as taken by RealNetworks in rm files and WindowsMedia files in asx files. This (having a separate document that references other streams) is what I was thinking of. I guess which is more natural depends on who is doing the assembling. If it is the HTML author that takes the individual pieces and links them together then doing it in the HTML is probably easiest. For what it's worth, loading an intermediate document of some new type which references other streams to be loaded adds a lot of complexity to the browser implementation. It creates new states that the decoder can be in, and introduces new failure modes. It creates new timing issues and possibly new security issues. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Thoughts on video accessibility
Silvia Pfeiffer ha scritto: I heard some complaints about there not being any implementation of the suggestions I made. So here goes: 1. out-of-band There is an example of using srt with ogg in a out-of-band approach here: http://v2v.cc/~j/jquery.srt/ You will need Firefox3.1 to play it. The syntax of what Jan implemented is different to what I proposed, but I wanted to take it forward and make it more generic. 2. in-band There is also a draft implementation of srt inside Ogg through the OggText specification, but I it's not released yet. It is also not as relevant to this group as the out-of-band example. Cheers, Silvia. As far as I've understood from a first read of your proposal (I'm not much inside that matter), current players/codecs implements different kinds of bindings with text (either in-band or out-of-band) and supports different formats, so perhaps there is place for both mechanisms you're proposing: - the html version, for compatibility with existing media and relative external bindings, for servers not supporting the dynamic creation of content defined by your ROE format and for people who don't want/can't afford to modify the way their medias are served (e.g. they can't access to the server where the media is stored and add or modify an xml metadata file, but want to try and bind the media with some text they can store separately); - the xml file mainly to drive dynamic content creation, and as a gradual replacement of other binding formats. Any problem arising from the management of separate connections (possibly to different domains) to get both the audio/video and the textual resources, might perhaps be mitigated by indicating (or establishing as default) a time to wait for external text before starting the playback (in case the text resource fails to load -- e.g. the server is temporarily offline -- and there is enough buffered content to start playing before the browser gets any answer for any other resource) -- when and if the text arrives, its use might be skipped at all, or start by synchronizing with the current point in the media; the same way, if any problem loading the text arose after starting the playback, the missing parts might just be skipped (such would be unlikely to happen if both the media and the text files were located on the same server). Perhaps, it might be useful to provied a way to indicate an alternative media to stream, i.e. an .asx or .rm media which is internally binded with only one of the supported languages, but the browser fails to bind them with the 'primary' media, or in case the ROE format is not supported (e.g. introduced in a v2 of the spec), or the 'primary' media is not supported by the browser, but the same content is available in several formats (i.e. a lossless compressed version along a lossy compressed one - the UA might even choice one basing on the network capabilities) -- I know such is possible with source elements, but perhaps some considerations are needed on the opportunity to relate source element and text bindings, i.e. to tell the UA, by the mean of an attribute, whether to verify if the source supports any of the declared text resources, preferably one matching the locale, or not (that is, specifying if a source is a 'last resort' in case the UA is unable to bind any other source with the text -- other sources might be chosen anyway, if no 'last resort' source is supported). Anyway, the use of subtitles in conjunction with screen readers might be problematic: a deeper synchronization with the media might be needed in order to have the text read just during voice pauses, to describe a mute scene, or to entirely substitute the sound, if the text provides a translation for the speech (I guess such would be untrivial to do without putting one's hands inside the media). Everything, of course, IMHO. Regards, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: CAPODANNO A RIMINI HOTEL 2 STELLE * 2 notti pernottamento con colazione a buffet euro 70,00, 3 notti euro 90,00 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8500d=9-12
Re: [whatwg] Thoughts on video accessibility
On Wed, Dec 10, 2008 at 6:59 AM, Calogero Alex Baldacchino [EMAIL PROTECTED] wrote: Anyway, the use of subtitles in conjunction with screen readers might be problematic: a deeper synchronization with the media might be needed in order to have the text read just during voice pauses, to describe a mute scene, or to entirely substitute the sound, if the text provides a translation for the speech (I guess such would be untrivial to do without putting one's hands inside the media). I cannot see a problem with conflicts between screen reading a web page and a video on the web page. A blind user would have turned off the use of captions by default in his/her browser, since they can hear very well what is going on, just not see it. As long as the video is not playing, it is only represented as a video (and maybe a alt text is read out). When the blind user clicks on the video, audio annotations will be read out by the screen reader in addition to the native sound. These would be placed into silence segments. In the case of a video with a non-native language sound track, it's a bit more complicated. The native sound would need to be turned off and the screenreader would need to read out the subtitles in the user's native language as well as the audio annotations in the breaks. This many not be easy to set up through preferences in the Web browser, but it should be possible for the user to manually select the right tracks and turn off the video sound. Regards, Silvia.
Re: [whatwg] Thoughts on video accessibility
Also, for those interested, metavid and mv_embed are examples of use of ROE: http://metavid.org/w/index.php/Mv_embed Metavid uses video roe=my_roe_file.xml for clean remote embedding of multiple text/video/audio tracks in a single xml encapsulation. An example of such embeds is here: http://metavid-mike.blogspot.com/ Regards, Silvia. On Wed, Dec 10, 2008 at 3:49 AM, Silvia Pfeiffer [EMAIL PROTECTED] wrote: I heard some complaints about there not being any implementation of the suggestions I made. So here goes: 1. out-of-band There is an example of using srt with ogg in a out-of-band approach here: http://v2v.cc/~j/jquery.srt/ You will need Firefox3.1 to play it. The syntax of what Jan implemented is different to what I proposed, but I wanted to take it forward and make it more generic. 2. in-band There is also a draft implementation of srt inside Ogg through the OggText specification, but I it's not released yet. It is also not as relevant to this group as the out-of-band example. Cheers, Silvia. On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan [EMAIL PROTECTED] wrote: On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins [EMAIL PROTECTED] wrote: Silvia Pfeiffer wrote: I'm interested to hear people's opinions on these ideas. I agree with Ralph and think having a simple, explicit mechanism at the html level is worthwhile - and very open and explicit to a web author. Having a redirection through a ROE-type file on the server is more opaque, but maybe more consistent with existing similar approaches as taken by RealNetworks in rm files and WindowsMedia files in asx files. This (having a separate document that references other streams) is what I was thinking of. I guess which is more natural depends on who is doing the assembling. If it is the HTML author that takes the individual pieces and links them together then doing it in the HTML is probably easiest. For what it's worth, loading an intermediate document of some new type which references other streams to be loaded adds a lot of complexity to the browser implementation. It creates new states that the decoder can be in, and introduces new failure modes. It creates new timing issues and possibly new security issues. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Thoughts on video accessibility
Silvia Pfeiffer ha scritto: On Wed, Dec 10, 2008 at 6:59 AM, Calogero Alex Baldacchino [EMAIL PROTECTED] wrote: Anyway, the use of subtitles in conjunction with screen readers might be problematic: a deeper synchronization with the media might be needed in order to have the text read just during voice pauses, to describe a mute scene, or to entirely substitute the sound, if the text provides a translation for the speech (I guess such would be untrivial to do without putting one's hands inside the media). I cannot see a problem with conflicts between screen reading a web page and a video on the web page. A blind user would have turned off the use of captions by default in his/her browser, since they can hear very well what is going on, just not see it. As long as the video is not playing, it is only represented as a video (and maybe a alt text is read out). When the blind user clicks on the video, audio annotations will be read out by the screen reader in addition to the native sound. These would be placed into silence segments. I was thinking on a possible lack of synchronism, with enabled annotations, between the screenreader reading them, and the actual duration of corresponding silence segments, maybe because of not enough brief sentences (e.g. as a consequence of a non well-groomed translation in a certain language) and/or a slow reading (depending on the language peculiarities, or the user settings, or both, and anyway out of control for any UA), resulting in a cross sound between the end part of a read out annotation and the beginning of the next non-silence segment, perhaps repeatedly during playback. Maybe this is a borderline case. In the case of a video with a non-native language sound track, it's a bit more complicated. The native sound would need to be turned off and the screenreader would need to read out the subtitles in the user's native language as well as the audio annotations in the breaks. This many not be easy to set up through preferences in the Web browser, but it should be possible for the user to manually select the right tracks and turn off the video sound. Regards, Silvia. If the base language of the video, or the provided languages, were indicated somewhere, in the metadata or in the enclosing xml file, perhaps such a switch might be automated (perhaps the corresponding preference might be something like read subtitles when the media does not support your language maybe coupled with the option don't read subtitles when the media supported language(s) can't be identified.). I was also thinking about 'implied' subtitles, such as those showed in a film when some characters speak in different language from the base language of the rest of the content; in such a case, if distinguishing 'implied' subtitles were possible somehow, it might be nice to turn down (or off, as needed) the volume and let a voice engine to speak them aloud. I guess a UA with an embedded voice technology (such as Opera Voice, or FireVox), could do a good job and keep audio and video synchronized in most cases, but involving an external software (such as a screen reader) the scenario might change (usually a screenreader can't be fastened or slowed, and stopping it - when reading annotations - after having fed some text, if at all possible, might be untrivial -- again, I'm not enough inside this stuff, so I can just suppose some borderline scenarios). Anyway, your proposal is nice, and, once widespread, screen readers developers might choose to provide some kind of support for synchronism (if needed to improve accessibility of audio/video contents). Regards, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Partecipa al concorso Sheba! * In palio speciali premi e tanti prodotti Sheba per il tuo gatto! Gioca ora e vinci! * Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8431d=10-12
Re: [whatwg] Thoughts on video accessibility
At 21:33 +1300 9/12/08, Robert O'Callahan wrote: For what it's worth, loading an intermediate document of some new type which references other streams to be loaded adds a lot of complexity to the browser implementation. It creates new states that the decoder can be in, and introduces new failure modes. It creates new timing issues and possibly new security issues. I'm not sure I agree; but if you believe that, we should address it no matter which way this discussion goes. It should absolutely be possible to reference a SMIL file, or an MP4 or MOV file with external data (to give only two examples) from a video or audio element, and have the DOM, events, states, and APis work correctly. Also, I should say that we quite deliberately left off associating and synchronizing media from our initial proposal for the media elements, for two reasons: a) we believe that SMIL files should be embeddable; b) it's an easy line to draw; you want media integration, use a media integration language such as SMIL. If you start adding some integration, it's very hard to know where to stop. As for user or user-agent supplied subtitles etc., that can (of course) be a UA feature or option. If a unique content ID would help find such subtitle files, then I am hoping that the media annotations group would come up with a workable scheme (something the music industry is still, ahem, struggling with). -- David Singer Multimedia Standards, Apple Inc.
[whatwg] Thoughts on video accessibility
Hi everybody, For the last 2 months, I have been investigating means of satisfying video accessibility needs through Ogg in Mozilla/Firefox for HTML5. You will find a lot of information about our work at https://wiki.mozilla.org/Accessibility/Video_Accessibility and in the archives of the Ogg accessibility mailing list at http://lists.xiph.org/mailman/listinfo/accessibility . I wanted to give some feedback here on our findings, since some of them will have an impact on the HTML5 specification. What are we talking about --- When I say video accessibility, I'm actually only talking about time-aligned text formats and not e.g. captions as bitmaps or audio annotations as wave files. Since we analysed how to attach time-aligned text formats with video in a Web Browser, we also did not want to restrict ourselves to only closed captions and subtitles. It made sense to extend this to any type of time-aligned text on can think about, including textual audio annotations (to be consumed by the blind through a screenreader or braille output), karaoke, speech bubbles, hyperlinked text annotations, and others. There is a list at http://wiki.xiph.org/index.php/OggText#Categories_of_Text_Codecs which gives you a more complete picture. How is it currently done --- When looking at the existing situation around time-aligned text for video, I found a very diverse set ot formats and means of doing it. First of all, most media players allow you to load a video file and a caption/subtitle file for it in two separate steps. The reason is that most subtitles are produced by other people than the original content and this allows the player to synchronise them together. This is particularly the case with the vast majority of SRT and SUB subtitle files, but is also the case for SMIL- and DFXP-based subtitle files. From a media file format POV, some formats have a means of multiplexing time-aligned text into the format, e.g. QuickTime has QTText and Flash has cuepoints. Others prefer to use external references, e.g. WindowsMedia and SAMI or SMIL files, RealMedia and SMIL files. For mobile applications, a subset of DFXP has been defined in 3GPP TimedText, which is actually being encapsulated into QuickTime QTText using some extensions, and can be encapsulated into MP4 using the MPEG-4 TTXT specification. As can be seen, the current situation is such that time-aligned text is being handled both in-stream and out-of-band and there are indeed requirements for both situations. Requirements --- Not to go into much detail here, but I have seen extensive arguments made on both sides of the equation for and against in-stream text tracks. One particular argument for in-stream text is that of downloading the video from some place and keeping all its information together in one file such that when it is distributed again, it retains that information. One particular argument for out-of-band text is the ability to add text tracks at a later stage, from another site, and even from a web service (e.g. a translation web service that uses an existing caption file and translates it into another language). In view of these requirements, I strongly believe we need to enable people to do both: provide time-aligned text through external/out-of-band resources and through in-stream, where the container format allows this. Proposal for out-of-band approach -- I'd like to stimulate a discussion here about how we can support out-of-band time-aligned text for video in HTML5. I have seen previous proposals, such as the track element at http://esw.w3.org/topic/HTML/MultimediaAccessibilty#head-a83ba3666e7a437bf966c6bb210cec392dc6ca53 and would like to propose the following specification. Take this as an example: video src=http://example.com/video.ogv; controls text category=CC lang=en type=text/x-srt src=caption.srt/text text category=SUB lang=de type=application/ttaf+xml src=german.dfxp/text text category=SUB lang=jp type=application/smil src=japanese.smil/text text category=SUB lang=fr type=text/x-srt src=translation_webservice/fr/caption.srt/text /video * text elements are subelements of the video element and therefore clearly related to one video (even if it comes in different formats). [BTW: I'm happy to rename this to textarea or whatever else people prefer to call it]. * the category tag (could also be renamed role if we prefer) allows us to specify what text category we are dealing with and allows the web browser to determine how to display it (there would be default display for the different categories and css would allow to override these). * the lang tag would allow the specification of alternative resources based on language, which allows the browser to select one by default based on browser preferences, and also to turn those tracks on by default that a particular user requires (e.g. because they are blind and
Re: [whatwg] Thoughts on video accessibility
Silvia Pfeiffer wrote: Take this as an example: video src=http://example.com/video.ogv; controls text category=CC lang=en type=text/x-srt src=caption.srt/text text category=SUB lang=de type=application/ttaf+xml src=german.dfxp/text text category=SUB lang=jp type=application/smil src=japanese.smil/text text category=SUB lang=fr type=text/x-srt src=translation_webservice/fr/caption.srt/text /video Could this combining of resources be achieved instead with SMIL or some other existing format? If there is already a format for doing this then I think HTML should avoid re-inventing it unless HTML's version is better in some way. On the other hand, if what is invented for HTML is indeed better in some way, it's likely to also be valuable outside of HTML, for example in situations where SMIL is used today. (For example, loading a video and its subtitles directly into a standalone player without needing to manually load both streams.) What are the advantages of doing this directly in HTML rather than having the src attribute point at some sort of compound media document?
Re: [whatwg] Thoughts on video accessibility
On Mon, Dec 8, 2008 at 6:08 PM, Martin Atkins [EMAIL PROTECTED] wrote: What are the advantages of doing this directly in HTML rather than having the src attribute point at some sort of compound media document? The general point here is that subtitle data is in current practice often created and stored in external files. This is, in part, because of poor support for embedded tracks in web video applications, but also arises naturally in production workflow. Moreover, because they are text, subtitle data is much more likely to be stored in a database with other text-based content while audio and video is treated as binary blobs. This scheme is intended to support such hybrid systems. There is generally a tension between authors wanting to easily manipulate and add tracks, users wanting a self-contained file, and search engines wanting stand-alone access to just the text. Because splitting and merging media files requires special tools, our thinking in the Ogg accessibility group has been that we need to support both embedded and external references for text tracks in html. Users (and their tools) can then choose what methods they want to use in particular circumstances. We're also interested in a more sophisticated mechanism for communicating track assortments between a server and a client, but in the particular case of text tracks for accessiblity, I think having a simple, explicit mechanism at the html level is worthwhile. -r
Re: [whatwg] Thoughts on video accessibility
On Tue, Dec 9, 2008 at 1:08 PM, Martin Atkins [EMAIL PROTECTED] wrote: Silvia Pfeiffer wrote: Take this as an example: video src=http://example.com/video.ogv; controls text category=CC lang=en type=text/x-srt src=caption.srt/text text category=SUB lang=de type=application/ttaf+xml src=german.dfxp/text text category=SUB lang=jp type=application/smil src=japanese.smil/text text category=SUB lang=fr type=text/x-srt src=translation_webservice/fr/caption.srt/text /video Could this combining of resources be achieved instead with SMIL or some other existing format? So, are you suggesting to use something like this: video srcdesc=http://example.com/video.smil; controls /video where the Web client would retrieve the smil file and find all the references to actual resources inside the SMIL file, then do another retrieval action to actually retrieve the data it wants? This is indeed an alternative, which would require to have a smil file specification that describes the composition of tracks of a single linear video. It is indeed what we have experimented with in the Ogg community and have come up with ROE (http://wiki.xiph.org/index.php/ROE). video roe=http://example.com/video.xml; controls /video When we defined ROE, we were trying to use a tightly defined subpart of SMIL for it. This however did not work, because some of the required attributes do not exist in SMIL (e.g. profile, category, distinction, inline), SMIL was too expressive (e.g. needed to explicitly separate audio, video, when mediaSource will do fine) and SMIL required the use of other elements that were really unnecessary. So, instead of butchering up a sub-version of SMIL that would work (and look really ugly), we defined a new xml specification that would satisfy the exact requirements we had. If there is already a format for doing this then I think HTML should avoid re-inventing it unless HTML's version is better in some way. I think both have their uses. We are using the ROE file to describe the (possibly only virtually existing) media resource on the server. It gives the Web client an opportunity to request a media resource with only a particular set of tracks (allows for content adaptation). This results in a single media file, dynamically created on the Web server, delivered in one connection, and decoded by the Web browser into its constituent tracks, which is each displayed in a different, but temporally synchronised means. In contrast, the proposed html5 solution requires the Web brower to set up multiple connections, one each to the resources that it requires. The decoding and display is then dependent on multiple connections having delivered enough data to provide for a synchronised playback. It also allows downloading the full text files first and display some text ahead of time (as is usual e.g. in a transcript), while in a multiplexed file the text data is often only retrieved consecutively in sync with the decoding of the a+v tracks. What are the advantages of doing this directly in HTML rather than having the src attribute point at some sort of compound media document? I guess, an argument can be made that a user agent could use ROE to get to the individual streams and download the resources in multiple connections itself, which would have the exact same effect as the proposed HTML5 syntax. ROE currently goes beyond just text tracks and allows description of multiple media and text tracks. You however wouldn't want a Web browser to have to create multiple connections to different audio and video resources and have to synchronise them locally. Text is different in this respect, because it's with almost certainty a small enough file to be fully received before even the beginning of a video file has loaded. So, if we used ROE for such a content selection task, I would courage to only use it for text tracks. I'm interested to hear people's opinions on these ideas. I agree with Ralph and think having a simple, explicit mechanism at the html level is worthwhile - and very open and explicit to a web author. Having a redirection through a ROE-type file on the server is more opaque, but maybe more consistent with existing similar approaches as taken by RealNetworks in rm files and WindowsMedia files in asx files. Cheers, Silvia.
Re: [whatwg] Thoughts on video accessibility
Silvia Pfeiffer wrote: I'm interested to hear people's opinions on these ideas. I agree with Ralph and think having a simple, explicit mechanism at the html level is worthwhile - and very open and explicit to a web author. Having a redirection through a ROE-type file on the server is more opaque, but maybe more consistent with existing similar approaches as taken by RealNetworks in rm files and WindowsMedia files in asx files. This (having a separate document that references other streams) is what I was thinking of. I guess which is more natural depends on who is doing the assembling. If it is the HTML author that takes the individual pieces and links them together then doing it in the HTML is probably easiest. My concern is that if the only thing linking the various streams together is the HTML document then the streams are less useful outside of a web browser context. If there is a separate resource containing the description of how to assemble the result from multiple resources then this resource will be useful to non-browser video playback clients. If an existing format is used then it can be linked to as fallback for users of downlevel browsers and will hopefully open in a standalone video player. If the only linking information is in the HTML document then the best you can do as fallback is link to the video stream, requiring the user to go find the text streams and load them manually.
Re: [whatwg] Thoughts on video accessibility
Am Montag, den 08.12.2008, 21:20 -0800 schrieb Martin Atkins: My concern is that if the only thing linking the various streams together is the HTML document then the streams are less useful outside of a web browser context. If there is a separate resource containing the description of how to assemble the result from multiple resources then this resource will be useful to non-browser video playback clients. If an existing format is used then it can be linked to as fallback for users of downlevel browsers and will hopefully open in a standalone video player. If the only linking information is in the HTML document then the best you can do as fallback is link to the video stream, requiring the user to go find the text streams and load them manually. Funny, I just recently talked with someone about that. He suggested something along a DNS for subtitles, e.g. having a hash value / UUID embedded inside the stream and looking that up. So for example urn:caption:dd23d31a1158052b4e68899e1a991df102d82e52/de could hold German annstations and subtitles for the media file with that hash. -- Nils Dagsson Moskopp http://dieweltistgarnichtso.net