Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
On Wed, 14 Nov 2007, John Foliot wrote: [...] Full text transcripts external to their media extends the shelf life of videos beyond what simple meta-data alone can provide. [...] While support for both external and embedded captioning might be of value, encouragement of the external method should be encouraged. I've noted this as a feature for v3 of the video spec. I am reluctant to add this as a feature immediately since we haven't even worked out what codec we should be advocating yet. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
Silvia Pfeiffer wrote: Sorry to be getting back to this thread this late, but I am trying to catch up on email. I'd like to contribute some thoughts on Ogg, CMML and Captions and will cite selectively from emails in this thread. snip This would be problematic when downloading the video for offline use or further distribution. This is also different from how this currently works for DVDs, iPod, and the like as far as I can tell. It also makes authoring more complicated in the cases where someone hands a video to you as you'd have to separate the closed caption stream from it first and point to it as a separate resource. Think it through: when you currently download a video from bittorrent, you download the subtitle file with it - mostly inside a zip file for simplicity even. Downloading a separate caption file is similar to how you currently have to download the images separately for a Web page. It's no big deal really as long as there is a connection that can be automatically identified (e.g. through a link to the other inside the one, or through a zip-file, or through a description file). Actually for the authoring, I completely disagree. Authoring a captioning file inside a text editor is much simpler than needing a special application to author the captions directly inside a video file. In any case: I don't think it's a matter of one or the other. I believe firmly that it should be both, no matter what caption format and video format is being used. Actually, having the media transcript separate from the media itself is far superior than embedded captioning from the perspective of indexing and SEO. Full text transcripts external to their media extends the shelf life of videos beyond what simple meta-data alone can provide. A number of proof-of-concept examples have emerged that even go so far as to use the caption/transcription file's time-stamping to 'surgically' arrive at a specific point in a video (in the example I saw, a lecture), allowing for precise search and retrieve capacity. While support for both external and embedded captioning might be of value, encouragement of the external method should be encouraged. JF
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
On Oct 8, 2007, at 22:12, Dave Singer wrote: At 12:22 +0300 8/10/07, Henri Sivonen wrote: Could someone who knows more about the production of audio descriptions, please, comment if audio description can in practice be implemented as a supplementary sound track that plays concurrently with the main sound track (in that case Speex would be appropriate) or whether the main sound must be manually mixed differently when description is present? Sometimes; but sometimes, for example: * background music needs to be reduced * other audio material needs to be 'moved' to make room for audio description In that case, an entire alternative soundtrack encoded using a general-purpose codec would be called for. Is it reasonable to expect content providers to take the bandwidth hit? Or should we expect content providers to provide an entire alternative video file? When the problem is frame this way, the language of the text track doesn't need to be specified at all. In case #1 it is same as audio. In case #2 it is same as context site. This makes the text track selection mechanism super-simple. Yes, it can often fall through to the what content did you select based on language and then the question of either selecting or styling content for accessibility can follow the language. I don't understand that comment. My point was that the two most obvious cases don't require a language preference-based selection mechanism at all. Personally, I'd be fine with a format with these features: * Metadata flag that tells if the text track is captioning for the deaf or translation subtitles. I don't think we can or should 'climb inside' the content formats, merely have a standard way to ask them to do things (e.g. turn on captions). I agree. However, in order for the HTML 5 spec to be able to reasonably and pragmatically tell browsers to ask the video subsystem to perform tasks like turn on captions, we need to check that the obviously foreseeable format families (Ogg in the case of Mozilla and, apparently, Opera and MPEG-4 in the case of Apple) are able to cater for such tasks. Moreover, browsers and content providers need to have a shared understanding of how to do this concretely. This should all be out of scope, IMHO; this is about the design of a captioning system, which I don't think we should try to do. I think the captioning format should be specified by the video format family. However, in this case it has become apparent that there currently isn't One True Way of doing captioning in the Ogg family. In principle, this is a problem that the specifiers of the Ogg family should solve. In practice, though, this thread arises directly from an issue hit by the Mozilla implementation effort. Since the WHATWG is about interoperable implementations, it becomes a WHATWG problem to make sure that browsers that implement Ogg for video and content providers have the same understanding of what the One True Way of doing captioning in Ogg is if the HTML 5 spec tosses the captioning problem to the video format (which, I agree, is the right place to toss it to). Hopefully, the HTML 5 spec text can be a one-sentence informative reference to a spec by another group. But which spec? -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
On Oct 8, 2007, at 22:52, Benjamin Hawkes-Lewis wrote: I'm a bit confused about why W3C's Timed Text Candidate Recommendation hasn't been mentioned in this thread, especially given that Flash objects are the VIDEO element's biggest competitor and Flash CS3's closed captioning component supports Timed Text. I haven't used it myself: is there some hideous disadvantage of Timed Text that makes it fundamentally flawed? It is appears to be designed for use both with subtitles and captions. Here's the link for the CR: http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/ My understanding is that the purpose of this thread isn't to find a captioning spec for HTML 5 but to find the right way to do closed captions in Ogg. Support in liboggplay and shippability in a timely manner are important considerations. Hence, the CMML slant so far. Have the Annodex/Xiph developers evaluated the suitability of the W3C timed text format for Ogg captioning for the deaf or translation subtitling? (I'm not at all an expert in this. My own experience is just what I can just observe about the kind of technology has served Finns well enough for decades for the purpose of *translation* subtitles. The solutions that have worked well enough are *very*, *very* feature- poor. The W3C spec seems a lot more complex than the simplest thing that could possible work if you consider that the SubRip format is simple and works for some definition of works and European TV subtitles work for some definition of works. It has been suggested to me off-list, though, that the W3C spec embodies the right expertise and reinventing it should be avoided. I wonder if the W3C spec could be implemented incrementally so that most of the complexity wouldn't burden an initial implementation.) -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
Henri Sivonen wrote: In that case, an entire alternative soundtrack encoded using a general-purpose codec would be called for. Is it reasonable to expect content providers to take the bandwidth hit? Or should we expect content providers to provide an entire alternative video file? Just for comparative purposes, the BBC iPlayer apparently uses three downloads: 1. Standard. 2. BSL. 3. Audio described (almost twice the size of Standard). All three have closed-captioning. Source: http://www.bbc.co.uk/blogs/access20/2007/05/audio_description_on_the_iplay.shtml -- Benjamin Hawkes-Lewis
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
On Oct 8, 2007, at 22:05, Dave Singer wrote: We suggested two ways to achieve captioning (a) by selection of element, at the HTML level ('if you need captions, use this resource') Makes sense to me in case of open captions burned onto the video track. and (b) styling of elements at the HTML level ('this video can be asked to display captions'). I don't quite understand how this would work. Closed captioning availability seems more like an intrinsic feature of the video file and the preference to have captions rendered seems like a boolean pref--not style. Should we (Apple) edit this into the Wiki, Please do. The wiki is open for editing. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
Benjamin Hawkes-Lewis schrieb: I'm a bit confused about why W3C's Timed Text Candidate Recommendation hasn't been mentioned in this thread, especially given that Flash objects are the VIDEO element's biggest competitor and Flash CS3's closed captioning component supports Timed Text. I haven't used it myself: is there some hideous disadvantage of Timed Text that makes it fundamentally flawed? It is appears to be designed for use both with subtitles and captions. Here's the link for the CR: http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/ Actually I wonder if it wouldn't make sense to have an attribute for media elements specifying a URI for a file containing Timed Text. These externally stored (not embedded in a media file) captions would be codec-agnostic and could be used to reuse the very same set of captions for e.g. differently encoded media (Ogg, MPEG, Generic-Codec-Of-The-Season, ...). As a side note I like the idea of captions which are more than just the usual stream text. Imagine a newsreel with timed Would you like to know more? links. Given that HTML5 is usually viewed in browsers that implement at least a non-empty subset of HTML I imagine it should be possible for the browser to layer something div-equivalent over the media elements supporting captioning and pipe the HTML captions into it (with caution, imagine a caption itself recursively embedding a video). Maik Merten
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
At 10:03 +0300 9/10/07, Henri Sivonen wrote: On Oct 8, 2007, at 22:52, Benjamin Hawkes-Lewis wrote: I'm a bit confused about why W3C's Timed Text Candidate Recommendation hasn't been mentioned in this thread, especially given that Flash objects are the VIDEO element's biggest competitor and Flash CS3's closed captioning component supports Timed Text. I haven't used it myself: is there some hideous disadvantage of Timed Text that makes it fundamentally flawed? It is appears to be designed for use both with subtitles and captions. Here's the link for the CR: http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/ My understanding is that the purpose of this thread isn't to find a captioning spec for HTML 5 but to find the right way to do closed captions in Ogg. Oh. I was under the impression that this thread was about the right way to request and get captions in HTML/Web. How the Ogg community designs intrinsic caption support is up to them, isn't it? -- David Singer Apple/QuickTime
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
On Tue, 09 Oct 2007 18:03:41 +0200, Maik Merten [EMAIL PROTECTED] wrote: http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/ Actually I wonder if it wouldn't make sense to have an attribute for media elements specifying a URI for a file containing Timed Text. These externally stored (not embedded in a media file) captions would be codec-agnostic and could be used to reuse the very same set of captions for e.g. differently encoded media (Ogg, MPEG, Generic-Codec-Of-The-Season, ...). This would be problematic when downloading the video for offline use or further distribution. This is also different from how this currently works for DVDs, iPod, and the like as far as I can tell. It also makes authoring more complicated in the cases where someone hands a video to you as you'd have to separate the closed caption stream from it first and point to it as a separate resource. As a side note I like the idea of captions which are more than just the usual stream text. Imagine a newsreel with timed Would you like to know more? links. Given that HTML5 is usually viewed in browsers that implement at least a non-empty subset of HTML I imagine it should be possible for the browser to layer something div-equivalent over the media elements supporting captioning and pipe the HTML captions into it (with caution, imagine a caption itself recursively embedding a video). I think the cue points feature is designed to do that. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
On Oct 9, 2007, at 19:24, Dave Singer wrote: At 10:03 +0300 9/10/07, Henri Sivonen wrote: My understanding is that the purpose of this thread isn't to find a captioning spec for HTML 5 but to find the right way to do closed captions in Ogg. Oh. I was under the impression that this thread was about the right way to request and get captions in HTML/Web. Yes, that also, but specifying the requesting part doesn't really help if there isn't advice to implementors on how to respond to the request. How the Ogg community designs intrinsic caption support is up to them, isn't it? In theory ideally yes. However, when HTML 5 says User agents should support Ogg Theora video and Ogg Vorbis audio, as well as the Ogg container format. and User agents should provide controls to enable or disable the display of closed captions associated with the video stream, though such features should, again, not interfere with the page's normal rendering. it becomes a WHATWG issue to elicit a way to satisfy both should requirements at the same time if implementors don't otherwise have sufficient guidance on how to implement closed captioning support for Ogg interoperably. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
At 9:22 +0300 9/10/07, Henri Sivonen wrote: On Oct 8, 2007, at 22:12, Dave Singer wrote: At 12:22 +0300 8/10/07, Henri Sivonen wrote: Could someone who knows more about the production of audio descriptions, please, comment if audio description can in practice be implemented as a supplementary sound track that plays concurrently with the main sound track (in that case Speex would be appropriate) or whether the main sound must be manually mixed differently when description is present? Sometimes; but sometimes, for example: * background music needs to be reduced * other audio material needs to be 'moved' to make room for audio description In that case, an entire alternative soundtrack encoded using a general-purpose codec would be called for. Is it reasonable to expect content providers to take the bandwidth hit? Or should we expect content providers to provide an entire alternative video file? If the delivery is streaming, or in some other way where the selection of tracks can be done prior to transport, then there isn't a bandwidth hit at all, of course. Then the ask this resource to present itself in the captioned fashion is a reasonable way to do this. Alternatively, as you say, one might prefer a whole separate file select this file if captions are desired. Our proposal covers both cases, as both have valid uses. When the problem is frame this way, the language of the text track doesn't need to be specified at all. In case #1 it is same as audio. In case #2 it is same as context site. This makes the text track selection mechanism super-simple. Yes, it can often fall through to the what content did you select based on language and then the question of either selecting or styling content for accessibility can follow the language. I don't understand that comment. My point was that the two most obvious cases don't require a language preference-based selection mechanism at all. I am trying clumsily to agree with you. Content selection based on language, and then choice of any assistive needs (e.g. captions) can be orthogonal. Personally, I'd be fine with a format with these features: * Metadata flag that tells if the text track is captioning for the deaf or translation subtitles. I don't think we can or should 'climb inside' the content formats, merely have a standard way to ask them to do things (e.g. turn on captions). I agree. However, in order for the HTML 5 spec to be able to reasonably and pragmatically tell browsers to ask the video subsystem to perform tasks like turn on captions, we need to check that the obviously foreseeable format families (Ogg in the case of Mozilla and, apparently, Opera and MPEG-4 in the case of Apple) are able to cater for such tasks. Moreover, browsers and content providers need to have a shared understanding of how to do this concretely. Sure, agreed. As this matures, we (Apple) will be looking at what it takes for the movie file format, and I'll raise the same questions about MP4 and 3GP. -- David Singer Apple/QuickTime
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
On 10/9/07, Dave Singer [EMAIL PROTECTED] wrote: If the delivery is streaming, or in some other way where the selection of tracks can be done prior to transport, then there isn't a bandwidth hit at all, of course. Then the ask this resource to present itself in the captioned fashion is a reasonable way to do this. Alternatively, as you say, one might prefer a whole separate file select this file if captions are desired. The way I see it, the browser is working like a video player. Modern video players allow users to configure if they would like to see the first subtitles track by default or not. And if the user wishes to turn subtitles on, off, or switch to another subtitles track (e.g. another language) s/he right clicks the video screen and modifies the subtitles options. Not elegant, but it works. -Ivo
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
At 0:25 +0100 10/10/07, Ivo Emanuel Gonçalves wrote: On 10/9/07, Dave Singer [EMAIL PROTECTED] wrote: If the delivery is streaming, or in some other way where the selection of tracks can be done prior to transport, then there isn't a bandwidth hit at all, of course. Then the ask this resource to present itself in the captioned fashion is a reasonable way to do this. Alternatively, as you say, one might prefer a whole separate file select this file if captions are desired. The way I see it, the browser is working like a video player. Modern video players allow users to configure if they would like to see the first subtitles track by default or not. And if the user wishes to turn subtitles on, off, or switch to another subtitles track (e.g. another language) s/he right clicks the video screen and modifies the subtitles options. Not elegant, but it works. Yes, I wish it were this simple, but unfortunately, this doesn't cut it, in two respects. (a) Users needing accessibility go crazy if they have to turn it on, resource by resource, by hand. (b) Users needing some kinds of accessibility (e.g. visual assistance) have trouble with things like right-click and choose a menu. I don't think it's unreasonable to expect to use persistent preferences, if the spec. stays out of the field of trying to guess what all the axes (possibilities) are. We've previously talked about captions high-contrast video audio description of video high-contrast (clarity) audio and then the iPlayer comes along and has 'sign language' as another axis, which confirms that we can't think of all the axes up front. -- David Singer Apple/QuickTime
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
On Mon, 08 Oct 2007 02:14:05 +0200, Silvia Pfeiffer [EMAIL PROTECTED] wrote: Hi Chris, this is a very good discussion to have and I would be curious about the opinions of people. An alternative is to use SVG as a container format. You can include captions in various forms, provide controls to swap between thm, and even provide metadata (using some common accessibility vocabulary) to describe the different available tracks, and you can convert common timed text formats relatively simply. For implementors who already have SVG this is possibly a good option. Loading HTML itself with everything seems like overkill to me. The case where you have fallback content means you can deal with some semi-capable format that doesn't allow a full range of accessibility options in a single resource... [snip] I think we need to understand exactly what we expect from the caption tracks before being able to suggest an optimal solution. Agree. I'm more likely to be involved if the discussion takes place on the W3C mailing list. On 10/8/07, Chris Double [EMAIL PROTECTED] wrote: The video element description states that Theora, Voribis and Ogg container should be supported. How should closed captions and audio description tracks for accessibility be supported using video and these formats? cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try the Kestrel - Opera 9.5 alpha
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
(Heavy quote snipping. Picking on particular points.) On Oct 8, 2007, at 03:14, Silvia Pfeiffer wrote: This is both, more generic than captions, and less generic in that captions have formatting and are displayed in a particular way. I think we should avoid overdoing captioning or subtitling by engineering excessive formatting. If we consider how subtitling works with legacy channels (TV and movie theaters), the text is always in the same sans-serif font with white fill and black outline located at the bottom of the video frame (optionally located at the top when there's relevant native text at the bottom and optionally italicized). To get feature parity with the legacy that is good enough, the only formatting option you need is putting the text at the top of the video frame as opposed to the bottom and optionally italicizing text runs. (It follows that I think the idea of using SVG for captioning or subtitles is excessive.) I wouldn't mind an upgrade path that allowed CSS font properties for captioning and subtitles, but I think we shouldn't let formatting hold back the first iteration. (colours, alignment etc. - the things that the EBU subtitling standard http://www.limeboy.com/support.php?kbID=12 is providing). The EBU format seems severely legacy from the Unicode point of view. :-( Another option would be to disregard CMML completely and invent a new timed text logical bitstream for Ogg which would just have the subtitles. This could use any existing time text format and would just require a bitstream mapping for Ogg, which should not be hard to do at all. Is 3GPP Timed Text aka. MPEG-4 part 17 unencumbered? (IANAL, this isn't an endorsement of the format--just a question.) an alternate audio track (e.g. speex as suggested by you for accessibility to blind people), My understanding is that at least conceptually an audio description track is *supplementary* to the normal sound track. Could someone who knows more about the production of audio descriptions, please, comment if audio description can in practice be implemented as a supplementary sound track that plays concurrently with the main sound track (in that case Speex would be appropriate) or whether the main sound must be manually mixed differently when description is present? and several caption tracks (for different languages), I think it needs emphasizing that captioning (for the deaf) and translation subtitling (for people who can hear but who can't follow the language) are distinctly differently in terms of the metadata flagging needs and the playback defaults. Moreover, although translations for multiple languages are nice to have, they complicate UI and metadata considerably and packaging multiple translations in one file is outside the scope of HTML5 as far as the current Design Principles draft (from the W3C side) goes. I think we should first focus on two kinds on qualitatively different timed text (differing in metadata and playback defaults): 1) Captions for the deaf: * Written in the same language as the speech content of the video is spoken. * May have speaker identification text. * May indicate other relevant sounds textually. * Don't indicate text that can be seen in the video frame. * Not rendered by default. * Enabled by a browser-wide I am deaf or my device doesn't do sound out pref. 2) Subtitles for the people who can't follow foreign-language speech: * Written in the language of the site that embeds video when there's speech in another language. * Don't identify the speaker. * Don't identify sounds. * Translate relevant text visible in the video frame. * Rendered by default. * As a bonus suppressible via the context menu or something on a case-by-case basis. When the problem is frame this way, the language of the text track doesn't need to be specified at all. In case #1 it is same as audio. In case #2 it is same as context site. This makes the text track selection mechanism super-simple. Note that #2 isn't an accessibility feature but addressing #2 right away avoids the abuse of the #1 feature which is for accessibility. I think we need to understand exactly what we expect from the caption tracks before being able to suggest an optimal solution. If e.g. we want caption tracks with hyperlinks on a temporal basis and some more metadata around that which is machine readable, then an extension of CMML would make the most sense. I would prefer Unicode data over bitmaps in order to allow captioning to be mined by search engines without OCR. In terms of defining the problem space and metadata modeling, I think we should aim for the two cases I outlined above instead of trying to cover more ground up front. Personally, I'd be fine with a format with these features: * Metadata flag that tells if the text track is captioning for the deaf or translation subtitles. * Sequence of
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
At 9:45 +1200 8/10/07, Chris Double wrote: The video element description states that Theora, Voribis and Ogg container should be supported. How should closed captions and audio description tracks for accessibility be supported using video and these formats? I was pointed to a page outlining some previous discussion on the issue: http://wiki.whatwg.org/wiki/Video_accessibility Is there a way of identifying which track is the closed caption track, which is the alternate audio track, etc? How are other implementors of the video element handling this issue? Is CMML for the closed captions viable? Or a speex track for the alternate audio? Or using Ogg Skeleton in some way to get information about the other tracks? There was also a thread I started in June, which I can't find on the archives; my initial email is below. We suggested two ways to achieve captioning (a) by selection of element, at the HTML level ('if you need captions, use this resource') and (b) styling of elements at the HTML level ('this video can be asked to display captions'). Choice (a) means that it is possible, for example, to prepare alternative versions with 'burned in' accessibility (e.g. captions), and then explicit support for them is not needed in the format. Choice (b) is more economical in media resources, and recognizes that 'true captioning' is sometimes better (e.g. it might be delivered out on analog video as line 21 data). The previous thread faded away, but with the W3C meeting approaching, I'd like to get a sense of how we make progress in this area. Should we (Apple) edit this into the Wiki, should we (Apple or WhatWG) carry the proposal to the W3C, and if so, which group? And so on. Thanks for re-raising this! * * * * * Date: Fri, 08 Jun 2007 16:22:00 -0700 From: Dave Singer [EMAIL PROTECTED] Subject: [whatwg] accessibility management for timed media elements, proposal Sender: [EMAIL PROTECTED] To: WHATWG [EMAIL PROTECTED] X-Original-To: whatwg@lists.whatwg.org List-Post: mailto:whatwg@lists.whatwg.org List-Subscribe: http://lists.whatwg.org/listinfo.cgi/whatwg-whatwg.org, mailto:[EMAIL PROTECTED] List-Unsubscribe: http://lists.whatwg.org/listinfo.cgi/whatwg-whatwg.org, mailto:[EMAIL PROTECTED] List-Archive: http://lists.whatwg.org/pipermail/whatwg-whatwg.org List-Help: mailto:[EMAIL PROTECTED] List-Id: Public mailing list for the WHAT working group whatwg-whatwg.org Hi we promised to get back to the whatwg with a proposal for a way to handle accessibility for timed media, and here it is. sorry it took a while... * * * * * To allow the UA to select among alternative sources for media elements based on users' accessibility preferences, we propose to: 1) Expose accessibility preferences to users 2) Allow the UA to evaluate the suitability of content for specific accessibility needs via CSS media queries Details: 1) Expose accessibility preferences to users Proposal: user settings that correspond to a accessibility needs. For each need, the user can choose among the following three dispositions: * favor (want): I prefer media that is adapted for this kind of accessibility. * disfavor (don't want): I prefer media that is not adapted for this kind of accessibility. * disinterest (don't care): I have no preference regarding this kind of accessibility. The initial set of user preferences for consideration in the selection of alternative media resources correspond to the following accessibility options: captions (corresponds to SMIL systemCaptions) descriptive audio (corresponds to SMIL systemAudioDesc) high contrast video high contrast audio (audio with minimal background noise, music etc., so speech is maximally intelligible) This list is not intended to be exhaustive; additional accessibility options and corresponding preferences may be considered for inclusion in the future. Herein we describe only those user preferences that are useful in the process of evaluating multiple alternative media resources for suitability. Note that these proposed preferences are not intended to exclude or supplant user preferences that may be offered by the UA to provide accessibility options according to the W3C accessibility guidelines, such as a global volume control http://www.w3.org/TR/WAI-USERAGENT/uaag10-chktable.html. 2) Allow the UA to evaluate the suitability of content for specific accessibility needs via CSS media queries Note that the current specification of video and audio includes a mechanism for selection among multiple alternate resources http://www.whatwg.org/specs/web-apps/current-work/#location. The scope of our proposal here is to extend that mechanism to cover accessibility options. Proposal: the media attribute of the source element as described in the current working draft of Web Applications 1.0 takes a CSS media query as its value http://www.w3.org/TR/css3-mediaqueries/, which the UA will evaluate
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
At 8:58 +0200 8/10/07, Charles McCathieNevile wrote: On Mon, 08 Oct 2007 02:14:05 +0200, Silvia Pfeiffer [EMAIL PROTECTED] wrote: Hi Chris, this is a very good discussion to have and I would be curious about the opinions of people. An alternative is to use SVG as a container format. You can include captions in various forms, provide controls to swap between thm, and even provide metadata (using some common accessibility vocabulary) to describe the different available tracks, and you can convert common timed text formats relatively simply. For implementors who already have SVG this is possibly a good option. Loading HTML itself with everything seems like overkill to me. The case where you have fallback content means you can deal with some semi-capable format that doesn't allow a full range of accessibility options in a single resource... [snip] I think we need to understand exactly what we expect from the caption tracks before being able to suggest an optimal solution. Agree. I'm more likely to be involved if the discussion takes place on the W3C mailing list. which one would you like? html, wcag, timed text, or ? On 10/8/07, Chris Double [EMAIL PROTECTED] wrote: The video element description states that Theora, Voribis and Ogg container should be supported. How should closed captions and audio description tracks for accessibility be supported using video and these formats? cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try the Kestrel - Opera 9.5 alpha -- David Singer Apple/QuickTime
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
At 12:22 +0300 8/10/07, Henri Sivonen wrote: Is 3GPP Timed Text aka. MPEG-4 part 17 unencumbered? (IANAL, this isn't an endorsement of the format--just a question.) I am not authoritative, but I have not seen any disclosures myself. an alternate audio track (e.g. speex as suggested by you for accessibility to blind people), My understanding is that at least conceptually an audio description track is *supplementary* to the normal sound track. Could someone who knows more about the production of audio descriptions, please, comment if audio description can in practice be implemented as a supplementary sound track that plays concurrently with the main sound track (in that case Speex would be appropriate) or whether the main sound must be manually mixed differently when description is present? Sometimes; but sometimes, for example: * background music needs to be reduced * other audio material needs to be 'moved' to make room for audio description and several caption tracks (for different languages), I think it needs emphasizing that captioning (for the deaf) and translation subtitling (for people who can hear but who can't follow the language) are distinctly differently in terms of the metadata flagging needs and the playback defaults. Moreover, although translations for multiple languages are nice to have, they complicate UI and metadata considerably and packaging multiple translations in one file is outside the scope of HTML5 as far as the current Design Principles draft (from the W3C side) goes. I think we should first focus on two kinds on qualitatively different timed text (differing in metadata and playback defaults): 1) Captions for the deaf: * Written in the same language as the speech content of the video is spoken. * May have speaker identification text. * May indicate other relevant sounds textually. * Don't indicate text that can be seen in the video frame. * Not rendered by default. * Enabled by a browser-wide I am deaf or my device doesn't do sound out pref. 2) Subtitles for the people who can't follow foreign-language speech: * Written in the language of the site that embeds video when there's speech in another language. * Don't identify the speaker. * Don't identify sounds. * Translate relevant text visible in the video frame. * Rendered by default. * As a bonus suppressible via the context menu or something on a case-by-case basis. When the problem is frame this way, the language of the text track doesn't need to be specified at all. In case #1 it is same as audio. In case #2 it is same as context site. This makes the text track selection mechanism super-simple. Yes, it can often fall through to the what content did you select based on language and then the question of either selecting or styling content for accessibility can follow the language. Personally, I'd be fine with a format with these features: * Metadata flag that tells if the text track is captioning for the deaf or translation subtitles. I don't think we can or should 'climb inside' the content formats, merely have a standard way to ask them to do things (e.g. turn on captions). * Sequence of plain-text Unicode strings (incl. forced line breaks and bidi marks) with the following data: - Time code when the string appears. - Time code when the string disappears. - Flag for positioning the string at the top of the frame instead of bottom. * A way to do italics (or other emphasis for scripts for which italics is not applicable), but I think this feature isn't essential. * A guideline for estimating the amount of text appropriate to be shown at one time and a matching rendering guideline for UAs. (This guideline should result in an amount of text that agrees with current TV best practices.) This should all be out of scope, IMHO; this is about the design of a captioning system, which I don't think we should try to do. It would be up to the UA to render the text at the bottom of the video frame in white sans-serif with black outline. Or wherever it's supposed to go. I think it would be inappropriate to put hyperlinks in captioning for the deaf because it would venture outside the space of accessibility and effectively hide some links for the non-deaf audience. Yes, generally true! -- David Singer Apple/QuickTime
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
Dave Singer wrote: an alternate audio track (e.g. speex as suggested by you for accessibility to blind people), My understanding is that at least conceptually an audio description track is *supplementary* to the normal sound track. Could someone who knows more about the production of audio descriptions, please, comment if audio description can in practice be implemented as a supplementary sound track that plays concurrently with the main sound track (in that case Speex would be appropriate) or whether the main sound must be manually mixed differently when description is present? Sometimes; but sometimes, for example: * background music needs to be reduced * other audio material needs to be 'moved' to make room for audio description The relationship between audio description and the main sound appears to be a non-simple one. See: http://joeclark.org/access/description/ad-principles.html I think we should first focus on two kinds on qualitatively different timed text (differing in metadata and playback defaults): 1) Captions for the deaf: * Written in the same language as the speech content of the video is spoken. * May have speaker identification text. * May indicate other relevant sounds textually. * Don't indicate text that can be seen in the video frame. * Not rendered by default. * Enabled by a browser-wide I am deaf or my device doesn't do sound out pref. It should also, I think, be available on a case-by-case basis. The information is potentially useful for everyone, e.g. if a background sound or a particular speaker is indistinct to your ears. I don't think closed captioning functionality is best buried in an obscure browser configuration setting. 2) Subtitles for the people who can't follow foreign-language speech: * Written in the language of the site that embeds video when there's speech in another language. * Don't identify the speaker. * Don't identify sounds. * Translate relevant text visible in the video frame. * Rendered by default. * As a bonus suppressible via the context menu or something on a case-by-case basis. Just to add another complication to the mix, we shouldn't forget the need to provide for sign language interpretation. The BBC's iPlayer features sign interpretation, FWIW: http://www.bbc.co.uk/blogs/access20/2007/08/bsl_comes_to_the_iplayer_1.shtml This should all be out of scope, IMHO; this is about the design of a captioning system, which I don't think we should try to do. I'm a bit confused about why W3C's Timed Text Candidate Recommendation hasn't been mentioned in this thread, especially given that Flash objects are the VIDEO element's biggest competitor and Flash CS3's closed captioning component supports Timed Text. I haven't used it myself: is there some hideous disadvantage of Timed Text that makes it fundamentally flawed? It is appears to be designed for use both with subtitles and captions. Here's the link for the CR: http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/ -- Benjamin Hawkes-Lewis
[whatwg] Video, Closed Captions, and Audio Description Tracks
The video element description states that Theora, Voribis and Ogg container should be supported. How should closed captions and audio description tracks for accessibility be supported using video and these formats? I was pointed to a page outlining some previous discussion on the issue: http://wiki.whatwg.org/wiki/Video_accessibility Is there a way of identifying which track is the closed caption track, which is the alternate audio track, etc? How are other implementors of the video element handling this issue? Is CMML for the closed captions viable? Or a speex track for the alternate audio? Or using Ogg Skeleton in some way to get information about the other tracks? Chris -- http://www.bluishcoder.co.nz
Re: [whatwg] Video, Closed Captions, and Audio Description Tracks
Hi Chris, this is a very good discussion to have and I would be curious about the opinions of people. CMML has been developed with an aim to provide html-type timed text annotations for audio/video - in particular hyperlinks and annotations to temporal sections of videos. This is both, more generic than captions, and less generic in that captions have formatting and are displayed in a particular way. One option is to extend CMML to provide the caption functionality inside CMML. This would not be difficult and in fact, the current desc tag is already being used for such functionality in xine. It is however suboptimal since it mixes aims. A better way would be to invent a caption tag for CMML which would have some formatting functionality (colours, alignment etc. - the things that the EBU subtitling standard http://www.limeboy.com/support.php?kbID=12 is providing). Another option would be to disregard CMML completely and invent a new timed text logical bitstream for Ogg which would just have the subtitles. This could use any existing time text format and would just require a bitstream mapping for Ogg, which should not be hard to do at all. Now for Ogg Skeleton: Ogg Skeleton will indeed have a part to play in this, however not directly for specification of the timed text annotations. Ogg Skeleton is a track that describes what is inside the Ogg file. So, assuming we would have a multitrack video file with a video track, an audio track, an alternate audio track (e.g. speex as suggested by you for accessibility to blind people), a CMML track (for hyperlinking into and out of the video), and several caption tracks (for different languages), then Ogg Skeleton would explain exactly that these exist without the need for a program to decode the Ogg file fully. I think we need to understand exactly what we expect from the caption tracks before being able to suggest an optimal solution. If e.g. we want caption tracks with hyperlinks on a temporal basis and some more metadata around that which is machine readable, then an extension of CMML would make the most sense. Regards, Silvia. On 10/8/07, Chris Double [EMAIL PROTECTED] wrote: The video element description states that Theora, Voribis and Ogg container should be supported. How should closed captions and audio description tracks for accessibility be supported using video and these formats? I was pointed to a page outlining some previous discussion on the issue: http://wiki.whatwg.org/wiki/Video_accessibility Is there a way of identifying which track is the closed caption track, which is the alternate audio track, etc? How are other implementors of the video element handling this issue? Is CMML for the closed captions viable? Or a speex track for the alternate audio? Or using Ogg Skeleton in some way to get information about the other tracks? Chris -- http://www.bluishcoder.co.nz