Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2008-05-14 Thread Ian Hickson
On Wed, 14 Nov 2007, John Foliot wrote:
 
 [...] Full text transcripts external to their media extends the shelf 
 life of videos beyond what simple meta-data alone can provide. [...] 
 While support for both external and embedded captioning might be of 
 value, encouragement of the external method should be encouraged.

I've noted this as a feature for v3 of the video spec. I am reluctant to 
add this as a feature immediately since we haven't even worked out what 
codec we should be advocating yet.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-11-14 Thread John Foliot
Silvia Pfeiffer wrote:
 Sorry to be getting back to this thread this late, but I am trying to
 catch up on email. 
 
 I'd like to contribute some thoughts on Ogg, CMML and Captions and
 will cite selectively from emails in this thread. 
 
snip

 
 This would be problematic when downloading the video for offline use
 or further distribution. This is also different from how this
 currently works for DVDs, iPod, and the like as far as I can tell. It
 also makes authoring more complicated in the cases where someone
 hands a video to you as you'd have to separate the closed caption
 stream from it first and point to it as a separate resource.
 
 Think it through: when you currently download a video from
 bittorrent, you download the subtitle file with it - mostly inside a
 zip file for simplicity even. Downloading a separate caption file  is
 similar to how you currently have to download the images separately
 for a Web page. It's no big deal really as long as there is a
 connection that can be automatically identified (e.g. through a link
 to the other inside the one, or through a zip-file, or through a
 description file).   
 
 Actually for the authoring, I completely disagree. Authoring a
 captioning file inside a text editor is much simpler than needing a
 special application to author the captions directly inside a video
 file.   
 
 In any case: I don't think it's a matter of one or the other. I
 believe firmly that it should be both, no matter what caption format
 and video format is being used.  

Actually, having the media transcript separate from the media itself is far
superior than embedded captioning from the perspective of indexing and
SEO.  Full text transcripts external to their media extends the shelf life
of videos beyond what simple meta-data alone can provide.  A number of
proof-of-concept examples have emerged that even go so far as to use the
caption/transcription file's time-stamping to 'surgically' arrive at a
specific point in a video (in the example I saw, a lecture), allowing for
precise search and retrieve capacity.  While support for both external and
embedded captioning might be of value, encouragement of the external method
should be encouraged.

JF



Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Henri Sivonen

On Oct 8, 2007, at 22:12, Dave Singer wrote:


At 12:22  +0300 8/10/07, Henri Sivonen wrote:


Could someone who knows more about the production of audio  
descriptions, please, comment if audio description can in practice  
be implemented as a supplementary sound track that plays  
concurrently with the main sound track (in that case Speex would  
be appropriate) or whether the main sound must be manually mixed  
differently when description is present?


Sometimes;  but sometimes, for example:
* background music needs to be reduced
* other audio material needs to be 'moved' to make room for audio  
description


In that case, an entire alternative soundtrack encoded using a  
general-purpose codec would be called for. Is it reasonable to expect  
content providers to take the bandwidth hit? Or should we expect  
content providers to provide an entire alternative video file?


When the problem is frame this way, the language of the text track  
doesn't need to be specified at all. In case #1 it is same as  
audio. In case #2 it is same as context site. This makes the  
text track selection mechanism super-simple.


Yes, it can often fall through to the what content did you select  
based on language and then the question of either selecting or  
styling content for accessibility can follow the language.


I don't understand that comment. My point was that the two most  
obvious cases don't require a language preference-based selection  
mechanism at all.



Personally, I'd be fine with a format with these features:
 * Metadata flag that tells if the text track is captioning for  
the deaf or translation subtitles.


I don't think we can or should 'climb inside' the content formats,  
merely have a standard way to ask them to do things (e.g. turn on  
captions).


I agree. However, in order for the HTML 5 spec to be able to  
reasonably and pragmatically tell browsers to ask the video subsystem  
to perform tasks like turn on captions, we need to check that the  
obviously foreseeable format families (Ogg in the case of Mozilla  
and, apparently, Opera and MPEG-4 in the case of Apple) are able to  
cater for such tasks. Moreover, browsers and content providers need  
to have a shared understanding of how to do this concretely.


This should all be out of scope, IMHO;  this is about the design of  
a captioning system, which I don't think we should try to do.


I think the captioning format should be specified by the video format  
family. However, in this case it has become apparent that there  
currently isn't One True Way of doing captioning in the Ogg family.  
In principle, this is a problem that the specifiers of the Ogg family  
should solve. In practice, though, this thread arises directly from  
an issue hit by the Mozilla implementation effort. Since the WHATWG  
is about interoperable implementations, it becomes a WHATWG problem  
to make sure that browsers that implement Ogg for video and content  
providers have the same understanding of what the One True Way of  
doing captioning in Ogg is if the HTML 5 spec tosses the captioning  
problem to the video format (which, I agree, is the right place to  
toss it to). Hopefully, the HTML 5 spec text can be a one-sentence  
informative reference to a spec by another group. But which spec?


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Henri Sivonen

On Oct 8, 2007, at 22:52, Benjamin Hawkes-Lewis wrote:

I'm a bit confused about why W3C's Timed Text Candidate  
Recommendation hasn't been mentioned in this thread, especially  
given that Flash objects are the VIDEO element's biggest  
competitor and Flash CS3's closed captioning component supports  
Timed Text. I haven't used it myself: is there some hideous  
disadvantage of Timed Text that makes it fundamentally flawed? It  
is appears to be designed for use both with subtitles and captions.


Here's the link for the CR:

http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/


My understanding is that the purpose of this thread isn't to find a  
captioning spec for HTML 5 but to find the right way to do closed  
captions in Ogg. Support in liboggplay and shippability in a timely  
manner are important considerations. Hence, the CMML slant so far.


Have the Annodex/Xiph developers evaluated the suitability of the W3C  
timed text format for Ogg captioning for the deaf or translation  
subtitling?


(I'm not at all an expert in this. My own experience is just what I  
can just observe about the kind of technology has served Finns well  
enough for decades for the purpose of *translation* subtitles. The  
solutions that have worked well enough are *very*, *very* feature- 
poor. The W3C spec seems a lot more complex than the simplest thing  
that could possible work if you consider that the SubRip format is  
simple and works for some definition of works and European TV  
subtitles work for some definition of works. It has been suggested  
to me off-list, though, that the W3C spec embodies the right  
expertise and reinventing it should be avoided. I wonder if the W3C  
spec could be implemented incrementally so that most of the  
complexity wouldn't burden an initial implementation.)


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Benjamin Hawkes-Lewis

Henri Sivonen wrote:
In that case, an entire alternative soundtrack encoded using a 
general-purpose codec would be called for. Is it reasonable to expect 
content providers to take the bandwidth hit? Or should we expect content 
providers to provide an entire alternative video file?


Just for comparative purposes, the BBC iPlayer apparently uses three
downloads:

1. Standard.

2. BSL.

3. Audio described (almost twice the size of Standard).

All three have closed-captioning.

Source:

http://www.bbc.co.uk/blogs/access20/2007/05/audio_description_on_the_iplay.shtml

--
Benjamin Hawkes-Lewis



Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Henri Sivonen

On Oct 8, 2007, at 22:05, Dave Singer wrote:

We suggested two ways to achieve captioning (a) by selection of  
element, at the HTML level ('if you need captions, use this resource')


Makes sense to me in case of open captions burned onto the video track.

and (b) styling of elements at the HTML level ('this video can be  
asked to display captions').


I don't quite understand how this would work. Closed captioning  
availability seems more like an intrinsic feature of the video file  
and the preference to have captions rendered seems like a boolean  
pref--not style.



Should we (Apple) edit this into the Wiki,


Please do. The wiki is open for editing.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Maik Merten
Benjamin Hawkes-Lewis schrieb:
 I'm a bit confused about why W3C's Timed Text Candidate Recommendation
 hasn't been mentioned in this thread, especially given that Flash
 objects are the VIDEO element's biggest competitor and Flash CS3's
 closed captioning component supports Timed Text. I haven't used it
 myself: is there some hideous disadvantage of Timed Text that makes it
 fundamentally flawed? It is appears to be designed for use both with
 subtitles and captions.
 
 Here's the link for the CR:
 
 http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/

Actually I wonder if it wouldn't make sense to have an attribute for
media elements specifying a URI for a file containing Timed Text. These
externally stored (not embedded in a media file) captions would be
codec-agnostic and could be used to reuse the very same set of captions
for e.g. differently encoded media (Ogg, MPEG,
Generic-Codec-Of-The-Season, ...).


As a side note I like the idea of captions which are more than just the
usual stream text. Imagine a newsreel with timed Would you like to know
more? links. Given that HTML5 is usually viewed in browsers that
implement at least a non-empty subset of HTML I imagine it should be
possible for the browser to layer something div-equivalent over the
media elements supporting captioning and pipe the HTML captions into it
(with caution, imagine a caption itself recursively embedding a video).


Maik Merten


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Dave Singer

At 10:03  +0300 9/10/07, Henri Sivonen wrote:

On Oct 8, 2007, at 22:52, Benjamin Hawkes-Lewis wrote:

I'm a bit confused about why W3C's Timed Text Candidate 
Recommendation hasn't been mentioned in this thread, especially 
given that Flash objects are the VIDEO element's biggest 
competitor and Flash CS3's closed captioning component supports 
Timed Text. I haven't used it myself: is there some hideous 
disadvantage of Timed Text that makes it fundamentally flawed? It 
is appears to be designed for use both with subtitles and captions.


Here's the link for the CR:

http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/


My understanding is that the purpose of this thread isn't to find a 
captioning spec for HTML 5 but to find the right way to do closed 
captions in Ogg.


Oh.  I was under the impression that this thread was about the right 
way to request and get captions in HTML/Web.  How the Ogg community 
designs intrinsic caption support is up to them, isn't it?



--
David Singer
Apple/QuickTime


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Anne van Kesteren

On Tue, 09 Oct 2007 18:03:41 +0200, Maik Merten [EMAIL PROTECTED] wrote:

http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/


Actually I wonder if it wouldn't make sense to have an attribute for
media elements specifying a URI for a file containing Timed Text. These
externally stored (not embedded in a media file) captions would be
codec-agnostic and could be used to reuse the very same set of captions
for e.g. differently encoded media (Ogg, MPEG,
Generic-Codec-Of-The-Season, ...).


This would be problematic when downloading the video for offline use or  
further distribution. This is also different from how this currently works  
for DVDs, iPod, and the like as far as I can tell. It also makes authoring  
more complicated in the cases where someone hands a video to you as you'd  
have to separate the closed caption stream from it first and point to it  
as a separate resource.




As a side note I like the idea of captions which are more than just the
usual stream text. Imagine a newsreel with timed Would you like to know
more? links. Given that HTML5 is usually viewed in browsers that
implement at least a non-empty subset of HTML I imagine it should be
possible for the browser to layer something div-equivalent over the
media elements supporting captioning and pipe the HTML captions into it
(with caution, imagine a caption itself recursively embedding a video).


I think the cue points feature is designed to do that.


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Henri Sivonen

On Oct 9, 2007, at 19:24, Dave Singer wrote:

At 10:03  +0300 9/10/07, Henri Sivonen wrote:
My understanding is that the purpose of this thread isn't to find  
a captioning spec for HTML 5 but to find the right way to do  
closed captions in Ogg.


Oh.  I was under the impression that this thread was about the  
right way to request and get captions in HTML/Web.


Yes, that also, but specifying the requesting part doesn't really  
help if there isn't advice to implementors on how to respond to the  
request.


How the Ogg community designs intrinsic caption support is up to  
them, isn't it?


In theory ideally yes.

However, when HTML 5 says User agents should support Ogg Theora  
video and Ogg Vorbis audio, as well as the Ogg container format. and  
User agents should provide controls to enable or disable the display  
of closed captions associated with the video stream, though such  
features should, again, not interfere with the page's normal  
rendering. it becomes a WHATWG issue to elicit a way to satisfy both  
should requirements at the same time if implementors don't  
otherwise have sufficient guidance on how to implement closed  
captioning support for Ogg interoperably.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Dave Singer

At 9:22  +0300 9/10/07, Henri Sivonen wrote:

On Oct 8, 2007, at 22:12, Dave Singer wrote:


At 12:22  +0300 8/10/07, Henri Sivonen wrote:


Could someone who knows more about the production of audio 
descriptions, please, comment if audio description can in practice 
be implemented as a supplementary sound track that plays 
concurrently with the main sound track (in that case Speex would 
be appropriate) or whether the main sound must be manually mixed 
differently when description is present?


Sometimes;  but sometimes, for example:
* background music needs to be reduced
* other audio material needs to be 'moved' to make room for audio description


In that case, an entire alternative soundtrack encoded using a 
general-purpose codec would be called for. Is it reasonable to 
expect content providers to take the bandwidth hit? Or should we 
expect content providers to provide an entire alternative video file?


If the delivery is streaming, or in some other way where the 
selection of tracks can be done prior to transport, then there isn't 
a bandwidth hit at all, of course.  Then the ask this resource to 
present itself in the captioned fashion is a reasonable way to do 
this.


Alternatively, as you say, one might prefer a whole separate file 
select this file if captions are desired.


Our proposal covers both cases, as both have valid uses.



When the problem is frame this way, the language of the text track 
doesn't need to be specified at all. In case #1 it is same as 
audio. In case #2 it is same as context site. This makes the 
text track selection mechanism super-simple.


Yes, it can often fall through to the what content did you select 
based on language and then the question of either selecting or 
styling content for accessibility can follow the language.


I don't understand that comment. My point was that the two most 
obvious cases don't require a language preference-based selection 
mechanism at all.


I am trying clumsily to agree with you. Content selection based on 
language, and then choice of any assistive needs (e.g. captions) can 
be orthogonal.





Personally, I'd be fine with a format with these features:
 * Metadata flag that tells if the text track is captioning for 
the deaf or translation subtitles.


I don't think we can or should 'climb inside' the content formats, 
merely have a standard way to ask them to do things (e.g. turn on 
captions).


I agree. However, in order for the HTML 5 spec to be able to 
reasonably and pragmatically tell browsers to ask the video 
subsystem to perform tasks like turn on captions, we need to check 
that the obviously foreseeable format families (Ogg in the case of 
Mozilla and, apparently, Opera and MPEG-4 in the case of Apple) are 
able to cater for such tasks. Moreover, browsers and content 
providers need to have a shared understanding of how to do this 
concretely.


Sure, agreed.  As this matures, we (Apple) will be looking at what it 
takes for the movie file format, and I'll raise the same questions 
about MP4 and 3GP.



--
David Singer
Apple/QuickTime


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Ivo Emanuel Gonçalves
On 10/9/07, Dave Singer [EMAIL PROTECTED] wrote:
 If the delivery is streaming, or in some other way where the
 selection of tracks can be done prior to transport, then there isn't
 a bandwidth hit at all, of course.  Then the ask this resource to
 present itself in the captioned fashion is a reasonable way to do
 this.

 Alternatively, as you say, one might prefer a whole separate file
 select this file if captions are desired.

The way I see it, the browser is working like a video player.  Modern
video players allow users to configure if they would like to see the
first subtitles track by default or not.  And if the user wishes to
turn subtitles on, off, or switch to another subtitles track (e.g.
another language) s/he right clicks the video screen and modifies the
subtitles options.  Not elegant, but it works.

-Ivo


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-09 Thread Dave Singer

At 0:25  +0100 10/10/07, Ivo Emanuel Gonçalves wrote:

On 10/9/07, Dave Singer [EMAIL PROTECTED] wrote:

 If the delivery is streaming, or in some other way where the
 selection of tracks can be done prior to transport, then there isn't
 a bandwidth hit at all, of course.  Then the ask this resource to
 present itself in the captioned fashion is a reasonable way to do
 this.

 Alternatively, as you say, one might prefer a whole separate file
 select this file if captions are desired.


The way I see it, the browser is working like a video player.  Modern
video players allow users to configure if they would like to see the
first subtitles track by default or not.  And if the user wishes to
turn subtitles on, off, or switch to another subtitles track (e.g.
another language) s/he right clicks the video screen and modifies the
subtitles options.  Not elegant, but it works.


Yes, I wish it were this simple, but 
unfortunately, this doesn't cut it, in two 
respects.  (a) Users needing accessibility go 
crazy if they have to turn it on, resource by 
resource, by hand.  (b) Users needing some kinds 
of accessibility (e.g. visual assistance) have 
trouble with things like right-click and choose 
a menu.


I don't think it's unreasonable to expect to use 
persistent preferences, if the spec. stays out of 
the field of trying to guess what all the axes 
(possibilities) are.  We've previously talked 
about

captions
high-contrast video
audio description of video
high-contrast (clarity) audio

and then the iPlayer comes along and has 'sign 
language' as another axis, which confirms that we 
can't think of all the axes up front.

--
David Singer
Apple/QuickTime


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-08 Thread Charles McCathieNevile
On Mon, 08 Oct 2007 02:14:05 +0200, Silvia Pfeiffer  
[EMAIL PROTECTED] wrote:



Hi Chris,

this is a very good discussion to have and I would be curious about
the opinions of people.


An alternative is to use SVG as a container format. You can include  
captions in various forms, provide controls to swap between thm, and even  
provide metadata (using some common accessibility vocabulary) to describe  
the different available tracks, and you can convert common timed text  
formats relatively simply. For implementors who already have SVG this is  
possibly a good option.


Loading HTML itself with everything seems like overkill to me. The case  
where you have fallback content means you can deal with some semi-capable  
format that doesn't allow a full range of accessibility options in a  
single resource...


[snip]

I think we need to understand exactly what we expect from the caption
tracks before being able to suggest an optimal solution.


Agree. I'm more likely to be involved if the discussion takes place on the  
W3C mailing list.



On 10/8/07, Chris Double [EMAIL PROTECTED] wrote:

The video element  description states that Theora, Voribis and Ogg
container should be supported. How should closed captions and audio
description tracks for accessibility be supported using video and
these formats?


cheers

Chaals

--
Charles McCathieNevile  Opera Software, Standards Group
je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals  Try the Kestrel - Opera 9.5 alpha


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-08 Thread Henri Sivonen

(Heavy quote snipping. Picking on particular points.)

On Oct 8, 2007, at 03:14, Silvia Pfeiffer wrote:

This is both, more generic than captions, and less generic in that  
captions have formatting and are displayed in a particular way.


I think we should avoid overdoing captioning or subtitling by  
engineering excessive formatting. If we consider how subtitling works  
with legacy channels (TV and movie theaters), the text is always in  
the same sans-serif font with white fill and black outline located at  
the bottom of the video frame (optionally located at the top when  
there's relevant native text at the bottom and optionally italicized).


To get feature parity with the legacy that is good enough, the only  
formatting option you need is putting the text at the top of the  
video frame as opposed to the bottom and optionally italicizing text  
runs.


(It follows that I think the idea of using SVG for captioning or  
subtitles is excessive.)


I wouldn't mind an upgrade path that allowed CSS font properties for  
captioning and subtitles, but I think we shouldn't let formatting  
hold back the first iteration.



(colours, alignment etc. - the things that the EBU
subtitling standard http://www.limeboy.com/support.php?kbID=12 is
providing).


The EBU format seems severely legacy from the Unicode point of view. :-(


Another option would be to disregard CMML completely and invent a new
timed text logical bitstream for Ogg which would just have the
subtitles. This could use any existing time text format and would just
require a bitstream mapping for Ogg, which should not be hard to do at
all.


Is 3GPP Timed Text aka. MPEG-4 part 17 unencumbered? (IANAL, this  
isn't an endorsement of the format--just a question.)


an alternate audio track (e.g. speex as suggested by you for  
accessibility to blind people),


My understanding is that at least conceptually an audio description  
track is *supplementary* to the normal sound track. Could someone who  
knows more about the production of audio descriptions, please,  
comment if audio description can in practice be implemented as a  
supplementary sound track that plays concurrently with the main sound  
track (in that case Speex would be appropriate) or whether the main  
sound must be manually mixed differently when description is present?



and several caption tracks (for different languages),


I think it needs emphasizing that captioning (for the deaf) and  
translation subtitling (for people who can hear but who can't follow  
the language) are distinctly differently in terms of the metadata  
flagging needs and the playback defaults. Moreover, although  
translations for multiple languages are nice to have, they complicate  
UI and metadata considerably and packaging multiple translations in  
one file is outside the scope of HTML5 as far as the current Design  
Principles draft (from the W3C side) goes.


I think we should first focus on two kinds on qualitatively different  
timed text (differing in metadata and playback defaults):

 1) Captions for the deaf:
  * Written in the same language as the speech content of the video  
is spoken.

  * May have speaker identification text.
  * May indicate other relevant sounds textually.
  * Don't indicate text that can be seen in the video frame.
  * Not rendered by default.
  * Enabled by a browser-wide I am deaf or my device doesn't do  
sound out pref.

 2) Subtitles for the people who can't follow foreign-language speech:
  * Written in the language of the site that embeds video when  
there's speech in another language.

  * Don't identify the speaker.
  * Don't identify sounds.
  * Translate relevant text visible in the video frame.
  * Rendered by default.
  * As a bonus suppressible via the context menu or something on a  
case-by-case basis.


When the problem is frame this way, the language of the text track  
doesn't need to be specified at all. In case #1 it is same as  
audio. In case #2 it is same as context site. This makes the text  
track selection mechanism super-simple.


Note that #2 isn't an accessibility feature but addressing #2 right  
away avoids the abuse of the #1 feature which is for accessibility.



I think we need to understand exactly what we expect from the caption
tracks before being able to suggest an optimal solution. If e.g. we
want caption tracks with hyperlinks on a temporal basis and some more
metadata around that which is machine readable, then an extension of
CMML would make the most sense.


I would prefer Unicode data over bitmaps in order to allow captioning  
to be mined by search engines without OCR. In terms of defining the  
problem space and metadata modeling, I think we should aim for the  
two cases I outlined above instead of trying to cover more ground up  
front.


Personally, I'd be fine with a format with these features:
 * Metadata flag that tells if the text track is captioning for the  
deaf or translation subtitles.
 * Sequence of 

Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-08 Thread Dave Singer

At 9:45  +1200 8/10/07, Chris Double wrote:

The video element  description states that Theora, Voribis and Ogg
container should be supported. How should closed captions and audio
description tracks for accessibility be supported using video and
these formats?

I was pointed to a page outlining some previous discussion on the issue:

http://wiki.whatwg.org/wiki/Video_accessibility

Is there a way of identifying which track is the closed caption track,
which is the alternate audio track, etc? How are other implementors of
the video element handling this issue?

Is CMML for the closed captions viable? Or a speex track for the
alternate audio? Or using Ogg Skeleton in some way to get information
about the other tracks?



There was also a thread I started in June, which I can't find on the 
archives;  my initial email is below.


We suggested two ways to achieve captioning (a) by selection of 
element, at the HTML level ('if you need captions, use this 
resource') and (b) styling of elements at the HTML level ('this video 
can be asked to display captions').


Choice (a) means that it is possible, for example, to prepare 
alternative versions with 'burned in' accessibility (e.g. captions), 
and then explicit support for them is not needed in the format.


Choice (b) is more economical in media resources, and recognizes that 
'true captioning' is sometimes better (e.g. it might be delivered out 
on analog video as line 21 data).


The previous thread faded away, but with the W3C meeting approaching, 
I'd like to get a sense of how we make progress in this area.  Should 
we (Apple) edit this into the Wiki, should we (Apple or WhatWG) carry 
the proposal to the W3C, and if so, which group?  And so on.


Thanks for re-raising this!

* * * * *

Date: Fri, 08 Jun 2007 16:22:00 -0700
From: Dave Singer [EMAIL PROTECTED]
Subject: [whatwg] accessibility management for timed media elements, proposal
Sender: [EMAIL PROTECTED]
To: WHATWG [EMAIL PROTECTED]
X-Original-To: whatwg@lists.whatwg.org
List-Post: mailto:whatwg@lists.whatwg.org
List-Subscribe: http://lists.whatwg.org/listinfo.cgi/whatwg-whatwg.org,
mailto:[EMAIL PROTECTED]
List-Unsubscribe: http://lists.whatwg.org/listinfo.cgi/whatwg-whatwg.org,
mailto:[EMAIL PROTECTED]
List-Archive: http://lists.whatwg.org/pipermail/whatwg-whatwg.org
List-Help: mailto:[EMAIL PROTECTED]
List-Id: Public mailing list for the WHAT working group whatwg-whatwg.org

Hi

we promised to get back to the whatwg with a proposal for a way to 
handle accessibility for timed media, and here it is.  sorry it took 
a while...


* * * * *


To allow the UA to select among alternative sources for media 
elements based on users' accessibility preferences, we propose to:


1) Expose accessibility preferences to users
2) Allow the UA to evaluate the suitability of content for specific 
accessibility needs via CSS media queries



Details:

1) Expose accessibility preferences to users

Proposal: user settings that correspond to a accessibility needs. For 
each need, the user can choose among the following three dispositions:


  * favor (want): I prefer media that is adapted for this kind of 
accessibility.
  * disfavor (don't want): I prefer media that is not adapted for 
this kind of accessibility.
  * disinterest (don't care): I have no preference regarding this 
kind of accessibility.


The initial set of user preferences for consideration in the 
selection of alternative media resources correspond to the following 
accessibility options:


  captions (corresponds to SMIL systemCaptions)
  descriptive audio (corresponds to SMIL systemAudioDesc)
  high contrast video
  high contrast audio (audio with minimal background noise, music 
etc., so speech is maximally intelligible)


This list is not intended to be exhaustive; additional accessibility 
options and corresponding preferences may be considered for inclusion 
in the future.


Herein we describe only those user preferences that are useful in the 
process of evaluating multiple alternative media resources for 
suitability. Note that these proposed preferences are not intended to 
exclude or supplant user preferences that may be offered by the UA to 
provide accessibility options according to the W3C accessibility 
guidelines, such as a global volume control 
http://www.w3.org/TR/WAI-USERAGENT/uaag10-chktable.html.



2) Allow the UA to evaluate the suitability of content for specific 
accessibility needs via CSS media queries


Note that the current specification of video and audio includes a 
mechanism for selection among multiple alternate resources 
http://www.whatwg.org/specs/web-apps/current-work/#location. The 
scope of our proposal here is to extend that mechanism to cover 
accessibility options.


Proposal: the media attribute of the source element as described in 
the current working draft of Web Applications 1.0 takes a CSS media 
query as its value http://www.w3.org/TR/css3-mediaqueries/, which 
the UA will evaluate 

Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-08 Thread Dave Singer

At 8:58  +0200 8/10/07, Charles McCathieNevile wrote:
On Mon, 08 Oct 2007 02:14:05 +0200, Silvia 
Pfeiffer [EMAIL PROTECTED] wrote:



Hi Chris,

this is a very good discussion to have and I would be curious about
the opinions of people.


An alternative is to use SVG as a container 
format. You can include captions in various 
forms, provide controls to swap between thm, and 
even provide metadata (using some common 
accessibility vocabulary) to describe the 
different available tracks, and you can convert 
common timed text formats relatively simply. For 
implementors who already have SVG this is 
possibly a good option.


Loading HTML itself with everything seems like 
overkill to me. The case where you have fallback 
content means you can deal with some 
semi-capable format that doesn't allow a full 
range of accessibility options in a single 
resource...


[snip]

I think we need to understand exactly what we expect from the caption
tracks before being able to suggest an optimal solution.


Agree. I'm more likely to be involved if the 
discussion takes place on the W3C mailing list.


which one would you like?  html, wcag, timed text, or ?




On 10/8/07, Chris Double [EMAIL PROTECTED] wrote:

The video element  description states that Theora, Voribis and Ogg
container should be supported. How should closed captions and audio
description tracks for accessibility be supported using video and
these formats?


cheers

Chaals

--
Charles McCathieNevile  Opera Software, Standards Group
je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals  Try the Kestrel - Opera 9.5 alpha



--
David Singer
Apple/QuickTime


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-08 Thread Dave Singer

At 12:22  +0300 8/10/07, Henri Sivonen wrote:


Is 3GPP Timed Text aka. MPEG-4 part 17 unencumbered? (IANAL, this 
isn't an endorsement of the format--just a question.)


I am not authoritative, but I have not seen any disclosures myself.

an alternate audio track (e.g. speex as suggested by you for 
accessibility to blind people),


My understanding is that at least conceptually an audio description 
track is *supplementary* to the normal sound track. Could someone 
who knows more about the production of audio descriptions, please, 
comment if audio description can in practice be implemented as a 
supplementary sound track that plays concurrently with the main 
sound track (in that case Speex would be appropriate) or whether the 
main sound must be manually mixed differently when description is 
present?


Sometimes;  but sometimes, for example:
* background music needs to be reduced
* other audio material needs to be 'moved' to make room for audio description





and several caption tracks (for different languages),


I think it needs emphasizing that captioning (for the deaf) and 
translation subtitling (for people who can hear but who can't follow 
the language) are distinctly differently in terms of the metadata 
flagging needs and the playback defaults. Moreover, although 
translations for multiple languages are nice to have, they 
complicate UI and metadata considerably and packaging multiple 
translations in one file is outside the scope of HTML5 as far as the 
current Design Principles draft (from the W3C side) goes.


I think we should first focus on two kinds on qualitatively 
different timed text (differing in metadata and playback defaults):

 1) Captions for the deaf:
  * Written in the same language as the speech content of the video is spoken.
  * May have speaker identification text.
  * May indicate other relevant sounds textually.
  * Don't indicate text that can be seen in the video frame.
  * Not rendered by default.
  * Enabled by a browser-wide I am deaf or my device doesn't do 
sound out pref.

 2) Subtitles for the people who can't follow foreign-language speech:
  * Written in the language of the site that embeds video when 
there's speech in another language.

  * Don't identify the speaker.
  * Don't identify sounds.
  * Translate relevant text visible in the video frame.
  * Rendered by default.
  * As a bonus suppressible via the context menu or something on a 
case-by-case basis.


When the problem is frame this way, the language of the text track 
doesn't need to be specified at all. In case #1 it is same as 
audio. In case #2 it is same as context site. This makes the text 
track selection mechanism super-simple.


Yes, it can often fall through to the what content did you select 
based on language and then the question of either selecting or 
styling content for accessibility can follow the language.




Personally, I'd be fine with a format with these features:
 * Metadata flag that tells if the text track is captioning for the 
deaf or translation subtitles.


I don't think we can or should 'climb inside' the content formats, 
merely have a standard way to ask them to do things (e.g. turn on 
captions).


 * Sequence of plain-text Unicode strings (incl. forced line breaks 
and bidi marks) with the following data:

   - Time code when the string appears.
   - Time code when the string disappears.
   - Flag for positioning the string at the top of the frame instead 
of bottom.
 * A way to do italics (or other emphasis for scripts for which 
italics is not applicable), but I think this feature isn't essential.
 * A guideline for estimating the amount of text appropriate to be 
shown at one time and a matching rendering guideline for UAs. (This 
guideline should result in an amount of text that agrees with 
current TV best practices.)


This should all be out of scope, IMHO;  this is about the design of a 
captioning system, which I don't think we should try to do.




It would be up to the UA to render the text at the bottom of the 
video frame in white sans-serif with black outline.


Or wherever it's supposed to go.



I think it would be inappropriate to put hyperlinks in captioning 
for the deaf because it would venture outside the space of 
accessibility and effectively hide some links for the non-deaf 
audience.


Yes, generally true!


--
David Singer
Apple/QuickTime


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-08 Thread Benjamin Hawkes-Lewis

Dave Singer wrote:

an alternate audio track (e.g. speex as suggested by you for 
accessibility to blind people),


My understanding is that at least conceptually an audio description 
track is *supplementary* to the normal sound track. Could someone who 
knows more about the production of audio descriptions, please, comment 
if audio description can in practice be implemented as a supplementary 
sound track that plays concurrently with the main sound track (in that 
case Speex would be appropriate) or whether the main sound must be 
manually mixed differently when description is present?


Sometimes;  but sometimes, for example:
* background music needs to be reduced
* other audio material needs to be 'moved' to make room for audio 
description


The relationship between audio description and the main sound appears to 
be a non-simple one. See:


http://joeclark.org/access/description/ad-principles.html

I think we should first focus on two kinds on qualitatively different 
timed text (differing in metadata and playback defaults):

 1) Captions for the deaf:
  * Written in the same language as the speech content of the video is 
spoken.

  * May have speaker identification text.
  * May indicate other relevant sounds textually.
  * Don't indicate text that can be seen in the video frame.
  * Not rendered by default.
  * Enabled by a browser-wide I am deaf or my device doesn't do sound 
out pref.


It should also, I think, be available on a case-by-case basis. The 
information is potentially useful for everyone, e.g. if a background 
sound or a particular speaker is indistinct to your ears. I don't think 
closed captioning functionality is best buried in an obscure browser 
configuration setting.



 2) Subtitles for the people who can't follow foreign-language speech:
  * Written in the language of the site that embeds video when there's 
speech in another language.

  * Don't identify the speaker.
  * Don't identify sounds.
  * Translate relevant text visible in the video frame.
  * Rendered by default.
  * As a bonus suppressible via the context menu or something on a 
case-by-case basis.


Just to add another complication to the mix, we shouldn't forget the 
need to provide for sign language interpretation. The BBC's iPlayer 
features sign interpretation, FWIW:


http://www.bbc.co.uk/blogs/access20/2007/08/bsl_comes_to_the_iplayer_1.shtml

This should all be out of scope, IMHO;  this is about the design of a 
captioning system, which I don't think we should try to do.


I'm a bit confused about why W3C's Timed Text Candidate Recommendation 
hasn't been mentioned in this thread, especially given that Flash 
objects are the VIDEO element's biggest competitor and Flash CS3's 
closed captioning component supports Timed Text. I haven't used it 
myself: is there some hideous disadvantage of Timed Text that makes it 
fundamentally flawed? It is appears to be designed for use both with 
subtitles and captions.


Here's the link for the CR:

http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/

--
Benjamin Hawkes-Lewis


[whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-07 Thread Chris Double
The video element  description states that Theora, Voribis and Ogg
container should be supported. How should closed captions and audio
description tracks for accessibility be supported using video and
these formats?

I was pointed to a page outlining some previous discussion on the issue:

http://wiki.whatwg.org/wiki/Video_accessibility

Is there a way of identifying which track is the closed caption track,
which is the alternate audio track, etc? How are other implementors of
the video element handling this issue?

Is CMML for the closed captions viable? Or a speex track for the
alternate audio? Or using Ogg Skeleton in some way to get information
about the other tracks?

Chris
-- 
http://www.bluishcoder.co.nz


Re: [whatwg] Video, Closed Captions, and Audio Description Tracks

2007-10-07 Thread Silvia Pfeiffer
Hi Chris,

this is a very good discussion to have and I would be curious about
the opinions of people.

CMML has been developed with an aim to provide html-type timed text
annotations for audio/video - in particular hyperlinks and annotations
to temporal sections of videos. This is both, more generic than
captions, and less generic in that captions have formatting and are
displayed in a particular way.

One option is to extend CMML to provide the caption functionality
inside CMML. This would not be difficult and in fact, the current
desc tag is already being used for such functionality in xine. It is
however suboptimal since it mixes aims. A better way would be to
invent a caption tag for CMML which would have some formatting
functionality (colours, alignment etc. - the things that the EBU
subtitling standard http://www.limeboy.com/support.php?kbID=12 is
providing).

Another option would be to disregard CMML completely and invent a new
timed text logical bitstream for Ogg which would just have the
subtitles. This could use any existing time text format and would just
require a bitstream mapping for Ogg, which should not be hard to do at
all.

Now for Ogg Skeleton: Ogg Skeleton will indeed have a part to play in
this, however not directly for specification of the timed text
annotations. Ogg Skeleton is a track that describes what is inside the
Ogg file. So, assuming we would have a multitrack video file with a
video track, an audio track, an alternate audio track (e.g. speex as
suggested by you for accessibility to blind people), a CMML track (for
hyperlinking into and out of the video), and several caption tracks
(for different languages), then Ogg Skeleton would explain exactly
that these exist without the need for a program to decode the Ogg file
fully.

I think we need to understand exactly what we expect from the caption
tracks before being able to suggest an optimal solution. If e.g. we
want caption tracks with hyperlinks on a temporal basis and some more
metadata around that which is machine readable, then an extension of
CMML would make the most sense.

Regards,
Silvia.


On 10/8/07, Chris Double [EMAIL PROTECTED] wrote:
 The video element  description states that Theora, Voribis and Ogg
 container should be supported. How should closed captions and audio
 description tracks for accessibility be supported using video and
 these formats?

 I was pointed to a page outlining some previous discussion on the issue:

 http://wiki.whatwg.org/wiki/Video_accessibility

 Is there a way of identifying which track is the closed caption track,
 which is the alternate audio track, etc? How are other implementors of
 the video element handling this issue?

 Is CMML for the closed captions viable? Or a speex track for the
 alternate audio? Or using Ogg Skeleton in some way to get information
 about the other tracks?

 Chris
 --
 http://www.bluishcoder.co.nz