Re: [whatwg] Thoughts on video accessibility

2009-07-16 Thread Philip Jägenstedt
On Thu, 16 Jul 2009 07:58:30 +0200, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:



Hi Ian,

Great to see the new efforts to move the subtitle/caption/karaoke
issues forward!

I actually have a contract with Mozilla starting this month to help
solve this, so I am more than grateful that you have proposed some
ideas in this space.

On Thu, Jul 16, 2009 at 9:38 AM, Ian Hicksoni...@hixie.ch wrote:

On Sat, 27 Dec 2008, Silvia Pfeiffer wrote:

 1. Timed text in the resource itself (or linked from the resource
 itself), rendered as part of the video automatically by the user
 agent.

For case 1, the practical implications are that browser vendors will
have to develop support for a large variety of text codecs, each one
providing different functionalities.


I would hope that as with a video codec, we can standardise on a single
subtitle format, ideally some simple media-independent combination of  
SRT

and LRC [1]. It's difficult to solve this problem without a standard
codec, though.


I have myself thought about creating a new format to address the needs
for time-aligned text in audio/video.

However, the problem with creating a new format is that you start from
scratch and already spreaded formats are not supported.

I can see that your proposed format is trying to be backwards
compatible with SRT, so at least it would work for the large number of
existing srt file collections. I am still skeptical, in particular
because there are no authoring systems for this format around.
But I would be curious what others think about your proposed SRT-LRC-mix.


There are already more formats than you could possibly want on the scale  
between SRT (dumb text) and complex XML formats like DFXP or USF (used in  
Matroska). In my layman opinion both extremes make sense, but anything in  
between I'm rather skeptical to.



In fact, the easiest solution would be if that particular format was
really only HTML.


IMHO that would be absurd. HTML means scripting, embedded videos, an
unbelivably complex rendering system, complex parsing, etc; plus, what's
more, it doesn't even support timing yet, so we'd have to add all the
timing and karaoke features on top of it. Requiring that video players
embed a timed HTML renderer just to render subtitles is like saying that
we should ship Microsoft Word with every DVD player, to handle the user
input when the user wants to type in a new chapter number to jump to.


I agree, it cannot be a format that contains all the complexity of
HTML. It would only support a subpart of HTML that is relevant, plus
the addition of timing - and in this case is indeed a new format. I
have therefore changed my mind since I sent that email in Dec 08 and
am hoping we can do it with existing formats.


I think that eventually we will want timing/synchronization in HTML for  
synchronizing multiple video or audio tracks. As far as I can tell no  
browser wants to implement the addCueRange API (removing this should be  
the topic of a separate mail), so we really need to re-think this part and  
I think that timed text plays an important part here.



In particular, I have taken an in-depth look at the latest
specification from the Timed Text working group that have put years of
experiments and decades of experience into developing DFXP. You can
see my review of DFXP here:
http://blog.gingertech.net/2009/06/28/a-review-of-the-w3c-timed-text-authoring-format/
. I think it is both too flexible in a lot of ways, but also too
restrictive in others. However, it is a well formulated format that is
also getting market traction. In addition, it is possible to formulate
profiles to add missing functionality.

If we want a quick and dirty hack, srt itself is probably the best
solution. If we want a well thought-out solution, DFXP is probably a
better idea.

I am currently experimenting with these and will be able to share
something soon for further discussion.



 3. Timed text stored in a separate file, which is then parsed by the
 user agent and rendered as part of the video automatically by the
 browser.

Maybe we should consider solving this differently. Either we could
encapsulate into the video container upon download. Or we could create  
a

zip-file or tarball upon download. I'd just find it a big mistake to
ignore the majority use case in the standard, which is why I proposed
the text elements inside the video tag.


If browser vendors are willing to merge subtitles and video files when
saving them, that would be great. Is this easy to do?


My suggestion was really about doing this server-side, which we have
already implemented years ago in the Annodex project for Ogg
Theora/Vorbis.

However, it is also possible to do this in the browser: in the case of
Ogg, the browser just needs to have a multiplexing library installed
as well as a means to encode the subtitle file (which I like to call a
text codec). Since it's text, it's nowhere near as complex as
encoding audio or video and just consists of 

Re: [whatwg] Thoughts on video accessibility

2009-07-16 Thread David Singer

Thanks for the analysis, but two pieces of feedback:

1) Though sub-titles and captions are the most common accessibility 
issue for audio/video content, they are not the only one.  There are 
people:

 -- who cannot see, and need audio description of video
 -- who cannot hear, and prefer sign language
 -- who have vision issues and prefer high or low contrast video
 -- who have audio issues and prefer audio that lacks background 
music, noise, etc.
This is only a partial list.  Note that some content is only 
available with open captions (aka burned-in).  Clearly sub-optimal, 
but better than nothing.


2) I think the environment can and should help select and configure 
type-1 resources, where it can.  It shouldn't need to be always a 
manual step by the user interacting with the media player.  That is, 
I don't see why we cannot have the markup express this source is 
better for people who have accessibility need X (probably as a media 
query).  However, media queries are CSS, not HTML...

--
David Singer
Multimedia Standards, Apple Inc.


Re: [whatwg] Thoughts on video accessibility

2009-07-16 Thread Silvia Pfeiffer
On Thu, Jul 16, 2009 at 10:31 PM, David Singersin...@apple.com wrote:
 Thanks for the analysis, but two pieces of feedback:

 1) Though sub-titles and captions are the most common accessibility issue
 for audio/video content, they are not the only one.  There are people:
  -- who cannot see, and need audio description of video
  -- who cannot hear, and prefer sign language
  -- who have vision issues and prefer high or low contrast video
  -- who have audio issues and prefer audio that lacks background music,
 noise, etc.
 This is only a partial list.  Note that some content is only available with
 open captions (aka burned-in).  Clearly sub-optimal, but better than
 nothing.

Agreed. Plus there is time-aligned textual markup that is not just
subtitles, captions, lyrics and karaoke: much is being talked about
timed metadata these days, and clickable regions, as well as spatial
and temporal notes.

The lowest hanging fruit for such time-aligned text are, however,
indeed subtitles and captions.


 2) I think the environment can and should help select and configure type-1
 resources, where it can.  It shouldn't need to be always a manual step by
 the user interacting with the media player.  That is, I don't see why we
 cannot have the markup express this source is better for people who have
 accessibility need X (probably as a media query).  However, media queries
 are CSS, not HTML...

Would you mind providing an example that demonstrates the use of media
queries? I cannot currently imagine what that could look like and how
it could work. Feels free to use CSS in addition to any require HTML
(and javascript?). Since I cannot imagine what that would look like
and how it could work, I cannot start to understand it as an
alternative.

Thanks,
Silvia.


Re: [whatwg] Thoughts on video accessibility

2009-07-16 Thread Silvia Pfeiffer
On Thu, Jul 16, 2009 at 6:28 PM, Philip Jägenstedtphil...@opera.com wrote:
 On Thu, 16 Jul 2009 07:58:30 +0200, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:

  3. Timed text stored in a separate file, which is then parsed by the
  user agent and rendered as part of the video automatically by the
  browser.
 
 Maybe we should consider solving this differently. Either we could
 encapsulate into the video container upon download. Or we could create a
 zip-file or tarball upon download. I'd just find it a big mistake to
 ignore the majority use case in the standard, which is why I proposed
 the text elements inside the video tag.

 If browser vendors are willing to merge subtitles and video files when
 saving them, that would be great. Is this easy to do?

 My suggestion was really about doing this server-side, which we have
 already implemented years ago in the Annodex project for Ogg
 Theora/Vorbis.

 However, it is also possible to do this in the browser: in the case of
 Ogg, the browser just needs to have a multiplexing library installed
 as well as a means to encode the subtitle file (which I like to call a
 text codec). Since it's text, it's nowhere near as complex as
 encoding audio or video and just consists of light-weight packaging
 code. So, yes, it is totally possible to have the browsers create a
 binary video file that has the subtitles encapsulated that were
 previously only accessible as referenced text files behind a separate
 URL.

 The only issue I see is the baseline codec issue: every browser that
 wants to support multiple media formats has to implement this
 multiplexing and text encoding for every media encapsulation format
 differently, which is annoying and increases complexity. It's however
 generally a small amount of complexity compared to the complexity
 created by having to support multiple codecs.

 I disagree, remuxing files would be much more of an implementation burden
 than supporting multiple codecs, at least if a format-agnostic media
 framework is used (be that internal or external to the browser). Remuxing
 would require you to support/create parts of the media framework that you
 otherwise aren't using, i.e. parsers, muxers, file writers and plugging of
 these together (which unlike decoding isn't automatic in any framework I've
 seen).

The point that I was trying to make is that if one had to only
implement it for one encapsulation format, it would be simple and a
small piece of dedicated code. However, if one has to be
format-agnostic, it indeed requires supporting parts of a media
framework that is not needed for demuxing and decoding. So, yes, I
agree with you: in the general case it might create extraneous
complexity in a browser.


 Anything is doable of course, but I think this is really something that is
 best done server-side using specialized tools.

I agree with this. This can be a special service that some servers
would offer who want to allow their users to share single video files
that contain their timed text within.

 It would be interesting to hear back from the browser vendors about how
 easily the subtitles could be kept with the video in a way that survives
 reuse in other contexts.

 I think that in the case of external subtitles the browser could simply save
 it alongside with the video. It is my experience that is media players have
 much more robust support for external subtitles (like SRT) than for internal
 subtitles, so this is my preferred option (plus it's easier).

Agreed: this would be the fallback for content downloaded from servers
that do no offer the special muxing capability.

In fact, such a separate handling of composed content through multiple
files is nothing new to HTML: all Web pages that I download from the
Internet require me to download each component of the Web page
separately: the images, the text, the css, the javascript. (Worse even
if the text is created e.g. through a database query.) I agree with
Philip that the separate handling of subtitle files and media files is
not as much of an issue as it may seem.

Regards,
Silvia.


Re: [whatwg] Thoughts on video accessibility

2009-07-16 Thread David Singer

At 23:28  +1000 16/07/09, Silvia Pfeiffer wrote:

  2) I think the environment can and should help select and configure type-1

 resources, where it can.  It shouldn't need to be always a manual step by
 the user interacting with the media player.  That is, I don't see why we
 cannot have the markup express this source is better for people who have
 accessibility need X (probably as a media query).  However, media queries
 are CSS, not HTML...


Would you mind providing an example that demonstrates the use of media
queries? I cannot currently imagine what that could look like and how
it could work. Feels free to use CSS in addition to any require HTML
(and javascript?). Since I cannot imagine what that would look like
and how it could work, I cannot start to understand it as an
alternative.


sure. using deliberately vague way of writing the media queries

video blah blah ... 
   source src=xx-O.ers media=want-captions /
   source src=xx-N.ers media=not want-captions /
/video

xx-O has open (burned in captions), uses the same codecs etc.  It 
gets selected if the user says they want captions, otherwise XX-N (no 
captions) is selected.


video blah blah ... 
   source src=xx-S.ers media=want-sign-language /
   source src=xx.ers /
/video

xx-S has a sign-language overlay capability.  It gets selected for 
those users expressing a positive preference for sign language; 
otherwise we don't waste the bandwidth loading that, and we load the 
plain xx file.  It may be that the media part of the UA also detects 
this user preference and automatically enables sign language in xx-S.



Basically, I think we should have a framework which attempts to 
select and configure the appropriate source, so we get it right most 
of the time by default.


This (accessibility) is a subject that covers multiple groups, of course...
--
David Singer
Multimedia Standards, Apple Inc.


Re: [whatwg] Thoughts on video accessibility

2009-07-16 Thread Silvia Pfeiffer
On Thu, Jul 16, 2009 at 11:56 PM, David Singersin...@apple.com wrote:
 At 23:28  +1000 16/07/09, Silvia Pfeiffer wrote:

   2) I think the environment can and should help select and configure
 type-1

  resources, where it can.  It shouldn't need to be always a manual step
 by
  the user interacting with the media player.  That is, I don't see why we
  cannot have the markup express this source is better for people who
 have
  accessibility need X (probably as a media query).  However, media
 queries
  are CSS, not HTML...

 Would you mind providing an example that demonstrates the use of media
 queries? I cannot currently imagine what that could look like and how
 it could work. Feels free to use CSS in addition to any require HTML
 (and javascript?). Since I cannot imagine what that would look like
 and how it could work, I cannot start to understand it as an
 alternative.

 sure. using deliberately vague way of writing the media queries

 video blah blah ... 
   source src=xx-O.ers media=want-captions /
   source src=xx-N.ers media=not want-captions /
 /video

 xx-O has open (burned in captions), uses the same codecs etc.  It gets
 selected if the user says they want captions, otherwise XX-N (no captions)
 is selected.

 video blah blah ... 
   source src=xx-S.ers media=want-sign-language /
   source src=xx.ers /
 /video

 xx-S has a sign-language overlay capability.  It gets selected for those
 users expressing a positive preference for sign language; otherwise we don't
 waste the bandwidth loading that, and we load the plain xx file.  It may be
 that the media part of the UA also detects this user preference and
 automatically enables sign language in xx-S.

I just noticed that the media attribute is already part of the
source element definition in HTML5. I wonder which browsers have
implemented this attribute.

After having looked at http://www.w3.org/TR/css3-mediaqueries/, my
understanding is that media queries specify the different presentation
media that the html page's different stylesheets were built for and
thus allow choosing between these stylesheets through the link
element and its media attribute where the query goes. Also, IIUC,
the list of presentation media is currently restricted to ‘print’,
‘screen’ , ‘aural’, ‘braille’, ‘handheld’, ‘print’, ‘projection’,
‘screen’, ‘tty’, ‘tv’, and 'all' and the queries cover only the
features width, height, device-width, device-height, orientation,
aspect-ratio, device-aspect-ratio, color, color-index, monochrome,
resolution, scan, and grid.

This is different for the source elements though: instead of
specifying different presentation media and choosing between
stylesheets, the media attribute specifies different user
requirements and chooses between video source files. This makes it
independent from CSS, IIUC.

Is the intention to extend the specification of media queries to
include generic means of selecting between alternative files to load
into a HTML page? Is there a W3C activity that actually extends the
media queries to audio and video files?

If this is the case, it could also be used for the associated text
elements that Ian and I discussed earlier in this thread. The
alternatives there would be based on a combination of languages and
the different categories of time-aligned text. The language would
choose between different text files to load, and the text category
would choose between different default styles to apply.

I can imagine that that would work, but has anyone started extending
existing media query specifications for this yet?

Regards,
Silvia.


Re: [whatwg] Thoughts on video accessibility

2009-07-15 Thread Ian Hickson
On Sat, 27 Dec 2008, Calogero Alex Baldacchino wrote:
 
 A flying thought: why not thinking also to a further option for 
 embedding everything in a sort of all-in-one html page generated on 
 the fly when downloading, making of it a global container for video and 
 text to be consumed by UAs (while maintaining the opportunity to 
 download a video as a separate file, of course)? For instance, the video 
 itself might become the base64-encoded (or otherwise acceptably encoded) 
 value of a data-* attribute (or a more specific attribute) to be decoded 
 by a script (as well generated on the fly) and served to the video 
 engine as a javascript: url in place of the video src (or, perhaps 
 better, the UA might do that itself by supporting the data: protocol 
 as a valid source for the video, or a fragid pointing to an element 
 following the /video tag, perhaps a paintext or something else, and 
 containing the encoded video); while text elements might wrap the 
 corresponding timed text file, to be embedded into the page as bare 
 text, similarly to a script code -- if a certain format contained text 
 tag, those might be changed into lt;textgt; or similarly (or perhaps 
 the file content might be encoded as well) to avoid conflicts with html 
 tags.
 
 Of course, it's a first-glance idea, and needs further considerations 
 on its reliability (e.g. such an html page perhaps shouldn't be the 
 source set for a video in another page, and an option should be provided 
 to extract embedded contet; seeking might require a sequential decoding 
 to reach a desired point, and so on).

This idea seems out of scope for HTML5; it can already be done using 
features like multipart/related or data: URLs.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Thoughts on video accessibility

2009-07-15 Thread Ian Hickson
On Sat, 27 Dec 2008, Silvia Pfeiffer wrote:
  
  6. Timed text stored in a separate file, which is then fetched and 
  parsed by the Web page, and which is then rendered by the Web page.

 For case 6, while it works for deaf people, we actually create an 
 accessibility nightmare for blind people and their web developers. There 
 is no standard means for a screen reader to identify that a particular 
 part in the DOM is actually text related to the video and supposed to be 
 displayed with the video (through a screenreader or a braille reader). 

As far as I can tell, that's exactly what ARIA is for.


 Such functionality would need to be implemented through javascript by 
 every single site that wanted to provide audio annotations.

Right.


 It's also a nightmare for search engines, since there is no clear way of 
 identifying a specific text as video-related and use it as such to 
 extend knowledge about the video.

Embedding subtitles inside the video file is certainly the best option 
overall, for both accessibility and for automated analysis, yes.


  1. Timed text in the resource itself (or linked from the resource 
  itself), rendered as part of the video automatically by the user 
  agent.

 For case 1, the practical implications are that browser vendors will 
 have to develop support for a large variety of text codecs, each one 
 providing different functionalities.

I would hope that as with a video codec, we can standardise on a single 
subtitle format, ideally some simple media-independent combination of SRT 
and LRC [1]. It's difficult to solve this problem without a standard 
codec, though.


 In fact, the easiest solution would be if that particular format was 
 really only HTML.

IMHO that would be absurd. HTML means scripting, embedded videos, an 
unbelivably complex rendering system, complex parsing, etc; plus, what's 
more, it doesn't even support timing yet, so we'd have to add all the 
timing and karaoke features on top of it. Requiring that video players 
embed a timed HTML renderer just to render subtitles is like saying that 
we should ship Microsoft Word with every DVD player, to handle the user 
input when the user wants to type in a new chapter number to jump to.


 But strategically can we keep our options open towards using such a 
 format in HTML5?

As far as I can tell, HTML5 doesn't preclude any particular direction for 
subtitling.


 And now to option 3:
 
  3. Timed text stored in a separate file, which is then parsed by the 
  user agent and rendered as part of the video automatically by the 
  browser.
 
  This would make authoring subtitles somewhat easier, but would 
  typically lose the benefits of subtitles surviving when the video file 
  is extracted. It would also involve a distinct increase in 
  implementation and language complexity. We would also have to pick a 
  timed text format, or add yet another format war to the 
  video/audio codec debacle, which I think would be a really big 
  mistake right now. Given the immature state of timed text formats (it 
  seems there are new formats announced every month), it's probably 
  premature to pick one -- we should let the market pick one first.
 
 I think excluding option 3 from our list of ways of supporting
 time-aligned text is a big mistake.

We're not excluding it, we're just delaying its standardisation.


 The majority of subtitles currently available on the Web come from 
 separate files, in particular in srt or sub format. They are simple 
 formats, easily authored in a text editor, and can be related to any 
 container format. It is easy to implement support for them in authoring 
 applications and in player applications. Encapsulating them into a video 
 file and extracting them from a video file again for decoding seems an 
 unnecessary nuisance. This is why I think dealing with separate caption 
 files will continue to be the main way we deal with captions into the 
 future and why we should consider supporting this natively in Web 
 browsers rather than leaving it to every web developer to sort this out 
 himself.

I agree that if we can't get people to embed subtitles straight into their 
video streams, that providing a standard way to associate a video file 
with a subtitle stream is the way to go on the long term.


 The only real issue that we have with separate files is that the 
 captions may get lost when people download the video, store it locally, 
 and share it with friends.

This is a pretty big problem, IMHO.


 Maybe we should consider solving this differently. Either we could 
 encapsulate into the video container upon download. Or we could create a 
 zip-file or tarball upon download. I'd just find it a big mistake to 
 ignore the majority use case in the standard, which is why I proposed 
 the text elements inside the video tag.

If browser vendors are willing to merge subtitles and video files when 
saving them, that would be great. Is this easy to do?


 Here is my example again:

Re: [whatwg] Thoughts on video accessibility

2009-07-15 Thread Silvia Pfeiffer
Hi Ian,

Great to see the new efforts to move the subtitle/caption/karaoke
issues forward!

I actually have a contract with Mozilla starting this month to help
solve this, so I am more than grateful that you have proposed some
ideas in this space.

On Thu, Jul 16, 2009 at 9:38 AM, Ian Hicksoni...@hixie.ch wrote:
 On Sat, 27 Dec 2008, Silvia Pfeiffer wrote:
  1. Timed text in the resource itself (or linked from the resource
  itself), rendered as part of the video automatically by the user
  agent.

 For case 1, the practical implications are that browser vendors will
 have to develop support for a large variety of text codecs, each one
 providing different functionalities.

 I would hope that as with a video codec, we can standardise on a single
 subtitle format, ideally some simple media-independent combination of SRT
 and LRC [1]. It's difficult to solve this problem without a standard
 codec, though.

I have myself thought about creating a new format to address the needs
for time-aligned text in audio/video.

However, the problem with creating a new format is that you start from
scratch and already spreaded formats are not supported.

I can see that your proposed format is trying to be backwards
compatible with SRT, so at least it would work for the large number of
existing srt file collections. I am still skeptical, in particular
because there are no authoring systems for this format around.
But I would be curious what others think about your proposed SRT-LRC-mix.


 In fact, the easiest solution would be if that particular format was
 really only HTML.

 IMHO that would be absurd. HTML means scripting, embedded videos, an
 unbelivably complex rendering system, complex parsing, etc; plus, what's
 more, it doesn't even support timing yet, so we'd have to add all the
 timing and karaoke features on top of it. Requiring that video players
 embed a timed HTML renderer just to render subtitles is like saying that
 we should ship Microsoft Word with every DVD player, to handle the user
 input when the user wants to type in a new chapter number to jump to.

I agree, it cannot be a format that contains all the complexity of
HTML. It would only support a subpart of HTML that is relevant, plus
the addition of timing - and in this case is indeed a new format. I
have therefore changed my mind since I sent that email in Dec 08 and
am hoping we can do it with existing formats.

In particular, I have taken an in-depth look at the latest
specification from the Timed Text working group that have put years of
experiments and decades of experience into developing DFXP. You can
see my review of DFXP here:
http://blog.gingertech.net/2009/06/28/a-review-of-the-w3c-timed-text-authoring-format/
. I think it is both too flexible in a lot of ways, but also too
restrictive in others. However, it is a well formulated format that is
also getting market traction. In addition, it is possible to formulate
profiles to add missing functionality.

If we want a quick and dirty hack, srt itself is probably the best
solution. If we want a well thought-out solution, DFXP is probably a
better idea.

I am currently experimenting with these and will be able to share
something soon for further discussion.


  3. Timed text stored in a separate file, which is then parsed by the
  user agent and rendered as part of the video automatically by the
  browser.
 
 Maybe we should consider solving this differently. Either we could
 encapsulate into the video container upon download. Or we could create a
 zip-file or tarball upon download. I'd just find it a big mistake to
 ignore the majority use case in the standard, which is why I proposed
 the text elements inside the video tag.

 If browser vendors are willing to merge subtitles and video files when
 saving them, that would be great. Is this easy to do?

My suggestion was really about doing this server-side, which we have
already implemented years ago in the Annodex project for Ogg
Theora/Vorbis.

However, it is also possible to do this in the browser: in the case of
Ogg, the browser just needs to have a multiplexing library installed
as well as a means to encode the subtitle file (which I like to call a
text codec). Since it's text, it's nowhere near as complex as
encoding audio or video and just consists of light-weight packaging
code. So, yes, it is totally possible to have the browsers create a
binary video file that has the subtitles encapsulated that were
previously only accessible as referenced text files behind a separate
URL.

The only issue I see is the baseline codec issue: every browser that
wants to support multiple media formats has to implement this
multiplexing and text encoding for every media encapsulation format
differently, which is annoying and increases complexity. It's however
generally a small amount of complexity compared to the complexity
created by having to support multiple codecs.


 Here is my example again:
 video src=http://example.com/video.ogv; controls
  text 

Re: [whatwg] Thoughts on video accessibility

2008-12-27 Thread Ian Hickson

I have carefully read all the feedback in this thread concerning 
associating text with video, for various purposes such as captions, 
annotations, etc.

Taking a step back as far as I can tell there are two axes: where the 
timed text comes from, and how it is rendered.

Where it comes from, it seems, boils down to three options:
 - embedded in or referenced from the media resource itself
 - as a separate file parsed by the user agent
 - as a separate file parsed by the web page

Where the timed text is rendered boils down to two options:
 - rendered automatically by the user agent
 - rendered by the web page overlaying content on the video

For the purposes of this discussion I am ignoring burned-in captions, 
since they're basically equivalent to a different video, much like videos 
with overlayed sign language interpreters (or VH1 pop-up's annotations!).


These 5 options give us 6 cases:

1. Timed text in the resource itself (or linked from the resource itself), 
   rendered as part of the video automatically by the user agent.

This is the optimal situation from an accessibility and usability point of 
view, because it works when the video is shown full-screen, it works when 
the video is saved separate from the Web page, it works easily when other 
pages link to the same video file, it requires minimal work from the page 
author, and so forth.

This is what I think we should be encouraging.

It would probably make sense to expose the timed text track selection to 
the Web page through the API, maybe even expose the text itself somehow, 
but these are features that can and should probably wait until video has 
been more reliably implemented.


2. Timed text in the resource itself (or linked from the resource itself), 
   exposed to the Web page with no native rendering.

This allows pages to implement experimental subtitling mechanisms while 
still allowing the timed text tracks to survive re-use of the video file, 
but it seems to introduce a high cost (all pages have to implement 
subtitling themselves) with very little gain, and with several 
disadvantages -- different sites will have inconsistent subtitling, bugs 
will be prevalent in the subtitling and accessibility will thus suffer, 
and in all likelihood even videos that have subtitles will end up not 
having them shown as small sites sites don't bother to implement anything 
but the most basic controls.


3. Timed text stored in a separate file, which is then parsed by the user 
   agent and rendered as part of the video automatically by the browser.

This would make authoring subtitles somewhat easier, but would typically 
lose the benefits of subtitles surviving when the video file is extracted. 
It would also involve a distinct increase in implementation and language 
complexity. We would also have to pick a timed text format, or add yet 
another format war to the video/audio codec debacle, which I think 
would be a really big mistake right now. Given the immature state of timed 
text formats (it seems there are new formats announced every month), it's 
probably premature to pick one -- we should let the market pick one first.


4. Timed text stored in a separate file, which is then parsed by the user
   agent and exposed to the Web page with no native rendering.

This combines the disadvantages of the previous two options, without 
really introducing any groundbreaking advantages.


5. Timed text stored in a separate file, which is then fetched and parsed 
   by the Web page, which then passes it to the browser for rendering.

This is just an excessive level of complexity for a feature that could 
just be supported exclusively by the user agent. In particular, it doesn't 
actually provide for much space for experimentation -- whatever API we 
provide to expose the subtitles would limit what the rendering would be 
like regardless of what the pages want to try.

This option side-steps the issue of picking a format, though.


6. Timed text stored in a separate file, which is then fetched and parsed 
   by the Web page, and which is then rendered by the Web page.

We can't stop this from being available, and there's not much we can do to 
help with this case beyond what we do now. The disadvantages are that it 
doesn't work when the video is shown full-screen, when the video is saved 
separate from the Web page, when other pages link to the same video file 
without using their own implementation of the feature, and it requires 
substantial implementation work from the page. The _advantages_, and they 
are significant, are that pages can easily create subtitles separate from 
the video, they can easily provide features such as automated 
translations, and they can easily implement features that would otherwise 
seem overly ambitious, e.g. hyperlinked annotations with ad tracking.


Based on this analysis it seems to me that cases 1 and 6 are important to 
support, but that cases 2 to 5 aren't as compelling -- they either have 
disadvantages 

Re: [whatwg] Thoughts on video accessibility

2008-12-27 Thread Silvia Pfeiffer
Hi Ian,

Thanks for taking the time to go through all the options, analyse and
understand them - especially on your birthday! :-) Much appreciated!

I agree with your analysis and the 6 options you have identified.

However, I disagree slightly with the conclusions you have come to -
mostly from a strategic viewpoint rather than from where we currently
stand.

Your proposal is to support cases 1 and 6 and not to worry about the
others at this stage. This is a fair enough statement for the current
state of play.

Support for case 1 comes from the fact that there are indeed a number
of video container formats that have text codecs (e.g. QTtext for
Quicktime, TimedText for MPEG, CMML and Kate for Ogg).

Support for case 6 comes from the fact that it is already possible, it
is flexible, and it is therefore an easy way out of the need of
providing video accessibility support into Web pages. This is in fact
how this example http://v2v.cc/~j/jquery.srt/ is implemented.

As I said - for the current state of play, you have come to the right
conclusions. Theoretically.

But we should look at the practical implications.

For case 6, while it works for deaf people, we actually create an
accessibility nightmare for blind people and their web developers.
There is no standard means for a screen reader to identify that a
particular part in the DOM is actually text related to the video and
supposed to be displayed with the video (through a screenreader or a
braille reader). Such functionality would need to be implemented
through javascript by every single site that wanted to provide audio
annotations.

It's also a nightmare for search engines, since there is no clear way
of identifying a specific text as video-related and use it as such to
extend knowledge about the video.

As much as case 6 is the easy way out, I would like us to discourage
such solutions right before they start by providing a viable
alternative: a standard way of relating time-aligned text with video
(or audio). And that unfortunately means attacking case 3 (let me
address case 3 and your objections below).


For case 1, the practical implications are that browser vendors will
have to develop support for a large variety of text codecs, each one
providing different functionalities. It would indeed be nice if we had
one standard format that everybody used, but alas that is not the
case. What will browser vendors do in this situation? Probably just
simply nothing - maybe use the underlying media frameworks that are
being used to decode the video formats to also decode the text formats
and render them on top of the video - thus taking them completely out
of reach of the Web page. This again means that screenreaders cannot
get to them, search engines will need to find a different way of
extracting them form the video rather than the web page and generally
a worse accessibility experience.

Now, is it realistic to expect a standard format to emerge? I think
this is actually a chicken and egg problem. We currently have poor
solutions (e.g. srt as extra files, or the above mentioned text codecs
inside specific containers). Lacking an alternative, people will
continue to use these to author captions - and use their own hacked-up
formats to provide other formats such as video annotations in speech
bubbles at certain time points and coordinates etc. If there was
however a compelling case to use a different standard format, people
would go for it, IMHO.  If e.g. all browser vendors had agreed to
support one particular format. In fact, the easiest solution would be
if that particular format was really only HTML. Then, browser vendors
would find it trivial to implement, which in turn would encourage Web
developers to choose this format. Which in turn would encourage video
container formats to adopt it also inside itself. And then we have
created a uniform means of dealing with time-aligned text coming from
any of the three locations listed by you and going to the Web page.

As we haven't got any experience with this proposal yet, we can
obviously not support it. But strategically can we keep our options
open towards using such a format in HTML5?


And now to option 3:

 3. Timed text stored in a separate file, which is then parsed by the user
   agent and rendered as part of the video automatically by the browser.

 This would make authoring subtitles somewhat easier, but would typically
 lose the benefits of subtitles surviving when the video file is extracted.
 It would also involve a distinct increase in implementation and language
 complexity. We would also have to pick a timed text format, or add yet
 another format war to the video/audio codec debacle, which I think
 would be a really big mistake right now. Given the immature state of timed
 text formats (it seems there are new formats announced every month), it's
 probably premature to pick one -- we should let the market pick one first.

I think excluding option 3 from our list of ways of supporting

Re: [whatwg] Thoughts on video accessibility

2008-12-27 Thread Calogero Alex Baldacchino

Silvia Pfeiffer ha scritto:

Hi Ian,

Thanks for taking the time to go through all the options, analyse and
understand them - especially on your birthday! :-) Much appreciated!
  


Than, happy birthday to Ian!


[...]
The only real issue that we have with separate files is that the
captions may get lost when people download the video, store it
locally, and share it with friends. Maybe we should consider solving
this differently. Either we could encapsulate into the video container
upon download. Or we could create a zip-file or tarball upon download.
I'd just find it a big mistake to ignore the majority use case in the
standard, which is why I proposed the text elements inside the
video tag.

[...]


A flying thought: why not thinking also to a further option for 
embedding everything in a sort of all-in-one html page generated on 
the fly when downloading, making of it a global container for video and 
text to be consumed by UAs (while maintaining the opportunity to 
download a video as a separate file, of course)? For instance, the video 
itself might become the base64-encoded (or otherwise acceptably encoded) 
value of a data-* attribute (or a more specific attribute) to be decoded 
by a script (as well generated on the fly) and served to the video 
engine as a javascript: url in place of the video src (or, perhaps 
better, the UA might do that itself by supporting the data: protocol 
as a valid source for the video, or a fragid pointing to an element 
following the /video tag, perhaps a paintext or something else, and 
containing the encoded video); while text elements might wrap the 
corresponding timed text file, to be embedded into the page as bare 
text, similarly to a script code -- if a certain format contained text 
tag, those might be changed into lt;textgt; or similarly (or perhaps 
the file content might be encoded as well) to avoid conflicts with html 
tags.


Of course, it's a first-glance idea, and needs further considerations 
on its reliability (e.g. such an html page perhaps shouldn't be the 
source set for a video in another page, and an option should be provided 
to extract embedded contet; seeking might require a sequential decoding 
to reach a desired point, and so on).


Regards, Alex


--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP 
autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Partecipa al concorso “Crea il tuo Webshire” su Leiweb: vincere è un gioco da ragazze! 
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8518d=27-12


Re: [whatwg] Thoughts on video accessibility

2008-12-11 Thread Silvia Pfeiffer
Another implementation comes from the W3C TimedText working group:

They have a test suite for DFXP files at
http://www.w3.org/2008/12/dfxp-testsuite/web-framework/START.html .

Philippe just announced that he added HTML5 video tag support using
the javascript file that Jan had written for srt support and adapting
it to work like this:

 video src=example.ogv id=video controls
 text lang='en' type=application/ttaf+xml
src=testsuite/Content/Br001.xml/text
 /video

You'll need to use Firefox 3.1 to test it.
If you select the HTML5 DFXP player prototype, you can click on the
tests on the left and it will load the DFXP content.

The adapted javascript is at
http://www.w3.org/2008/12/dfxp-testsuite/web-framework/HTML5_player.js .

It works by mapping DFXP to HTML and DFXP styling attributes to CSS.
This is exactly what was also discussed yesterday on irc - if we can
find a simple way to map time-aligned text formats to HTML, it will be
easy to deal with in HTML5.

Regards,
Silvia.


On Thu, Dec 11, 2008 at 9:57 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:
 And now we have a first demo of the proposed syntax in action. Michael
 Dale implemented SRT support like this:

 video src=sample_fish.ogg poster=sample_fish.jpg duration=26
text category=SUB lang=en type=text/x-srt default=true
title=english SRT subtitles src=sample_fish_text_en.srt
/text
text category=SUB lang=es type=text/x-srt
title=spanish SRT subtitles src=sample_fish_text_es.srt
/text
 /video

 Michael writes:
 the demo: (tested on IE, Firefox, Safari ... with varying degrees of success 
 ;)
 http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php
 (bottom example)

 If Firefox exposes timed text tracks in ogg media the script could query
 them and display them alongside any available markup text tracks (but of
 course other browsers like IE wont easily expose those muxed text tracks
 so its likely the least common denominator of text based markup /
 pointers will be dominate for some time)

 You will need to click on the CC button on the player and click on
 select transcripts to see the different subtitles in English and
 Spanish.

 Regards,
 Silvia.


 On Wed, Dec 10, 2008 at 3:49 AM, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:
 I heard some complaints about there not being any implementation of
 the suggestions I made.

 So here goes:

 1. out-of-band
 There is an example of using srt with ogg in a out-of-band approach here:
 http://v2v.cc/~j/jquery.srt/
 You will need Firefox3.1 to play it.
 The syntax of what Jan implemented is different to what I proposed,
 but I wanted to take it forward and make it more generic.

 2. in-band
 There is also a draft implementation of srt inside Ogg through the
 OggText specification, but I it's not released yet. It is also not as
 relevant to this group as the out-of-band example.

 Cheers,
 Silvia.



 On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan rob...@ocallahan.org 
 wrote:
 On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins m...@degeneration.co.uk
 wrote:

 Silvia Pfeiffer wrote:

 I'm interested to hear people's opinions on these ideas. I agree with
 Ralph and think having a simple, explicit mechanism at the html level
 is worthwhile - and very open and explicit to a web author. Having a
 redirection through a ROE-type file on the server is more opaque, but
 maybe more consistent with existing similar approaches as taken by
 RealNetworks in rm files and WindowsMedia files in asx files.


 This (having a separate document that references other streams) is what I
 was thinking of. I guess which is more natural depends on who is doing the
 assembling. If it is the HTML author that takes the individual pieces and
 links them together then doing it in the HTML is probably easiest.


 For what it's worth, loading an intermediate document of some new type which
 references other streams to be loaded adds a lot of complexity to the
 browser implementation. It creates new states that the decoder can be in,
 and introduces new failure modes. It creates new timing issues and possibly
 new security issues.

 Rob
 --
 He was pierced for our transgressions, he was crushed for our iniquities;
 the punishment that brought us peace was upon him, and by his wounds we are
 healed. We all, like sheep, have gone astray, each of us has turned to his
 own way; and the LORD has laid on him the iniquity of us all. [Isaiah
 53:5-6]





Re: [whatwg] Thoughts on video accessibility

2008-12-10 Thread Silvia Pfeiffer
And now we have a first demo of the proposed syntax in action. Michael
Dale implemented SRT support like this:

video src=sample_fish.ogg poster=sample_fish.jpg duration=26
text category=SUB lang=en type=text/x-srt default=true
title=english SRT subtitles src=sample_fish_text_en.srt
/text
text category=SUB lang=es type=text/x-srt
title=spanish SRT subtitles src=sample_fish_text_es.srt
/text
/video

Michael writes:
 the demo: (tested on IE, Firefox, Safari ... with varying degrees of success 
 ;)
 http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php
(bottom example)

 If Firefox exposes timed text tracks in ogg media the script could query
 them and display them alongside any available markup text tracks (but of
 course other browsers like IE wont easily expose those muxed text tracks
 so its likely the least common denominator of text based markup /
 pointers will be dominate for some time)

You will need to click on the CC button on the player and click on
select transcripts to see the different subtitles in English and
Spanish.

Regards,
Silvia.


On Wed, Dec 10, 2008 at 3:49 AM, Silvia Pfeiffer
[EMAIL PROTECTED] wrote:
 I heard some complaints about there not being any implementation of
 the suggestions I made.

 So here goes:

 1. out-of-band
 There is an example of using srt with ogg in a out-of-band approach here:
 http://v2v.cc/~j/jquery.srt/
 You will need Firefox3.1 to play it.
 The syntax of what Jan implemented is different to what I proposed,
 but I wanted to take it forward and make it more generic.

 2. in-band
 There is also a draft implementation of srt inside Ogg through the
 OggText specification, but I it's not released yet. It is also not as
 relevant to this group as the out-of-band example.

 Cheers,
 Silvia.



 On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan [EMAIL PROTECTED] wrote:
 On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins [EMAIL PROTECTED]
 wrote:

 Silvia Pfeiffer wrote:

 I'm interested to hear people's opinions on these ideas. I agree with
 Ralph and think having a simple, explicit mechanism at the html level
 is worthwhile - and very open and explicit to a web author. Having a
 redirection through a ROE-type file on the server is more opaque, but
 maybe more consistent with existing similar approaches as taken by
 RealNetworks in rm files and WindowsMedia files in asx files.


 This (having a separate document that references other streams) is what I
 was thinking of. I guess which is more natural depends on who is doing the
 assembling. If it is the HTML author that takes the individual pieces and
 links them together then doing it in the HTML is probably easiest.


 For what it's worth, loading an intermediate document of some new type which
 references other streams to be loaded adds a lot of complexity to the
 browser implementation. It creates new states that the decoder can be in,
 and introduces new failure modes. It creates new timing issues and possibly
 new security issues.

 Rob
 --
 He was pierced for our transgressions, he was crushed for our iniquities;
 the punishment that brought us peace was upon him, and by his wounds we are
 healed. We all, like sheep, have gone astray, each of us has turned to his
 own way; and the LORD has laid on him the iniquity of us all. [Isaiah
 53:5-6]




Re: [whatwg] Thoughts on video accessibility

2008-12-10 Thread Michael Dale
Yea as Silvia outlines in the intro to this thread we will likely 
continue to see external timed text files winning out over muxed timed 
text.


Its just more flexible ... Javascript embedding libraries which are 
widely used today for flash video will be even more widely used with the 
emerging browser support of the video tag.


So its not too big a deal... the complexity of transcript formats will 
be handled by dedicated javascript libraries. If the browser does expose 
some timed text tracks muxed with the media then the library will 
integrate that into the interface.


That being said having some semantically meaningful source data 
representation is worthwhile. By supporting these proposed syntax with 
the embedding libraries we are promoting the idea that the syntax will 
eventually be adopted and natively handled by the browser. But in 
practice even if we do get native timed text support a javascript 
library will likely rewrite it to conform the sites layout and skinning 
anyway.


The take-away point here is if people do mux text tracks that they 
should be exposed via javascript. Otherwise it will be of very limited 
value in the context of web media.


peace,
--michael

Silvia Pfeiffer wrote:

And now we have a first demo of the proposed syntax in action. Michael
Dale implemented SRT support like this:

video src=sample_fish.ogg poster=sample_fish.jpg duration=26
text category=SUB lang=en type=text/x-srt default=true
title=english SRT subtitles src=sample_fish_text_en.srt
/text
text category=SUB lang=es type=text/x-srt
title=spanish SRT subtitles src=sample_fish_text_es.srt
/text
/video

Michael writes:
  

the demo: (tested on IE, Firefox, Safari ... with varying degrees of success ;)
http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php


(bottom example)
  

If Firefox exposes timed text tracks in ogg media the script could query
them and display them alongside any available markup text tracks (but of
course other browsers like IE wont easily expose those muxed text tracks
so its likely the least common denominator of text based markup /
pointers will be dominate for some time)



You will need to click on the CC button on the player and click on
select transcripts to see the different subtitles in English and
Spanish.

Regards,
Silvia.


On Wed, Dec 10, 2008 at 3:49 AM, Silvia Pfeiffer
[EMAIL PROTECTED] wrote:
  

I heard some complaints about there not being any implementation of
the suggestions I made.

So here goes:

1. out-of-band
There is an example of using srt with ogg in a out-of-band approach here:
http://v2v.cc/~j/jquery.srt/
You will need Firefox3.1 to play it.
The syntax of what Jan implemented is different to what I proposed,
but I wanted to take it forward and make it more generic.

2. in-band
There is also a draft implementation of srt inside Ogg through the
OggText specification, but I it's not released yet. It is also not as
relevant to this group as the out-of-band example.

Cheers,
Silvia.



On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan [EMAIL PROTECTED] wrote:


On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins [EMAIL PROTECTED]
wrote:
  

Silvia Pfeiffer wrote:


I'm interested to hear people's opinions on these ideas. I agree with
Ralph and think having a simple, explicit mechanism at the html level
is worthwhile - and very open and explicit to a web author. Having a
redirection through a ROE-type file on the server is more opaque, but
maybe more consistent with existing similar approaches as taken by
RealNetworks in rm files and WindowsMedia files in asx files.

  

This (having a separate document that references other streams) is what I
was thinking of. I guess which is more natural depends on who is doing the
assembling. If it is the HTML author that takes the individual pieces and
links them together then doing it in the HTML is probably easiest.


For what it's worth, loading an intermediate document of some new type which
references other streams to be loaded adds a lot of complexity to the
browser implementation. It creates new states that the decoder can be in,
and introduces new failure modes. It creates new timing issues and possibly
new security issues.

Rob
--
He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all. [Isaiah
53:5-6]

  




Re: [whatwg] Thoughts on video accessibility

2008-12-10 Thread Robert O'Callahan
On Wed, Dec 10, 2008 at 5:56 PM, Dave Singer [EMAIL PROTECTED] wrote:

 At 21:33  +1300 9/12/08, Robert O'Callahan wrote:

 For what it's worth, loading an intermediate document of some new type
 which references other streams to be loaded adds a lot of complexity to the
 browser implementation. It creates new states that the decoder can be in,
 and introduces new failure modes. It creates new timing issues and possibly
 new security issues.


 I'm not sure I agree;  but if you believe that, we should address it no
 matter which way this discussion goes.  It should absolutely be possible to
 reference a SMIL file, or an MP4 or MOV file with external data (to give
 only two examples) from a video or audio element, and have the DOM,
 events, states, and APis work correctly.


I agree it should be done eventually, it's just significantly more
complicated than what we have to deal with currently.

Rob
-- 
He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all. [Isaiah
53:5-6]


Re: [whatwg] Thoughts on video accessibility

2008-12-10 Thread Dave Singer

At 14:40  +1300 11/12/08, Robert O'Callahan wrote:
On Wed, Dec 10, 2008 at 5:56 PM, Dave Singer 
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:


At 21:33  +1300 9/12/08, Robert O'Callahan wrote:

For what it's worth, loading an intermediate document of some new 
type which references other streams to be loaded adds a lot of 
complexity to the browser implementation. It creates new states that 
the decoder can be in, and introduces new failure modes. It creates 
new timing issues and possibly new security issues.



I'm not sure I agree;  but if you believe that, we should address it 
no matter which way this discussion goes.  It should absolutely be 
possible to reference a SMIL file, or an MP4 or MOV file with 
external data (to give only two examples) from a video or audio 
element, and have the DOM, events, states, and APis work correctly.



I agree it should be done eventually, it's just significantly more 
complicated than what we have to deal with currently.


But if the state machine or other aspects are actually wrong for this 
case, then we should fix it now.  We have, for example, tried to keep 
out these kinds of assumptions:
a) all media is downloaded (no, it might be streamed or even arriving 
over non-IP, e.g. a TV broadcast)
b) all delivery methods are self-contained (no, they might reference 
resources as well as contain them)
c) all delivery is sequential in play order (no, some file formats 
decouple data timing and data ordering)



Rob
--
He was pierced for our transgressions, he was crushed for our 
iniquities; the punishment that brought us peace was upon him, and 
by his wounds we are healed. We all, like sheep, have gone astray, 
each of us has turned to his own way; and the LORD has laid on him 
the iniquity of us all. [Isaiah 53:5-6]



--
David Singer
Multimedia Standards, Apple Inc.

Re: [whatwg] Thoughts on video accessibility

2008-12-09 Thread Ralph Giles
On Mon, Dec 8, 2008 at 9:20 PM, Martin Atkins [EMAIL PROTECTED] wrote:

 My concern is that if the only thing linking the various streams together is
 the HTML document then the streams are less useful outside of a web browser
 context.

Absolutely. This proposal places an additional burden on the user to
download and integrate multiple resources. This trade-off to supports
applications where having the text available separately is valuable.

 -r


Re: [whatwg] Thoughts on video accessibility

2008-12-09 Thread Silvia Pfeiffer
I heard some complaints about there not being any implementation of
the suggestions I made.

So here goes:

1. out-of-band
There is an example of using srt with ogg in a out-of-band approach here:
http://v2v.cc/~j/jquery.srt/
You will need Firefox3.1 to play it.
The syntax of what Jan implemented is different to what I proposed,
but I wanted to take it forward and make it more generic.

2. in-band
There is also a draft implementation of srt inside Ogg through the
OggText specification, but I it's not released yet. It is also not as
relevant to this group as the out-of-band example.

Cheers,
Silvia.



On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan [EMAIL PROTECTED] wrote:
 On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins [EMAIL PROTECTED]
 wrote:

 Silvia Pfeiffer wrote:

 I'm interested to hear people's opinions on these ideas. I agree with
 Ralph and think having a simple, explicit mechanism at the html level
 is worthwhile - and very open and explicit to a web author. Having a
 redirection through a ROE-type file on the server is more opaque, but
 maybe more consistent with existing similar approaches as taken by
 RealNetworks in rm files and WindowsMedia files in asx files.


 This (having a separate document that references other streams) is what I
 was thinking of. I guess which is more natural depends on who is doing the
 assembling. If it is the HTML author that takes the individual pieces and
 links them together then doing it in the HTML is probably easiest.


 For what it's worth, loading an intermediate document of some new type which
 references other streams to be loaded adds a lot of complexity to the
 browser implementation. It creates new states that the decoder can be in,
 and introduces new failure modes. It creates new timing issues and possibly
 new security issues.

 Rob
 --
 He was pierced for our transgressions, he was crushed for our iniquities;
 the punishment that brought us peace was upon him, and by his wounds we are
 healed. We all, like sheep, have gone astray, each of us has turned to his
 own way; and the LORD has laid on him the iniquity of us all. [Isaiah
 53:5-6]



Re: [whatwg] Thoughts on video accessibility

2008-12-09 Thread Calogero Alex Baldacchino

Silvia Pfeiffer ha scritto:

I heard some complaints about there not being any implementation of
the suggestions I made.

So here goes:

1. out-of-band
There is an example of using srt with ogg in a out-of-band approach here:
http://v2v.cc/~j/jquery.srt/
You will need Firefox3.1 to play it.
The syntax of what Jan implemented is different to what I proposed,
but I wanted to take it forward and make it more generic.

2. in-band
There is also a draft implementation of srt inside Ogg through the
OggText specification, but I it's not released yet. It is also not as
relevant to this group as the out-of-band example.

Cheers,
Silvia.

  
As far as I've understood from a first read of your proposal (I'm not 
much inside that matter), current players/codecs implements different 
kinds of bindings with text (either in-band or out-of-band) and supports 
different formats, so perhaps there is place for both mechanisms you're 
proposing:


- the html version, for compatibility with existing media and relative 
external bindings, for servers not supporting the dynamic creation of 
content defined by your ROE format and for people who don't want/can't 
afford to modify the way their medias are served (e.g. they can't access 
to the server where the media is stored and add or modify an xml 
metadata file, but want to try and bind the media with some text they 
can store separately);


- the xml file mainly to drive dynamic content creation, and as a 
gradual replacement of other binding formats.


Any problem arising from the management of separate connections 
(possibly to different domains) to get both the audio/video and the 
textual resources, might perhaps be mitigated by indicating (or 
establishing as default) a time to wait for external text before 
starting the playback (in case the text resource fails to load -- e.g. 
the server is temporarily offline -- and there is enough buffered 
content to start playing before the browser gets any answer for any 
other resource) -- when and if the text arrives, its use might be 
skipped at all, or start by synchronizing with the current point in the 
media; the same way, if any problem loading the text arose after 
starting the playback, the missing parts might just be skipped (such 
would be unlikely to happen if both the media and the text files were 
located on the same server).


Perhaps, it might be useful to provied a way to indicate an alternative 
media to stream, i.e. an .asx or .rm media which is internally binded 
with only one of the supported languages, but the browser fails to bind 
them with the 'primary' media, or in case the ROE format is not 
supported (e.g. introduced in a v2 of the spec), or the 'primary' 
media is not supported by the browser, but the same content is available 
in several formats (i.e. a lossless compressed version along a lossy 
compressed one - the UA might even choice one basing on the network 
capabilities) -- I know such is possible with source elements, but 
perhaps some considerations are needed on the opportunity to relate 
source element and text bindings, i.e. to tell the UA, by the mean of an 
attribute, whether to verify if the source supports any of the declared 
text resources, preferably one matching the locale, or not (that is, 
specifying if a source is a 'last resort' in case the UA is unable to 
bind any other source with the text -- other sources might be chosen 
anyway, if no 'last resort' source is supported).


Anyway, the use of subtitles in conjunction with screen readers might be 
problematic: a deeper synchronization with the media might be needed in 
order to have the text read just during voice pauses, to describe a mute 
scene, or to entirely substitute the sound, if the text provides a 
translation for the speech (I guess such would be untrivial to do 
without putting one's hands inside the media).


Everything, of course, IMHO.
Regards,
Alex


--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP 
autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
CAPODANNO A RIMINI HOTEL 2 STELLE
* 2 notti pernottamento con colazione a buffet euro 70,00, 3 notti euro 90,00
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8500d=9-12


Re: [whatwg] Thoughts on video accessibility

2008-12-09 Thread Silvia Pfeiffer
On Wed, Dec 10, 2008 at 6:59 AM, Calogero Alex Baldacchino
[EMAIL PROTECTED] wrote:

 Anyway, the use of subtitles in conjunction with screen readers might be
 problematic: a deeper synchronization with the media might be needed in
 order to have the text read just during voice pauses, to describe a mute
 scene, or to entirely substitute the sound, if the text provides a
 translation for the speech (I guess such would be untrivial to do without
 putting one's hands inside the media).

I cannot see a problem with conflicts between screen reading a web
page and a video on the web page. A blind user would have turned off
the use of captions by default in his/her browser, since they can hear
very well what is going on, just not see it. As long as the video is
not playing, it is only represented as a video (and maybe a alt text
is read out). When the blind user clicks on the video, audio
annotations will be read out by the screen reader in addition to the
native sound. These would be placed into silence segments.

In the case of a video with a non-native language sound track, it's a
bit more complicated. The native sound would need to be turned off and
the screenreader would need to read out the subtitles in the user's
native language as well as the audio annotations in the breaks. This
many not be easy to set up through preferences in the Web browser, but
it should be possible for the user to manually select the right tracks
and turn off the video sound.

Regards,
Silvia.


Re: [whatwg] Thoughts on video accessibility

2008-12-09 Thread Silvia Pfeiffer
Also, for those interested, metavid and mv_embed are examples of use of ROE:
http://metavid.org/w/index.php/Mv_embed

Metavid uses
  video roe=my_roe_file.xml
for clean remote embedding of multiple text/video/audio tracks in a
single xml encapsulation.

An example of such embeds is here: http://metavid-mike.blogspot.com/

Regards,
Silvia.

On Wed, Dec 10, 2008 at 3:49 AM, Silvia Pfeiffer
[EMAIL PROTECTED] wrote:
 I heard some complaints about there not being any implementation of
 the suggestions I made.

 So here goes:

 1. out-of-band
 There is an example of using srt with ogg in a out-of-band approach here:
 http://v2v.cc/~j/jquery.srt/
 You will need Firefox3.1 to play it.
 The syntax of what Jan implemented is different to what I proposed,
 but I wanted to take it forward and make it more generic.

 2. in-band
 There is also a draft implementation of srt inside Ogg through the
 OggText specification, but I it's not released yet. It is also not as
 relevant to this group as the out-of-band example.

 Cheers,
 Silvia.



 On Tue, Dec 9, 2008 at 7:33 PM, Robert O'Callahan [EMAIL PROTECTED] wrote:
 On Tue, Dec 9, 2008 at 6:20 PM, Martin Atkins [EMAIL PROTECTED]
 wrote:

 Silvia Pfeiffer wrote:

 I'm interested to hear people's opinions on these ideas. I agree with
 Ralph and think having a simple, explicit mechanism at the html level
 is worthwhile - and very open and explicit to a web author. Having a
 redirection through a ROE-type file on the server is more opaque, but
 maybe more consistent with existing similar approaches as taken by
 RealNetworks in rm files and WindowsMedia files in asx files.


 This (having a separate document that references other streams) is what I
 was thinking of. I guess which is more natural depends on who is doing the
 assembling. If it is the HTML author that takes the individual pieces and
 links them together then doing it in the HTML is probably easiest.


 For what it's worth, loading an intermediate document of some new type which
 references other streams to be loaded adds a lot of complexity to the
 browser implementation. It creates new states that the decoder can be in,
 and introduces new failure modes. It creates new timing issues and possibly
 new security issues.

 Rob
 --
 He was pierced for our transgressions, he was crushed for our iniquities;
 the punishment that brought us peace was upon him, and by his wounds we are
 healed. We all, like sheep, have gone astray, each of us has turned to his
 own way; and the LORD has laid on him the iniquity of us all. [Isaiah
 53:5-6]




Re: [whatwg] Thoughts on video accessibility

2008-12-09 Thread Calogero Alex Baldacchino

Silvia Pfeiffer ha scritto:

On Wed, Dec 10, 2008 at 6:59 AM, Calogero Alex Baldacchino
[EMAIL PROTECTED] wrote:
  

Anyway, the use of subtitles in conjunction with screen readers might be
problematic: a deeper synchronization with the media might be needed in
order to have the text read just during voice pauses, to describe a mute
scene, or to entirely substitute the sound, if the text provides a
translation for the speech (I guess such would be untrivial to do without
putting one's hands inside the media).



I cannot see a problem with conflicts between screen reading a web
page and a video on the web page. A blind user would have turned off
the use of captions by default in his/her browser, since they can hear
very well what is going on, just not see it. As long as the video is
not playing, it is only represented as a video (and maybe a alt text
is read out). When the blind user clicks on the video, audio
annotations will be read out by the screen reader in addition to the
native sound. These would be placed into silence segments.

  


I was thinking on a possible lack of synchronism, with enabled 
annotations, between the screenreader reading them, and the actual 
duration of corresponding silence segments, maybe because of not enough 
brief sentences (e.g. as a consequence of a non well-groomed translation 
in a certain language) and/or a slow reading (depending on the language 
peculiarities, or the user settings, or both, and anyway out of control 
for any UA), resulting in a cross sound between the end part of a read 
out annotation and the beginning of the next non-silence segment, 
perhaps repeatedly during playback. Maybe this is a borderline case.



In the case of a video with a non-native language sound track, it's a
bit more complicated. The native sound would need to be turned off and
the screenreader would need to read out the subtitles in the user's
native language as well as the audio annotations in the breaks. This
many not be easy to set up through preferences in the Web browser, but
it should be possible for the user to manually select the right tracks
and turn off the video sound.

Regards,
Silvia.
  
If the base language of the video, or the provided languages, were 
indicated somewhere, in the metadata or in the enclosing xml file, 
perhaps such a switch might be automated (perhaps the corresponding 
preference might be something like read subtitles when the media does 
not support your language maybe coupled with the option don't read 
subtitles when the media supported language(s) can't be identified.). I 
was also thinking about 'implied' subtitles, such as those showed in a 
film when some characters speak in different language from the base 
language of the rest of the content; in such a case, if distinguishing 
'implied' subtitles were possible somehow, it might be nice to turn down 
(or off, as needed) the volume and let a voice engine to speak them 
aloud. I guess a UA with an embedded voice technology (such as Opera 
Voice, or FireVox), could do a good job and keep audio and video 
synchronized in most cases, but involving an external software (such as 
a screen reader) the scenario might change (usually a screenreader can't 
be fastened or slowed, and stopping it - when reading annotations - 
after having fed some text, if at all possible, might be untrivial -- 
again, I'm not enough inside this stuff, so I can just suppose some 
borderline scenarios). Anyway, your proposal is nice, and, once 
widespread, screen readers developers might choose to provide some kind 
of support for synchronism (if needed to improve accessibility of 
audio/video contents).


Regards, Alex


--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP 
autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Partecipa al concorso Sheba!
* In palio speciali premi e tanti prodotti Sheba per il tuo gatto! Gioca ora e 
vinci!   
* 
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8431d=10-12


Re: [whatwg] Thoughts on video accessibility

2008-12-09 Thread Dave Singer

At 21:33  +1300 9/12/08, Robert O'Callahan wrote:
For what it's worth, loading an intermediate document of some new 
type which references other streams to be loaded adds a lot of 
complexity to the browser implementation. It creates new states that 
the decoder can be in, and introduces new failure modes. It creates 
new timing issues and possibly new security issues.


I'm not sure I agree;  but if you believe that, we should address it 
no matter which way this discussion goes.  It should absolutely be 
possible to reference a SMIL file, or an MP4 or MOV file with 
external data (to give only two examples) from a video or audio 
element, and have the DOM, events, states, and APis work correctly.



Also, I should say that we quite deliberately left off associating 
and synchronizing media from our initial proposal for the media 
elements, for two reasons:

a) we believe that SMIL files should be embeddable;
b) it's an easy line to draw;  you want media integration, use a 
media integration language such as SMIL.  If you start adding some 
integration, it's very hard to know where to stop.


As for user or user-agent supplied subtitles etc., that can (of 
course) be a UA feature or option.  If a unique content ID would help 
find such subtitle files, then I am hoping that the media annotations 
group would come up with a workable scheme (something the music 
industry is still, ahem, struggling with).

--
David Singer
Multimedia Standards, Apple Inc.


[whatwg] Thoughts on video accessibility

2008-12-08 Thread Silvia Pfeiffer
Hi everybody,

For the last 2 months, I have been investigating means of satisfying
video accessibility needs through Ogg in Mozilla/Firefox for HTML5.

You will find a lot of information about our work at
https://wiki.mozilla.org/Accessibility/Video_Accessibility and in the
archives of the Ogg accessibility mailing list at
http://lists.xiph.org/mailman/listinfo/accessibility .

I wanted to give some feedback here on our findings, since some of
them will have an impact on the HTML5 specification.


What are we talking about
---
When I say video accessibility, I'm actually only talking about
time-aligned text formats and not e.g. captions as bitmaps or audio
annotations as wave files.
Since we analysed how to attach time-aligned text formats with video
in a Web Browser, we also did not want to restrict ourselves to only
closed captions and subtitles.
It made sense to extend this to any type of time-aligned text on can
think about, including textual audio annotations (to be consumed by
the blind through a screenreader or braille output), karaoke, speech
bubbles, hyperlinked text annotations, and others. There is a list at
http://wiki.xiph.org/index.php/OggText#Categories_of_Text_Codecs which
gives you a more complete picture.


How is it currently done
---
When looking at the existing situation around time-aligned text for
video, I found a very diverse set ot formats and means of doing it.

First of all, most media players allow you to load a video file and a
caption/subtitle file for it in two separate steps. The reason is that
most subtitles are produced by other people than the original content
and this allows the player to synchronise them together. This is
particularly the case with the vast majority of SRT and SUB subtitle
files, but is also the case for SMIL- and DFXP-based subtitle files.

From a media file format POV, some formats have a means of
multiplexing time-aligned text into the format, e.g. QuickTime has
QTText and Flash has cuepoints. Others prefer to use external
references, e.g. WindowsMedia and SAMI or SMIL files, RealMedia and
SMIL files.

For mobile applications, a subset of DFXP has been defined in 3GPP
TimedText, which is actually being encapsulated into QuickTime QTText
using some extensions, and can be encapsulated into MP4 using the
MPEG-4 TTXT specification.

As can be seen, the current situation is such that time-aligned text
is being handled both in-stream and out-of-band and there are indeed
requirements for both situations.


Requirements
---
Not to go into much detail here, but I have seen extensive arguments
made on both sides of the equation for and against in-stream text
tracks.
One particular argument for in-stream text is that of downloading the
video from some place and keeping all its information together in one
file such that when it is distributed again, it retains that
information.
One particular argument for out-of-band text is the ability to add
text tracks at a later stage, from another site, and even from a web
service (e.g. a translation web service that uses an existing caption
file and translates it into another language).
In view of these requirements, I strongly believe we need to enable
people to do both: provide time-aligned text through
external/out-of-band resources and through in-stream, where the
container format allows this.


Proposal for out-of-band approach
--
I'd like to stimulate a discussion here about how we can support
out-of-band time-aligned text for video in HTML5.
I have seen previous proposals, such as the track element at
http://esw.w3.org/topic/HTML/MultimediaAccessibilty#head-a83ba3666e7a437bf966c6bb210cec392dc6ca53
and would like to propose the following specification.

Take this as an example:

video src=http://example.com/video.ogv; controls
 text category=CC lang=en type=text/x-srt src=caption.srt/text
 text category=SUB lang=de type=application/ttaf+xml
src=german.dfxp/text
 text category=SUB lang=jp type=application/smil
src=japanese.smil/text
 text category=SUB lang=fr type=text/x-srt
src=translation_webservice/fr/caption.srt/text
/video

* text elements are subelements of the video element and therefore
clearly related to one video (even if it comes in different formats).
[BTW: I'm happy to rename this to textarea or whatever else people
prefer to call it].

* the category tag (could also be renamed role if we prefer)
allows us to specify what text category we are dealing with and allows
the web browser to determine how to display it (there would be default
display for the different categories and css would allow to override
these).

* the lang tag would allow the specification of alternative
resources based on language, which allows the browser to select one by
default based on browser preferences, and also to turn those tracks on
by default that a particular user requires (e.g. because they are
blind and 

Re: [whatwg] Thoughts on video accessibility

2008-12-08 Thread Martin Atkins

Silvia Pfeiffer wrote:


Take this as an example:

video src=http://example.com/video.ogv; controls
 text category=CC lang=en type=text/x-srt src=caption.srt/text
 text category=SUB lang=de type=application/ttaf+xml
src=german.dfxp/text
 text category=SUB lang=jp type=application/smil
src=japanese.smil/text
 text category=SUB lang=fr type=text/x-srt
src=translation_webservice/fr/caption.srt/text
/video



Could this combining of resources be achieved instead with SMIL or some 
other existing format?


If there is already a format for doing this then I think HTML should 
avoid re-inventing it unless HTML's version is better in some way.


On the other hand, if what is invented for HTML is indeed better in some 
way, it's likely to also be valuable outside of HTML, for example in 
situations where SMIL is used today. (For example, loading a video and 
its subtitles directly into a standalone player without needing to 
manually load both streams.)


What are the advantages of doing this directly in HTML rather than 
having the src attribute point at some sort of compound media document?





Re: [whatwg] Thoughts on video accessibility

2008-12-08 Thread Ralph Giles
On Mon, Dec 8, 2008 at 6:08 PM, Martin Atkins [EMAIL PROTECTED] wrote:

 What are the advantages of doing this directly in HTML rather than having
 the src attribute point at some sort of compound media document?

The general point here is that subtitle data is in current practice
often created and stored in external files. This is, in part, because
of poor support for embedded tracks in web video applications, but
also arises naturally in production workflow. Moreover, because they
are text, subtitle data is much more likely to be stored in a database
with other text-based content while audio and video is treated as
binary blobs. This scheme is intended to support such hybrid systems.

There is generally a tension between authors wanting to easily
manipulate and add tracks, users wanting a self-contained file, and
search engines wanting stand-alone access to just the text. Because
splitting and merging media files requires special tools, our thinking
in the Ogg accessibility group has been that we need to support both
embedded and external references for text tracks in html. Users (and
their tools) can then choose what methods they want to use in
particular circumstances.

We're also interested in a more sophisticated mechanism for
communicating track assortments between a server and a client, but in
the particular case of text tracks for accessiblity, I think having a
simple, explicit mechanism at the html level is worthwhile.

 -r


Re: [whatwg] Thoughts on video accessibility

2008-12-08 Thread Silvia Pfeiffer
On Tue, Dec 9, 2008 at 1:08 PM, Martin Atkins [EMAIL PROTECTED] wrote:
 Silvia Pfeiffer wrote:

 Take this as an example:

 video src=http://example.com/video.ogv; controls
  text category=CC lang=en type=text/x-srt src=caption.srt/text
  text category=SUB lang=de type=application/ttaf+xml
 src=german.dfxp/text
  text category=SUB lang=jp type=application/smil
 src=japanese.smil/text
  text category=SUB lang=fr type=text/x-srt
 src=translation_webservice/fr/caption.srt/text
 /video


 Could this combining of resources be achieved instead with SMIL or some
 other existing format?


So, are you suggesting to use something like this:

video srcdesc=http://example.com/video.smil; controls
/video

where the Web client would retrieve the smil file and find all the
references to actual resources inside the SMIL file, then do another
retrieval action to actually retrieve the data it wants?

This is indeed an alternative, which would require to have a smil file
specification that describes the composition of tracks of a single
linear video. It is indeed what we have experimented with in the Ogg
community and have come up with ROE
(http://wiki.xiph.org/index.php/ROE).

video roe=http://example.com/video.xml; controls
/video

When we defined ROE, we were trying to use a tightly defined subpart
of SMIL for it. This however did not work, because some of the
required attributes do not exist in SMIL (e.g. profile, category,
distinction, inline), SMIL was too expressive (e.g. needed to
explicitly separate audio, video, when mediaSource will do fine) and
SMIL required the use of other elements that were really unnecessary.
So, instead of butchering up a sub-version of SMIL that would work
(and look really ugly), we defined a new xml specification that would
satisfy the exact requirements we had.


 If there is already a format for doing this then I think HTML should avoid
 re-inventing it unless HTML's version is better in some way.

I think both have their uses.

We are using the ROE file to describe the (possibly only virtually
existing) media resource on the server. It gives the Web client an
opportunity to request a media resource with only a particular set of
tracks (allows for content adaptation). This results in a single media
file, dynamically created on the Web server, delivered in one
connection, and decoded by the Web browser into its constituent
tracks, which is each displayed in a different, but temporally
synchronised means.

In contrast, the proposed html5 solution requires the Web brower to
set up multiple connections, one each to the resources that it
requires. The decoding and display is then dependent on multiple
connections having delivered enough data to provide for a synchronised
playback. It also allows downloading the full text files first and
display some text ahead of time (as is usual e.g. in a transcript),
while in a multiplexed file the text data is often only retrieved
consecutively in sync with the decoding of the a+v tracks.


What are the advantages of doing this directly in HTML rather than having the 
src attribute point at some sort of compound media
 document?

I guess, an argument can be made that a user agent could use ROE to
get to the individual streams and download the resources in multiple
connections itself, which would have the exact same effect as the
proposed HTML5 syntax. ROE currently goes beyond just text tracks and
allows description of multiple media and text tracks. You however
wouldn't want a Web browser to have to create multiple connections to
different audio and video resources and have to synchronise them
locally. Text is different in this respect, because it's with almost
certainty a small enough file to be fully received before even the
beginning of a video file has loaded. So, if we used ROE for such a
content selection task, I would courage to only use it for text
tracks.


I'm interested to hear people's opinions on these ideas. I agree with
Ralph and think having a simple, explicit mechanism at the html level
is worthwhile - and very open and explicit to a web author. Having a
redirection through a ROE-type file on the server is more opaque, but
maybe more consistent with existing similar approaches as taken by
RealNetworks in rm files and WindowsMedia files in asx files.

Cheers,
Silvia.


Re: [whatwg] Thoughts on video accessibility

2008-12-08 Thread Martin Atkins

Silvia Pfeiffer wrote:


I'm interested to hear people's opinions on these ideas. I agree with
Ralph and think having a simple, explicit mechanism at the html level
is worthwhile - and very open and explicit to a web author. Having a
redirection through a ROE-type file on the server is more opaque, but
maybe more consistent with existing similar approaches as taken by
RealNetworks in rm files and WindowsMedia files in asx files.



This (having a separate document that references other streams) is what 
I was thinking of. I guess which is more natural depends on who is doing 
the assembling. If it is the HTML author that takes the individual 
pieces and links them together then doing it in the HTML is probably 
easiest.


My concern is that if the only thing linking the various streams 
together is the HTML document then the streams are less useful outside 
of a web browser context. If there is a separate resource containing the 
description of how to assemble the result from multiple resources then 
this resource will be useful to non-browser video playback clients. If 
an existing format is used then it can be linked to as fallback for 
users of downlevel browsers and will hopefully open in a standalone 
video player. If the only linking information is in the HTML document 
then the best you can do as fallback is link to the video stream, 
requiring the user to go find the text streams and load them manually.






Re: [whatwg] Thoughts on video accessibility

2008-12-08 Thread Nils Dagsson Moskopp
Am Montag, den 08.12.2008, 21:20 -0800 schrieb Martin Atkins:
 My concern is that if the only thing linking the various streams 
 together is the HTML document then the streams are less useful outside 
 of a web browser context. If there is a separate resource containing the 
 description of how to assemble the result from multiple resources then 
 this resource will be useful to non-browser video playback clients. If 
 an existing format is used then it can be linked to as fallback for 
 users of downlevel browsers and will hopefully open in a standalone 
 video player. If the only linking information is in the HTML document 
 then the best you can do as fallback is link to the video stream, 
 requiring the user to go find the text streams and load them manually.
Funny, I just recently talked with someone about that. He suggested
something along a DNS for subtitles, e.g. having a hash value / UUID
embedded inside the stream and looking that up. So for example
urn:caption:dd23d31a1158052b4e68899e1a991df102d82e52/de could hold
German annstations and subtitles for the media file with that hash.

-- 
Nils Dagsson Moskopp
http://dieweltistgarnichtso.net