Re: [whatwg] On implementing videos with multiple tracks in HTML5

2012-04-30 Thread Ian Hickson
On Fri, 20 Aug 2010, Silvia Pfeiffer wrote:

 Three issues I have taken out of this discussion that I think are still 
 open to discuss and potentially define in the spec:
 
 * How to expose in-band extra audio and video tracks from a multi-track 
 media resource to the Web browser? I am particularly thinking here about 
 the use cases Lachlan mentioned: offering stereo and surround sound 
 alternatives, audio descriptions, audio commentaries or multiple 
 languages, and would like to add sign language tracks to this list. This 
 is important to solve now, since it will allow the use of audio 
 descriptions and sign language, two important accessibility 
 requirements.

I think this is now resolved. Let me know if there's still something open 
here.


 * How to associate and expose such extra audio and video tracks that are 
 provided out-of-band to the Web browser? This is probably a next-version 
 issue since it's rather difficult to implement in the browser. It 
 improves on meeting accessibility needs, but it doesn't stand in the way 
 of providing audio descriptions and sign language - just makes it easier 
 to use them.

I'm not sure what you mean here.


 * Whether to include a multiplexed download functionality in browsers 
 for media resources, where the browser would do the multiplexing of the 
 active media resource with all the active text, audio and video tracks? 
 This could be a context menu functionality, so is probably not so much a 
 need to include in the HTML5 spec, but it's something that browsers can 
 consider to provide. And since muxing isn't quite as difficult a 
 functionality as e.g. decoding video, it could actually be fairly cheap 
 to implement.

I agree that this seems out of scope for the spec.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] On implementing videos with multiple tracks in HTML5

2012-04-30 Thread Silvia Pfeiffer
On Tue, May 1, 2012 at 2:27 PM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 20 Aug 2010, Silvia Pfeiffer wrote:

 Three issues I have taken out of this discussion that I think are still
 open to discuss and potentially define in the spec:

 * How to expose in-band extra audio and video tracks from a multi-track
 media resource to the Web browser? I am particularly thinking here about
 the use cases Lachlan mentioned: offering stereo and surround sound
 alternatives, audio descriptions, audio commentaries or multiple
 languages, and would like to add sign language tracks to this list. This
 is important to solve now, since it will allow the use of audio
 descriptions and sign language, two important accessibility
 requirements.

 I think this is now resolved. Let me know if there's still something open
 here.

Ha, yes! 21 months later and it's indeed solved through the same
mechanism that synchronisation of multiple audio/video tracks is
solved.


 * How to associate and expose such extra audio and video tracks that are
 provided out-of-band to the Web browser? This is probably a next-version
 issue since it's rather difficult to implement in the browser. It
 improves on meeting accessibility needs, but it doesn't stand in the way
 of providing audio descriptions and sign language - just makes it easier
 to use them.

 I'm not sure what you mean here.

It was the difference between in-band tracks and separate files. Also
solved by now.


 * Whether to include a multiplexed download functionality in browsers
 for media resources, where the browser would do the multiplexing of the
 active media resource with all the active text, audio and video tracks?
 This could be a context menu functionality, so is probably not so much a
 need to include in the HTML5 spec, but it's something that browsers can
 consider to provide. And since muxing isn't quite as difficult a
 functionality as e.g. decoding video, it could actually be fairly cheap
 to implement.

 I agree that this seems out of scope for the spec.

Thread closed. :-)

Cheers,
Silvia.


Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-08-23 Thread Eric Carlson

On Aug 20, 2010, at 5:53 PM, Silvia Pfeiffer wrote:

 On Sat, Aug 21, 2010 at 10:03 AM, Eric Carlson eric.carl...@apple.com wrote:
 
 On Aug 19, 2010, at 5:23 PM, Silvia Pfeiffer wrote:
 
 
  * Whether to include a multiplexed download functionality in browsers for 
  media resources, where the browser would do the multiplexing of the active 
  media resource with all the active text, audio and video tracks? This could 
  be a context menu functionality, so is probably not so much a need to 
  include in the HTML5 spec, but it's something that browsers can consider to 
  provide. And since muxing isn't quite as difficult a functionality as e.g. 
  decoding video, it could actually be fairly cheap to implement.
 
 
  I don't understand what you mean here, can you explain?
 
  
 
 Sure. What I mean is: you get a video resource through the video element 
 and a list of text resources through the track element. If I as a user want 
 to take away (i.e. download and share with friends) the video file with the 
 text tracks that I have activated and am currently watching, then I'd want a 
 download feature that allows me to download a single multiplexed video file 
 with all the text tracks inside. Something like a MPEG-4 file with the 
 track resources encoded into, say, 3GPP-TT. Or a WebM with WebSRT encoded 
 (if there will be such a mapping). Or a Ogg file with WebSRT - maybe encoded 
 in Kate or natively.
 
 The simplest implementation of such a functionality is of course where the 
 external text track totally matches the format used in the media resource for 
 encoding text. Assuming WebM will have such a thing as a WebSRT track, the 
 download functionality would then consist of multiplexing a new WebM 
 resource by re-using the original WebM resource and including the WebSRT 
 tracks into that. It wouldn't require new video and audio encoding, since 
 it's just a matter of a different multiplexed container. If transcoding to 
 the text format in the native container is required, then it's a bit more 
 complex, but no less so than what we need to do for extracting such data into 
 a Web page for the JavaScript API (it's in fact the inverse of that 
 operation).
 
 So, I wouldn't think it's a very complex functionality, but it certainly 
 seems to be outside the HTML spec and a browser feature, possibly at first 
 even a browser plugin. Sorry if this is now off topic. :-)
 
  Even in the hypothetical case where the external text track is already in a 
format supported by the media container file, saving will require the UA to 
regenerate the movie's table of contents (eg. the 'moov' atom in MPEG-4 or 
QuickTime files, Meta Seek Information in a WebM file) as well as muxing the 
text track with the other media data. 

  As you note transcoding is a bit more complex, especially in the case where 
a feature in the text track format is not supported by the text format of the 
native container.

  Further, what should a UA do in the case where the native container format 
doesn't support any form of text track - eg. mp3, WAVE, etc?

  I disagree that it is not a complex feature, but I do agree that it is 
outside of the scope of the HTML spec.

eric



Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-08-23 Thread Silvia Pfeiffer
On Tue, Aug 24, 2010 at 1:07 AM, Eric Carlson eric.carl...@apple.comwrote:


 On Aug 20, 2010, at 5:53 PM, Silvia Pfeiffer wrote:

 On Sat, Aug 21, 2010 at 10:03 AM, Eric Carlson eric.carl...@apple.comwrote:


 On Aug 19, 2010, at 5:23 PM, Silvia Pfeiffer wrote:

 
  * Whether to include a multiplexed download functionality in browsers
 for media resources, where the browser would do the multiplexing of the
 active media resource with all the active text, audio and video tracks? This
 could be a context menu functionality, so is probably not so much a need to
 include in the HTML5 spec, but it's something that browsers can consider to
 provide. And since muxing isn't quite as difficult a functionality as e.g.
 decoding video, it could actually be fairly cheap to implement.
 

   I don't understand what you mean here, can you explain?




 Sure. What I mean is: you get a video resource through the video element
 and a list of text resources through the track element. If I as a user
 want to take away (i.e. download and share with friends) the video file with
 the text tracks that I have activated and am currently watching, then I'd
 want a download feature that allows me to download a single multiplexed
 video file with all the text tracks inside. Something like a MPEG-4 file
 with the track resources encoded into, say, 3GPP-TT. Or a WebM with WebSRT
 encoded (if there will be such a mapping). Or a Ogg file with WebSRT - maybe
 encoded in Kate or natively.

 The simplest implementation of such a functionality is of course where the
 external text track totally matches the format used in the media resource
 for encoding text. Assuming WebM will have such a thing as a WebSRT track,
 the download functionality would then consist of multiplexing a new WebM
 resource by re-using the original WebM resource and including the WebSRT
 tracks into that. It wouldn't require new video and audio encoding, since
 it's just a matter of a different multiplexed container. If transcoding to
 the text format in the native container is required, then it's a bit more
 complex, but no less so than what we need to do for extracting such data
 into a Web page for the JavaScript API (it's in fact the inverse of that
 operation).

 So, I wouldn't think it's a very complex functionality, but it certainly
 seems to be outside the HTML spec and a browser feature, possibly at first
 even a browser plugin. Sorry if this is now off topic. :-)

   Even in the hypothetical case where the external text track is already in
 a format supported by the media container file, saving will require the UA
 to regenerate the movie's table of contents (eg. the 'moov' atom in MPEG-4
 or QuickTime files, Meta Seek Information in a WebM file) as well as muxing
 the text track with the other media data.

   As you note transcoding is a bit more complex, especially in the case
 where a feature in the text track format is not supported by the text format
 of the native container.

   Further, what should a UA do in the case where the native container
 format doesn't support any form of text track - eg. mp3, WAVE, etc?



Well, for those you cannot obviously expect a single download file. Maybe it
makes sense to just have a download functionality that allows downloading
the text tracks as well. I was just following on from some of the user
requirements raised in this thread.



   I disagree that it is not a complex feature, but I do agree that it is
 outside of the scope of the HTML spec.



I guess the complexity really depends on the format in use. For Ogg there is
plenty of software available to demux and remux bitstreams, which is the
main functionality here. The second part is the encoding of the text track,
which again for Ogg has separate tools and libraries. After downloading the
text tracks and the Ogg file separately, I would find it very easy to create
a new multiplexed file using all these tools. That's where my judgement of
simple came from. But you are probably right and it's a lot more
complicated for other formats.

Enough off-topic brainstorming. ;-) I think we have more important things to
solve right now.

Cheers,
Silvia.


Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-08-20 Thread Eric Carlson

On Aug 19, 2010, at 5:23 PM, Silvia Pfeiffer wrote:

 
 * Whether to include a multiplexed download functionality in browsers for 
 media resources, where the browser would do the multiplexing of the active 
 media resource with all the active text, audio and video tracks? This could 
 be a context menu functionality, so is probably not so much a need to include 
 in the HTML5 spec, but it's something that browsers can consider to provide. 
 And since muxing isn't quite as difficult a functionality as e.g. decoding 
 video, it could actually be fairly cheap to implement.
 

  I don't understand what you mean here, can you explain?

  Thanks,

eric




Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-08-20 Thread Silvia Pfeiffer
On Sat, Aug 21, 2010 at 10:03 AM, Eric Carlson eric.carl...@apple.comwrote:


 On Aug 19, 2010, at 5:23 PM, Silvia Pfeiffer wrote:

 
  * Whether to include a multiplexed download functionality in browsers for
 media resources, where the browser would do the multiplexing of the active
 media resource with all the active text, audio and video tracks? This could
 be a context menu functionality, so is probably not so much a need to
 include in the HTML5 spec, but it's something that browsers can consider to
 provide. And since muxing isn't quite as difficult a functionality as e.g.
 decoding video, it could actually be fairly cheap to implement.
 

   I don't understand what you mean here, can you explain?




Sure. What I mean is: you get a video resource through the video element
and a list of text resources through the track element. If I as a user
want to take away (i.e. download and share with friends) the video file with
the text tracks that I have activated and am currently watching, then I'd
want a download feature that allows me to download a single multiplexed
video file with all the text tracks inside. Something like a MPEG-4 file
with the track resources encoded into, say, 3GPP-TT. Or a WebM with WebSRT
encoded (if there will be such a mapping). Or a Ogg file with WebSRT - maybe
encoded in Kate or natively.

The simplest implementation of such a functionality is of course where the
external text track totally matches the format used in the media resource
for encoding text. Assuming WebM will have such a thing as a WebSRT track,
the download functionality would then consist of multiplexing a new WebM
resource by re-using the original WebM resource and including the WebSRT
tracks into that. It wouldn't require new video and audio encoding, since
it's just a matter of a different multiplexed container. If transcoding to
the text format in the native container is required, then it's a bit more
complex, but no less so than what we need to do for extracting such data
into a Web page for the JavaScript API (it's in fact the inverse of that
operation).

So, I wouldn't think it's a very complex functionality, but it certainly
seems to be outside the HTML spec and a browser feature, possibly at first
even a browser plugin. Sorry if this is now off topic. :-)

Cheers,
Silvia.


Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-08-19 Thread Ian Hickson
On Sat, 22 May 2010, Carlos Andr�s Sol�s wrote:
 
 Imagine a hypothetical website that delivers videos in multiple 
 languages. Like on a DVD, where you can choose your audio and subtitles 
 language. And also imagine there is the possibility of downloading a 
 file with the video, along with either the chosen audio/sub tracks, or 
 all of them at once. Right now, though, there's no way to deliver 
 multiple audio and subtitle streams on HTML5 and WebM. Since the latter 
 supports only one audio and one video track, with no embedded subtitles, 
 creating a file with multiple tracks is impossible, unless using full 
 Matroska instead of WebM - save for the fact that the standard proposed 
 is WebM and not Matroska.

 A solution could be to stream the full Matroska with all tracks 
 embedded. This, though, would be inefficient, since the user often will 
 select only one language to view the video, and there's no way yet to 
 stream only the selected tracks to the user. I have thought of two 
 solutions for this:

 * Solution 1: Server-side demuxing. The video with all tracks is stored 
 as a Matroska file. The server demuxes the file, generates a new one 
 with the chosen tracks, and streams only the tracks chosen by the user. 
 When the user chooses to download the full video, the full Matroska file 
 is downloaded with no overhead. The downside is the server-side demuxing 
 and remuxing; fortunately most users only need to choose once. Also, 
 there's the problem of having to download the full file instead of a 
 file with only the tracks wanted; this could be solved by even more 
 muxing.

On Sun, 23 May 2010, Silvia Pfeiffer wrote:
 
 For the last 10 years, we have tried to solve many of the media 
 challenges on servers, making servers increasingly intelligent, and by 
 that slow, and not real HTTP servers any more. Much of that happened in 
 proprietary software, but others tried it with open software, too. For 
 example I worked on a project called Annodex which was trying to make 
 open media resources available on normal HTTP servers with only a cgi 
 script installed that would allow remuxing files for serving time 
 segments of the media resources. Or look at any of the open source RTSP 
 streaming servers that were created.
 
 We have learnt in the last 10 years that the Web is better served with a 
 plain HTTP server than with custom media servers and we have started 
 putting the intelligence into user agents instead. User agents now know 
 how to do byte range requests to retrieve temporal segments of a media 
 resource. I believe for certain formats it's even possible to retrieve 
 tracks through byte range requests only.
 
 In short, the biggest problem with your idea of dynamic muxing on a 
 server is that it's very CPU intensive and doesn't lead easily to a 
 scalable server. Also, it leads to specialised media servers in contrast 
 to just using a simple HTTP server. It's possible, of course, but it's 
 complex and not general-purpose.

On Mon, 31 May 2010, Lachlan Hunt wrote:
 
 WebM, just like Matroska, certainly does support multiple video and 
 audio tracks.  The current limitation is that browser implementations 
 don't yet provide an interface or API for track selection.
 
 Whether or not authors would actually do this depends on their use case 
 and what trade offs they're willing to make.  The use cases I'm aware of 
 for multiple tracks include offering stereo and surround sound 
 alternatives, audio descripitons, audio commentaries or multiple 
 languages.
 
 The trade off here is in bandwidth usage vs. storage space (or 
 processing time if you're doing dynamic server side muxing). Duplicating 
 the video track in each file, containing only a single audio track saves 
 bandwidth for users while increasing storage space. Storing all audio 
 tracks in one multi-track webm file avoids duplication, while increasing 
 the bandwidth for users downloading tracks they may not need.
 
 The latter theoretically allows for the user to dynamically switch audio 
 tracks to, e.g. change language or listen to commentary, without having 
 to download a whole new copy of the video.  The former requires the user 
 to choose which tracks they want prior to downloading the appropriate 
 file.
 
 If there's only a choice between 2 or maybe 3 tracks, then the extra 
 bandwidth may be insignificant.  If, however, you're offering several 
 alternate languages in both stereo and surround sound, with audio 
 descriptions and directors commentary — the kind of stuff you'll find 
 on many commercial DVDs — then the extra bandwidth wasted by users 
 downloading so many tracks they don't need may not be worth it.

On Sat, 22 May 2010, Carlos Andr�s Sol�s wrote:

 * Solution 2: User-side muxing. Each track (video, audio, subtitles) is 
 stored in standalone files. The server streams the tracks chosen by the 
 user, and the web browser muxes them back. When the user chooses to 
 download the video, the 

Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-08-19 Thread Silvia Pfeiffer
On Fri, Aug 20, 2010 at 9:58 AM, Ian Hickson i...@hixie.ch wrote:

 On Sat, 22 May 2010, Carlos Andrés Solís wrote:
 
  Imagine a hypothetical website that delivers videos in multiple
  languages. Like on a DVD, where you can choose your audio and subtitles
  language. And also imagine there is the possibility of downloading a
  file with the video, along with either the chosen audio/sub tracks, or
  all of them at once. Right now, though, there's no way to deliver
  multiple audio and subtitle streams on HTML5 and WebM. Since the latter
  supports only one audio and one video track, with no embedded subtitles,
  creating a file with multiple tracks is impossible, unless using full
  Matroska instead of WebM - save for the fact that the standard proposed
  is WebM and not Matroska.
 
  A solution could be to stream the full Matroska with all tracks
  embedded. This, though, would be inefficient, since the user often will
  select only one language to view the video, and there's no way yet to
  stream only the selected tracks to the user. I have thought of two
  solutions for this:
 
  * Solution 1: Server-side demuxing. The video with all tracks is stored
  as a Matroska file. The server demuxes the file, generates a new one
  with the chosen tracks, and streams only the tracks chosen by the user.
  When the user chooses to download the full video, the full Matroska file
  is downloaded with no overhead. The downside is the server-side demuxing
  and remuxing; fortunately most users only need to choose once. Also,
  there's the problem of having to download the full file instead of a
  file with only the tracks wanted; this could be solved by even more
  muxing.

 On Sun, 23 May 2010, Silvia Pfeiffer wrote:
 
  For the last 10 years, we have tried to solve many of the media
  challenges on servers, making servers increasingly intelligent, and by
  that slow, and not real HTTP servers any more. Much of that happened in
  proprietary software, but others tried it with open software, too. For
  example I worked on a project called Annodex which was trying to make
  open media resources available on normal HTTP servers with only a cgi
  script installed that would allow remuxing files for serving time
  segments of the media resources. Or look at any of the open source RTSP
  streaming servers that were created.
 
  We have learnt in the last 10 years that the Web is better served with a
  plain HTTP server than with custom media servers and we have started
  putting the intelligence into user agents instead. User agents now know
  how to do byte range requests to retrieve temporal segments of a media
  resource. I believe for certain formats it's even possible to retrieve
  tracks through byte range requests only.
 
  In short, the biggest problem with your idea of dynamic muxing on a
  server is that it's very CPU intensive and doesn't lead easily to a
  scalable server. Also, it leads to specialised media servers in contrast
  to just using a simple HTTP server. It's possible, of course, but it's
  complex and not general-purpose.

 On Mon, 31 May 2010, Lachlan Hunt wrote:
 
  WebM, just like Matroska, certainly does support multiple video and
  audio tracks.  The current limitation is that browser implementations
  don't yet provide an interface or API for track selection.
 
  Whether or not authors would actually do this depends on their use case
  and what trade offs they're willing to make.  The use cases I'm aware of
  for multiple tracks include offering stereo and surround sound
  alternatives, audio descripitons, audio commentaries or multiple
  languages.
 
  The trade off here is in bandwidth usage vs. storage space (or
  processing time if you're doing dynamic server side muxing). Duplicating
  the video track in each file, containing only a single audio track saves
  bandwidth for users while increasing storage space. Storing all audio
  tracks in one multi-track webm file avoids duplication, while increasing
  the bandwidth for users downloading tracks they may not need.
 
  The latter theoretically allows for the user to dynamically switch audio
  tracks to, e.g. change language or listen to commentary, without having
  to download a whole new copy of the video.  The former requires the user
  to choose which tracks they want prior to downloading the appropriate
  file.
 
  If there's only a choice between 2 or maybe 3 tracks, then the extra
  bandwidth may be insignificant.  If, however, you're offering several
  alternate languages in both stereo and surround sound, with audio
  descriptions and directors commentary — the kind of stuff you'll find
  on many commercial DVDs — then the extra bandwidth wasted by users
  downloading so many tracks they don't need may not be worth it.

 On Sat, 22 May 2010, Carlos Andrés Solís wrote:
 
  * Solution 2: User-side muxing. Each track (video, audio, subtitles) is
  stored in standalone files. The server streams the tracks chosen by the
  user, and 

Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-05-31 Thread Lachlan Hunt

On 2010-05-23 05:40, Carlos Andrés Solís wrote:

Imagine a hypothetical website that delivers videos in multiple languages.
Like on a DVD, where you can choose your audio and subtitles language. And
also imagine there is the possibility of downloading a file with the video,
along with either the chosen audio/sub tracks, or all of them at once. Right
now, though, there's no way to deliver multiple audio and subtitle streams
on HTML5 and WebM. Since the latter supports only one audio and one video
track,


WebM, just like Matroska, certainly does support multiple video and 
audio tracks.  The current limitation is that browser implementations 
don't yet provide an interface or API for track selection.


Whether or not authors would actually do this depends on their use case 
and what trade offs they're willing to make.  The use cases I'm aware of 
for multiple tracks include offering stereo and surround sound 
alternatives, audio descripitons, audio commentaries or multiple languages.


The trade off here is in bandwidth usage vs. storage space (or 
processing time if you're doing dynamic server side muxing). 
Duplicating the video track in each file, containing only a single audio 
track saves bandwidth for users while increasing storage space. Storing 
all audio tracks in one multi-track webm file avoids duplication, while 
increasing the bandwidth for users downloading tracks they may not need.


The latter theoretically allows for the user to dynamically switch audio 
tracks to, e.g. change language or listen to commentary, without having 
to download a whole new copy of the video.  The former requires the user 
to choose which tracks they want prior to downloading the appropriate file.


If there's only a choice between 2 or maybe 3 tracks, then the extra 
bandwidth may be insignificant.  If, however, you're offering several 
alternate languages in both stereo and surround sound, with audio 
descriptions and directors commentary — the kind of stuff you'll find on 
many commercial DVDs — then the extra bandwidth wasted by users 
downloading so many tracks they don't need may not be worth it.



with no embedded subtitles,


Timed text tracks within WebM (most likely WebSRT) will eventually be 
supported.


--
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/


Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-05-31 Thread Monty Montgomery
2010/5/31 Lachlan Hunt lachlan.h...@lachy.id.au:

 WebM, just like Matroska, certainly does support multiple video and audio
 tracks.  The current limitation is that browser implementations don't yet
 provide an interface or API for track selection.

It could, but the spec currently explicitly disallows it.  Has that
changed while I was not looking?

Also, Silvia-- one reason Ogg was designed the way it was so that
remuxing was trivial-- it was a simple as deciding what pages to send
out.  The remuxing was trivial (shuffling a deck of cards; the cards
remain unchanged)

Monty


Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-05-31 Thread Monty Montgomery
2010/5/31 Monty Montgomery xiphm...@gmail.com:
 2010/5/31 Lachlan Hunt lachlan.h...@lachy.id.au:

 WebM, just like Matroska, certainly does support multiple video and audio
 tracks.  The current limitation is that browser implementations don't yet
 provide an interface or API for track selection.

 It could, but the spec currently explicitly disallows it.  Has that
 changed while I was not looking?

I just looked through the docs I have, and I'm clearly worng-- none
specify such a restriction.

Monty


Re: [whatwg] On implementing videos with multiple tracks in HTML5

2010-05-23 Thread Silvia Pfeiffer
Hi Carlos,

2010/5/23 Carlos Andrés Solís csol...@gmail.com:
 Hello, I've been writing lately in the WHATWG and WebM mail-lists and would
 like to hear your opinion on the following idea.

 Imagine a hypothetical website that delivers videos in multiple languages.
 Like on a DVD, where you can choose your audio and subtitles language. And
 also imagine there is the possibility of downloading a file with the video,
 along with either the chosen audio/sub tracks, or all of them at once. Right
 now, though, there's no way to deliver multiple audio and subtitle streams
 on HTML5 and WebM. Since the latter supports only one audio and one video
 track, with no embedded subtitles, creating a file with multiple tracks is
 impossible, unless using full Matroska instead of WebM - save for the fact
 that the standard proposed is WebM and not Matroska.
 A solution could be to stream the full Matroska with all tracks embedded.
 This, though, would be inefficient, since the user often will select only
 one language to view the video, and there's no way yet to stream only the
 selected tracks to the user. I have thought of two solutions for this:

 * Solution 1: Server-side demuxing. The video with all tracks is stored as a
 Matroska file. The server demuxes the file, generates a new one with the
 chosen tracks, and streams only the tracks chosen by the user. When the user
 chooses to download the full video, the full Matroska file is downloaded
 with no overhead. The downside is the server-side demuxing and remuxing;
 fortunately most users only need to choose once. Also, there's the problem
 of having to download the full file instead of a file with only the tracks
 wanted; this could be solved by even more muxing.

For the last 10 years, we have tried to solve many of the media
challenges on servers, making servers increasingly intelligent, and by
that slow, and not real HTTP servers any more. Much of that happened
in proprietary software, but others tried it with open software, too.
For example I worked on a project called Annodex which was trying to
make open media resources available on normal HTTP servers with only a
cgi script installed that would allow remuxing files for serving time
segments of the media resources. Or look at any of the open source
RTSP streaming servers that were created.

We have learnt in the last 10 years that the Web is better served with
a plain HTTP server than with custom media servers and we have started
putting the intelligence into user agents instead. User agents now
know how to do byte range requests to retrieve temporal segments of a
media resource. I believe for certain formats it's even possible to
retrieve tracks through byte range requests only.

In short, the biggest problem with your idea of dynamic muxing on a
server is that it's very CPU intensive and doesn't lead easily to a
scalable server. Also, it leads to specialised media servers in
contrast to just using a simple HTTP server. It's possible, of course,
but it's complex and not general-purpose.


 * Solution 2: User-side muxing. Each track (video, audio, subtitles) is
 stored in standalone files. The server streams the tracks chosen by the
 user, and the web browser muxes them back. When the user chooses to download
 the video, the generation of the file can be done either server-side or
 client-side. This can be very dynamic but will force content providers to
 use extra coding inside of the pages.

Again, we've actually tried this over the last 10 years with SMIL.
However, synchronising audio and video that comes from multiple
servers and therefore has different network delays, different
buffering rates, different congestion times, etc. makes it really
difficult to keep multiple media resources in sync.

You don't actually have to rip audio and video apart to achieve what
you're trying to do. Different Websites are created for different
languages, too. So, I would expect that if your Website is in Spanish,
you will get your video with a Spanish audio track, or when it's in
German, your audio will be German. Each one of these is a media
resource with a single audio and a single video track. Yes, your video
track is replicated on the server between these different resources.
But that's probably easier to handle from a production point of view
anyway.

The matter with subtitle / caption tracks is then a separate one. You
could embed all of the subtitle tracks in all the media resources to
make sure that when a file is downloaded, it comes with its
alternative subtitle tracks. That's not actually that huge an
overhead, seeing as text tracks make up the least space compared to
the audio and video data.

Or alternatively you could have the subtitle tracks as extra files.
This is probably the preferred mode of operation and most conformant
with traditional Web principles, seeing as they are text resources and
the best source of information for indexing the content of a media
resource in, e.g. a search engine. Also, such 

[whatwg] On implementing videos with multiple tracks in HTML5

2010-05-22 Thread Carlos Andrés Solís
Hello, I've been writing lately in the WHATWG and WebM mail-lists and would
like to hear your opinion on the following idea.

Imagine a hypothetical website that delivers videos in multiple languages.
Like on a DVD, where you can choose your audio and subtitles language. And
also imagine there is the possibility of downloading a file with the video,
along with either the chosen audio/sub tracks, or all of them at once. Right
now, though, there's no way to deliver multiple audio and subtitle streams
on HTML5 and WebM. Since the latter supports only one audio and one video
track, with no embedded subtitles, creating a file with multiple tracks is
impossible, unless using full Matroska instead of WebM - save for the fact
that the standard proposed is WebM and not Matroska.
A solution could be to stream the full Matroska with all tracks embedded.
This, though, would be inefficient, since the user often will select only
one language to view the video, and there's no way yet to stream only the
selected tracks to the user. I have thought of two solutions for this:
* Solution 1: Server-side demuxing. The video with all tracks is stored as a
Matroska file. The server demuxes the file, generates a new one with the
chosen tracks, and streams only the tracks chosen by the user. When the user
chooses to download the full video, the full Matroska file is downloaded
with no overhead. The downside is the server-side demuxing and remuxing;
fortunately most users only need to choose once. Also, there's the problem
of having to download the full file instead of a file with only the tracks
wanted; this could be solved by even more muxing.
* Solution 2: User-side muxing. Each track (video, audio, subtitles) is
stored in standalone files. The server streams the tracks chosen by the
user, and the web browser muxes them back. When the user chooses to download
the video, the generation of the file can be done either server-side or
client-side. This can be very dynamic but will force content providers to
use extra coding inside of the pages.

Any ideas or suggestions?
- Carlos Solís