Re: [whatwg] On implementing videos with multiple tracks in HTML5
On Fri, 20 Aug 2010, Silvia Pfeiffer wrote: Three issues I have taken out of this discussion that I think are still open to discuss and potentially define in the spec: * How to expose in-band extra audio and video tracks from a multi-track media resource to the Web browser? I am particularly thinking here about the use cases Lachlan mentioned: offering stereo and surround sound alternatives, audio descriptions, audio commentaries or multiple languages, and would like to add sign language tracks to this list. This is important to solve now, since it will allow the use of audio descriptions and sign language, two important accessibility requirements. I think this is now resolved. Let me know if there's still something open here. * How to associate and expose such extra audio and video tracks that are provided out-of-band to the Web browser? This is probably a next-version issue since it's rather difficult to implement in the browser. It improves on meeting accessibility needs, but it doesn't stand in the way of providing audio descriptions and sign language - just makes it easier to use them. I'm not sure what you mean here. * Whether to include a multiplexed download functionality in browsers for media resources, where the browser would do the multiplexing of the active media resource with all the active text, audio and video tracks? This could be a context menu functionality, so is probably not so much a need to include in the HTML5 spec, but it's something that browsers can consider to provide. And since muxing isn't quite as difficult a functionality as e.g. decoding video, it could actually be fairly cheap to implement. I agree that this seems out of scope for the spec. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] On implementing videos with multiple tracks in HTML5
On Tue, May 1, 2012 at 2:27 PM, Ian Hickson i...@hixie.ch wrote: On Fri, 20 Aug 2010, Silvia Pfeiffer wrote: Three issues I have taken out of this discussion that I think are still open to discuss and potentially define in the spec: * How to expose in-band extra audio and video tracks from a multi-track media resource to the Web browser? I am particularly thinking here about the use cases Lachlan mentioned: offering stereo and surround sound alternatives, audio descriptions, audio commentaries or multiple languages, and would like to add sign language tracks to this list. This is important to solve now, since it will allow the use of audio descriptions and sign language, two important accessibility requirements. I think this is now resolved. Let me know if there's still something open here. Ha, yes! 21 months later and it's indeed solved through the same mechanism that synchronisation of multiple audio/video tracks is solved. * How to associate and expose such extra audio and video tracks that are provided out-of-band to the Web browser? This is probably a next-version issue since it's rather difficult to implement in the browser. It improves on meeting accessibility needs, but it doesn't stand in the way of providing audio descriptions and sign language - just makes it easier to use them. I'm not sure what you mean here. It was the difference between in-band tracks and separate files. Also solved by now. * Whether to include a multiplexed download functionality in browsers for media resources, where the browser would do the multiplexing of the active media resource with all the active text, audio and video tracks? This could be a context menu functionality, so is probably not so much a need to include in the HTML5 spec, but it's something that browsers can consider to provide. And since muxing isn't quite as difficult a functionality as e.g. decoding video, it could actually be fairly cheap to implement. I agree that this seems out of scope for the spec. Thread closed. :-) Cheers, Silvia.
Re: [whatwg] On implementing videos with multiple tracks in HTML5
On Aug 20, 2010, at 5:53 PM, Silvia Pfeiffer wrote: On Sat, Aug 21, 2010 at 10:03 AM, Eric Carlson eric.carl...@apple.com wrote: On Aug 19, 2010, at 5:23 PM, Silvia Pfeiffer wrote: * Whether to include a multiplexed download functionality in browsers for media resources, where the browser would do the multiplexing of the active media resource with all the active text, audio and video tracks? This could be a context menu functionality, so is probably not so much a need to include in the HTML5 spec, but it's something that browsers can consider to provide. And since muxing isn't quite as difficult a functionality as e.g. decoding video, it could actually be fairly cheap to implement. I don't understand what you mean here, can you explain? Sure. What I mean is: you get a video resource through the video element and a list of text resources through the track element. If I as a user want to take away (i.e. download and share with friends) the video file with the text tracks that I have activated and am currently watching, then I'd want a download feature that allows me to download a single multiplexed video file with all the text tracks inside. Something like a MPEG-4 file with the track resources encoded into, say, 3GPP-TT. Or a WebM with WebSRT encoded (if there will be such a mapping). Or a Ogg file with WebSRT - maybe encoded in Kate or natively. The simplest implementation of such a functionality is of course where the external text track totally matches the format used in the media resource for encoding text. Assuming WebM will have such a thing as a WebSRT track, the download functionality would then consist of multiplexing a new WebM resource by re-using the original WebM resource and including the WebSRT tracks into that. It wouldn't require new video and audio encoding, since it's just a matter of a different multiplexed container. If transcoding to the text format in the native container is required, then it's a bit more complex, but no less so than what we need to do for extracting such data into a Web page for the JavaScript API (it's in fact the inverse of that operation). So, I wouldn't think it's a very complex functionality, but it certainly seems to be outside the HTML spec and a browser feature, possibly at first even a browser plugin. Sorry if this is now off topic. :-) Even in the hypothetical case where the external text track is already in a format supported by the media container file, saving will require the UA to regenerate the movie's table of contents (eg. the 'moov' atom in MPEG-4 or QuickTime files, Meta Seek Information in a WebM file) as well as muxing the text track with the other media data. As you note transcoding is a bit more complex, especially in the case where a feature in the text track format is not supported by the text format of the native container. Further, what should a UA do in the case where the native container format doesn't support any form of text track - eg. mp3, WAVE, etc? I disagree that it is not a complex feature, but I do agree that it is outside of the scope of the HTML spec. eric
Re: [whatwg] On implementing videos with multiple tracks in HTML5
On Tue, Aug 24, 2010 at 1:07 AM, Eric Carlson eric.carl...@apple.comwrote: On Aug 20, 2010, at 5:53 PM, Silvia Pfeiffer wrote: On Sat, Aug 21, 2010 at 10:03 AM, Eric Carlson eric.carl...@apple.comwrote: On Aug 19, 2010, at 5:23 PM, Silvia Pfeiffer wrote: * Whether to include a multiplexed download functionality in browsers for media resources, where the browser would do the multiplexing of the active media resource with all the active text, audio and video tracks? This could be a context menu functionality, so is probably not so much a need to include in the HTML5 spec, but it's something that browsers can consider to provide. And since muxing isn't quite as difficult a functionality as e.g. decoding video, it could actually be fairly cheap to implement. I don't understand what you mean here, can you explain? Sure. What I mean is: you get a video resource through the video element and a list of text resources through the track element. If I as a user want to take away (i.e. download and share with friends) the video file with the text tracks that I have activated and am currently watching, then I'd want a download feature that allows me to download a single multiplexed video file with all the text tracks inside. Something like a MPEG-4 file with the track resources encoded into, say, 3GPP-TT. Or a WebM with WebSRT encoded (if there will be such a mapping). Or a Ogg file with WebSRT - maybe encoded in Kate or natively. The simplest implementation of such a functionality is of course where the external text track totally matches the format used in the media resource for encoding text. Assuming WebM will have such a thing as a WebSRT track, the download functionality would then consist of multiplexing a new WebM resource by re-using the original WebM resource and including the WebSRT tracks into that. It wouldn't require new video and audio encoding, since it's just a matter of a different multiplexed container. If transcoding to the text format in the native container is required, then it's a bit more complex, but no less so than what we need to do for extracting such data into a Web page for the JavaScript API (it's in fact the inverse of that operation). So, I wouldn't think it's a very complex functionality, but it certainly seems to be outside the HTML spec and a browser feature, possibly at first even a browser plugin. Sorry if this is now off topic. :-) Even in the hypothetical case where the external text track is already in a format supported by the media container file, saving will require the UA to regenerate the movie's table of contents (eg. the 'moov' atom in MPEG-4 or QuickTime files, Meta Seek Information in a WebM file) as well as muxing the text track with the other media data. As you note transcoding is a bit more complex, especially in the case where a feature in the text track format is not supported by the text format of the native container. Further, what should a UA do in the case where the native container format doesn't support any form of text track - eg. mp3, WAVE, etc? Well, for those you cannot obviously expect a single download file. Maybe it makes sense to just have a download functionality that allows downloading the text tracks as well. I was just following on from some of the user requirements raised in this thread. I disagree that it is not a complex feature, but I do agree that it is outside of the scope of the HTML spec. I guess the complexity really depends on the format in use. For Ogg there is plenty of software available to demux and remux bitstreams, which is the main functionality here. The second part is the encoding of the text track, which again for Ogg has separate tools and libraries. After downloading the text tracks and the Ogg file separately, I would find it very easy to create a new multiplexed file using all these tools. That's where my judgement of simple came from. But you are probably right and it's a lot more complicated for other formats. Enough off-topic brainstorming. ;-) I think we have more important things to solve right now. Cheers, Silvia.
Re: [whatwg] On implementing videos with multiple tracks in HTML5
On Aug 19, 2010, at 5:23 PM, Silvia Pfeiffer wrote: * Whether to include a multiplexed download functionality in browsers for media resources, where the browser would do the multiplexing of the active media resource with all the active text, audio and video tracks? This could be a context menu functionality, so is probably not so much a need to include in the HTML5 spec, but it's something that browsers can consider to provide. And since muxing isn't quite as difficult a functionality as e.g. decoding video, it could actually be fairly cheap to implement. I don't understand what you mean here, can you explain? Thanks, eric
Re: [whatwg] On implementing videos with multiple tracks in HTML5
On Sat, Aug 21, 2010 at 10:03 AM, Eric Carlson eric.carl...@apple.comwrote: On Aug 19, 2010, at 5:23 PM, Silvia Pfeiffer wrote: * Whether to include a multiplexed download functionality in browsers for media resources, where the browser would do the multiplexing of the active media resource with all the active text, audio and video tracks? This could be a context menu functionality, so is probably not so much a need to include in the HTML5 spec, but it's something that browsers can consider to provide. And since muxing isn't quite as difficult a functionality as e.g. decoding video, it could actually be fairly cheap to implement. I don't understand what you mean here, can you explain? Sure. What I mean is: you get a video resource through the video element and a list of text resources through the track element. If I as a user want to take away (i.e. download and share with friends) the video file with the text tracks that I have activated and am currently watching, then I'd want a download feature that allows me to download a single multiplexed video file with all the text tracks inside. Something like a MPEG-4 file with the track resources encoded into, say, 3GPP-TT. Or a WebM with WebSRT encoded (if there will be such a mapping). Or a Ogg file with WebSRT - maybe encoded in Kate or natively. The simplest implementation of such a functionality is of course where the external text track totally matches the format used in the media resource for encoding text. Assuming WebM will have such a thing as a WebSRT track, the download functionality would then consist of multiplexing a new WebM resource by re-using the original WebM resource and including the WebSRT tracks into that. It wouldn't require new video and audio encoding, since it's just a matter of a different multiplexed container. If transcoding to the text format in the native container is required, then it's a bit more complex, but no less so than what we need to do for extracting such data into a Web page for the JavaScript API (it's in fact the inverse of that operation). So, I wouldn't think it's a very complex functionality, but it certainly seems to be outside the HTML spec and a browser feature, possibly at first even a browser plugin. Sorry if this is now off topic. :-) Cheers, Silvia.
Re: [whatwg] On implementing videos with multiple tracks in HTML5
On Sat, 22 May 2010, Carlos Andr�s Sol�s wrote: Imagine a hypothetical website that delivers videos in multiple languages. Like on a DVD, where you can choose your audio and subtitles language. And also imagine there is the possibility of downloading a file with the video, along with either the chosen audio/sub tracks, or all of them at once. Right now, though, there's no way to deliver multiple audio and subtitle streams on HTML5 and WebM. Since the latter supports only one audio and one video track, with no embedded subtitles, creating a file with multiple tracks is impossible, unless using full Matroska instead of WebM - save for the fact that the standard proposed is WebM and not Matroska. A solution could be to stream the full Matroska with all tracks embedded. This, though, would be inefficient, since the user often will select only one language to view the video, and there's no way yet to stream only the selected tracks to the user. I have thought of two solutions for this: * Solution 1: Server-side demuxing. The video with all tracks is stored as a Matroska file. The server demuxes the file, generates a new one with the chosen tracks, and streams only the tracks chosen by the user. When the user chooses to download the full video, the full Matroska file is downloaded with no overhead. The downside is the server-side demuxing and remuxing; fortunately most users only need to choose once. Also, there's the problem of having to download the full file instead of a file with only the tracks wanted; this could be solved by even more muxing. On Sun, 23 May 2010, Silvia Pfeiffer wrote: For the last 10 years, we have tried to solve many of the media challenges on servers, making servers increasingly intelligent, and by that slow, and not real HTTP servers any more. Much of that happened in proprietary software, but others tried it with open software, too. For example I worked on a project called Annodex which was trying to make open media resources available on normal HTTP servers with only a cgi script installed that would allow remuxing files for serving time segments of the media resources. Or look at any of the open source RTSP streaming servers that were created. We have learnt in the last 10 years that the Web is better served with a plain HTTP server than with custom media servers and we have started putting the intelligence into user agents instead. User agents now know how to do byte range requests to retrieve temporal segments of a media resource. I believe for certain formats it's even possible to retrieve tracks through byte range requests only. In short, the biggest problem with your idea of dynamic muxing on a server is that it's very CPU intensive and doesn't lead easily to a scalable server. Also, it leads to specialised media servers in contrast to just using a simple HTTP server. It's possible, of course, but it's complex and not general-purpose. On Mon, 31 May 2010, Lachlan Hunt wrote: WebM, just like Matroska, certainly does support multiple video and audio tracks. The current limitation is that browser implementations don't yet provide an interface or API for track selection. Whether or not authors would actually do this depends on their use case and what trade offs they're willing to make. The use cases I'm aware of for multiple tracks include offering stereo and surround sound alternatives, audio descripitons, audio commentaries or multiple languages. The trade off here is in bandwidth usage vs. storage space (or processing time if you're doing dynamic server side muxing). Duplicating the video track in each file, containing only a single audio track saves bandwidth for users while increasing storage space. Storing all audio tracks in one multi-track webm file avoids duplication, while increasing the bandwidth for users downloading tracks they may not need. The latter theoretically allows for the user to dynamically switch audio tracks to, e.g. change language or listen to commentary, without having to download a whole new copy of the video. The former requires the user to choose which tracks they want prior to downloading the appropriate file. If there's only a choice between 2 or maybe 3 tracks, then the extra bandwidth may be insignificant. If, however, you're offering several alternate languages in both stereo and surround sound, with audio descriptions and directors commentary — the kind of stuff you'll find on many commercial DVDs — then the extra bandwidth wasted by users downloading so many tracks they don't need may not be worth it. On Sat, 22 May 2010, Carlos Andr�s Sol�s wrote: * Solution 2: User-side muxing. Each track (video, audio, subtitles) is stored in standalone files. The server streams the tracks chosen by the user, and the web browser muxes them back. When the user chooses to download the video, the
Re: [whatwg] On implementing videos with multiple tracks in HTML5
On Fri, Aug 20, 2010 at 9:58 AM, Ian Hickson i...@hixie.ch wrote: On Sat, 22 May 2010, Carlos Andrés Solís wrote: Imagine a hypothetical website that delivers videos in multiple languages. Like on a DVD, where you can choose your audio and subtitles language. And also imagine there is the possibility of downloading a file with the video, along with either the chosen audio/sub tracks, or all of them at once. Right now, though, there's no way to deliver multiple audio and subtitle streams on HTML5 and WebM. Since the latter supports only one audio and one video track, with no embedded subtitles, creating a file with multiple tracks is impossible, unless using full Matroska instead of WebM - save for the fact that the standard proposed is WebM and not Matroska. A solution could be to stream the full Matroska with all tracks embedded. This, though, would be inefficient, since the user often will select only one language to view the video, and there's no way yet to stream only the selected tracks to the user. I have thought of two solutions for this: * Solution 1: Server-side demuxing. The video with all tracks is stored as a Matroska file. The server demuxes the file, generates a new one with the chosen tracks, and streams only the tracks chosen by the user. When the user chooses to download the full video, the full Matroska file is downloaded with no overhead. The downside is the server-side demuxing and remuxing; fortunately most users only need to choose once. Also, there's the problem of having to download the full file instead of a file with only the tracks wanted; this could be solved by even more muxing. On Sun, 23 May 2010, Silvia Pfeiffer wrote: For the last 10 years, we have tried to solve many of the media challenges on servers, making servers increasingly intelligent, and by that slow, and not real HTTP servers any more. Much of that happened in proprietary software, but others tried it with open software, too. For example I worked on a project called Annodex which was trying to make open media resources available on normal HTTP servers with only a cgi script installed that would allow remuxing files for serving time segments of the media resources. Or look at any of the open source RTSP streaming servers that were created. We have learnt in the last 10 years that the Web is better served with a plain HTTP server than with custom media servers and we have started putting the intelligence into user agents instead. User agents now know how to do byte range requests to retrieve temporal segments of a media resource. I believe for certain formats it's even possible to retrieve tracks through byte range requests only. In short, the biggest problem with your idea of dynamic muxing on a server is that it's very CPU intensive and doesn't lead easily to a scalable server. Also, it leads to specialised media servers in contrast to just using a simple HTTP server. It's possible, of course, but it's complex and not general-purpose. On Mon, 31 May 2010, Lachlan Hunt wrote: WebM, just like Matroska, certainly does support multiple video and audio tracks. The current limitation is that browser implementations don't yet provide an interface or API for track selection. Whether or not authors would actually do this depends on their use case and what trade offs they're willing to make. The use cases I'm aware of for multiple tracks include offering stereo and surround sound alternatives, audio descripitons, audio commentaries or multiple languages. The trade off here is in bandwidth usage vs. storage space (or processing time if you're doing dynamic server side muxing). Duplicating the video track in each file, containing only a single audio track saves bandwidth for users while increasing storage space. Storing all audio tracks in one multi-track webm file avoids duplication, while increasing the bandwidth for users downloading tracks they may not need. The latter theoretically allows for the user to dynamically switch audio tracks to, e.g. change language or listen to commentary, without having to download a whole new copy of the video. The former requires the user to choose which tracks they want prior to downloading the appropriate file. If there's only a choice between 2 or maybe 3 tracks, then the extra bandwidth may be insignificant. If, however, you're offering several alternate languages in both stereo and surround sound, with audio descriptions and directors commentary — the kind of stuff you'll find on many commercial DVDs — then the extra bandwidth wasted by users downloading so many tracks they don't need may not be worth it. On Sat, 22 May 2010, Carlos Andrés Solís wrote: * Solution 2: User-side muxing. Each track (video, audio, subtitles) is stored in standalone files. The server streams the tracks chosen by the user, and
Re: [whatwg] On implementing videos with multiple tracks in HTML5
On 2010-05-23 05:40, Carlos Andrés Solís wrote: Imagine a hypothetical website that delivers videos in multiple languages. Like on a DVD, where you can choose your audio and subtitles language. And also imagine there is the possibility of downloading a file with the video, along with either the chosen audio/sub tracks, or all of them at once. Right now, though, there's no way to deliver multiple audio and subtitle streams on HTML5 and WebM. Since the latter supports only one audio and one video track, WebM, just like Matroska, certainly does support multiple video and audio tracks. The current limitation is that browser implementations don't yet provide an interface or API for track selection. Whether or not authors would actually do this depends on their use case and what trade offs they're willing to make. The use cases I'm aware of for multiple tracks include offering stereo and surround sound alternatives, audio descripitons, audio commentaries or multiple languages. The trade off here is in bandwidth usage vs. storage space (or processing time if you're doing dynamic server side muxing). Duplicating the video track in each file, containing only a single audio track saves bandwidth for users while increasing storage space. Storing all audio tracks in one multi-track webm file avoids duplication, while increasing the bandwidth for users downloading tracks they may not need. The latter theoretically allows for the user to dynamically switch audio tracks to, e.g. change language or listen to commentary, without having to download a whole new copy of the video. The former requires the user to choose which tracks they want prior to downloading the appropriate file. If there's only a choice between 2 or maybe 3 tracks, then the extra bandwidth may be insignificant. If, however, you're offering several alternate languages in both stereo and surround sound, with audio descriptions and directors commentary — the kind of stuff you'll find on many commercial DVDs — then the extra bandwidth wasted by users downloading so many tracks they don't need may not be worth it. with no embedded subtitles, Timed text tracks within WebM (most likely WebSRT) will eventually be supported. -- Lachlan Hunt - Opera Software http://lachy.id.au/ http://www.opera.com/
Re: [whatwg] On implementing videos with multiple tracks in HTML5
2010/5/31 Lachlan Hunt lachlan.h...@lachy.id.au: WebM, just like Matroska, certainly does support multiple video and audio tracks. The current limitation is that browser implementations don't yet provide an interface or API for track selection. It could, but the spec currently explicitly disallows it. Has that changed while I was not looking? Also, Silvia-- one reason Ogg was designed the way it was so that remuxing was trivial-- it was a simple as deciding what pages to send out. The remuxing was trivial (shuffling a deck of cards; the cards remain unchanged) Monty
Re: [whatwg] On implementing videos with multiple tracks in HTML5
2010/5/31 Monty Montgomery xiphm...@gmail.com: 2010/5/31 Lachlan Hunt lachlan.h...@lachy.id.au: WebM, just like Matroska, certainly does support multiple video and audio tracks. The current limitation is that browser implementations don't yet provide an interface or API for track selection. It could, but the spec currently explicitly disallows it. Has that changed while I was not looking? I just looked through the docs I have, and I'm clearly worng-- none specify such a restriction. Monty
Re: [whatwg] On implementing videos with multiple tracks in HTML5
Hi Carlos, 2010/5/23 Carlos Andrés Solís csol...@gmail.com: Hello, I've been writing lately in the WHATWG and WebM mail-lists and would like to hear your opinion on the following idea. Imagine a hypothetical website that delivers videos in multiple languages. Like on a DVD, where you can choose your audio and subtitles language. And also imagine there is the possibility of downloading a file with the video, along with either the chosen audio/sub tracks, or all of them at once. Right now, though, there's no way to deliver multiple audio and subtitle streams on HTML5 and WebM. Since the latter supports only one audio and one video track, with no embedded subtitles, creating a file with multiple tracks is impossible, unless using full Matroska instead of WebM - save for the fact that the standard proposed is WebM and not Matroska. A solution could be to stream the full Matroska with all tracks embedded. This, though, would be inefficient, since the user often will select only one language to view the video, and there's no way yet to stream only the selected tracks to the user. I have thought of two solutions for this: * Solution 1: Server-side demuxing. The video with all tracks is stored as a Matroska file. The server demuxes the file, generates a new one with the chosen tracks, and streams only the tracks chosen by the user. When the user chooses to download the full video, the full Matroska file is downloaded with no overhead. The downside is the server-side demuxing and remuxing; fortunately most users only need to choose once. Also, there's the problem of having to download the full file instead of a file with only the tracks wanted; this could be solved by even more muxing. For the last 10 years, we have tried to solve many of the media challenges on servers, making servers increasingly intelligent, and by that slow, and not real HTTP servers any more. Much of that happened in proprietary software, but others tried it with open software, too. For example I worked on a project called Annodex which was trying to make open media resources available on normal HTTP servers with only a cgi script installed that would allow remuxing files for serving time segments of the media resources. Or look at any of the open source RTSP streaming servers that were created. We have learnt in the last 10 years that the Web is better served with a plain HTTP server than with custom media servers and we have started putting the intelligence into user agents instead. User agents now know how to do byte range requests to retrieve temporal segments of a media resource. I believe for certain formats it's even possible to retrieve tracks through byte range requests only. In short, the biggest problem with your idea of dynamic muxing on a server is that it's very CPU intensive and doesn't lead easily to a scalable server. Also, it leads to specialised media servers in contrast to just using a simple HTTP server. It's possible, of course, but it's complex and not general-purpose. * Solution 2: User-side muxing. Each track (video, audio, subtitles) is stored in standalone files. The server streams the tracks chosen by the user, and the web browser muxes them back. When the user chooses to download the video, the generation of the file can be done either server-side or client-side. This can be very dynamic but will force content providers to use extra coding inside of the pages. Again, we've actually tried this over the last 10 years with SMIL. However, synchronising audio and video that comes from multiple servers and therefore has different network delays, different buffering rates, different congestion times, etc. makes it really difficult to keep multiple media resources in sync. You don't actually have to rip audio and video apart to achieve what you're trying to do. Different Websites are created for different languages, too. So, I would expect that if your Website is in Spanish, you will get your video with a Spanish audio track, or when it's in German, your audio will be German. Each one of these is a media resource with a single audio and a single video track. Yes, your video track is replicated on the server between these different resources. But that's probably easier to handle from a production point of view anyway. The matter with subtitle / caption tracks is then a separate one. You could embed all of the subtitle tracks in all the media resources to make sure that when a file is downloaded, it comes with its alternative subtitle tracks. That's not actually that huge an overhead, seeing as text tracks make up the least space compared to the audio and video data. Or alternatively you could have the subtitle tracks as extra files. This is probably the preferred mode of operation and most conformant with traditional Web principles, seeing as they are text resources and the best source of information for indexing the content of a media resource in, e.g. a search engine. Also, such
[whatwg] On implementing videos with multiple tracks in HTML5
Hello, I've been writing lately in the WHATWG and WebM mail-lists and would like to hear your opinion on the following idea. Imagine a hypothetical website that delivers videos in multiple languages. Like on a DVD, where you can choose your audio and subtitles language. And also imagine there is the possibility of downloading a file with the video, along with either the chosen audio/sub tracks, or all of them at once. Right now, though, there's no way to deliver multiple audio and subtitle streams on HTML5 and WebM. Since the latter supports only one audio and one video track, with no embedded subtitles, creating a file with multiple tracks is impossible, unless using full Matroska instead of WebM - save for the fact that the standard proposed is WebM and not Matroska. A solution could be to stream the full Matroska with all tracks embedded. This, though, would be inefficient, since the user often will select only one language to view the video, and there's no way yet to stream only the selected tracks to the user. I have thought of two solutions for this: * Solution 1: Server-side demuxing. The video with all tracks is stored as a Matroska file. The server demuxes the file, generates a new one with the chosen tracks, and streams only the tracks chosen by the user. When the user chooses to download the full video, the full Matroska file is downloaded with no overhead. The downside is the server-side demuxing and remuxing; fortunately most users only need to choose once. Also, there's the problem of having to download the full file instead of a file with only the tracks wanted; this could be solved by even more muxing. * Solution 2: User-side muxing. Each track (video, audio, subtitles) is stored in standalone files. The server streams the tracks chosen by the user, and the web browser muxes them back. When the user chooses to download the video, the generation of the file can be done either server-side or client-side. This can be very dynamic but will force content providers to use extra coding inside of the pages. Any ideas or suggestions? - Carlos Solís