Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Brendan Long

On 10/27/2014 08:43 PM, Silvia Pfeiffer wrote:
 On Tue, Oct 28, 2014 at 2:41 AM, Philip Jägenstedt phil...@opera.com wrote:
 On Sun, Oct 26, 2014 at 8:28 AM, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:
 On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt phil...@opera.com 
 wrote:
 On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:
 Using the VideoTrack interface it would list them as a kind=captions
 and would thus also be able to be activated by JavaScript. The
 downside would that if you have N video tracks and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface.
 VideoTrackList can have at most one video track selected at a time, so
 representing this as a VideoTrack would require some additional
 tweaking to the model.
 The captions video track is one that has video and captions rendered
 together, so you only need the one video track active. If you want to
 turn off captions, you merely activate a different video track which
 is one without captions.

 There is no change to the model necessary - in fact, it fits perfectly
 to what the spec is currently describing without any change.
 Ah, right! Unless I'm misunderstanding again, your suggestion is to
 expose extra video tracks with kind captions or subtitles, requiring
 no spec change at all. That sounds good to me.
 Yes, that was my suggestion for dealing with UA rendered tracks.
Doesn't this still leave us with the issue: if you have N video tracks
and m caption tracks in
the media file, you'd have to expose NxM videoTracks in the interface?
We would also need to consider:

  * How do you label this combined video and text track?
  * What is the track's id?
  * How do you present this to users in a way that isn't confusing?
  * What if the video track's kind isn't main? For example, what if we
have a sign language track and we also want to display captions?
What is the generated track's kind?
  * The language attribute could also have conflicts.
  * I think it might also be possible to create files where the video
track and text track are different lengths, so we'd need to figure
out what to do when one of them ends.



Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Silvia Pfeiffer
On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long s...@brendanlong.com wrote:

 On 10/27/2014 08:43 PM, Silvia Pfeiffer wrote:
 On Tue, Oct 28, 2014 at 2:41 AM, Philip Jägenstedt phil...@opera.com wrote:
 On Sun, Oct 26, 2014 at 8:28 AM, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:
 On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt phil...@opera.com 
 wrote:
 On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:
 Using the VideoTrack interface it would list them as a kind=captions
 and would thus also be able to be activated by JavaScript. The
 downside would that if you have N video tracks and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface.
 VideoTrackList can have at most one video track selected at a time, so
 representing this as a VideoTrack would require some additional
 tweaking to the model.
 The captions video track is one that has video and captions rendered
 together, so you only need the one video track active. If you want to
 turn off captions, you merely activate a different video track which
 is one without captions.

 There is no change to the model necessary - in fact, it fits perfectly
 to what the spec is currently describing without any change.
 Ah, right! Unless I'm misunderstanding again, your suggestion is to
 expose extra video tracks with kind captions or subtitles, requiring
 no spec change at all. That sounds good to me.
 Yes, that was my suggestion for dealing with UA rendered tracks.

 Doesn't this still leave us with the issue: if you have N video tracks
 and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface?

Right, that was the original concern. But how realistic is the
situation of n video tracks and m caption tracks with n being larger
than 2 or 3 without a change of the audio track anyway?

 We would also need to consider:

   * How do you label this combined video and text track?

That's not specific to the approach that we pick and will always need
to be decided. Note that label isn't something that needs to be unique
to a track, so you could just use the same label for all burnt-in
video tracks and identify them to be different only in the language.

   * What is the track's id?

This would need to be unique, but I think it will be easy to come up
with a scheme that works. Something like video_[n]_[captiontrackid]
could work.

   * How do you present this to users in a way that isn't confusing?

No different to presenting caption tracks.

   * What if the video track's kind isn't main? For example, what if we
 have a sign language track and we also want to display captions?
 What is the generated track's kind?

How would that work? Are you saying we're not displaying the main
video, but only displaying the sign language track? Is that realistic
and something anybody would actually do?

   * The language attribute could also have conflicts.

How so?

   * I think it might also be possible to create files where the video
 track and text track are different lengths, so we'd need to figure
 out what to do when one of them ends.

The timeline of a video is well defined in the spec - I don't think we
need to do more than what is already defined.

Silvia.


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Brendan Long

On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote:
 On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long s...@brendanlong.com wrote:
 Right, that was the original concern. But how realistic is the
 situation of n video tracks and m caption tracks with n being larger
 than 2 or 3 without a change of the audio track anyway?
I think the situation gets confusing at N=2. See below.

 We would also need to consider:

   * How do you label this combined video and text track?
 That's not specific to the approach that we pick and will always need
 to be decided. Note that label isn't something that needs to be unique
 to a track, so you could just use the same label for all burnt-in
 video tracks and identify them to be different only in the language.
But the video and the text track might both have their own label in the
underlying media file. Presumably we'd want to preserve both.

   * What is the track's id?
 This would need to be unique, but I think it will be easy to come up
 with a scheme that works. Something like video_[n]_[captiontrackid]
 could work.
This sounds much more complicated and likely to cause problems for
JavaScript developers than just indicating that a text track has cues
that can't be represented in JavaScript.

   * How do you present this to users in a way that isn't confusing?
 No different to presenting caption tracks.
I think VideoTracks with kind=caption are confusing too, and we should
avoid creating more situations where we need to do that.

Even when we only have one video, it's confusing that captions could
exist in multiple places.

   * What if the video track's kind isn't main? For example, what if we
 have a sign language track and we also want to display captions?
 What is the generated track's kind?
 How would that work? Are you saying we're not displaying the main
 video, but only displaying the sign language track? Is that realistic
 and something anybody would actually do?
It's possible, so the spec should handle it. Maybe it doesn't matter though?

   * The language attribute could also have conflicts.
 How so?
The underlying streams could have their own metadata, and it could
conflict. I'm not sure if it would ever be reasonable to author a file
like that, but it would be trivial to create. At the very least, we'd
need language to say which takes precedence if the two streams have
conflicting metadata.

   * I think it might also be possible to create files where the video
 track and text track are different lengths, so we'd need to figure
 out what to do when one of them ends.
 The timeline of a video is well defined in the spec - I don't think we
 need to do more than what is already defined.
What I mean is that this could be confusing for users. Say I'm watching
a video with two video streams (main camera angle, secondary camera
angle) and two captions tracks (for sports for example). If I'm watching
the secondary camera angle and looking at one of the captions tracks,
but then the secondary camera angle goes away, my player is now forced
to randomly select one of the caption tracks combined with the primary
video, because it's not obvious which one corresponds with the captions
I was reading before.

In fact, if I was making a video player for my website where multiple
people give commentary on baseball games with multiple camera angles, I
would probably create my own controls that parse the video track ids and
separates them back into video and text tracks so that I could have
offer separate video and text controls, since combining them just makes
the UI more complicated.


So, what's the advantage of combining video and captions, rather than
just indicating that a text track can't be represented as TextTrackCues?


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Silvia Pfeiffer
On Tue, Nov 4, 2014 at 10:24 AM, Brendan Long s...@brendanlong.com wrote:

 On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote:
 On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long s...@brendanlong.com wrote:
 Right, that was the original concern. But how realistic is the
 situation of n video tracks and m caption tracks with n being larger
 than 2 or 3 without a change of the audio track anyway?
 I think the situation gets confusing at N=2. See below.

 We would also need to consider:

   * How do you label this combined video and text track?
 That's not specific to the approach that we pick and will always need
 to be decided. Note that label isn't something that needs to be unique
 to a track, so you could just use the same label for all burnt-in
 video tracks and identify them to be different only in the language.
 But the video and the text track might both have their own label in the
 underlying media file. Presumably we'd want to preserve both.

   * What is the track's id?
 This would need to be unique, but I think it will be easy to come up
 with a scheme that works. Something like video_[n]_[captiontrackid]
 could work.
 This sounds much more complicated and likely to cause problems for
 JavaScript developers than just indicating that a text track has cues
 that can't be represented in JavaScript.

   * How do you present this to users in a way that isn't confusing?
 No different to presenting caption tracks.
 I think VideoTracks with kind=caption are confusing too, and we should
 avoid creating more situations where we need to do that.

 Even when we only have one video, it's confusing that captions could
 exist in multiple places.

   * What if the video track's kind isn't main? For example, what if we
 have a sign language track and we also want to display captions?
 What is the generated track's kind?
 How would that work? Are you saying we're not displaying the main
 video, but only displaying the sign language track? Is that realistic
 and something anybody would actually do?
 It's possible, so the spec should handle it. Maybe it doesn't matter though?

   * The language attribute could also have conflicts.
 How so?
 The underlying streams could have their own metadata, and it could
 conflict. I'm not sure if it would ever be reasonable to author a file
 like that, but it would be trivial to create. At the very least, we'd
 need language to say which takes precedence if the two streams have
 conflicting metadata.

   * I think it might also be possible to create files where the video
 track and text track are different lengths, so we'd need to figure
 out what to do when one of them ends.
 The timeline of a video is well defined in the spec - I don't think we
 need to do more than what is already defined.
 What I mean is that this could be confusing for users. Say I'm watching
 a video with two video streams (main camera angle, secondary camera
 angle) and two captions tracks (for sports for example). If I'm watching
 the secondary camera angle and looking at one of the captions tracks,
 but then the secondary camera angle goes away, my player is now forced
 to randomly select one of the caption tracks combined with the primary
 video, because it's not obvious which one corresponds with the captions
 I was reading before.

 In fact, if I was making a video player for my website where multiple
 people give commentary on baseball games with multiple camera angles, I
 would probably create my own controls that parse the video track ids and
 separates them back into video and text tracks so that I could have
 offer separate video and text controls, since combining them just makes
 the UI more complicated.

That's what I meant with multiple video tracks: if you have several
that require different captions, then you're in a world of hurt in any
case and this has nothing to do with whether you're representing the
non-cue-exposed caption tracks as UARendered or as a video track.


 So, what's the advantage of combining video and captions, rather than
 just indicating that a text track can't be represented as TextTrackCues?

One important advantage: there's no need to change the spec.

If we change the spec, we still have to work through all the issues
that you listed above and find a solution.

Silvia.


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Brendan Long

On 11/03/2014 05:41 PM, Silvia Pfeiffer wrote:
 On Tue, Nov 4, 2014 at 10:24 AM, Brendan Long s...@brendanlong.com wrote:
 On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote:
 On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long s...@brendanlong.com wrote:
 Right, that was the original concern. But how realistic is the
 situation of n video tracks and m caption tracks with n being larger
 than 2 or 3 without a change of the audio track anyway?
 I think the situation gets confusing at N=2. See below.

 We would also need to consider:

   * How do you label this combined video and text track?
 That's not specific to the approach that we pick and will always need
 to be decided. Note that label isn't something that needs to be unique
 to a track, so you could just use the same label for all burnt-in
 video tracks and identify them to be different only in the language.
 But the video and the text track might both have their own label in the
 underlying media file. Presumably we'd want to preserve both.

   * What is the track's id?
 This would need to be unique, but I think it will be easy to come up
 with a scheme that works. Something like video_[n]_[captiontrackid]
 could work.
 This sounds much more complicated and likely to cause problems for
 JavaScript developers than just indicating that a text track has cues
 that can't be represented in JavaScript.

   * How do you present this to users in a way that isn't confusing?
 No different to presenting caption tracks.
 I think VideoTracks with kind=caption are confusing too, and we should
 avoid creating more situations where we need to do that.

 Even when we only have one video, it's confusing that captions could
 exist in multiple places.

   * What if the video track's kind isn't main? For example, what if we
 have a sign language track and we also want to display captions?
 What is the generated track's kind?
 How would that work? Are you saying we're not displaying the main
 video, but only displaying the sign language track? Is that realistic
 and something anybody would actually do?
 It's possible, so the spec should handle it. Maybe it doesn't matter though?

   * The language attribute could also have conflicts.
 How so?
 The underlying streams could have their own metadata, and it could
 conflict. I'm not sure if it would ever be reasonable to author a file
 like that, but it would be trivial to create. At the very least, we'd
 need language to say which takes precedence if the two streams have
 conflicting metadata.

   * I think it might also be possible to create files where the video
 track and text track are different lengths, so we'd need to figure
 out what to do when one of them ends.
 The timeline of a video is well defined in the spec - I don't think we
 need to do more than what is already defined.
 What I mean is that this could be confusing for users. Say I'm watching
 a video with two video streams (main camera angle, secondary camera
 angle) and two captions tracks (for sports for example). If I'm watching
 the secondary camera angle and looking at one of the captions tracks,
 but then the secondary camera angle goes away, my player is now forced
 to randomly select one of the caption tracks combined with the primary
 video, because it's not obvious which one corresponds with the captions
 I was reading before.

 In fact, if I was making a video player for my website where multiple
 people give commentary on baseball games with multiple camera angles, I
 would probably create my own controls that parse the video track ids and
 separates them back into video and text tracks so that I could have
 offer separate video and text controls, since combining them just makes
 the UI more complicated.
 That's what I meant with multiple video tracks: if you have several
 that require different captions, then you're in a world of hurt in any
 case and this has nothing to do with whether you're representing the
 non-cue-exposed caption tracks as UARendered or as a video track.
I mean multiple video tracks that are valid for multiple caption tracks.
The example I had in my head was sports commentary, with multiple people
commenting on the same game, which is available from multiple camera angles.

We probably do need a way to indicate that tracks go together when they
don't all go together though. I think it's come up before. Maybe the
obvious answer is, don't have tracks that don't go together in the same
file.

 So, what's the advantage of combining video and captions, rather than
 just indicating that a text track can't be represented as TextTrackCues?
 One important advantage: there's no need to change the spec.

 If we change the spec, we still have to work through all the issues
 that you listed above and find a solution.

 Silvia.
I suppose not changing the spec is nice, but I think the changes are
simpler if we have no-cue text tracks, since the answer to all of my
questions becomes we don't do that, we just keep the two tracks 

Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Bob Lund


On 11/3/14, 3:41 PM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote:

On Tue, Nov 4, 2014 at 10:24 AM, Brendan Long s...@brendanlong.com
wrote:

 On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote:
 On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long s...@brendanlong.com
wrote:
 Right, that was the original concern. But how realistic is the
 situation of n video tracks and m caption tracks with n being larger
 than 2 or 3 without a change of the audio track anyway?
 I think the situation gets confusing at N=2. See below.

 We would also need to consider:

   * How do you label this combined video and text track?
 That's not specific to the approach that we pick and will always need
 to be decided. Note that label isn't something that needs to be unique
 to a track, so you could just use the same label for all burnt-in
 video tracks and identify them to be different only in the language.
 But the video and the text track might both have their own label in the
 underlying media file. Presumably we'd want to preserve both.

   * What is the track's id?
 This would need to be unique, but I think it will be easy to come up
 with a scheme that works. Something like video_[n]_[captiontrackid]
 could work.
 This sounds much more complicated and likely to cause problems for
 JavaScript developers than just indicating that a text track has cues
 that can't be represented in JavaScript.

   * How do you present this to users in a way that isn't confusing?
 No different to presenting caption tracks.
 I think VideoTracks with kind=caption are confusing too, and we should
 avoid creating more situations where we need to do that.

 Even when we only have one video, it's confusing that captions could
 exist in multiple places.

   * What if the video track's kind isn't main? For example, what if
we
 have a sign language track and we also want to display captions?
 What is the generated track's kind?
 How would that work? Are you saying we're not displaying the main
 video, but only displaying the sign language track? Is that realistic
 and something anybody would actually do?
 It's possible, so the spec should handle it. Maybe it doesn't matter
though?

   * The language attribute could also have conflicts.
 How so?
 The underlying streams could have their own metadata, and it could
 conflict. I'm not sure if it would ever be reasonable to author a file
 like that, but it would be trivial to create. At the very least, we'd
 need language to say which takes precedence if the two streams have
 conflicting metadata.

   * I think it might also be possible to create files where the video
 track and text track are different lengths, so we'd need to figure
 out what to do when one of them ends.
 The timeline of a video is well defined in the spec - I don't think we
 need to do more than what is already defined.
 What I mean is that this could be confusing for users. Say I'm watching
 a video with two video streams (main camera angle, secondary camera
 angle) and two captions tracks (for sports for example). If I'm watching
 the secondary camera angle and looking at one of the captions tracks,
 but then the secondary camera angle goes away, my player is now forced
 to randomly select one of the caption tracks combined with the primary
 video, because it's not obvious which one corresponds with the captions
 I was reading before.

 In fact, if I was making a video player for my website where multiple
 people give commentary on baseball games with multiple camera angles, I
 would probably create my own controls that parse the video track ids and
 separates them back into video and text tracks so that I could have
 offer separate video and text controls, since combining them just makes
 the UI more complicated.

That's what I meant with multiple video tracks: if you have several
that require different captions, then you're in a world of hurt in any
case and this has nothing to do with whether you're representing the
non-cue-exposed caption tracks as UARendered or as a video track.


 So, what's the advantage of combining video and captions, rather than
 just indicating that a text track can't be represented as TextTrackCues?

One important advantage: there's no need to change the spec.

If we change the spec, we still have to work through all the issues
that you listed above and find a solution.

Will we? I agree the case of multiple video tracks, each with different
audio/captions (possibly multiple languages) is complicated. But treating
captions as burned in video means the UA has to sort things out; leaving
them as cueless text tracks means the app figures it out. Having the app
sort it out doesn't make it easier but it is more flexible.

Also, in the case where the text tracks have cues, then the multiple
video/audio/text track case will have to be handled by the app. Why should
the whole model change just because cues are not exposed to javascript?


Silvia.



Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-27 Thread Philip Jägenstedt
On Sun, Oct 26, 2014 at 8:28 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:

 On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt phil...@opera.com wrote:
  On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
  silviapfeiff...@gmail.com wrote:
 
  Hi all,
 
  In the Inband Text Tracks Community Group we've recently had a
  discussion about a proposal by HbbTV. I'd like to bring it up here to
  get some opinions on how to resolve the issue.
 
  (The discussion thread is at
  http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
  , but let me summarize it here, because it's a bit spread out.)
 
  The proposed use case is as follows:
  * there are MPEG-2 files that have an audio, a video and several caption 
  tracks
  * the caption tracks are not in WebVTT format but in formats that
  existing Digital TV receivers are already capable of decoding and
  displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
  * there is no intention to standardize a TextTrackCue format for those
  other formats (statements are: there are too many formats to deal
  with, a set-top-box won't need access to cues)
 
  The request was to expose such caption tracks as textTracks:
  interface HTMLMediaElement : HTMLElement {
  ...
readonly attribute TextTrackList textTracks;
  ...
  }
 
  Then, the TextTrack interface would list them as a kind=captions,
  but without any cues, since they're not exposed. This then allows
  turning the caption tracks on/off via JavaScript. However, for
  JavaScript it is indistinguishable from a text track that has no
  captions. So the suggestion was to introduce a new kind=UARendered.
 
 
  My suggestion was to instead treat such tracks as burnt-in video
  tracks (by combination with the main video track):
  interface HTMLMediaElement : HTMLElement {
  ...
 
  readonly attribute VideoTrackList videoTracks;
  ...
  }
 
  Using the VideoTrack interface it would list them as a kind=captions
  and would thus also be able to be activated by JavaScript. The
  downside would that if you have N video tracks and m caption tracks in
  the media file, you'd have to expose NxM videoTracks in the interface.
 
 
  So, given this, should we introduce a kind=UARendered or expose such
  tracks a videoTracks or is there another solution that we're
  overlooking?
 
  VideoTrackList can have at most one video track selected at a time, so
  representing this as a VideoTrack would require some additional
  tweaking to the model.

 The captions video track is one that has video and captions rendered
 together, so you only need the one video track active. If you want to
 turn off captions, you merely activate a different video track which
 is one without captions.

 There is no change to the model necessary - in fact, it fits perfectly
 to what the spec is currently describing without any change.

Ah, right! Unless I'm misunderstanding again, your suggestion is to
expose extra video tracks with kind captions or subtitles, requiring
no spec change at all. That sounds good to me.

Philip


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-27 Thread Silvia Pfeiffer
On Tue, Oct 28, 2014 at 2:41 AM, Philip Jägenstedt phil...@opera.com wrote:
 On Sun, Oct 26, 2014 at 8:28 AM, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:

 On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt phil...@opera.com wrote:
  On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
  silviapfeiff...@gmail.com wrote:
 
  Hi all,
 
  In the Inband Text Tracks Community Group we've recently had a
  discussion about a proposal by HbbTV. I'd like to bring it up here to
  get some opinions on how to resolve the issue.
 
  (The discussion thread is at
  http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
  , but let me summarize it here, because it's a bit spread out.)
 
  The proposed use case is as follows:
  * there are MPEG-2 files that have an audio, a video and several caption 
  tracks
  * the caption tracks are not in WebVTT format but in formats that
  existing Digital TV receivers are already capable of decoding and
  displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
  * there is no intention to standardize a TextTrackCue format for those
  other formats (statements are: there are too many formats to deal
  with, a set-top-box won't need access to cues)
 
  The request was to expose such caption tracks as textTracks:
  interface HTMLMediaElement : HTMLElement {
  ...
readonly attribute TextTrackList textTracks;
  ...
  }
 
  Then, the TextTrack interface would list them as a kind=captions,
  but without any cues, since they're not exposed. This then allows
  turning the caption tracks on/off via JavaScript. However, for
  JavaScript it is indistinguishable from a text track that has no
  captions. So the suggestion was to introduce a new kind=UARendered.
 
 
  My suggestion was to instead treat such tracks as burnt-in video
  tracks (by combination with the main video track):
  interface HTMLMediaElement : HTMLElement {
  ...
 
  readonly attribute VideoTrackList videoTracks;
  ...
  }
 
  Using the VideoTrack interface it would list them as a kind=captions
  and would thus also be able to be activated by JavaScript. The
  downside would that if you have N video tracks and m caption tracks in
  the media file, you'd have to expose NxM videoTracks in the interface.
 
 
  So, given this, should we introduce a kind=UARendered or expose such
  tracks a videoTracks or is there another solution that we're
  overlooking?
 
  VideoTrackList can have at most one video track selected at a time, so
  representing this as a VideoTrack would require some additional
  tweaking to the model.

 The captions video track is one that has video and captions rendered
 together, so you only need the one video track active. If you want to
 turn off captions, you merely activate a different video track which
 is one without captions.

 There is no change to the model necessary - in fact, it fits perfectly
 to what the spec is currently describing without any change.

 Ah, right! Unless I'm misunderstanding again, your suggestion is to
 expose extra video tracks with kind captions or subtitles, requiring
 no spec change at all. That sounds good to me.


Yes, that was my suggestion for dealing with UA rendered tracks.

Cheers,
Silvia.


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-26 Thread Silvia Pfeiffer
On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt phil...@opera.com wrote:
 On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:

 Hi all,

 In the Inband Text Tracks Community Group we've recently had a
 discussion about a proposal by HbbTV. I'd like to bring it up here to
 get some opinions on how to resolve the issue.

 (The discussion thread is at
 http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
 , but let me summarize it here, because it's a bit spread out.)

 The proposed use case is as follows:
 * there are MPEG-2 files that have an audio, a video and several caption 
 tracks
 * the caption tracks are not in WebVTT format but in formats that
 existing Digital TV receivers are already capable of decoding and
 displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
 * there is no intention to standardize a TextTrackCue format for those
 other formats (statements are: there are too many formats to deal
 with, a set-top-box won't need access to cues)

 The request was to expose such caption tracks as textTracks:
 interface HTMLMediaElement : HTMLElement {
 ...
   readonly attribute TextTrackList textTracks;
 ...
 }

 Then, the TextTrack interface would list them as a kind=captions,
 but without any cues, since they're not exposed. This then allows
 turning the caption tracks on/off via JavaScript. However, for
 JavaScript it is indistinguishable from a text track that has no
 captions. So the suggestion was to introduce a new kind=UARendered.


 My suggestion was to instead treat such tracks as burnt-in video
 tracks (by combination with the main video track):
 interface HTMLMediaElement : HTMLElement {
 ...

 readonly attribute VideoTrackList videoTracks;
 ...
 }

 Using the VideoTrack interface it would list them as a kind=captions
 and would thus also be able to be activated by JavaScript. The
 downside would that if you have N video tracks and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface.


 So, given this, should we introduce a kind=UARendered or expose such
 tracks a videoTracks or is there another solution that we're
 overlooking?

 VideoTrackList can have at most one video track selected at a time, so
 representing this as a VideoTrack would require some additional
 tweaking to the model.

The captions video track is one that has video and captions rendered
together, so you only need the one video track active. If you want to
turn off captions, you merely activate a different video track which
is one without captions.

There is no change to the model necessary - in fact, it fits perfectly
to what the spec is currently describing without any change.


 A separate text track kind seems better, but wouldn't it still be
 useful to distinguish between captions and subtitles even if the
 underlying data is unavailable?

As stated, the proposal was to introduce kind=UARendered and that
would introduce a change to the spec.

Regards,
Silvia.


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-26 Thread Silvia Pfeiffer
On Thu, Oct 23, 2014 at 2:33 AM, Bob Lund b.l...@cablelabs.com wrote:


 On 10/22/14, 9:01 AM, Philip Jägenstedt phil...@opera.com wrote:

On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:

 Hi all,

 In the Inband Text Tracks Community Group we've recently had a
 discussion about a proposal by HbbTV. I'd like to bring it up here to
 get some opinions on how to resolve the issue.

 (The discussion thread is at

http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
 , but let me summarize it here, because it's a bit spread out.)

 The proposed use case is as follows:
 * there are MPEG-2 files that have an audio, a video and several
caption tracks
 * the caption tracks are not in WebVTT format but in formats that
 existing Digital TV receivers are already capable of decoding and
 displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
 * there is no intention to standardize a TextTrackCue format for those
 other formats (statements are: there are too many formats to deal
 with, a set-top-box won't need access to cues)

 The request was to expose such caption tracks as textTracks:
 interface HTMLMediaElement : HTMLElement {
 ...
   readonly attribute TextTrackList textTracks;
 ...
 }

 Then, the TextTrack interface would list them as a kind=captions,
 but without any cues, since they're not exposed. This then allows
 turning the caption tracks on/off via JavaScript. However, for
 JavaScript it is indistinguishable from a text track that has no
 captions. So the suggestion was to introduce a new kind=UARendered.


 My suggestion was to instead treat such tracks as burnt-in video
 tracks (by combination with the main video track):
 interface HTMLMediaElement : HTMLElement {
 ...

 readonly attribute VideoTrackList videoTracks;
 ...
 }

 Using the VideoTrack interface it would list them as a kind=captions
 and would thus also be able to be activated by JavaScript. The
 downside would that if you have N video tracks and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface.


 So, given this, should we introduce a kind=UARendered or expose such
 tracks a videoTracks or is there another solution that we're
 overlooking?

VideoTrackList can have at most one video track selected at a time, so
representing this as a VideoTrack would require some additional
tweaking to the model.

A separate text track kind seems better, but wouldn't it still be
useful to distinguish between captions and subtitles even if the
underlying data is unavailable?

 This issue was clarified here [1]. TextTrack.mode would be set
 ³uarendered². TextTrack.kind would still reflect ³captions² or ³subtitles².

OK, right that's another approach and probably better than introducing
a different kind.

 [1]
 http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0154.html


Philip



Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-23 Thread Philip Jägenstedt
On Wed, Oct 22, 2014 at 5:33 PM, Bob Lund b.l...@cablelabs.com wrote:


 On 10/22/14, 9:01 AM, Philip Jägenstedt phil...@opera.com wrote:

On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:

 Hi all,

 In the Inband Text Tracks Community Group we've recently had a
 discussion about a proposal by HbbTV. I'd like to bring it up here to
 get some opinions on how to resolve the issue.

 (The discussion thread is at

http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
 , but let me summarize it here, because it's a bit spread out.)

 The proposed use case is as follows:
 * there are MPEG-2 files that have an audio, a video and several
caption tracks
 * the caption tracks are not in WebVTT format but in formats that
 existing Digital TV receivers are already capable of decoding and
 displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
 * there is no intention to standardize a TextTrackCue format for those
 other formats (statements are: there are too many formats to deal
 with, a set-top-box won't need access to cues)

 The request was to expose such caption tracks as textTracks:
 interface HTMLMediaElement : HTMLElement {
 ...
   readonly attribute TextTrackList textTracks;
 ...
 }

 Then, the TextTrack interface would list them as a kind=captions,
 but without any cues, since they're not exposed. This then allows
 turning the caption tracks on/off via JavaScript. However, for
 JavaScript it is indistinguishable from a text track that has no
 captions. So the suggestion was to introduce a new kind=UARendered.


 My suggestion was to instead treat such tracks as burnt-in video
 tracks (by combination with the main video track):
 interface HTMLMediaElement : HTMLElement {
 ...

 readonly attribute VideoTrackList videoTracks;
 ...
 }

 Using the VideoTrack interface it would list them as a kind=captions
 and would thus also be able to be activated by JavaScript. The
 downside would that if you have N video tracks and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface.


 So, given this, should we introduce a kind=UARendered or expose such
 tracks a videoTracks or is there another solution that we're
 overlooking?

VideoTrackList can have at most one video track selected at a time, so
representing this as a VideoTrack would require some additional
tweaking to the model.

A separate text track kind seems better, but wouldn't it still be
useful to distinguish between captions and subtitles even if the
underlying data is unavailable?

 This issue was clarified here [1]. TextTrack.mode would be set
 ³uarendered². TextTrack.kind would still reflect ³captions² or ³subtitles².

 [1]
 http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0154.html

Oops, I missed that.

I was under the impression that the ability for scripts to detect this
situation was the motivation for a spec change. If there are multiple
tracks most likely all but one will be disabled initially, which
would be indistinguishable from a disabled track with no cues. Since
TextTrack.mode is mutable, even when it is initially uarendered,
scripts would have to remember that before disabling the track, which
seems a bit inconvenient.

P.S. Your mails have an encoding problem resulting in superscript
numbers instead of quotes.

Philip


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-22 Thread Philip Jägenstedt
On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:

 Hi all,

 In the Inband Text Tracks Community Group we've recently had a
 discussion about a proposal by HbbTV. I'd like to bring it up here to
 get some opinions on how to resolve the issue.

 (The discussion thread is at
 http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
 , but let me summarize it here, because it's a bit spread out.)

 The proposed use case is as follows:
 * there are MPEG-2 files that have an audio, a video and several caption 
 tracks
 * the caption tracks are not in WebVTT format but in formats that
 existing Digital TV receivers are already capable of decoding and
 displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
 * there is no intention to standardize a TextTrackCue format for those
 other formats (statements are: there are too many formats to deal
 with, a set-top-box won't need access to cues)

 The request was to expose such caption tracks as textTracks:
 interface HTMLMediaElement : HTMLElement {
 ...
   readonly attribute TextTrackList textTracks;
 ...
 }

 Then, the TextTrack interface would list them as a kind=captions,
 but without any cues, since they're not exposed. This then allows
 turning the caption tracks on/off via JavaScript. However, for
 JavaScript it is indistinguishable from a text track that has no
 captions. So the suggestion was to introduce a new kind=UARendered.


 My suggestion was to instead treat such tracks as burnt-in video
 tracks (by combination with the main video track):
 interface HTMLMediaElement : HTMLElement {
 ...

 readonly attribute VideoTrackList videoTracks;
 ...
 }

 Using the VideoTrack interface it would list them as a kind=captions
 and would thus also be able to be activated by JavaScript. The
 downside would that if you have N video tracks and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface.


 So, given this, should we introduce a kind=UARendered or expose such
 tracks a videoTracks or is there another solution that we're
 overlooking?

VideoTrackList can have at most one video track selected at a time, so
representing this as a VideoTrack would require some additional
tweaking to the model.

A separate text track kind seems better, but wouldn't it still be
useful to distinguish between captions and subtitles even if the
underlying data is unavailable?

Philip


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-22 Thread Bob Lund


On 10/22/14, 9:01 AM, Philip Jägenstedt phil...@opera.com wrote:

On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:

 Hi all,

 In the Inband Text Tracks Community Group we've recently had a
 discussion about a proposal by HbbTV. I'd like to bring it up here to
 get some opinions on how to resolve the issue.

 (The discussion thread is at
 
http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
 , but let me summarize it here, because it's a bit spread out.)

 The proposed use case is as follows:
 * there are MPEG-2 files that have an audio, a video and several
caption tracks
 * the caption tracks are not in WebVTT format but in formats that
 existing Digital TV receivers are already capable of decoding and
 displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
 * there is no intention to standardize a TextTrackCue format for those
 other formats (statements are: there are too many formats to deal
 with, a set-top-box won't need access to cues)

 The request was to expose such caption tracks as textTracks:
 interface HTMLMediaElement : HTMLElement {
 ...
   readonly attribute TextTrackList textTracks;
 ...
 }

 Then, the TextTrack interface would list them as a kind=captions,
 but without any cues, since they're not exposed. This then allows
 turning the caption tracks on/off via JavaScript. However, for
 JavaScript it is indistinguishable from a text track that has no
 captions. So the suggestion was to introduce a new kind=UARendered.


 My suggestion was to instead treat such tracks as burnt-in video
 tracks (by combination with the main video track):
 interface HTMLMediaElement : HTMLElement {
 ...

 readonly attribute VideoTrackList videoTracks;
 ...
 }

 Using the VideoTrack interface it would list them as a kind=captions
 and would thus also be able to be activated by JavaScript. The
 downside would that if you have N video tracks and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface.


 So, given this, should we introduce a kind=UARendered or expose such
 tracks a videoTracks or is there another solution that we're
 overlooking?

VideoTrackList can have at most one video track selected at a time, so
representing this as a VideoTrack would require some additional
tweaking to the model.

A separate text track kind seems better, but wouldn't it still be
useful to distinguish between captions and subtitles even if the
underlying data is unavailable?

This issue was clarified here [1]. TextTrack.mode would be set
³uarendered². TextTrack.kind would still reflect ³captions² or ³subtitles².

[1] 
http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0154.html


Philip



Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-13 Thread Bob Lund


On 10/12/14, 3:45 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote:

Hi all,

In the Inband Text Tracks Community Group we've recently had a
discussion about a proposal by HbbTV. I'd like to bring it up here to
get some opinions on how to resolve the issue.

(The discussion thread is at
http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
, but let me summarize it here, because it's a bit spread out.)

The proposed use case is as follows:
* there are MPEG-2 files that have an audio, a video and several caption
tracks
* the caption tracks are not in WebVTT format but in formats that
existing Digital TV receivers are already capable of decoding and
displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
* there is no intention to standardize a TextTrackCue format for those
other formats (statements are: there are too many formats to deal
with, a set-top-box won't need access to cues)

The request was to expose such caption tracks as textTracks:
interface HTMLMediaElement : HTMLElement {
...
  readonly attribute TextTrackList textTracks;
...
}

Then, the TextTrack interface would list them as a kind=captions,
but without any cues, since they're not exposed. This then allows
turning the caption tracks on/off via JavaScript. However, for
JavaScript it is indistinguishable from a text track that has no
captions. So the suggestion was to introduce a new kind=UARendered.

A clarification - the suggestion was for a new TextTrack.mode value of
³UARendered² and for this type of TextTrack the only valid modes would be
³UARendered² and ³disabled². The ³hidden² and ³showing² modes would not be
allowed since no Cues are generated. @kind would continue to denote the
type of TextTrack.

Bob


My suggestion was to instead treat such tracks as burnt-in video
tracks (by combination with the main video track):
interface HTMLMediaElement : HTMLElement {
...

readonly attribute VideoTrackList videoTracks;
...
}

Using the VideoTrack interface it would list them as a kind=captions
and would thus also be able to be activated by JavaScript. The
downside would that if you have N video tracks and m caption tracks in
the media file, you'd have to expose NxM videoTracks in the interface.


So, given this, should we introduce a kind=UARendered or expose such
tracks a videoTracks or is there another solution that we're
overlooking?

Silvia.



Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-13 Thread Elliott Sprehn
What does UA rendered mean? How does the UA render it? Can the UA just
convert the format into WebVTT instead?

On Mon, Oct 13, 2014 at 11:15 AM, Bob Lund b.l...@cablelabs.com wrote:



 On 10/12/14, 3:45 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote:

 Hi all,
 
 In the Inband Text Tracks Community Group we've recently had a
 discussion about a proposal by HbbTV. I'd like to bring it up here to
 get some opinions on how to resolve the issue.
 
 (The discussion thread is at
 http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
 , but let me summarize it here, because it's a bit spread out.)
 
 The proposed use case is as follows:
 * there are MPEG-2 files that have an audio, a video and several caption
 tracks
 * the caption tracks are not in WebVTT format but in formats that
 existing Digital TV receivers are already capable of decoding and
 displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
 * there is no intention to standardize a TextTrackCue format for those
 other formats (statements are: there are too many formats to deal
 with, a set-top-box won't need access to cues)
 
 The request was to expose such caption tracks as textTracks:
 interface HTMLMediaElement : HTMLElement {
 ...
   readonly attribute TextTrackList textTracks;
 ...
 }
 
 Then, the TextTrack interface would list them as a kind=captions,
 but without any cues, since they're not exposed. This then allows
 turning the caption tracks on/off via JavaScript. However, for
 JavaScript it is indistinguishable from a text track that has no
 captions. So the suggestion was to introduce a new kind=UARendered.

 A clarification - the suggestion was for a new TextTrack.mode value of
 ³UARendered² and for this type of TextTrack the only valid modes would be
 ³UARendered² and ³disabled². The ³hidden² and ³showing² modes would not be
 allowed since no Cues are generated. @kind would continue to denote the
 type of TextTrack.

 Bob
 
 
 My suggestion was to instead treat such tracks as burnt-in video
 tracks (by combination with the main video track):
 interface HTMLMediaElement : HTMLElement {
 ...
 
 readonly attribute VideoTrackList videoTracks;
 ...
 }
 
 Using the VideoTrack interface it would list them as a kind=captions
 and would thus also be able to be activated by JavaScript. The
 downside would that if you have N video tracks and m caption tracks in
 the media file, you'd have to expose NxM videoTracks in the interface.
 
 
 So, given this, should we introduce a kind=UARendered or expose such
 tracks a videoTracks or is there another solution that we're
 overlooking?
 
 Silvia.