Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Bob Lund


On 11/3/14, 3:41 PM, "Silvia Pfeiffer"  wrote:

>On Tue, Nov 4, 2014 at 10:24 AM, Brendan Long 
>wrote:
>>
>> On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote:
>>> On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long 
>>>wrote:
>>> Right, that was the original concern. But how realistic is the
>>> situation of n video tracks and m caption tracks with n being larger
>>> than 2 or 3 without a change of the audio track anyway?
>> I think the situation gets confusing at N=2. See below.
>>
 We would also need to consider:

   * How do you label this combined video and text track?
>>> That's not specific to the approach that we pick and will always need
>>> to be decided. Note that label isn't something that needs to be unique
>>> to a track, so you could just use the same label for all burnt-in
>>> video tracks and identify them to be different only in the language.
>> But the video and the text track might both have their own label in the
>> underlying media file. Presumably we'd want to preserve both.
>>
   * What is the track's "id"?
>>> This would need to be unique, but I think it will be easy to come up
>>> with a scheme that works. Something like "video_[n]_[captiontrackid]"
>>> could work.
>> This sounds much more complicated and likely to cause problems for
>> JavaScript developers than just indicating that a text track has cues
>> that can't be represented in JavaScript.
>>
   * How do you present this to users in a way that isn't confusing?
>>> No different to presenting caption tracks.
>> I think VideoTracks with kind=caption are confusing too, and we should
>> avoid creating more situations where we need to do that.
>>
>> Even when we only have one video, it's confusing that captions could
>> exist in multiple places.
>>
   * What if the video track's kind isn't "main"? For example, what if
we
 have a sign language track and we also want to display captions?
 What is the generated track's kind?
>>> How would that work? Are you saying we're not displaying the main
>>> video, but only displaying the sign language track? Is that realistic
>>> and something anybody would actually do?
>> It's possible, so the spec should handle it. Maybe it doesn't matter
>>though?
>>
   * The "language" attribute could also have conflicts.
>>> How so?
>> The underlying streams could have their own metadata, and it could
>> conflict. I'm not sure if it would ever be reasonable to author a file
>> like that, but it would be trivial to create. At the very least, we'd
>> need language to say which takes precedence if the two streams have
>> conflicting metadata.
>>
   * I think it might also be possible to create files where the video
 track and text track are different lengths, so we'd need to figure
 out what to do when one of them ends.
>>> The timeline of a video is well defined in the spec - I don't think we
>>> need to do more than what is already defined.
>> What I mean is that this could be confusing for users. Say I'm watching
>> a video with two video streams (main camera angle, secondary camera
>> angle) and two captions tracks (for sports for example). If I'm watching
>> the secondary camera angle and looking at one of the captions tracks,
>> but then the secondary camera angle goes away, my player is now forced
>> to randomly select one of the caption tracks combined with the primary
>> video, because it's not obvious which one corresponds with the captions
>> I was reading before.
>>
>> In fact, if I was making a video player for my website where multiple
>> people give commentary on baseball games with multiple camera angles, I
>> would probably create my own controls that parse the video track ids and
>> separates them back into video and text tracks so that I could have
>> offer separate video and text controls, since combining them just makes
>> the UI more complicated.
>
>That's what I meant with multiple video tracks: if you have several
>that require different captions, then you're in a world of hurt in any
>case and this has nothing to do with whether you're representing the
>non-cue-exposed caption tracks as UARendered or as a video track.
>
>
>> So, what's the advantage of combining video and captions, rather than
>> just indicating that a text track can't be represented as TextTrackCues?
>
>One important advantage: there's no need to change the spec.
>
>If we change the spec, we still have to work through all the issues
>that you listed above and find a solution.

Will we? I agree the case of multiple video tracks, each with different
audio/captions (possibly multiple languages) is complicated. But treating
captions as burned in video means the UA has to sort things out; leaving
them as cueless text tracks means the app figures it out. Having the app
sort it out doesn't make it easier but it is more flexible.

Also, in the case where the text tracks have cues, then the multiple
video/audio/text track case will have to be handled by the app. 

Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Brendan Long

On 11/03/2014 05:41 PM, Silvia Pfeiffer wrote:
> On Tue, Nov 4, 2014 at 10:24 AM, Brendan Long  wrote:
>> On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote:
>>> On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long  wrote:
>>> Right, that was the original concern. But how realistic is the
>>> situation of n video tracks and m caption tracks with n being larger
>>> than 2 or 3 without a change of the audio track anyway?
>> I think the situation gets confusing at N=2. See below.
>>
 We would also need to consider:

   * How do you label this combined video and text track?
>>> That's not specific to the approach that we pick and will always need
>>> to be decided. Note that label isn't something that needs to be unique
>>> to a track, so you could just use the same label for all burnt-in
>>> video tracks and identify them to be different only in the language.
>> But the video and the text track might both have their own label in the
>> underlying media file. Presumably we'd want to preserve both.
>>
   * What is the track's "id"?
>>> This would need to be unique, but I think it will be easy to come up
>>> with a scheme that works. Something like "video_[n]_[captiontrackid]"
>>> could work.
>> This sounds much more complicated and likely to cause problems for
>> JavaScript developers than just indicating that a text track has cues
>> that can't be represented in JavaScript.
>>
   * How do you present this to users in a way that isn't confusing?
>>> No different to presenting caption tracks.
>> I think VideoTracks with kind=caption are confusing too, and we should
>> avoid creating more situations where we need to do that.
>>
>> Even when we only have one video, it's confusing that captions could
>> exist in multiple places.
>>
   * What if the video track's kind isn't "main"? For example, what if we
 have a sign language track and we also want to display captions?
 What is the generated track's kind?
>>> How would that work? Are you saying we're not displaying the main
>>> video, but only displaying the sign language track? Is that realistic
>>> and something anybody would actually do?
>> It's possible, so the spec should handle it. Maybe it doesn't matter though?
>>
   * The "language" attribute could also have conflicts.
>>> How so?
>> The underlying streams could have their own metadata, and it could
>> conflict. I'm not sure if it would ever be reasonable to author a file
>> like that, but it would be trivial to create. At the very least, we'd
>> need language to say which takes precedence if the two streams have
>> conflicting metadata.
>>
   * I think it might also be possible to create files where the video
 track and text track are different lengths, so we'd need to figure
 out what to do when one of them ends.
>>> The timeline of a video is well defined in the spec - I don't think we
>>> need to do more than what is already defined.
>> What I mean is that this could be confusing for users. Say I'm watching
>> a video with two video streams (main camera angle, secondary camera
>> angle) and two captions tracks (for sports for example). If I'm watching
>> the secondary camera angle and looking at one of the captions tracks,
>> but then the secondary camera angle goes away, my player is now forced
>> to randomly select one of the caption tracks combined with the primary
>> video, because it's not obvious which one corresponds with the captions
>> I was reading before.
>>
>> In fact, if I was making a video player for my website where multiple
>> people give commentary on baseball games with multiple camera angles, I
>> would probably create my own controls that parse the video track ids and
>> separates them back into video and text tracks so that I could have
>> offer separate video and text controls, since combining them just makes
>> the UI more complicated.
> That's what I meant with multiple video tracks: if you have several
> that require different captions, then you're in a world of hurt in any
> case and this has nothing to do with whether you're representing the
> non-cue-exposed caption tracks as UARendered or as a video track.
I mean multiple video tracks that are valid for multiple caption tracks.
The example I had in my head was sports commentary, with multiple people
commenting on the same game, which is available from multiple camera angles.

We probably do need a way to indicate that tracks go together when they
don't all go together though. I think it's come up before. Maybe the
obvious answer is, "don't have tracks that don't go together in the same
file".

>> So, what's the advantage of combining video and captions, rather than
>> just indicating that a text track can't be represented as TextTrackCues?
> One important advantage: there's no need to change the spec.
>
> If we change the spec, we still have to work through all the issues
> that you listed above and find a solution.
>
> Silvia.
I suppose not changing the spec is nice, but

Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Silvia Pfeiffer
On Tue, Nov 4, 2014 at 10:24 AM, Brendan Long  wrote:
>
> On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote:
>> On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long  wrote:
>> Right, that was the original concern. But how realistic is the
>> situation of n video tracks and m caption tracks with n being larger
>> than 2 or 3 without a change of the audio track anyway?
> I think the situation gets confusing at N=2. See below.
>
>>> We would also need to consider:
>>>
>>>   * How do you label this combined video and text track?
>> That's not specific to the approach that we pick and will always need
>> to be decided. Note that label isn't something that needs to be unique
>> to a track, so you could just use the same label for all burnt-in
>> video tracks and identify them to be different only in the language.
> But the video and the text track might both have their own label in the
> underlying media file. Presumably we'd want to preserve both.
>
>>>   * What is the track's "id"?
>> This would need to be unique, but I think it will be easy to come up
>> with a scheme that works. Something like "video_[n]_[captiontrackid]"
>> could work.
> This sounds much more complicated and likely to cause problems for
> JavaScript developers than just indicating that a text track has cues
> that can't be represented in JavaScript.
>
>>>   * How do you present this to users in a way that isn't confusing?
>> No different to presenting caption tracks.
> I think VideoTracks with kind=caption are confusing too, and we should
> avoid creating more situations where we need to do that.
>
> Even when we only have one video, it's confusing that captions could
> exist in multiple places.
>
>>>   * What if the video track's kind isn't "main"? For example, what if we
>>> have a sign language track and we also want to display captions?
>>> What is the generated track's kind?
>> How would that work? Are you saying we're not displaying the main
>> video, but only displaying the sign language track? Is that realistic
>> and something anybody would actually do?
> It's possible, so the spec should handle it. Maybe it doesn't matter though?
>
>>>   * The "language" attribute could also have conflicts.
>> How so?
> The underlying streams could have their own metadata, and it could
> conflict. I'm not sure if it would ever be reasonable to author a file
> like that, but it would be trivial to create. At the very least, we'd
> need language to say which takes precedence if the two streams have
> conflicting metadata.
>
>>>   * I think it might also be possible to create files where the video
>>> track and text track are different lengths, so we'd need to figure
>>> out what to do when one of them ends.
>> The timeline of a video is well defined in the spec - I don't think we
>> need to do more than what is already defined.
> What I mean is that this could be confusing for users. Say I'm watching
> a video with two video streams (main camera angle, secondary camera
> angle) and two captions tracks (for sports for example). If I'm watching
> the secondary camera angle and looking at one of the captions tracks,
> but then the secondary camera angle goes away, my player is now forced
> to randomly select one of the caption tracks combined with the primary
> video, because it's not obvious which one corresponds with the captions
> I was reading before.
>
> In fact, if I was making a video player for my website where multiple
> people give commentary on baseball games with multiple camera angles, I
> would probably create my own controls that parse the video track ids and
> separates them back into video and text tracks so that I could have
> offer separate video and text controls, since combining them just makes
> the UI more complicated.

That's what I meant with multiple video tracks: if you have several
that require different captions, then you're in a world of hurt in any
case and this has nothing to do with whether you're representing the
non-cue-exposed caption tracks as UARendered or as a video track.


> So, what's the advantage of combining video and captions, rather than
> just indicating that a text track can't be represented as TextTrackCues?

One important advantage: there's no need to change the spec.

If we change the spec, we still have to work through all the issues
that you listed above and find a solution.

Silvia.


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Brendan Long

On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote:
> On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long  wrote:
> Right, that was the original concern. But how realistic is the
> situation of n video tracks and m caption tracks with n being larger
> than 2 or 3 without a change of the audio track anyway?
I think the situation gets confusing at N=2. See below.

>> We would also need to consider:
>>
>>   * How do you label this combined video and text track?
> That's not specific to the approach that we pick and will always need
> to be decided. Note that label isn't something that needs to be unique
> to a track, so you could just use the same label for all burnt-in
> video tracks and identify them to be different only in the language.
But the video and the text track might both have their own label in the
underlying media file. Presumably we'd want to preserve both.

>>   * What is the track's "id"?
> This would need to be unique, but I think it will be easy to come up
> with a scheme that works. Something like "video_[n]_[captiontrackid]"
> could work.
This sounds much more complicated and likely to cause problems for
JavaScript developers than just indicating that a text track has cues
that can't be represented in JavaScript.

>>   * How do you present this to users in a way that isn't confusing?
> No different to presenting caption tracks.
I think VideoTracks with kind=caption are confusing too, and we should
avoid creating more situations where we need to do that.

Even when we only have one video, it's confusing that captions could
exist in multiple places.

>>   * What if the video track's kind isn't "main"? For example, what if we
>> have a sign language track and we also want to display captions?
>> What is the generated track's kind?
> How would that work? Are you saying we're not displaying the main
> video, but only displaying the sign language track? Is that realistic
> and something anybody would actually do?
It's possible, so the spec should handle it. Maybe it doesn't matter though?

>>   * The "language" attribute could also have conflicts.
> How so?
The underlying streams could have their own metadata, and it could
conflict. I'm not sure if it would ever be reasonable to author a file
like that, but it would be trivial to create. At the very least, we'd
need language to say which takes precedence if the two streams have
conflicting metadata.

>>   * I think it might also be possible to create files where the video
>> track and text track are different lengths, so we'd need to figure
>> out what to do when one of them ends.
> The timeline of a video is well defined in the spec - I don't think we
> need to do more than what is already defined.
What I mean is that this could be confusing for users. Say I'm watching
a video with two video streams (main camera angle, secondary camera
angle) and two captions tracks (for sports for example). If I'm watching
the secondary camera angle and looking at one of the captions tracks,
but then the secondary camera angle goes away, my player is now forced
to randomly select one of the caption tracks combined with the primary
video, because it's not obvious which one corresponds with the captions
I was reading before.

In fact, if I was making a video player for my website where multiple
people give commentary on baseball games with multiple camera angles, I
would probably create my own controls that parse the video track ids and
separates them back into video and text tracks so that I could have
offer separate video and text controls, since combining them just makes
the UI more complicated.


So, what's the advantage of combining video and captions, rather than
just indicating that a text track can't be represented as TextTrackCues?


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Silvia Pfeiffer
On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long  wrote:
>
> On 10/27/2014 08:43 PM, Silvia Pfeiffer wrote:
>> On Tue, Oct 28, 2014 at 2:41 AM, Philip Jägenstedt  wrote:
>>> On Sun, Oct 26, 2014 at 8:28 AM, Silvia Pfeiffer
>>>  wrote:
 On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt  
 wrote:
> On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
>  wrote:
>> Using the VideoTrack interface it would list them as a kind="captions"
>> and would thus also be able to be activated by JavaScript. The
>> downside would that if you have N video tracks and m caption tracks in
>> the media file, you'd have to expose NxM videoTracks in the interface.
> VideoTrackList can have at most one video track selected at a time, so
> representing this as a VideoTrack would require some additional
> tweaking to the model.
 The "captions" video track is one that has video and captions rendered
 together, so you only need the one video track active. If you want to
 turn off captions, you merely activate a different video track which
 is one without captions.

 There is no change to the model necessary - in fact, it fits perfectly
 to what the spec is currently describing without any change.
>>> Ah, right! Unless I'm misunderstanding again, your suggestion is to
>>> expose extra video tracks with kind captions or subtitles, requiring
>>> no spec change at all. That sounds good to me.
>> Yes, that was my suggestion for dealing with UA rendered tracks.
>
> Doesn't this still leave us with the issue: "if you have N video tracks
> and m caption tracks in
> the media file, you'd have to expose NxM videoTracks in the interface"?

Right, that was the original concern. But how realistic is the
situation of n video tracks and m caption tracks with n being larger
than 2 or 3 without a change of the audio track anyway?

> We would also need to consider:
>
>   * How do you label this combined video and text track?

That's not specific to the approach that we pick and will always need
to be decided. Note that label isn't something that needs to be unique
to a track, so you could just use the same label for all burnt-in
video tracks and identify them to be different only in the language.

>   * What is the track's "id"?

This would need to be unique, but I think it will be easy to come up
with a scheme that works. Something like "video_[n]_[captiontrackid]"
could work.

>   * How do you present this to users in a way that isn't confusing?

No different to presenting caption tracks.

>   * What if the video track's kind isn't "main"? For example, what if we
> have a sign language track and we also want to display captions?
> What is the generated track's kind?

How would that work? Are you saying we're not displaying the main
video, but only displaying the sign language track? Is that realistic
and something anybody would actually do?

>   * The "language" attribute could also have conflicts.

How so?

>   * I think it might also be possible to create files where the video
> track and text track are different lengths, so we'd need to figure
> out what to do when one of them ends.

The timeline of a video is well defined in the spec - I don't think we
need to do more than what is already defined.

Silvia.


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-11-03 Thread Brendan Long

On 10/27/2014 08:43 PM, Silvia Pfeiffer wrote:
> On Tue, Oct 28, 2014 at 2:41 AM, Philip Jägenstedt  wrote:
>> On Sun, Oct 26, 2014 at 8:28 AM, Silvia Pfeiffer
>>  wrote:
>>> On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt  
>>> wrote:
 On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
  wrote:
> Using the VideoTrack interface it would list them as a kind="captions"
> and would thus also be able to be activated by JavaScript. The
> downside would that if you have N video tracks and m caption tracks in
> the media file, you'd have to expose NxM videoTracks in the interface.
 VideoTrackList can have at most one video track selected at a time, so
 representing this as a VideoTrack would require some additional
 tweaking to the model.
>>> The "captions" video track is one that has video and captions rendered
>>> together, so you only need the one video track active. If you want to
>>> turn off captions, you merely activate a different video track which
>>> is one without captions.
>>>
>>> There is no change to the model necessary - in fact, it fits perfectly
>>> to what the spec is currently describing without any change.
>> Ah, right! Unless I'm misunderstanding again, your suggestion is to
>> expose extra video tracks with kind captions or subtitles, requiring
>> no spec change at all. That sounds good to me.
> Yes, that was my suggestion for dealing with UA rendered tracks.
Doesn't this still leave us with the issue: "if you have N video tracks
and m caption tracks in
the media file, you'd have to expose NxM videoTracks in the interface"?
We would also need to consider:

  * How do you label this combined video and text track?
  * What is the track's "id"?
  * How do you present this to users in a way that isn't confusing?
  * What if the video track's kind isn't "main"? For example, what if we
have a sign language track and we also want to display captions?
What is the generated track's kind?
  * The "language" attribute could also have conflicts.
  * I think it might also be possible to create files where the video
track and text track are different lengths, so we'd need to figure
out what to do when one of them ends.



Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-27 Thread Silvia Pfeiffer
On Tue, Oct 28, 2014 at 2:41 AM, Philip Jägenstedt  wrote:
> On Sun, Oct 26, 2014 at 8:28 AM, Silvia Pfeiffer
>  wrote:
>>
>> On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt  wrote:
>> > On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
>> >  wrote:
>> >>
>> >> Hi all,
>> >>
>> >> In the Inband Text Tracks Community Group we've recently had a
>> >> discussion about a proposal by HbbTV. I'd like to bring it up here to
>> >> get some opinions on how to resolve the issue.
>> >>
>> >> (The discussion thread is at
>> >> http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
>> >> , but let me summarize it here, because it's a bit spread out.)
>> >>
>> >> The proposed use case is as follows:
>> >> * there are MPEG-2 files that have an audio, a video and several caption 
>> >> tracks
>> >> * the caption tracks are not in WebVTT format but in formats that
>> >> existing Digital TV receivers are already capable of decoding and
>> >> displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
>> >> * there is no intention to standardize a TextTrackCue format for those
>> >> other formats (statements are: there are too many formats to deal
>> >> with, a set-top-box won't need access to cues)
>> >>
>> >> The request was to expose such caption tracks as textTracks:
>> >> interface HTMLMediaElement : HTMLElement {
>> >> ...
>> >>   readonly attribute TextTrackList textTracks;
>> >> ...
>> >> }
>> >>
>> >> Then, the TextTrack interface would list them as a kind="captions",
>> >> but without any cues, since they're not exposed. This then allows
>> >> turning the caption tracks on/off via JavaScript. However, for
>> >> JavaScript it is indistinguishable from a text track that has no
>> >> captions. So the suggestion was to introduce a new kind="UARendered".
>> >>
>> >>
>> >> My suggestion was to instead treat such tracks as burnt-in video
>> >> tracks (by combination with the main video track):
>> >> interface HTMLMediaElement : HTMLElement {
>> >> ...
>> >>
>> >> readonly attribute VideoTrackList videoTracks;
>> >> ...
>> >> }
>> >>
>> >> Using the VideoTrack interface it would list them as a kind="captions"
>> >> and would thus also be able to be activated by JavaScript. The
>> >> downside would that if you have N video tracks and m caption tracks in
>> >> the media file, you'd have to expose NxM videoTracks in the interface.
>> >>
>> >>
>> >> So, given this, should we introduce a kind="UARendered" or expose such
>> >> tracks a videoTracks or is there another solution that we're
>> >> overlooking?
>> >
>> > VideoTrackList can have at most one video track selected at a time, so
>> > representing this as a VideoTrack would require some additional
>> > tweaking to the model.
>>
>> The "captions" video track is one that has video and captions rendered
>> together, so you only need the one video track active. If you want to
>> turn off captions, you merely activate a different video track which
>> is one without captions.
>>
>> There is no change to the model necessary - in fact, it fits perfectly
>> to what the spec is currently describing without any change.
>
> Ah, right! Unless I'm misunderstanding again, your suggestion is to
> expose extra video tracks with kind captions or subtitles, requiring
> no spec change at all. That sounds good to me.


Yes, that was my suggestion for dealing with UA rendered tracks.

Cheers,
Silvia.


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-27 Thread Philip Jägenstedt
On Sun, Oct 26, 2014 at 8:28 AM, Silvia Pfeiffer
 wrote:
>
> On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt  wrote:
> > On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
> >  wrote:
> >>
> >> Hi all,
> >>
> >> In the Inband Text Tracks Community Group we've recently had a
> >> discussion about a proposal by HbbTV. I'd like to bring it up here to
> >> get some opinions on how to resolve the issue.
> >>
> >> (The discussion thread is at
> >> http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
> >> , but let me summarize it here, because it's a bit spread out.)
> >>
> >> The proposed use case is as follows:
> >> * there are MPEG-2 files that have an audio, a video and several caption 
> >> tracks
> >> * the caption tracks are not in WebVTT format but in formats that
> >> existing Digital TV receivers are already capable of decoding and
> >> displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
> >> * there is no intention to standardize a TextTrackCue format for those
> >> other formats (statements are: there are too many formats to deal
> >> with, a set-top-box won't need access to cues)
> >>
> >> The request was to expose such caption tracks as textTracks:
> >> interface HTMLMediaElement : HTMLElement {
> >> ...
> >>   readonly attribute TextTrackList textTracks;
> >> ...
> >> }
> >>
> >> Then, the TextTrack interface would list them as a kind="captions",
> >> but without any cues, since they're not exposed. This then allows
> >> turning the caption tracks on/off via JavaScript. However, for
> >> JavaScript it is indistinguishable from a text track that has no
> >> captions. So the suggestion was to introduce a new kind="UARendered".
> >>
> >>
> >> My suggestion was to instead treat such tracks as burnt-in video
> >> tracks (by combination with the main video track):
> >> interface HTMLMediaElement : HTMLElement {
> >> ...
> >>
> >> readonly attribute VideoTrackList videoTracks;
> >> ...
> >> }
> >>
> >> Using the VideoTrack interface it would list them as a kind="captions"
> >> and would thus also be able to be activated by JavaScript. The
> >> downside would that if you have N video tracks and m caption tracks in
> >> the media file, you'd have to expose NxM videoTracks in the interface.
> >>
> >>
> >> So, given this, should we introduce a kind="UARendered" or expose such
> >> tracks a videoTracks or is there another solution that we're
> >> overlooking?
> >
> > VideoTrackList can have at most one video track selected at a time, so
> > representing this as a VideoTrack would require some additional
> > tweaking to the model.
>
> The "captions" video track is one that has video and captions rendered
> together, so you only need the one video track active. If you want to
> turn off captions, you merely activate a different video track which
> is one without captions.
>
> There is no change to the model necessary - in fact, it fits perfectly
> to what the spec is currently describing without any change.

Ah, right! Unless I'm misunderstanding again, your suggestion is to
expose extra video tracks with kind captions or subtitles, requiring
no spec change at all. That sounds good to me.

Philip


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-26 Thread Silvia Pfeiffer
On Thu, Oct 23, 2014 at 2:33 AM, Bob Lund  wrote:
>
>
> On 10/22/14, 9:01 AM, "Philip Jägenstedt"  wrote:
>
>>On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
>> wrote:
>>>
>>> Hi all,
>>>
>>> In the Inband Text Tracks Community Group we've recently had a
>>> discussion about a proposal by HbbTV. I'd like to bring it up here to
>>> get some opinions on how to resolve the issue.
>>>
>>> (The discussion thread is at
>>>
>>>http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
>>> , but let me summarize it here, because it's a bit spread out.)
>>>
>>> The proposed use case is as follows:
>>> * there are MPEG-2 files that have an audio, a video and several
>>>caption tracks
>>> * the caption tracks are not in WebVTT format but in formats that
>>> existing Digital TV receivers are already capable of decoding and
>>> displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
>>> * there is no intention to standardize a TextTrackCue format for those
>>> other formats (statements are: there are too many formats to deal
>>> with, a set-top-box won't need access to cues)
>>>
>>> The request was to expose such caption tracks as textTracks:
>>> interface HTMLMediaElement : HTMLElement {
>>> ...
>>>   readonly attribute TextTrackList textTracks;
>>> ...
>>> }
>>>
>>> Then, the TextTrack interface would list them as a kind="captions",
>>> but without any cues, since they're not exposed. This then allows
>>> turning the caption tracks on/off via JavaScript. However, for
>>> JavaScript it is indistinguishable from a text track that has no
>>> captions. So the suggestion was to introduce a new kind="UARendered".
>>>
>>>
>>> My suggestion was to instead treat such tracks as burnt-in video
>>> tracks (by combination with the main video track):
>>> interface HTMLMediaElement : HTMLElement {
>>> ...
>>>
>>> readonly attribute VideoTrackList videoTracks;
>>> ...
>>> }
>>>
>>> Using the VideoTrack interface it would list them as a kind="captions"
>>> and would thus also be able to be activated by JavaScript. The
>>> downside would that if you have N video tracks and m caption tracks in
>>> the media file, you'd have to expose NxM videoTracks in the interface.
>>>
>>>
>>> So, given this, should we introduce a kind="UARendered" or expose such
>>> tracks a videoTracks or is there another solution that we're
>>> overlooking?
>>
>>VideoTrackList can have at most one video track selected at a time, so
>>representing this as a VideoTrack would require some additional
>>tweaking to the model.
>>
>>A separate text track kind seems better, but wouldn't it still be
>>useful to distinguish between captions and subtitles even if the
>>underlying data is unavailable?
>
> This issue was clarified here [1]. TextTrack.mode would be set
> ³uarendered². TextTrack.kind would still reflect ³captions² or ³subtitles².

OK, right that's another approach and probably better than introducing
a different kind.

> [1]
> http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0154.html
>
>>
>>Philip
>


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-26 Thread Silvia Pfeiffer
On Thu, Oct 23, 2014 at 2:01 AM, Philip Jägenstedt  wrote:
> On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
>  wrote:
>>
>> Hi all,
>>
>> In the Inband Text Tracks Community Group we've recently had a
>> discussion about a proposal by HbbTV. I'd like to bring it up here to
>> get some opinions on how to resolve the issue.
>>
>> (The discussion thread is at
>> http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
>> , but let me summarize it here, because it's a bit spread out.)
>>
>> The proposed use case is as follows:
>> * there are MPEG-2 files that have an audio, a video and several caption 
>> tracks
>> * the caption tracks are not in WebVTT format but in formats that
>> existing Digital TV receivers are already capable of decoding and
>> displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
>> * there is no intention to standardize a TextTrackCue format for those
>> other formats (statements are: there are too many formats to deal
>> with, a set-top-box won't need access to cues)
>>
>> The request was to expose such caption tracks as textTracks:
>> interface HTMLMediaElement : HTMLElement {
>> ...
>>   readonly attribute TextTrackList textTracks;
>> ...
>> }
>>
>> Then, the TextTrack interface would list them as a kind="captions",
>> but without any cues, since they're not exposed. This then allows
>> turning the caption tracks on/off via JavaScript. However, for
>> JavaScript it is indistinguishable from a text track that has no
>> captions. So the suggestion was to introduce a new kind="UARendered".
>>
>>
>> My suggestion was to instead treat such tracks as burnt-in video
>> tracks (by combination with the main video track):
>> interface HTMLMediaElement : HTMLElement {
>> ...
>>
>> readonly attribute VideoTrackList videoTracks;
>> ...
>> }
>>
>> Using the VideoTrack interface it would list them as a kind="captions"
>> and would thus also be able to be activated by JavaScript. The
>> downside would that if you have N video tracks and m caption tracks in
>> the media file, you'd have to expose NxM videoTracks in the interface.
>>
>>
>> So, given this, should we introduce a kind="UARendered" or expose such
>> tracks a videoTracks or is there another solution that we're
>> overlooking?
>
> VideoTrackList can have at most one video track selected at a time, so
> representing this as a VideoTrack would require some additional
> tweaking to the model.

The "captions" video track is one that has video and captions rendered
together, so you only need the one video track active. If you want to
turn off captions, you merely activate a different video track which
is one without captions.

There is no change to the model necessary - in fact, it fits perfectly
to what the spec is currently describing without any change.


> A separate text track kind seems better, but wouldn't it still be
> useful to distinguish between captions and subtitles even if the
> underlying data is unavailable?

As stated, the proposal was to introduce kind="UARendered" and that
would introduce a change to the spec.

Regards,
Silvia.


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-23 Thread Philip Jägenstedt
On Wed, Oct 22, 2014 at 5:33 PM, Bob Lund  wrote:
>
>
> On 10/22/14, 9:01 AM, "Philip Jägenstedt"  wrote:
>
>>On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
>> wrote:
>>>
>>> Hi all,
>>>
>>> In the Inband Text Tracks Community Group we've recently had a
>>> discussion about a proposal by HbbTV. I'd like to bring it up here to
>>> get some opinions on how to resolve the issue.
>>>
>>> (The discussion thread is at
>>>
>>>http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
>>> , but let me summarize it here, because it's a bit spread out.)
>>>
>>> The proposed use case is as follows:
>>> * there are MPEG-2 files that have an audio, a video and several
>>>caption tracks
>>> * the caption tracks are not in WebVTT format but in formats that
>>> existing Digital TV receivers are already capable of decoding and
>>> displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
>>> * there is no intention to standardize a TextTrackCue format for those
>>> other formats (statements are: there are too many formats to deal
>>> with, a set-top-box won't need access to cues)
>>>
>>> The request was to expose such caption tracks as textTracks:
>>> interface HTMLMediaElement : HTMLElement {
>>> ...
>>>   readonly attribute TextTrackList textTracks;
>>> ...
>>> }
>>>
>>> Then, the TextTrack interface would list them as a kind="captions",
>>> but without any cues, since they're not exposed. This then allows
>>> turning the caption tracks on/off via JavaScript. However, for
>>> JavaScript it is indistinguishable from a text track that has no
>>> captions. So the suggestion was to introduce a new kind="UARendered".
>>>
>>>
>>> My suggestion was to instead treat such tracks as burnt-in video
>>> tracks (by combination with the main video track):
>>> interface HTMLMediaElement : HTMLElement {
>>> ...
>>>
>>> readonly attribute VideoTrackList videoTracks;
>>> ...
>>> }
>>>
>>> Using the VideoTrack interface it would list them as a kind="captions"
>>> and would thus also be able to be activated by JavaScript. The
>>> downside would that if you have N video tracks and m caption tracks in
>>> the media file, you'd have to expose NxM videoTracks in the interface.
>>>
>>>
>>> So, given this, should we introduce a kind="UARendered" or expose such
>>> tracks a videoTracks or is there another solution that we're
>>> overlooking?
>>
>>VideoTrackList can have at most one video track selected at a time, so
>>representing this as a VideoTrack would require some additional
>>tweaking to the model.
>>
>>A separate text track kind seems better, but wouldn't it still be
>>useful to distinguish between captions and subtitles even if the
>>underlying data is unavailable?
>
> This issue was clarified here [1]. TextTrack.mode would be set
> ³uarendered². TextTrack.kind would still reflect ³captions² or ³subtitles².
>
> [1]
> http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0154.html

Oops, I missed that.

I was under the impression that the ability for scripts to detect this
situation was the motivation for a spec change. If there are multiple
tracks most likely all but one will be "disabled" initially, which
would be indistinguishable from a disabled track with no cues. Since
TextTrack.mode is mutable, even when it is initially "uarendered",
scripts would have to remember that before disabling the track, which
seems a bit inconvenient.

P.S. Your mails have an encoding problem resulting in superscript
numbers instead of quotes.

Philip


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-22 Thread Bob Lund


On 10/22/14, 9:01 AM, "Philip Jägenstedt"  wrote:

>On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
> wrote:
>>
>> Hi all,
>>
>> In the Inband Text Tracks Community Group we've recently had a
>> discussion about a proposal by HbbTV. I'd like to bring it up here to
>> get some opinions on how to resolve the issue.
>>
>> (The discussion thread is at
>> 
>>http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
>> , but let me summarize it here, because it's a bit spread out.)
>>
>> The proposed use case is as follows:
>> * there are MPEG-2 files that have an audio, a video and several
>>caption tracks
>> * the caption tracks are not in WebVTT format but in formats that
>> existing Digital TV receivers are already capable of decoding and
>> displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
>> * there is no intention to standardize a TextTrackCue format for those
>> other formats (statements are: there are too many formats to deal
>> with, a set-top-box won't need access to cues)
>>
>> The request was to expose such caption tracks as textTracks:
>> interface HTMLMediaElement : HTMLElement {
>> ...
>>   readonly attribute TextTrackList textTracks;
>> ...
>> }
>>
>> Then, the TextTrack interface would list them as a kind="captions",
>> but without any cues, since they're not exposed. This then allows
>> turning the caption tracks on/off via JavaScript. However, for
>> JavaScript it is indistinguishable from a text track that has no
>> captions. So the suggestion was to introduce a new kind="UARendered".
>>
>>
>> My suggestion was to instead treat such tracks as burnt-in video
>> tracks (by combination with the main video track):
>> interface HTMLMediaElement : HTMLElement {
>> ...
>>
>> readonly attribute VideoTrackList videoTracks;
>> ...
>> }
>>
>> Using the VideoTrack interface it would list them as a kind="captions"
>> and would thus also be able to be activated by JavaScript. The
>> downside would that if you have N video tracks and m caption tracks in
>> the media file, you'd have to expose NxM videoTracks in the interface.
>>
>>
>> So, given this, should we introduce a kind="UARendered" or expose such
>> tracks a videoTracks or is there another solution that we're
>> overlooking?
>
>VideoTrackList can have at most one video track selected at a time, so
>representing this as a VideoTrack would require some additional
>tweaking to the model.
>
>A separate text track kind seems better, but wouldn't it still be
>useful to distinguish between captions and subtitles even if the
>underlying data is unavailable?

This issue was clarified here [1]. TextTrack.mode would be set
³uarendered². TextTrack.kind would still reflect ³captions² or ³subtitles².

[1] 
http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0154.html

>
>Philip



Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-22 Thread Philip Jägenstedt
On Sun, Oct 12, 2014 at 11:45 AM, Silvia Pfeiffer
 wrote:
>
> Hi all,
>
> In the Inband Text Tracks Community Group we've recently had a
> discussion about a proposal by HbbTV. I'd like to bring it up here to
> get some opinions on how to resolve the issue.
>
> (The discussion thread is at
> http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
> , but let me summarize it here, because it's a bit spread out.)
>
> The proposed use case is as follows:
> * there are MPEG-2 files that have an audio, a video and several caption 
> tracks
> * the caption tracks are not in WebVTT format but in formats that
> existing Digital TV receivers are already capable of decoding and
> displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
> * there is no intention to standardize a TextTrackCue format for those
> other formats (statements are: there are too many formats to deal
> with, a set-top-box won't need access to cues)
>
> The request was to expose such caption tracks as textTracks:
> interface HTMLMediaElement : HTMLElement {
> ...
>   readonly attribute TextTrackList textTracks;
> ...
> }
>
> Then, the TextTrack interface would list them as a kind="captions",
> but without any cues, since they're not exposed. This then allows
> turning the caption tracks on/off via JavaScript. However, for
> JavaScript it is indistinguishable from a text track that has no
> captions. So the suggestion was to introduce a new kind="UARendered".
>
>
> My suggestion was to instead treat such tracks as burnt-in video
> tracks (by combination with the main video track):
> interface HTMLMediaElement : HTMLElement {
> ...
>
> readonly attribute VideoTrackList videoTracks;
> ...
> }
>
> Using the VideoTrack interface it would list them as a kind="captions"
> and would thus also be able to be activated by JavaScript. The
> downside would that if you have N video tracks and m caption tracks in
> the media file, you'd have to expose NxM videoTracks in the interface.
>
>
> So, given this, should we introduce a kind="UARendered" or expose such
> tracks a videoTracks or is there another solution that we're
> overlooking?

VideoTrackList can have at most one video track selected at a time, so
representing this as a VideoTrack would require some additional
tweaking to the model.

A separate text track kind seems better, but wouldn't it still be
useful to distinguish between captions and subtitles even if the
underlying data is unavailable?

Philip


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-13 Thread Elliott Sprehn
What does "UA rendered" mean? How does the UA render it? Can the UA just
convert the format into WebVTT instead?

On Mon, Oct 13, 2014 at 11:15 AM, Bob Lund  wrote:

>
>
> On 10/12/14, 3:45 AM, "Silvia Pfeiffer"  wrote:
>
> >Hi all,
> >
> >In the Inband Text Tracks Community Group we've recently had a
> >discussion about a proposal by HbbTV. I'd like to bring it up here to
> >get some opinions on how to resolve the issue.
> >
> >(The discussion thread is at
> >http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
> >, but let me summarize it here, because it's a bit spread out.)
> >
> >The proposed use case is as follows:
> >* there are MPEG-2 files that have an audio, a video and several caption
> >tracks
> >* the caption tracks are not in WebVTT format but in formats that
> >existing Digital TV receivers are already capable of decoding and
> >displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
> >* there is no intention to standardize a TextTrackCue format for those
> >other formats (statements are: there are too many formats to deal
> >with, a set-top-box won't need access to cues)
> >
> >The request was to expose such caption tracks as textTracks:
> >interface HTMLMediaElement : HTMLElement {
> >...
> >  readonly attribute TextTrackList textTracks;
> >...
> >}
> >
> >Then, the TextTrack interface would list them as a kind="captions",
> >but without any cues, since they're not exposed. This then allows
> >turning the caption tracks on/off via JavaScript. However, for
> >JavaScript it is indistinguishable from a text track that has no
> >captions. So the suggestion was to introduce a new kind="UARendered".
>
> A clarification - the suggestion was for a new TextTrack.mode value of
> ³UARendered² and for this type of TextTrack the only valid modes would be
> ³UARendered² and ³disabled². The ³hidden² and ³showing² modes would not be
> allowed since no Cues are generated. @kind would continue to denote the
> type of TextTrack.
>
> Bob
> >
> >
> >My suggestion was to instead treat such tracks as burnt-in video
> >tracks (by combination with the main video track):
> >interface HTMLMediaElement : HTMLElement {
> >...
> >
> >readonly attribute VideoTrackList videoTracks;
> >...
> >}
> >
> >Using the VideoTrack interface it would list them as a kind="captions"
> >and would thus also be able to be activated by JavaScript. The
> >downside would that if you have N video tracks and m caption tracks in
> >the media file, you'd have to expose NxM videoTracks in the interface.
> >
> >
> >So, given this, should we introduce a kind="UARendered" or expose such
> >tracks a videoTracks or is there another solution that we're
> >overlooking?
> >
> >Silvia.
>
>


Re: [whatwg] How to expose caption tracks without TextTrackCues

2014-10-13 Thread Bob Lund


On 10/12/14, 3:45 AM, "Silvia Pfeiffer"  wrote:

>Hi all,
>
>In the Inband Text Tracks Community Group we've recently had a
>discussion about a proposal by HbbTV. I'd like to bring it up here to
>get some opinions on how to resolve the issue.
>
>(The discussion thread is at
>http://lists.w3.org/Archives/Public/public-inbandtracks/2014Sep/0008.html
>, but let me summarize it here, because it's a bit spread out.)
>
>The proposed use case is as follows:
>* there are MPEG-2 files that have an audio, a video and several caption
>tracks
>* the caption tracks are not in WebVTT format but in formats that
>existing Digital TV receivers are already capable of decoding and
>displaying (e.g. CEA708, DVB-T, DVB-S, TTML)
>* there is no intention to standardize a TextTrackCue format for those
>other formats (statements are: there are too many formats to deal
>with, a set-top-box won't need access to cues)
>
>The request was to expose such caption tracks as textTracks:
>interface HTMLMediaElement : HTMLElement {
>...
>  readonly attribute TextTrackList textTracks;
>...
>}
>
>Then, the TextTrack interface would list them as a kind="captions",
>but without any cues, since they're not exposed. This then allows
>turning the caption tracks on/off via JavaScript. However, for
>JavaScript it is indistinguishable from a text track that has no
>captions. So the suggestion was to introduce a new kind="UARendered".

A clarification - the suggestion was for a new TextTrack.mode value of
³UARendered² and for this type of TextTrack the only valid modes would be
³UARendered² and ³disabled². The ³hidden² and ³showing² modes would not be
allowed since no Cues are generated. @kind would continue to denote the
type of TextTrack.

Bob
>
>
>My suggestion was to instead treat such tracks as burnt-in video
>tracks (by combination with the main video track):
>interface HTMLMediaElement : HTMLElement {
>...
>
>readonly attribute VideoTrackList videoTracks;
>...
>}
>
>Using the VideoTrack interface it would list them as a kind="captions"
>and would thus also be able to be activated by JavaScript. The
>downside would that if you have N video tracks and m caption tracks in
>the media file, you'd have to expose NxM videoTracks in the interface.
>
>
>So, given this, should we introduce a kind="UARendered" or expose such
>tracks a videoTracks or is there another solution that we're
>overlooking?
>
>Silvia.