On Wed, 08 Sep 2010 01:19:17 +0200, Ian Hickson <i...@hixie.ch> wrote:

On Fri, 23 Jul 2010, Philip Jägenstedt wrote:

http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#attr-track-kind

The distinction between subtitles and captions isn't terribly clear.

It says that subtitles are translations, but plain transcriptions
without cues for the hard of hearing would also be subtitles.

How does one categorize translations that are for the HoH?

I've tried to clarify this.

Thanks, the new definitions look good to me.

Alternatively, might it not be better to simply use the voice "sound"
for this and let the default stylesheet hide those cues? When writing
subtitles I don't want the maintenance overhead of 2 different versions
that differ only by the inclusion of [doorbell rings] and similar.
Honestly, it's more likely that I just wouldn't bother with
accessibility for the HoH at all. If I could add it with <sound>doorbell
rings, it's far more likely I would do that, as long as it isn't
rendered by default. This is my preferred solution, then keeping only
one of kind=subtitles and kind=captions. Enabling the HoH-cues could
then be a global preference in the browser, or done from the context
menu of individual videos.

I don't disagree with this, but I fear it might be too radical a step for
the caption-authoring community to take at this point.

Well, I guess the infrastructure in place is enough to do this by changing stylesheets.

If we must have both kind=subtitles and kind=captions, then I'd suggest
making the default subtitles, as that is without a doubt the most common
kind of timed text. Making captions the default only means that most
timed text will be mislabeled as being appropriate for the HoH when it
is not.

Ok, I've changed the default. However, I'm not fighting this battle if it
comes up again, and will just change it back if people don't defend having
this as the default. (And then change it back again if the browsers pick
"subtitles" in their implementations after all, of course.)

Note that captions aren't just for users that are hard-of-hearing. Most of
the time when I use timed tracks, I want captions, because the reason I
have them enabled is that I have the sound muted.

OK, thanks!

On Fri, 23 Jul 2010, Sam Dutton wrote:

Is trackgroup out of the spec?

What is trackgroup?

In the discussion on public-html-a11y <trackgroup> was suggested to group together mutually exclusive tracks, so that enabling one automatically disables the others in the same trackgroup.

I guess it's up to the UA how to enable and disable <track>s now, but the only option is making them all mutually exclusive (as existing players do) or a weird kind of context menu where it's possible to enable and disable tracks completely independently. Neither options is great, but as a user I would almost certainly prefer all tracks being mutually exclusive and requiring scripts to enable several at once.

On Fri, 6 Aug 2010, Philip Jägenstedt wrote:

I'm not particularly fond of the current voice markup, mainly for 2
reasons:

First, a cue can only have 1 voice, which makes it impossible to style
cues spoken/sung simultaneously by 2 or more voices. There's a karaoke
example of this in
<http://wiki.whatwg.org/wiki/Use_cases_for_timed_tracks_rendered_over_video_by_the_UA#Multiple_voices>

That's just two cues.

I'm not sure what you're saying. The male singer's cues are in blue, the female singer's are in red and the part sung together is in green. Are you saying that the last cue should be made into two cues, or something else?

I would prefer if voices could be mixed, as such:

00:01.000 --> 00:02.000
<1> Speaker 1

00:03.000 --> 00:04.000
<2> Speaker 2

00:05.000 --> 00:06.000
<1><2> Speaker 1+2

What's the use case?

To use a different style for the cues that are sung together, so that you know when it's your turn to sing. I hope we can throw away the numerical voices, continued below...

Second, it makes it impossible to target a smaller part of the cue for
styling. We have <i> and <b>, but there are also cases where part of the
cue should be in a different color, see
<http://wiki.whatwg.org/wiki/Use_cases_for_timed_tracks_rendered_over_video_by_the_UA#Multiple_colors>

Well you can always restyle <i> or <b>.

That would be quite an abuse of <i> and <b> and would give bogus italics/bold text in standalone players.

If one allows multiple voices, it's not hard to predict that people will
start using magic numbers just to work around this, which would both be
wrong semantically and ugly to look at:

00:01.000 --> 00:02.000
<1> I like <1234>blue</1234> words.

They'd then target 1234 with CSS to color it blue.

I'm not sure of the best solution. I'd quite like the ability to use
arbitrary voices, e.g. to use the names/initials of the speaker rather
than a number, or to use e.g. <shouting> in combination with CSS :before
{ content 'Shouting: ' } or similar to adapt the display for different
audiences (accessibility, basically).

Yeah, there are some difficult-to-satisfy constraints here. On the one
hand having a predefined set of voices leads to better semantics,
usability for authors, and accessibility; on the other hand we need
something open-ended because we can't think of everything. We also have to
make sure we don't enable voices to conflict with future tag names, so
whatever we do that's open-ended would have to use a specific syntax (like
being all numbers, which is what I currenlty have). I'm not sure how to
improve on what we have now, but it's certainly not perfect.


On Wed, 11 Aug 2010, Philip Jägenstedt wrote:

What should numerical voices be replaced with? Personally I'd much
rather write <philip> and <silvia> to mark up a conversation between us
two, as I think it'd be quite hard to keep track of the numbers if
editing subtitles with many different speakers.

We could say that a custom voice has to start with some punctuation or
other, say <:philip>?

Yes, that would be better than numerical voices IMO. Unless there's a very good reason for making voices always apply to the whole cue, could we not use the same parsing for voices and other tags (i, b, ruby, rt)?

Ideally, the CSS extensions (http://wiki.whatwg.org/wiki/Timed_tracks#CSS_extensions) should also work the same for voices and tags, using the normal child selectors would work. Something like video::cue(narrator > i) to style the following cue:

00:01.000 --> 00:02.000
<narrator><i>The story begins

I'm not sure what constraints CSS syntax puts on the prefix for custom voices, is : safe? Other options might be <@philip> (Twitter style) or <-philip> (vendor prefix style).

On Tue, 24 Aug 2010, Philip Jägenstedt wrote:

Here's the SRT research I promised:
http://blog.foolip.org/2010/08/20/srt-research/

Awesome! Thanks for this.

Addressing points in the same order:

 - charset: resolved by introducing a charset override.

Oh well, that's better than sniffing the encoding or trusting Content-Type I guess.

 - blank lines not separating cues: I couldn't find a client that
   supported missing the blank line, so I didn't support that. It's a
   small number of files, and a small number of cues within those files,
   I presume, so I'm not too worried.

Indeed, I couldn't find one either, the players I tested instead rendered the timing line and following cue text together with the previous cue, just like a WebSRT implementation would. What we could do to slightly improve the situation is to make --> invalid in the cue text, so that validators could warn about this. That would require adding a &gt; escape for >, so I'm not sure it's worth it. Perhaps validators could warn about it regardless of the spec.

 - overlapping cues: supporting these is pretty important, so files with
   overlapping cues will just have some weird artefects on playback.

OK, tools to fix SRT timings already exist, so I guess this is manageable.

The remaining data is interesting but seems to be consistent with our
expectations before WebSRT was specced.

Right.

On Wed, 25 Aug 2010, Philip Jägenstedt wrote:

"The tasks queued by the fetching algorithm on the networking task
source to process the data as it is being fetched must examine the
resource's Content Type metadata, once it is available, if it ever is.
If no Content Type metadata is ever available, or if the type is not
recognised as a timed track format, then the resource's format must be
assumed to be unsupported (this causes the load to fail, as described
below)."

In other words, browsers should have a whitelist of supported text track
format, just like they should for audio and video formats. (Note though
that Safari and Chrome ignore the MIME type for audio/video and will
likely continue to do so.)

It seems to that a side-effect of this is that it will be impossible to
test <track> on a local file system, as there's no MIME type and
browsers aren't allowed to sniff. Surely this can't be the intention,
Hixie?

Local file systems generally use extensions to declare file types (at
least, on Windows and Mac OS X).

On Wed, 25 Aug 2010, Philip Jägenstedt wrote:

The main reason to care about the MIME type is some kind of "doing the
right thing" by not letting people get away with misconfigured servers.
Sometimes I feel it's just a waste of everyone's time though, it would
generally be less work for both browsers and authors to not bother.

Agreed. Not sure what to do for WebSRT though, since there's no good way
to recognise a WebSRT file as opposed to some other format.

In a <track> context, ignoring Content-Type is certainly the simplest and removes the need to require any specific file extension for local use. Sniffing isn't really an issue since in a top-level context you can't do much of anything interesting with SRT except display it as text (which text/plain would achieve).

--
Philip Jägenstedt
Core Developer
Opera Software

Reply via email to