Re: [whatwg] Timed tracks: feedback compendium

Philip Jägenstedt Tue, 14 Sep 2010 01:11:28 -0700

On Mon, 13 Sep 2010 15:50:09 +0200, Silvia Pfeiffer<[email protected]> wrote:

On Mon, Sep 13, 2010 at 5:55 PM, Philip Jägenstedt<[email protected]>wrote:
On Sat, 11 Sep 2010 01:27:48 +0200, Silvia Pfeiffer <
[email protected]> wrote:

 On Fri, Sep 10, 2010 at 11:00 PM, Philip Jägenstedt <[email protected]
>wrote:

 On Thu, 09 Sep 2010 15:08:43 +0200, Silvia Pfeiffer
<[email protected]> wrote:

 On Wed, Sep 8, 2010 at 9:19 AM, Ian Hickson <[email protected]> wrote:
 On Fri, 23 Jul 2010, Philip Jägenstedt wrote:
If we must have both kind=subtitles and kind=captions, then I'dsuggest
> making the default subtitles, as that is without a doubt the most
common
> kind of timed text. Making captions the default only means thatmost> timed text will be mislabeled as being appropriate for the HoHwhen
it
> is not.
Ok, I've changed the default. However, I'm not fighting this battleif
it
comes up again, and will just change it back if people don't defend
having
this as the default. (And then change it back again if the browsers
pick
"subtitles" in their implementations after all, of course.)
Note that captions aren't just for users that are hard-of-hearing.Most
of
the time when I use timed tracks, I want captions, because thereason I
have them enabled is that I have the sound muted.


 Hmm, you both have good points. Maybe we should choose something as
the
default that is not visible on screen, such as "descriptions"? That
would
avoid the issue and make it explicit for people who provide captionsor
subtitles that they have to make a choice.
If we want people to make an explicit choice, we should make kind a
required attribute and make browsers ignore <track>s without it. (Ithink
subtitles is a good default though.)
I think you misunderstood - my explanation probably wasn't very good.I'm
looking at it from the authoring POV.
What I meant was: if I author a text track that is supposed to bevisible
on
screen as the video plays back and if we choose either @kind=subtitleor
@kind=caption as the default, then I don't have to really think through
about what I authored as it will be displayed on screen. This invites
people
to not distinguish between whether they authored subtitles or captions,
which is a bad thing, because a deaf user may then get tracks with the
wrong
label and expectations. If, however, we choose as a default somethingthatis not visible on screen, e.g. @kind=description or @kind=metadata,then
the
author who wants their text track to be visible on screen has to giveit a
label, i.e. make an explicit choice between @kind=subtitle and
@kind=caption. I believe this will lead to more correctly labeledcontent.
I
am therefore strongly against default labeling with either subtitle or
caption. We could make @kind a required attribute instead as you are
saying.
OK, I think we mostly agree. Any default will sometimes be wrong, so tonothave to choose between subtitles and captions, I'd still really preferif
specific HoH-tags like <sound> can be shown or hidden depending on user
preference. I think that would lead to more content actually beingwritten
for HoH users, as it doesn't requiring maintaining 2 different files.
Ah, you are talking about some kind of CSS marker for the audio eventsthat
are marked up in a caption file and that could just simple be "display:
none" if they are viewed as a subtitle. Interesting idea... not sure that
matches with the current spec though.

The spec already has <sound>, what's missing is making the default stylingof it depend on user preference and making this the recommended way ofdelivering HoH content.

  many new files will not play in the software created for the old spec.
As long as we don't add a header, the files will play in most existing
software. Apart from parsers that assume that SRT is plain text (andthuswould be unsuitable for much existing SRT content), what kind ofbreakage
have you found with WebSRT-specific syntax in existing software?
I think we need to add a header - and possibly other things in thefuture.
Will we forever have the SRT restrictions hold back the introduction of
new
features into WebSRT?
Yes, if we extend SRT we can't break compatibility. However, it seemsthat
all the extensibility needed already exists, as arbitrary tag names are
handled by the parser.
Your analysis of what format for headers we can introduce withoutbreakingold SRT files speaks against that. Whatever extensions we introducebeyondwhat we currently have will break compatibility with some andincreasinglymore old SRT parsing software. Not to speak of format compatibility,which
is already a non-given.


You're right, adding a header breaks SRT compat.

 Allowing anything as part of the syntax is a bit
dangerous though, as most unrecognized stuff between cues are likely
broken cues. Validators should warn about it, not treat it as acomment.
I wasn't aware of the effect of the standardised parsing algorithm for
WebSRT allowing "broken cues" to be dealt with. This will effectivelymean
that a parser will be required to parse all files that it is given from
beginning to end and discard all non-conformant lines - even if thatfile
may be a 100GB large movie file. In this case, I would really recommend
that
we put a magic identifier at the beginning of Web SRT files so we canbe
sure that the intention of the file was to be a WebSRT file. Let's have
the
string "WebSRT" at the beginning of the files.
That's a good point. I don't suppose it's a huge problem in practicethat
errors can't be detected until EOF, but it's certainly not a desirable
feature. To maintain some sanity, we probably ought to either requirethecorrect MIME type or require the correct magic bytes. From the <video>MIME
type debacle, I think I slightly prefer magic bytes to be checked by the
parser.
I've also argued for the inclusion of metadata, so I'm beginning towarm upto the idea of adding a header beginning with "WebSRT" or some such. Ifwedo this, no existing SRT content can be reused, but we can still try tomake
it possible for WebSRT files to be reusable in desktop applications, by
keeping the syntax highly compatible so that the same parser can beused for
both without a mode switch.
Sounds good to me. I'm sure browsers would find a way to have old SRTfiles slip through the cracks, but that's not what we should bespecifying for. SRT could IMHO be a second format to support in <track>elements, but WebSRT should be the baseline.

The point of a header is that browsers can identify WebSRT files and notkeep parsing through a 100GB movie file, so if we do add a header then noexisting SRT files will work. I certainly don't want to support SRT andWebSRT as *different* formats.

So, thinking about that header: from your analysis of the existing files:
did you have many starting with @.. ?

22/10000 files have lines starting with @, but since this is only in theheader, I don't think it matters.

I'd be happy for the name-value pairs spec that Ian mentioned, whichcould

then lead to something like the following as header:

WebSRT
@language --> en-US
@kind --> subtitle
@cueformat --> plain/minimal/metadata
@author --> Frank, Charlie, Anna
@date --> 20th September 2010
@copyright --> WGBH, 2010
@license --> CC-BY-SA, http://creativecommons.org/licenses/by-sa/3.0/

I'd say that the simplest approach is probably requiring the first line tobe "WebSRT", and then all lines up to the first blank line are defined asthe header. I'm not sure what the point of using @ is, and using --> hereseems weird as it's used for a range in the timing line, something quitedifferent. I think the following would be simpler:


WebSRT
language: en-US
author: Frank
date: 2010-09-20

(allowing free form dates makes it non-machine-readable, so why bother?)

Further, with your analysis, it seemed like the following could be
acceptable for comments:

// Lines starting with // are comments

Yes, but do we need comments in the cues at all? Since SRT has nocomments, this would make the cue format incompatible too, in which casewe can just stop pretending that there's any relationship to SRT.


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Timed tracks: feedback compendium

Reply via email to