Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

Philip Jägenstedt Mon, 09 Aug 2010 07:04:58 -0700

On Sat, 07 Aug 2010 09:57:39 +0200, Silvia Pfeiffer<[email protected]> wrote:

Hi Philip,
On Sat, Aug 7, 2010 at 1:50 AM, Philip Jägenstedt <[email protected]>wrote:
* there is a possibility to provide script that just affects the
time-synchronized text resource
I agree that some metadata would be useful, more on that below. I'm not
sure why we would want to run scripts inside the text document, though,when
that can be accomplished by using the TimedTrack API from the containing
page.
Scripts inside a timed text document would only be useful forapplications
that use the track not in conjunction with a Web page.

Do you mean that media players could include a JavaScript engine just forsupporting scripts in WebSRT? Not to say that it can't happen, but itseems a bit unlikely.

2. There is a natural mapping of WebSRT into in-band text tracks.
Each cue naturally maps into a encoding page (just like a WMML cuedoes,
too). But in WebSRT, because the setup information is not brought in a
hierarchical element surrounding all cues, it is easier to just chuck
anything that comes before the first cue into an encoding header page.For
WMML, this problem can be solved, but it is less natural.
I really like the idea of letting everything before the first timestampin
WebSRT be interpreted as the header. I'd want to use it like this:

# author: Fan Subber
# voices: <1> Boy
#         <2> Girl

01:23:45.678 --> 01:23:46.789
<1> Hello

01:23:48.910 --> 01:23:49.101
<2> Hello
It's not critical that the format of the header be machine-readable,but we
could of course make up a key-value syntax, use JSON, or something else.
I disagree. I think it's absolutely necessary that the format of theheaderbe machine-readable. Just like EXIF in images is machine readable or ID3in
MP3 is machine-readable. It would be counter-productive not to have it
machine-readable, in particular useless to archiving and media management
solutions.


OK, so maybe key-values?

Author: Fan Subber
Voice: <1> Boy
Voice: <2> Girl

01:23:45.678 --> 01:23:46.789
<1> Hello

This looks a bit like HTTP headers. (I'm not sure I'd actually want toallow multiple occurrences of the same key, in practice that seems toresult in inconsistencies in how people mark up multiple authors.)

I'm not sure of the best solution. I'd quite like the ability to use
arbitrary voices, e.g. to use the names/initials of the speaker ratherthan
a number, or to use e.g. <shouting> in combination with CSS :before {
content 'Shouting: ' } or similar to adapt the display for different
audiences (accessibility, basically).
I agree. I think we can go back to using<span> and @class and @id andthat
would solve it all.

I guess this is in support of Henri's proposal of parsing the cue usingthe HTML fragment parser (same as innerHTML)? That would be easy toimplement, but how do we then mark up speakers? Using <spanclass="narrator"></span> around each cue is very verbose. HTML isn't verygood for marking up dialog, which is quite a limitation when dealing withsubtitles...

* there is no language specification for a WebSRT resource; while this
will
not be a problem when used in conjunction with a <track> element, itstillis a problem when the resource is used just by itself, in particularas a
hint for font selection and speech synthesis.
The language inside the WebSRT file wouldn't end up being used foranythingby a browser, as it needs to know the language before downloading it toknowwhether or not to download it at all. Still, I'd like a header sectionin
WebSRT. I think the parser is already defined so that it would ignore
garbage before the first cue, so this is more a matter of making itlegal
syntax.
Not quite. Some metadata in the header can make sense to also expose tothe
Web page.

I agree that we need a structured header section in WebSRT.

Fair enough, we should revisit this when deciding on how to exposemetadata in media resources in general.

* there is no means to identify which parser is required in the cues(is
it
"plain text", "minimal markup", or "anything"?) and therefore it is not
possible for an application to know how it should parse the cues.
All the types that are actually for visual rendering are parsed in thesame
way, aren't they? Of course there's no way for non-browsers to know that
metadata tracks aren't interesting to look at as subtitles, but I think
showing the user the garbage is a quicker to communicate that the fileisn't
for direct viewing than hiding the text or similar.
The spec says that files of kind "descriptions" and "metadata" are not
displayed. It seems though that the parsing section will try twointerfaces:HTML and plain. I think there is a disconnect there. If we already knowthat
it's not parsable in HTML, why even try?

I was confused. The parsing algorithm does the same thing regardless ofwhat kind of text track it is dealing with. I guess what you're saying isthat non-browser applications also need to know that something is e.g.chapter markers, so that it can display it appropriately?

I don't have a strong opinion, but repeating the same information both inthe containing document and in the subtitle file means that one of themwill be ignored by browsers. People will copy-paste the ignored one and itwill end up being wrong a lot of the time.

* there is no version number on the format, thus it will be difficultto
introduce future changes.
I think we shouldn't have a version number, for the same reason that CSS
and HTML don't really have versions. If we evolve the WebSRT spec, itshould
be in a backwards-compatible way.
CSS and HTML are structured formats where you ignore things that youcannotinterpret. But the parsing is fixed and extensions play within thisparsing
framework. I have my doubts that is possible with WebSRT. Already one
extension that we are discussion here will break parsing: theintroduction
of structured headers. Because there is no structured way of extending
WebSRT, I believe the best way to communicate whether it is backwards
compatible is through a version number. We can change the minor versionsifthe compatibility is not broken - it communicates though what featuresarebeing used - and we can change the major version of compatibility isbroken.

Similarly, I think that the WebSRT parser should be designed to ignorethings that it doesn't recognize, in particular unknown voices (if we keepthose). Requiring parsers to fail when the version number is increasedmakes it harder to introduce changes to the format, because you'll have toeither break all existing implementations or provide one subtitle file foreach version. (Having a version number but letting parsers ignore it isjust weird, quite like in HTML.)

I filed a bug suggesting that voice is allowed to be an arbitrary string:<http://www.w3.org/Bugs/Public/show_bug.cgi?id=10320> (From the point ofview of the parser, it still wouldn't be valid syntax.)

 2. Break the SRT link.

* the mime type of WebSRT resources should be a different mime type toSRT

files, since they are so fundamentally different; e.g. text/websrt

* the file extension of WebSRT resources should be different from SRT
files,
e.g. wsrt


I'm not sure if either of these would make a difference.



Really? How do you propose that a media player identifies that it cannot
parse a WebSRT file that has random metadata in it when it is called .srt
and provided under the same mime type as SRT files? Or a transcoding

pipeline that relies on srt files just being plain old simple SRT. Itbreaks

expectations with users, with developers and with software.

I think it's unlikely that people will offer download links to SRT filesthat aren't useful outside of the page, so random metadata isn't likely toreach end users or applications by accident. Also, most media frameworksrely mainly on sniffing, so even a file that uses lots of WebSRT-onlyfeatures is quite likely going to be detected as SRT anyway. At least inGStreamer, the file extension is given quite little weight in guessing thetype and MIME isn't used at all (because the sniffing code doesn't knowanything about HTTP). Finally, seeing random metadata displayed on screenis about as good an indication that the file is "broken" as theapplication failing to recognize the file completely.

On the other hand, keeping the same extension and (unregistered) MIME typeas SRT has plenty of benefits, such as immediately being able to useexisting SRT files in browsers without changing their file extension orMIME type.

 4. Make full use of CSS
In the current form, WebSRT only makes limited use of existing CSS. Isee
particularly the following limitations:
* no use of the positioning functionality is made and instead a newmeans
of
positioning is introduced; it would be nicer to just have this reuseCSS
functionality. It would also avoid having to repeat the positioning
information on every single cue.
I agree, the positioning syntax isn't something I'm happy about with
WebSRT. I think treating everything that follows the timestamp to be CSS
that applies to the whole cue would be better.
Or taking the positioning stuff out of WebSRT and moving it to anexternal
CSS file as is done with formatting would make it much simpler.

Ah, that would be great. It's quite likely that there will only be 1 or 2different positions in the whole file, which you don't want to repeat oneach and every cue.

 * there is no definition of the "canvas" dimensions that the cues are
prepared for (width/height) and expected to work with other thansaying itis the video dimensions - but these can change and the proportionsshould
be
changed with that
I'm not sure what you're saying here. Should the subtitle file be
hard-coded to a particular size? In the quite peculiar case where thesamesubtitles really don't work at two different resolutions, couldn't wejust
have two files? In what cases would this be needed?
Most subtitles will be created with a specific width and height in mind.Forexample, the width in characters relies on the video canvas having atleastthat size and the number of lines used usually refers to a lower thirdof a
video - where that is too small, it might cover the whole video. So, my
proposal is not the hard-code the subtitles to a particular size, but toput
the minimum width and height that are being used for the creation of the
subtitles into the file. Then, the file can be scaled below or above this
size to adjust to the actual available space.

In practice, does this mean scaling font-size bywidth_actual/width_intended or similar? Personally, I prefer subtitles tobe something like 20 screen pixels regardless of video size, as that isreadable. Making them bigger hides more of the video, while making themsmaller makes them hard to read. But I guess we could let the CSS mediaquery min-width and similar be evaluated against the size of thecontaining video element, to make it possible anyway.


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

Reply via email to