On Tue, 19 Oct 2010 22:35:50 +0200, Silvia Pfeiffer <[email protected]> wrote:

On Tue, Sep 14, 2010 at 7:49 PM, Philip Jägenstedt <[email protected]> wrote:
On Tue, 14 Sep 2010 10:30:03 +0200, Simon Pieters <[email protected]> wrote:

On Tue, 14 Sep 2010 10:11:16 +0200, Philip Jägenstedt <[email protected]>
wrote:

The point of a header is that browsers can identify WebSRT files and not
keep parsing through a 100GB movie file,

I don't think we should break SRT compat for this. I don't think this is a problem at all. We already have this situation elsewhere, e.g. what if you
do <link rel=stylesheet href=movie.webm>?

If it really turns out to be a problem you could just apply the hardware limitations clause and abort parsing if you haven't found any cues after
parsing X bytes or whatever.

In any case, the spec currently requires text/srt (or other supported
subtitle format MIME type) for <track>, so a movie file would be rejected
based on the MIME type per spec (see step 4 in
#sourcing-out-of-band-timed-tracks).


Well, I was hoping to sidestep the issue of MIME types and file extensions
by always ignoring them. Last I checked Apache doesn't have a default
mapping for .srt, so everyone using <track> would have to add it themselves.

About metadata, I noticed that there's a voice called <credit>...

I think that's only for the credits at the start or end of a movie.



Anyway: I'm trying to summarize the changes that were discussed this
far to WebSRT. I think we have the following:

* add a header to identify the kind of websrt file & the language
* add a means to add metadata as name-value pairs

e.g.
WebSRT
language: en-US
author: Frank
date: 2010-09-20
kind: subtitle
copyright: WGBH, 2010
license: CC-BY-SA, http://creativecommons.org/licenses/by-sa/3.0/

What should happen when the language in <track srclang> doesn't match the language in the file itself? Also, why is kind needed in the file?

* add a means to add comments

e.g.
// Lines starting with // are comments

So far the web two comment syntaxes: <!-- SGML style --> and /* CSS style */, so if we need comments I think we should pick one of these.

And some changes on <track>:
* make @kind a required attribute

Why was this?

* add @type for mime type identification as we allow more than just
WebSRT as external formats, e.g. TTML

Having more than one format seems to complicate rendering. The WebSRT rendering rules tries to avoid overlap between cues from different tracks, but I don't see how that could work between different formats, unless all formats have basically the same model. It certainly wouldn't work with a fixed-layout format like TTML. In other words, can't this wait until some implementor has shown concrete interest in implementing more than one format?

Anyway, I agree that at least a magic header like "WebSRT" is needed because of the horrors of legacy SRT parsing. Breaking SRT compat means that we can go back to requiring UTF-8 as the encoding. However, UTF-8 does complicate the magic header a bit due to the possibility of a BOM [1]. While it would be nice to forbid the use of a BOM, I expect we'd then see lots of frustration from authors who's editors automatically insert it...

[1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

--
Philip Jägenstedt
Core Developer
Opera Software

Reply via email to