On Sat, 07 Aug 2010 09:57:39 +0200, Silvia Pfeiffer
<[email protected]> wrote:
Hi Philip,
On Sat, Aug 7, 2010 at 1:50 AM, Philip Jägenstedt <[email protected]>
wrote:
* there is a possibility to provide script that just affects the
time-synchronized text resource
I agree that some metadata would be useful, more on that below. I'm not
sure why we would want to run scripts inside the text document, though,
when
that can be accomplished by using the TimedTrack API from the containing
page.
Scripts inside a timed text document would only be useful for
applications
that use the track not in conjunction with a Web page.
Do you mean that media players could include a JavaScript engine just for
supporting scripts in WebSRT? Not to say that it can't happen, but it
seems a bit unlikely.
2. There is a natural mapping of WebSRT into in-band text tracks.
Each cue naturally maps into a encoding page (just like a WMML cue
does,
too). But in WebSRT, because the setup information is not brought in a
hierarchical element surrounding all cues, it is easier to just chuck
anything that comes before the first cue into an encoding header page.
For
WMML, this problem can be solved, but it is less natural.
I really like the idea of letting everything before the first timestamp
in
WebSRT be interpreted as the header. I'd want to use it like this:
# author: Fan Subber
# voices: <1> Boy
# <2> Girl
01:23:45.678 --> 01:23:46.789
<1> Hello
01:23:48.910 --> 01:23:49.101
<2> Hello
It's not critical that the format of the header be machine-readable,
but we
could of course make up a key-value syntax, use JSON, or something else.
I disagree. I think it's absolutely necessary that the format of the
header
be machine-readable. Just like EXIF in images is machine readable or ID3
in
MP3 is machine-readable. It would be counter-productive not to have it
machine-readable, in particular useless to archiving and media management
solutions.
OK, so maybe key-values?
Author: Fan Subber
Voice: <1> Boy
Voice: <2> Girl
01:23:45.678 --> 01:23:46.789
<1> Hello
This looks a bit like HTTP headers. (I'm not sure I'd actually want to
allow multiple occurrences of the same key, in practice that seems to
result in inconsistencies in how people mark up multiple authors.)
I'm not sure of the best solution. I'd quite like the ability to use
arbitrary voices, e.g. to use the names/initials of the speaker rather
than
a number, or to use e.g. <shouting> in combination with CSS :before {
content 'Shouting: ' } or similar to adapt the display for different
audiences (accessibility, basically).
I agree. I think we can go back to using<span> and @class and @id and
that
would solve it all.
I guess this is in support of Henri's proposal of parsing the cue using
the HTML fragment parser (same as innerHTML)? That would be easy to
implement, but how do we then mark up speakers? Using <span
class="narrator"></span> around each cue is very verbose. HTML isn't very
good for marking up dialog, which is quite a limitation when dealing with
subtitles...
* there is no language specification for a WebSRT resource; while this
will
not be a problem when used in conjunction with a <track> element, it
still
is a problem when the resource is used just by itself, in particular
as a
hint for font selection and speech synthesis.
The language inside the WebSRT file wouldn't end up being used for
anything
by a browser, as it needs to know the language before downloading it to
know
whether or not to download it at all. Still, I'd like a header section
in
WebSRT. I think the parser is already defined so that it would ignore
garbage before the first cue, so this is more a matter of making it
legal
syntax.
Not quite. Some metadata in the header can make sense to also expose to
the
Web page.
I agree that we need a structured header section in WebSRT.
Fair enough, we should revisit this when deciding on how to expose
metadata in media resources in general.
* there is no means to identify which parser is required in the cues
(is
it
"plain text", "minimal markup", or "anything"?) and therefore it is not
possible for an application to know how it should parse the cues.
All the types that are actually for visual rendering are parsed in the
same
way, aren't they? Of course there's no way for non-browsers to know that
metadata tracks aren't interesting to look at as subtitles, but I think
showing the user the garbage is a quicker to communicate that the file
isn't
for direct viewing than hiding the text or similar.
The spec says that files of kind "descriptions" and "metadata" are not
displayed. It seems though that the parsing section will try two
interfaces:
HTML and plain. I think there is a disconnect there. If we already know
that
it's not parsable in HTML, why even try?
I was confused. The parsing algorithm does the same thing regardless of
what kind of text track it is dealing with. I guess what you're saying is
that non-browser applications also need to know that something is e.g.
chapter markers, so that it can display it appropriately?
I don't have a strong opinion, but repeating the same information both in
the containing document and in the subtitle file means that one of them
will be ignored by browsers. People will copy-paste the ignored one and it
will end up being wrong a lot of the time.
* there is no version number on the format, thus it will be difficult
to
introduce future changes.
I think we shouldn't have a version number, for the same reason that CSS
and HTML don't really have versions. If we evolve the WebSRT spec, it
should
be in a backwards-compatible way.
CSS and HTML are structured formats where you ignore things that you
cannot
interpret. But the parsing is fixed and extensions play within this
parsing
framework. I have my doubts that is possible with WebSRT. Already one
extension that we are discussion here will break parsing: the
introduction
of structured headers. Because there is no structured way of extending
WebSRT, I believe the best way to communicate whether it is backwards
compatible is through a version number. We can change the minor versions
if
the compatibility is not broken - it communicates though what features
are
being used - and we can change the major version of compatibility is
broken.
Similarly, I think that the WebSRT parser should be designed to ignore
things that it doesn't recognize, in particular unknown voices (if we keep
those). Requiring parsers to fail when the version number is increased
makes it harder to introduce changes to the format, because you'll have to
either break all existing implementations or provide one subtitle file for
each version. (Having a version number but letting parsers ignore it is
just weird, quite like in HTML.)
I filed a bug suggesting that voice is allowed to be an arbitrary string:
<http://www.w3.org/Bugs/Public/show_bug.cgi?id=10320> (From the point of
view of the parser, it still wouldn't be valid syntax.)
2. Break the SRT link.
* the mime type of WebSRT resources should be a different mime type to
SRT
files, since they are so fundamentally different; e.g. text/websrt
* the file extension of WebSRT resources should be different from SRT
files,
e.g. wsrt
I'm not sure if either of these would make a difference.
Really? How do you propose that a media player identifies that it cannot
parse a WebSRT file that has random metadata in it when it is called .srt
and provided under the same mime type as SRT files? Or a transcoding
pipeline that relies on srt files just being plain old simple SRT. It
breaks
expectations with users, with developers and with software.
I think it's unlikely that people will offer download links to SRT files
that aren't useful outside of the page, so random metadata isn't likely to
reach end users or applications by accident. Also, most media frameworks
rely mainly on sniffing, so even a file that uses lots of WebSRT-only
features is quite likely going to be detected as SRT anyway. At least in
GStreamer, the file extension is given quite little weight in guessing the
type and MIME isn't used at all (because the sniffing code doesn't know
anything about HTTP). Finally, seeing random metadata displayed on screen
is about as good an indication that the file is "broken" as the
application failing to recognize the file completely.
On the other hand, keeping the same extension and (unregistered) MIME type
as SRT has plenty of benefits, such as immediately being able to use
existing SRT files in browsers without changing their file extension or
MIME type.
4. Make full use of CSS
In the current form, WebSRT only makes limited use of existing CSS. I
see
particularly the following limitations:
* no use of the positioning functionality is made and instead a new
means
of
positioning is introduced; it would be nicer to just have this reuse
CSS
functionality. It would also avoid having to repeat the positioning
information on every single cue.
I agree, the positioning syntax isn't something I'm happy about with
WebSRT. I think treating everything that follows the timestamp to be CSS
that applies to the whole cue would be better.
Or taking the positioning stuff out of WebSRT and moving it to an
external
CSS file as is done with formatting would make it much simpler.
Ah, that would be great. It's quite likely that there will only be 1 or 2
different positions in the whole file, which you don't want to repeat on
each and every cue.
* there is no definition of the "canvas" dimensions that the cues are
prepared for (width/height) and expected to work with other than
saying it
is the video dimensions - but these can change and the proportions
should
be
changed with that
I'm not sure what you're saying here. Should the subtitle file be
hard-coded to a particular size? In the quite peculiar case where the
same
subtitles really don't work at two different resolutions, couldn't we
just
have two files? In what cases would this be needed?
Most subtitles will be created with a specific width and height in mind.
For
example, the width in characters relies on the video canvas having at
least
that size and the number of lines used usually refers to a lower third
of a
video - where that is too small, it might cover the whole video. So, my
proposal is not the hard-code the subtitles to a particular size, but to
put
the minimum width and height that are being used for the creation of the
subtitles into the file. Then, the file can be scaled below or above this
size to adjust to the actual available space.
In practice, does this mean scaling font-size by
width_actual/width_intended or similar? Personally, I prefer subtitles to
be something like 20 screen pixels regardless of video size, as that is
readable. Making them bigger hides more of the video, while making them
smaller makes them hard to read. But I guess we could let the CSS media
query min-width and similar be evaluated against the size of the
containing video element, to make it possible anyway.
--
Philip Jägenstedt
Core Developer
Opera Software