On Thu, 16 Jul 2009 07:58:30 +0200, Silvia Pfeiffer
<[email protected]> wrote:
Hi Ian,
Great to see the new efforts to move the subtitle/caption/karaoke
issues forward!
I actually have a contract with Mozilla starting this month to help
solve this, so I am more than grateful that you have proposed some
ideas in this space.
On Thu, Jul 16, 2009 at 9:38 AM, Ian Hickson<[email protected]> wrote:
On Sat, 27 Dec 2008, Silvia Pfeiffer wrote:
> 1. Timed text in the resource itself (or linked from the resource
> itself), rendered as part of the video automatically by the user
> agent.
For case 1, the practical implications are that browser vendors will
have to develop support for a large variety of text codecs, each one
providing different functionalities.
I would hope that as with a video codec, we can standardise on a single
subtitle format, ideally some simple media-independent combination of
SRT
and LRC [1]. It's difficult to solve this problem without a standard
codec, though.
I have myself thought about creating a new format to address the needs
for time-aligned text in audio/video.
However, the problem with creating a new format is that you start from
scratch and already spreaded formats are not supported.
I can see that your proposed format is trying to be backwards
compatible with SRT, so at least it would work for the large number of
existing srt file collections. I am still skeptical, in particular
because there are no authoring systems for this format around.
But I would be curious what others think about your proposed SRT-LRC-mix.
There are already more formats than you could possibly want on the scale
between SRT (dumb text) and complex XML formats like DFXP or USF (used in
Matroska). In my layman opinion both extremes make sense, but anything in
between I'm rather skeptical to.
In fact, the easiest solution would be if that particular format was
really only HTML.
IMHO that would be absurd. HTML means scripting, embedded videos, an
unbelivably complex rendering system, complex parsing, etc; plus, what's
more, it doesn't even support timing yet, so we'd have to add all the
timing and karaoke features on top of it. Requiring that video players
embed a timed HTML renderer just to render subtitles is like saying that
we should ship Microsoft Word with every DVD player, to handle the user
input when the user wants to type in a new chapter number to jump to.
I agree, it cannot be a format that contains all the complexity of
HTML. It would only support a subpart of HTML that is relevant, plus
the addition of timing - and in this case is indeed a new format. I
have therefore changed my mind since I sent that email in Dec 08 and
am hoping we can do it with existing formats.
I think that eventually we will want timing/synchronization in HTML for
synchronizing multiple video or audio tracks. As far as I can tell no
browser wants to implement the addCueRange API (removing this should be
the topic of a separate mail), so we really need to re-think this part and
I think that timed text plays an important part here.
In particular, I have taken an in-depth look at the latest
specification from the Timed Text working group that have put years of
experiments and decades of experience into developing DFXP. You can
see my review of DFXP here:
http://blog.gingertech.net/2009/06/28/a-review-of-the-w3c-timed-text-authoring-format/
. I think it is both too flexible in a lot of ways, but also too
restrictive in others. However, it is a well formulated format that is
also getting market traction. In addition, it is possible to formulate
profiles to add missing functionality.
If we want a quick and dirty hack, srt itself is probably the best
solution. If we want a well thought-out solution, DFXP is probably a
better idea.
I am currently experimenting with these and will be able to share
something soon for further discussion.
> 3. Timed text stored in a separate file, which is then parsed by the
> user agent and rendered as part of the video automatically by the
> browser.
>
Maybe we should consider solving this differently. Either we could
encapsulate into the video container upon download. Or we could create
a
zip-file or tarball upon download. I'd just find it a big mistake to
ignore the majority use case in the standard, which is why I proposed
the <text> elements inside the <video> tag.
If browser vendors are willing to merge subtitles and video files when
saving them, that would be great. Is this easy to do?
My suggestion was really about doing this server-side, which we have
already implemented years ago in the Annodex project for Ogg
Theora/Vorbis.
However, it is also possible to do this in the browser: in the case of
Ogg, the browser just needs to have a multiplexing library installed
as well as a means to encode the subtitle file (which I like to call a
"text codec"). Since it's text, it's nowhere near as complex as
encoding audio or video and just consists of light-weight packaging
code. So, yes, it is totally possible to have the browsers create a
binary video file that has the subtitles encapsulated that were
previously only accessible as referenced text files behind a separate
URL.
The only issue I see is the baseline codec issue: every browser that
wants to support multiple media formats has to implement this
multiplexing and text encoding for every media encapsulation format
differently, which is annoying and increases complexity. It's however
generally a small amount of complexity compared to the complexity
created by having to support multiple codecs.
I disagree, remuxing files would be much more of an implementation burden
than supporting multiple codecs, at least if a format-agnostic media
framework is used (be that internal or external to the browser). Remuxing
would require you to support/create parts of the media framework that you
otherwise aren't using, i.e. parsers, muxers, file writers and plugging of
these together (which unlike decoding isn't automatic in any framework
I've seen).
Anything is doable of course, but I think this is really something that is
best done server-side using specialized tools.
Here is my example again:
<video src="http://example.com/video.ogv" controls>
<text category="CC" lang="en" type="text/x-srt"
src="caption.srt"></text>
<text category="SUB" lang="de" type="application/ttaf+xml"
src="german.dfxp"></text>
<text category="SUB" lang="jp" type="application/smil"
src="japanese.smil"></text>
<text category="SUB" lang="fr" type="text/x-srt"
src="translation_webservice/fr/caption.srt"></text>
</video>
Here's a counterproposal:
<video src="http://example.com/video.ogv"
subtitles="http://example.com/caption.srt" controls>
</video>
Subtitle files are created to enable users to choose the text in the
language that they speak to be displayed. With a simple addition like
what you are proposing, I don't think such a choice is possible. Or do
you have a proposal on how to choose the adequate language file?
Also, the attributes on the proposed text element of course serve a
purpose:
* the "category" attribute is meant to provide a default for styling
the text track,
* the "language" attribute is meant to provide a means to build a menu
to choose the adequate subtitle file from,
* the "type" attribute is meant to both identify the mime type of the
format and the character set used in the file.
The character set question is actually a really difficult problem to
get right, because srt files are created in an appropriate character
set for the language, but there is no means to store in a srt file
what character set was used in its creation. That's a really bad
situation to be in for the Web server, who can then only take an
educated guess. By giving the ability to the HTML author to specify
the charset of the srt file with the link, this can be solved.
BTW: my latest experiments with subtitles have even a few more
attributes. I am not ready to publish that yet, but should be within a
week or so and will be glad to have a further discussion then.
I think this would be fine, on the long term. I don't think the existing
implementations of <video> are at a point yet where it makes sense to
define this yet, though.
I think we have to start discussing it and doing experiments. I think
<video> is getting stable enough to move forward. I'm expecting a
period of discussion and experimentation with time-aligned text both
in-band and out-of-band, so it's good to get started on this rather
sooner than later.
It would be interesting to hear back from the browser vendors about how
easily the subtitles could be kept with the video in a way that survives
reuse in other contexts.
I think that in the case of external subtitles the browser could simply
save it alongside with the video. It is my experience that is media
players have much more robust support for external subtitles (like SRT)
than for internal subtitles, so this is my preferred option (plus it's
easier).
Incidentally, I'd be interested in such information about H.264. I
wonder how easy it will be for example with QuickTime or mp4 to
encapsulate srt on-the-fly inside a browser.
Regards,
Silvia.
--
Philip Jägenstedt
Core Developer
Opera Software