Re: [whatwg] WebSRT feedback

Philip Jägenstedt Wed, 13 Oct 2010 09:35:20 -0700

On Fri, 08 Oct 2010 04:39:43 -0700, Silvia Pfeiffer
<[email protected]> wrote:

On 08/10/2010, at 1:28 PM, "Philip Jägenstedt" <[email protected]> wrote:
On Thu, 07 Oct 2010 13:18:37 -0700, Silvia Pfeiffer<[email protected]> wrote:
On Thu, Oct 7, 2010 at 4:06 PM, Philip Jägenstedt <[email protected]>wrote:
On Thu, 07 Oct 2010 01:57:17 -0700, James Graham <[email protected]>
wrote:

On 10/06/2010 04:04 AM, Philip Jägenstedt wrote:
As an aside, the idea of using an HTML parser for the cue text wasn't
very popular.
Why? Were any technical reasons given?
The question was directed at the media player/framework developerspresent.One of them didn't care and one was strongly opposed on the basis ofbloat.This was an aside, if anyone is serious about using the HTML fragmentparserfor WebSRT, we really should approach the developer mailing lists ofmediaplayers/frameworks. I doubt we will find much love, but would behappy to be
shown wrong.
The one I talked to said that HTML markup should totally be used incues (heeven mentioned more generally why we didn't pick up USF). The reasonbeingthat it clearly defines extensibility and would in fact alreadyprovide anyuse case that anyone can come up with, thus stopping people frominventing
their own screwed up extensions, such as the use of ass commands in {}
inside srt subtitles.
The thing is: while the full set of features of HTML fragments seemsbloat,not every subtitle will consist of all the possible markup. Just likeWebpages are often created with very simple markup which uses less then1% ofwhat HTML is capable of, we will see the same happening with subtitlecues.But the availability and clear definition of how such features shouldbe
used prevents the introduction of crappy extension.
Even if very few subtitles use inline SVG, SVG in <object>, <img>,<iframe>, <video>, self-referencing <track>, etc in the cue text, allimplementations would have to support it in the same way for it to beinteroperable. That's quite an undertaking and I don't think it'sreally worth it.
They all need to be interoperable on all of these features already. Itshould be easier to keep them interoperable on something known andalready implemented than on a set of new features, in particular whenthe new feature set is restricted and features beyond the limited givenset are not available such that custom "markup" will be produced byplugins etc.
As for extensibility, I suggest that we generalize the WebSRT parsersomewhat to produce a normal DOM with elements in a non-HTML namespaceand then use CSS to style them as usual. Unknown element namesshouldn't be valid, of course, but they'd still appear in the DOM. If"XML5" (http://annevankesteren.nl/2007/10/xml5) was ready, I'd suggestwe use that, with the constraint that it should only be able to outputelements in that non-HTML namespace. (Just thinking out loud here.)
I think that's ok, even though I think it makes more sense to have HTMLfragments than arbitrary markup that is related but somewhat different.I think we are then just re-inventing HTML.


On Fri, 08 Oct 2010 05:20:28 -0700, Robert O'Callahan
<[email protected]> wrote:

User agents only need to be interoperable over the common subset of HTML
features they support. HTML is mostly designed to degrade gracefullywhen a
user agent encounters elements it doesn't support. The simplest possible
video player would use an HTML parser (hopefully off-the-shelf) to build
some kind of DOM structure. Then it can group text into paragraphs for
rendering, and ignore the rest of the content.
In practice, we'll have to deal with user agents that support differentsetsof WebSRT features --- when version 2 of WebSRT is developed, if notbefore.
Why not use existing, proven machinery --- HTML --- to cope with that
situation?

I do think that a syntax that looks similar to HTML and XML should havesimilar parsing, which WebSRT currently doesn't. However, using HTML seemsto create plenty of complications, such as:

* What are relative URLs in <a> and <img> relative to? Is it thecontaining document or the WebSRT document? When following links, whichwindow is navigated?


* When are external resources like <img>, <object> and <video> loaded?

* If a WebSRT cue includes <video autoplay>, when should that nested videoplay?

* If a WebSRT cue starting at time 0 includes a self-referring<video><track> that will be enabled by default, what should happen?

* When should the track be considered ready? This delays theloadedmetadata on <video>, see<http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-timed-tracks-are-ready>

I'd like to understand in more detail what exactly is being suggested bedone with the HTML fragments returned by the parser, in order to answerthese questions. Neither of the two obvious implementation approaches"temporary document in iframe" and "temporary part of containing document"seem like very good solutions, but I'll hold off more concrete criticismuntil there is a concrete suggestion.

(Finally, I should mention that I'm assuming that the cue text format ofWebSRT will also be used in WebM when we add support for in-band captions,unless EBML can somehow be leveraged.)


On Fri, 08 Oct 2010 06:00:25 -0700, Jeroen Wijering
<[email protected]> wrote:

The requests we receive on the captioning functionality of the JW Playeralways revolve around styling. Font size, color, style, weight, outlineand family. Block x, y, width, height, text-align, vertical-align,padding, margin, background and alpha. Both for an entire SRT file, fordistinct captioning entries and for specific parts of a captioningentry. Not to say that a full parsing engine wouldn't be nice or useful,but at present there's simply no requests for it (not even for <a> ;).Plus, more advanced timed track applications can easily be built withjavascript (timed boucing 3D balls using WebGL).
W3C's timed text does a decent job in facilitating the styling needs forcaptioning authors. Overall regions, single paragraphs and inline chunks(through <span>) can be styled. There are a few small misses, such astext outline, and vertical alignment (which can be done with separateregions though). IMO the biggest con of TT is that it uses its own,in-document styling namespace, instead of relying upon page CSS.

Another thing that TTML does pretty well is the "agent metadata", i.e.marking up the speaker, linking that with the name of the character andactor. It adds one level of indirection, but allows using a short-hand idin the markup instead of the full character name. Other cons are:

* It's an interchange format, so it has the complexity of many otherformats combined.

* Almost completely presentational markup.

* Hard-coded to a particular resolution of video, at least the examplesare.

Now and then someone suggests using a subset of TTML, but I haven't seenany concrete proposal of what exactly would remain.


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] WebSRT feedback

Reply via email to