[whatwg] WebSRT feedback

Philip Jägenstedt Tue, 05 Oct 2010 19:06:18 -0700

Over the past week I've attended 3 video-related events in New York andhave discussed <track> and WebSRT at all of them. Here's a lengthy reportof feedback, mine and others.

At the Open Subtitles Design Summit [1], there was some discussion aboutcaptioning for the HoH. I've already put this input into a related bug[2], but to summarize: The default rendering for the voices syntax shouldprobably be to prefix the text cue with the name of the speaker, not to doanything funny with colors or positioning. What's less clear is if it'sannoying to always prefix with the speaker, or if it should be done onlyto disambiguate.

For my Open Video Conference [3] presentation [4] I did a JavaScriptimplementation of the most interesting parts of <track> and WebSRT to beable to demo what the future might hold [5][6][7]. I have some issues withthe parser that are at the end of this mail.

At FOMS [8] we had a session on WebSRT [9] which was extremely helpful. Itturns out that SRT has more syntax variations than we had thought, kindlypointed out by VLC developer j-b. Even though there is no SRT spec, thereis a test suite of sorts [10] that I had never seen before. I'll call SRTwhich follows the syntax implied by these tests ale5000-SRT. Apart fromthe HTML-like markup we knew about, ale5000-SRT also has various markup onthe form {...} which was borrowed from SSA, as well as \h and \N for "hardspace" and line break respectively. Also in the crazy department is thattags which aren't matched with an opening and closing tag should berendered as plain text. Stray < should also just be displayed as text. VLCactually implements most of this, as does VSFilter, which we should havetested but didn't [11]. It would probably be possible to write a spec forale5000-SRT, but extensibility would be limited to matched opening andclosing tags, which doesn't work for the suggested voices syntax. Withthis mess, I'd rather not extend ale5000-SRT. I can only agree with Silviathat we should make WebSRT identifiable, so that different parsers can beused. So:

* Add magic bytes to identify WebSRT, maybe "WebSRT". (This will breaksome existing SRT parsers.)* Make WebSRT always be UTF-8, since you can't reuse existing SRT filesanyway.* Note that certain ale5000-SRT syntax is not part of WebSRT, so that onedoesn't have to debug the parsing algorithm to learn that.

Styling hooks were requested. If we only have the predefined tags (i, b,...) and voices, these will most certainly be abused, e.g. resulting in<i> being used where italics isn't wanted or <v Foo> being used just forstyling, breaking the accessibility value it has.

As an aside, the idea of using an HTML parser for the cue text wasn't verypopular.

There was also some discussion about metadata. Language is sometimesnecessary for the font engine to pick the right glyph. With legacy SRT theencoding could be used as a hint, but if we use UTF-8 that's not possible.License is also an often requested piece of metadata. I have no strongopinion about how to solve this, but key-value pairs like HTTP headerscomes to mind.


Finally, some things I think are broken in the current WebSRT parser:

* Parsing of timestamps is more liberal than it needs to be. Inparticular, treating the part after the decimal separator as an integerand dividing by 1000 leads to 00:00:00.1 being interpreted as 0.001seconds, which is weird. This is what e.g. VLC does, but if we need to adda header we could just as well change this to make more sane.Alternatively, if we want to really align with C implementations usingscanf, we should also handle negative numbers (00:01:-5,000 means 55seconds), octal and hexadecimal.

* The current syntax looks like XML or HTML but has very differentparsing. Voices like <narrator> don't create nodes at all and for tagslike <i> the paser has a whitelist and also special rules for inserting<rt>. Unless there are strong reasons for this, then for simplicity andforward compatibility, I'd much rather have the parser create an actualDOM (not a tree of "WebSRT Node Object") that reflects the input. If wealso support attributes then people who actually want to use their (silly)<font color=red> tags can do so with CSS. This could also work as stylinghooks. Obviously, a WebSRT parser should create elements in anothernamespace, we don't want e.g. <img> to work inside cues.

* The "bad cue" handling is stricter than it should be. After collectingan id, the next line must be a timestamp line. Otherwise, we skipeverything until a blank line, so in the following the parser would jumpto "bad cue" on line "2" and skip the whole cue.


1
2
00:00:00.000 --> 00:00:01.000
Bla

This doesn't match what most existing SRT parsers do, as they simply lookfor timing lines and ignore everything else. If we really need to collectthe id instead of ignoring it like everyone else, this should be morerobust, so that a valid timing line always begins a new cue. Personally,I'd prefer if it is simply ignored and that we use some form of in-cuemarkup for styling hooks.

* At the beginning of "cue text loop" (step 28) a newline should becollected.


[1] http://universalsubtitles.org/opensubtitles2010
[2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10320
[3] http://www.openvideoconference.org/
[4] http://people.opera.com/philipj/2010/10/02/ovc/
[5] http://people.opera.com/philipj/2010/10/02/ovc/demos/captions.html
[6] http://people.opera.com/philipj/2010/10/02/ovc/demos/transcript.html
[7] http://people.opera.com/philipj/2010/10/02/ovc/demos/metadata.html
[8] http://www.foms-workshop.org/foms2010OVC/
[9] http://www.foms-workshop.org/foms2010OVC/pmwiki.php/Main/WebSRT
[10] http://ale5000.altervista.org/subtitles.htm
[11] http://wiki.whatwg.org/wiki/SRT_research

--
Philip Jägenstedt
Core Developer
Opera Software

[whatwg] WebSRT feedback

Reply via email to