Re: [whatwg] SRT research: separating cues

Simon Pieters Tue, 25 Oct 2011 00:17:20 -0700

On Mon, 24 Oct 2011 22:50:43 +0200, Silvia Pfeiffer<[email protected]> wrote:

So, in your opinion, should there be a change to the WebVTT spec that
separates cues differently?
Is there a recommendation you have from your analysis?


My recommendation is http://www.w3.org/Bugs/Public/show_bug.cgi?id=14550

Cheers,
Silvia.

On Mon, Oct 24, 2011 at 6:26 PM, Simon Pieters <[email protected]> wrote:
I wanted to research how common it is to fail to separate cues in SRT,and
for what reason.

SRT parsers usually interpret a timings line as a new cue, while WebVTT
wants two blank lines for a new cue.
I took the 65k SRT files we've got, replaced comma with dot andprepended"WEBVTT\n\n", then ran them in Opera's <track> impl, looking for '-->'in
cue data.

There were 840 files with --> in cue data. This is 1.3% of the files.
Looking at the cue data, there were 11,118 lines that contained -->.There
were 8830 lines of only whitespace.

In the cue data, if I look at valid-looking timing lines
(/^\d\d:\d\d:\d\d\.\d\d\d\s*-->\s*\d\d:\d\d:\d\d\.\d\d\d(\s|$)/) andcheckthe line before that, or the line before *that* if it looks like an SRTid
(/^\d+\s*$/), then I see 7030 lines of only whitespace and 3761 lines of
something else.
Failing to separate cues results in an unpleasant experience for theuser,since basically the screen is filled with several "cues" includingtheir IDs
and timing lines.

Some files had most or all of their cues parsed as a single cue with the
WebVTT parser, e.g. because all lines ended with one or more spaces.Lookingat such a file in a text editor, it's not immediately obvious thatthere's
an error, because the spaces are not visible. Moreover, the file is not
non-conforming, so a validator wouldn't help either.
So what about the cases that aren't whitespace? It seems to be mostlyjustmissing the newline completely. Some omitted the ID also. One file hada "|"
between all cues.

--
Simon Pieters
Opera Software



--
Simon Pieters
Opera Software

Re: [whatwg] SRT research: separating cues

Reply via email to