On 21/09/11 02:15 AM, Philip Jägenstedt wrote:
Implementors of <track> / WebVTT from several browser vendors (Opera,
Mozilla, Google, Apple) met at the Open Video Conference recently. There
was a session on video accessibility,[1] a bunch of new bugs were filed
[2] and there was much rejoicing.

There were a few issues that weren't concrete enough to file bugs on,
but which I think are still worthwhile discussing further:

== Comments ==

If you look at the source of the spec, you'll find comments as a v2
feature request:

COMMENT -->
this is a comment, bla bla

I don't like the format either. I do think it's very important we have some mechanism for multi-line file level metadata, embedded css, etc. so the files can live on their own.

The syntax section also suggests all metadata has to be on the signature line, while the parser will actually skip everything between the signature and the first double line terminator.

For in-caption, <! comment> is a good idea. Semantically it's a bit weird to not mention it in the spec, since everything else has an end tag, but the parser will ignore it as we want.


The parser is fairly strict in some regards:

* must use exactly 2 digits for minutes and seconds
* minutes and seconds must be <60

I'm not normally one for restrictions, but parser also says the (optional) hours field must have "two or more" digits, with no maximum value specified.

If we all agree on an implementation limit, it could be helpful to specify one. Storing milliseconds in a 32 bit type gives a little over 1000 hours of timestamps. Single-precision float runs out of useful precision after about 50 hours. I'd suggest a two or three digit limit on hours to avoid requiring a 64 bit type. If we don't care about that, then 10 digits is a reasonable limit to avoid running out of precision with doubles.

A small percentage of cues (or cue text) will be dropped because of
these constraints and this is not very likely to be noticed unless the
entire video+captions are watched.

This is a very good point.

02:00.000 --> next
Last Chapter

Cues would be created with endTime = Infinity, and be modified to the
startTime of the following cue (in source order) if there is a following
cue. This would IMO be quite neat, but is the use case strong enough?

This would also nicely solve the latency issue with generating live captions. With both use cases together, I'd be in favour of this, but we have other issues to address before live VTT streams work in the <track> element. See https://www.w3.org/Bugs/Public/show_bug.cgi?id=14104

 -r

Reply via email to