Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

Philip Jägenstedt Thu, 12 Aug 2010 01:15:07 -0700

On Thu, 12 Aug 2010 02:11:55 +0200, Silvia Pfeiffer<[email protected]> wrote:

On Thu, Aug 12, 2010 at 1:26 AM, Philip Jägenstedt<[email protected]>wrote:

On Wed, 11 Aug 2010 15:38:32 +0200, Silvia Pfeiffer <
[email protected]> wrote:

 On Wed, Aug 11, 2010 at 10:30 PM, Philip Jägenstedt <[email protected]

>wrote:

 On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer <

[email protected]> wrote:

 Going with HTML in the cues, we either have to drop voices and inner

timestamps or invent new markup, as HTML can't express either. I don't
think
either of those are really good solutions, so right now I'm notconvinced
that reusing the innerHTML parser is a good way forward.

I don't see a need for the voices - they already have markup in HTML,see

above. But I do wonder about the timestamps. I'd much rather keep the
innerHTML parser if we can, but I don't know enough about how the
timestamps
could be introduced in a non-breakable manner. Maybe with a data-
attribute?
Maybe <span data-t="00:00:02.100">...</span>?

data- attributes are reserved for use by scripts on the same page, butwe

*could* of course introduce new elements or attributes for this purpose.
However, adding features to HTML only for use in WebSRT seems a bit odd.



I'd rather avoid adding features to HTML only for WebSRT. Ian turned the
<timestamps> into ProcessingInstructions
http://www.whatwg.org/specs/web-apps/current-work/websrt.html#websrt-cue-text-dom-construction-rules.
Could we introduce something like <?t at="00:00:02.100"?> without
breaking
the innerHTML parser?

It appears that the innerHTML parser in at least Opera and Firefox handlesPIs in some manner, see test at<http://software.hixie.ch/utilities/js/live-dom-viewer/saved/587>

However, it isn't valid HTML, validator.nu says "Saw <?. Probable cause:Attempt to use an XML processing instruction in HTML. (XML processinginstructions are not supported in HTML.)"

  That would make text/srt and text/websrt synonymous, which is kind of
pointless.
No, it's only pointless if you are a browser vendor. For everyoneelse
it
is
a huge advantage to be able to choose between a guaranteed simpleformat
and
a complex format with all the bells and whistles.



 The advantages of taking text/srt is that all existing software to
create
SRT can be used to create WebSRT
That's not strictly true. If they load a WebSRT file that wascreated by
some other software for further editing and that WebSRT file uses
advanced
WebSRT functionality, the authoring software will break.
Right, especially settings appended after the timestamps are quitelikely
to be stripped when saving the file.
Or may even break the software if it's badly implemented, or may end up
inside the cue text - just like the other control instructions whichwill
end up as plain text inside the cue. You won't believe how many people
have
pointed out to me that my SRT test parser exposed an <i> tag markup inthe
cue text rather than interpreting it, when I was experimenting with
applying
SRT cues in a HTML div without touching the cue text content.Extraneous
markup is really annoying.
Indeed, but given the option of seeing no subtitles at all and seeingsomemarkup from time to time, which do you prefer? For a long time I wasusing a
media player that didn't handle "HTML" in SRT and wasn't very amused at
seeing <i> and similar, but it was sure better than no subtitles atall. I
doubt it will take long for popular software to start ignoring things
trailing the timestamp and things in square brackets, which is all youneed
for basic "compatibility". Some of the tested software already does so.
Hmm... not sure if I'd prefer to see the crap or rather be forced to runitthrough a stripping tool first. I think what would happen is that I'dstart
watching the movie, then notice the crap, get annoyed, stop it, run a
stripping tool, restart the movie. I'd probably prefer noticing thatbefore
I start the movie, which would happen if the file was a different format.
But it does take a bit of "expert knowledge" to know that websrt can be
easily converted to srt and to have such a stripping tool installed, Igive
you that.

Indeed, it never struck me to take the time to strip away the extramarkup, even though I would have known how. Instead I waited until mymedia player could do the job for me.

OTOH, if you say that it will take a short time for popular software to
start ignoring the extra WebSRT stuff, well, in this case they have
implemented WebSRT support in its most basic form and then there is no
problem any more anyway. They will then accept the new files and their
extensions and mime types and there is explicit support rather than the

dodgy question of whether these SRT files will provide crap or not.During a

transition period, we will make all software that currently supports SRT

become unstable and unreliable. I don't think that's the right way todeal

with an existing ecosystem. Coming in as the big brother, claiming their

underspecified format, throwing in incompatible features, and saying:just

deal with it. It's just not the cavalier thing to do.

I agree that it seems (and is) quite selfish, but am not sure thealternatives are any better, see below. About "unstable and unreliable", Ithink there are really only two kind of errors we will see:


1. Some cues being ignored due to trailing settings after the timestamp.

2. Markup being interpreted as plain text.

Both already can and do happen with existing use of SRT, which is annoyingbut better than no subtitles at all.

  and servers that already send text/srt don't need to be updated. In
either
case I think we should support only one mime type.
What's the harm in supporting two mime types but using the sameparser
to
parse them?
Most content will most likely be plain old SRT without voices, <ruby>or
similar. People will create them using existing software with the .srt
extension and serve them using the text/srt MIME type. When they later
decide to add some <ruby> or similar, it will just work withoutchanging
the
extension or MIME type. The net result is that text/srt andtext/websrt
mean
exactly the same thing, making it a wasted effort.
From a Web browser perspective, yes. But not from a caption authoring
perspective. At first, I would author a SRT file. Later, I want to add
some
fancy stuff. So, I load it into the application again. Then I add the
fancy
stuff. It tells me that I cannot save it as SRT, but have to save it as
WebSRT, so I don't lose the information. Good! Now, the pipeline that I
have
set up for SRT files transcoding and burning onto video and whichcannot
yet
deal with WebSRT will not accept the WebSRT file. Good again! Makes me
extend my pipeline or go to the provider and upgrade my software, so Iget
the full feature support and the correct rendering. Excellent.
I think that as long as WebSRT is mostly compatible with SRT then people
will keep using SRT tools, with the occasional mishap and disaster. Iwon'tdeny that it breaks expectations of what SRT is, but the alternative isto
make WebSRT fundamentally incompatible so that not even media frameworks
that rely on sniffing would treat it as SRT. However, unless <track> isa
complete failure other applications will eventually want to support the
format that browsers support, so inventing something completely new hasa
high cost too.
We have reduced this cost by making it build on an existing format. Let's
not pretend here: if all browser vendors support WebSRT, there will be a
high motivation to implement support for it. Supporting the Web is a big
argument. So, we are only talking about the transition period here as a
problem period.

During the transition period, if WebSRT is incompatible, it will motivate
people further to implement proper support for it. If it is almost
compatible, it will motivate people to make quick patches that will just
stop it from breaking their systems. The first one is positivemotivation,introduction of a new feature, great announcements to make. The secondone
is negative motivation, swearing on the Web standards developers for
breaking existing systems, apologies to the users for not supportingtheirnew files properly etc etc. I honestly think we won't be making friendsby
stealing an existing format. But we can make friends by building a new
format on an existing format such that code can be re-used by developers,
and such that users can learn that they can make use of the new files by
using simple tools.

The core "problem" is that WebSRT is far too compatible with existing SRTusage. Regardless of the file extension and MIME type used, it's quiteimprobable that anyone will have different parsers for the same format.Once media players have been forced to handle the extra markup in WebSRT(e.g. by ignoring it, as many already do) the two formats will be thesame, and using WebSRT markup in .srt files will just work, so that's whatpeople will do. We may avoid being seen as arrogant format-hijackers, butthe end result is two extensions and two different MIME types that meanexactly the same thing.

Since browser vendors get all the benefits and none of the problems it
would be a mistake to only listen to us, of course. It might beworthwhile
contacting developers of applications like VLC, Totem or MPlayer and ask
precisely how annoyed they would be if suddenly one day they had totweak
their SRT parser because of WebSRT.
Some of them have already spoken:
http://forum.doom9.org/showthread.php?p=1396576 "Extending SRT is a verybad
idea" etc etc. Also, I've had feedback from other subtitle professionals
that are also against extending SRT, the main reasons being to break
existing working software environments.

The only way to really avoid messing with the ecosystem is to invent acompletely new format. The choice is between something that won't work atall in non-browsers and something that will mostly work.

But I will ask that question at
http://universalsubtitles.org/opensubtitles2010 and at
http://www.foms-workshop.org/foms2010OVC/ where gstreamer, vlc and other
developers will be present.


Great, I'll be there too!

--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

Reply via email to