Re: [whatwg] metadata attribute for media

2013-02-18 Thread Ralph Giles
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17/02/13 05:48 AM, Nils Dagsson Moskopp wrote:

 If one cares to that extent, and is
 already handling format differences, dealing with vendor
 variation on top isn't that much more effort.
 
 I disagree, strongly.

Ok, thanks for the feedback. Do you like my subsequent proposal better?

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2013-January/038705.html

 -r

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRImnnAAoJEEcAD3uxRB3vHBwH/14a91L4oN6nktTttTRvQQFf
hmtz52nAO7BdInpG68fRACgLVsz2eT+eWakAAMuzh7T57NH9ZNCSWp1Rk/AM3BXD
akx6QXQuk9eOwuRlk+BIVIUaKVFyod9D133BdTjGqF6/cVmn7emmIm9BWB0sxXrE
3CPuALqs+07ExHbyB/UsE5BWqsJRccJHg9QnJwXd4aMtHbIg37KAcfgN4aYVTHQV
B1I95Uh1Ron2LUCNI9P+Pm8WPQGDcj0cuwFcueRMYy2O26a1xHisOew/l3Cz46Ex
Oq6gAX3Lz/y5LxFhnnlPIeBDhBE9bB0r4jqOtY3FNA+dcUruKI769nIw5WtTmd4=
=zJ6M
-END PGP SIGNATURE-


Re: [whatwg] metadata attribute for media

2013-01-16 Thread Ralph Giles
On 12-12-11 5:23 PM, Ralph Giles wrote:

 That said, I'm not convinced this is an issue given the primary
 use-case, which is pretty much that web content wants to do more
 sophisticated things with the metadata than the user-agent's
 standardized parsing allows. If one cares to that extent, and is already
 handling format differences, dealing with vendor variation on top isn't
 that much more effort.

Robert O'Callahan argued (off-list) that there is a significant
difference. Exposing what the media framework supplies for metadata,
multiplies the established format fragmentation by per-platform
differences. That's  not something we want to do if we can avoid it, and
we can avoid it by doing our own tag normalization as part of the
exposed API.

We feel that's more important than the extended and custom tag use case,
which is still addressible by direct parsing in javascript, etc.

I don't intend to continue with implementation of the raw metadata api,
and instead will focus on mapping everthing to a standard set of
attributes. To that proposal I've added a 'tracknumber' attribute since
this is imporant for sorting resources in media player applications.

interface Metadata {
  readonly attribute DOMString title;
  readonly attribute DOMString artist;
  readonly attribute DOMString album;
  readonly attribute long tracknumber;
  readonly attribute Blob? artwork;
};

partial interface MediaElement {
  Metadata metadata;
};

 -r


Re: [whatwg] metadata attribute for media

2012-12-11 Thread Ralph Giles
On 12-12-11 4:58 PM, Ian Hickson wrote:

 This seems reasonable.

Thanks for the feedback. Anyone else? :-)

 I don't want to be the one to maintain the mapping from media formats to 
 metadata schema, because this isn't my area of expertise, and it isn't 
 trivial work.

Good point. This would need to be standardized for the fixed-schema
proposal, at least for the formats commonly supported by the HTML media
element. The Web Ontology working group has done some work here, as
Silvia mentioned.

 I don't think we should have an open-ended API without fixed names, 
 because that is a recipe for an interoperability disaster.

I agree it would have interoperability issues. My own implementation
experience is that the easy thing to do is to mirror whatever
representation your playback framework offers, which can result in
per-platform differences as well as per-media-format (and per tagging
application, etc.).

That said, I'm not convinced this is an issue given the primary
use-case, which is pretty much that web content wants to do more
sophisticated things with the metadata than the user-agent's
standardized parsing allows. If one cares to that extent, and is already
handling format differences, dealing with vendor variation on top isn't
that much more effort.

We could say that user-agents should represent the metadata dictionaries
as directly as possible, and to match the tag names from the schema spec
when that's not possible.

 -r


Re: [whatwg] metadata attribute for media

2012-11-27 Thread Ralph Giles
On 12-11-26 4:18 PM, Ralph Giles wrote:

 interface HTMLMediaElement {
   ...
   object getMetadata();
 };
 
 After the metadataloaded event fires, this method would return a new
 object containing a copy of the metadata read from the resource, in
 whatever format the decoder implementation supplies. It would be up to
 the caller to do any semantic interpretation. The same method should be
 present on AudioTrack, VideoTrack, (TextTrack?) and Image elements.

The prefixed version of this in Firefox is documented in the 'Methods'
section of https://developer.mozilla.org/en-US/docs/DOM/HTMLMediaElement

 -r



Re: [whatwg] [mimesniff] Handling container formats like Ogg

2012-11-27 Thread Ralph Giles
On 12-11-27 9:19 AM, Gordon P. Hemsley wrote:

 Is it sufficient to sniff just for application/ogg and then let the
 UA's Ogg library determine whether or not the contents of the file can
 be handled? (I'm sensing the consensus is yes.)

I think so.

Defining a codec enumerating algorithm and mime type decision tree is
going to be several pages of spec for container formats like Ogg and
Matroska.

As an example, the 'file' tool reads the first codec header present in
an Ogg file, which can be found at a set of fixed offsets. This lets it
get some useful information for a lot of files, especially since it's
RECOMMENDED that primary codecs are listed first. I.e. theora should be
first in a 'video/ogg', opus should be first in an 'audio/ogg; codecs=opus'.

But files can be muxed without following the priority rule and will
still play fine, and there can be a 'skeleton' header in front of the
other codec data, obcuring primary role. Just looking at fixed offsets
isn't sufficient to distinguish audio and video reliably.

IMHO,
 -r


Re: [whatwg] metadata attribute for media

2012-11-26 Thread Ralph Giles
On 12-09-27 1:44 AM, Philip Jägenstedt wrote:

 I'm skeptical that all that we want from ID3v2 or common VorbisComment
 tags can be mapped to Dublin Core, it seems better to define mappings
 directly from the underlying format to the WebIDL interface.

You're right.

 Given the open-endedness of metadata contained in actual media
 resources, I'm personally a bit skeptical that there's something we
 could add to the Web platform that would be better than just letting
 authors pass that metadata out-of-band using any representation they
 like, but what use cases are you trying to cover here?

Two use cases I'm trying to address:

- A web application presents some view of a media library. If the libray
resides on a server, then yes, the server-side component of the app can
parse, cache, and deliver the metadata out-of-band. But the library
could also be local, in which case the webapp must do its own parsing,
e.g. from a list of blob urls returned by the file api.

- An author wants to display the embedded track title and artist name
with simple scripting on a webpage.

One of the goals of html media was to make video and audio as simple to
include as images. User agents are generally parsing this metadata
anyway; exposing it is straightforward, and greatly simplifies the two
tasks above.

In any case, the media.mozGetMetadata() method I described earlier is
available in Firefox 17, released last week, with support for Vorbis
tags in .ogg files. Firefox 18, now in beta, adds support for .opus
files as well. Here's an example:

  https://people.xiph.org/~giles/2012/metadata.html

You should see 'Title', 'Album', etc. below the controls if your browser
supports mozGetMetadata().

We're continuing to implement this interface for other formats we support.

I still think it's useful to define both this 'raw' metadata interface,
returning just the data out of the file, and a more structured metadata
object with standardized tag names. I've left off implementing the
second one for lack of feedback on what the basic tags should be, and
how useful they are. There certainly wasn't consensus here.

So, what do you think of these two proposals:

1. Define the 'raw' getMetadata method in an unprefixed form:

interface HTMLMediaElement {
  ...
  object getMetadata();
};

After the metadataloaded event fires, this method would return a new
object containing a copy of the metadata read from the resource, in
whatever format the decoder implementation supplies. It would be up to
the caller to do any semantic interpretation. The same method should be
present on AudioTrack, VideoTrack, (TextTrack?) and Image elements.

2. Define a parsed metadata attribute with some standard tags:

interface Metadata {
  readonly attribute DOMString title;
  readonly attribute DOMString artist;
  readonly attribute DOMString album;
  readonly attribute Blob? artwork;
};

interface MediaElement {
  ...
  Metadata metadata;
};

So you could say something as simple as

  img.src = audio.metadata.artwork;

to display the cover art embedded in a download single. DOMString
attributes would have the value of an empty string if the underlying
data isn't available. These four attributes are the absolute minimum, I
think, for the use cases above. More could be added later as usage of
the api evolves. For example: date, publisher, tracknumber, tracktotal,
description, genre, sort-artist, sort-title, copyright, license, url.

If having both media.getMetadata() (raw) and media.metadata is
confusing, the first proposal could be named media.getRawMetadata() as well.

Does it make sense to include more technical metadata here? For example:
samplerate, channels, duration, width, height, framerate, aspect-ratio.
Firefox currently has prefixed properties for the number of channels and
the audio sample rate, and including these in the metadata interface
would let us deprecate the prefixed versions. On the other hand,
properties like duration, width, and height are available directly on
media elements today, so maybe it makes more sense to do the same for
the others.

In that case, do we want the indirection of the Metadata inferface?
Saying 'video.title' or 'img.src = audio.artwork' instead is shorter.

Feedback welcome,
 -r


[whatwg] metadata attribute for media

2012-06-11 Thread Ralph Giles
Recently, we've been considering adding a 'tags' or 'metadata' attribute
to HTML media elements in Firefox, to allow webcontent access to
metadata from the playing media resource. In particular we're interested
in tag data like creator, title, date, and so on.

My recollection is that this has been discussed a number of times in the
past, but there was never suffient motivation to support the interface.
Our particular motivation here is webapps that present a media file
library. While it's certainly possible to parse the tag data out
directly with javascript, it's more convenient if the HTML media element
does so, and the underlying platform decoder libraries usually provide
this data already.

As such I wanted to raise the issue here and get design feedback and
levels of interest for other user agents.

Here's a first idea:

partial interface HTMLMediaElement {
  readonly attribute object tags;
};

Accessing media.tags provides an object with a key: value data, for example:

{
  'title': 'My Movie',
  'creator': 'This User',
  'date': '2012-06-18',
  'license': 'http://creativecommons.org/licenses/by-nc-sa/'
}

The keys may need to be filtered, since the files can contain things
like base64-encoded cover art, which makes the object prohibitively
large. The keys may need to be mapped to some standard scheme (i.e.
dublic core) since vocabularies vary from format to format.

This is nice because it's easy to access, can be simply enumerated,
and extensible. Which is helpful when if gets added the img for exif data.

 -r


Re: [whatwg] WebVTT feedback (and some other video feedback that snuck in)

2011-12-02 Thread Ralph Giles
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/12/11 05:38 AM, David Singer wrote:

 Very.  Correct rendering of some text requires that it be correctly
 labelled with a BCP-47 tag, as I understand.
 
 For me, file-level default/overall setting, with the possibility of
 span-based over-rides, seems ideal.  I think.

Yes. A 'srclang' attribute in the referencing track element would
override 'lang' in the file header. I think span-based overrides in
the file have to override both file-level settings; means you can't
fix them with markup, but nothing else makes sense.

 -r
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO2QjMAAoJEEcAD3uxRB3vfUUH/iUK/Cthllw1GZGLc3Wqlh6+
J08vA2TtSziEIivfx5bVh3OdpZDj0cKnn18s6VUYn4I2S/gUeu6EPyNM1a3R8MlY
tHuwqbhiZ0kN2nKx1+I06uk1PEJXBYHuEwjzM1S6IlDdVWNpxplec9vgSXLtd13I
eOYH9RowTymFF9ONjqfboeVfb1EE7xZEqh9VdB/rY5R4m0sL0c8ua5YY/qHY+WWf
7q5SieAv6Kcxf6t6iw5uuRjc97vgZRfctT3RU065UmNJe1VIJJfSL0rC3Xb5yAxg
AJW1gJO+NuNC5uiNel9aDKNNfjdnugEFNbgnzX1IQEgTgJzzougA/itjwKOni6s=
=8fTy
-END PGP SIGNATURE-


Re: [whatwg] I have feature requests for video and audio tags.

2011-11-17 Thread Ralph Giles
On 11-11-17 12:29 PM, Jonas Sicking wrote:

 Authors do however know how loud the volume of the media is though. If
 a video is encoded with a very loud volume, or a very quiet volume, it
 can be quite useful to be able to adjust that up or down when linking
 to it.

I was going to say, that sounds like user-agents should support
replay-gain tags. Note that expected line level is actually a required
header for the Opus codec's Off encapsulation.

However, relying on a replaygain tag in the media file does violate the
'simple to author' goal of html.

 -r


Re: [whatwg] HTML5 video seeking

2011-11-16 Thread Ralph Giles
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 15/11/11 10:32 AM, Aaron Colwell wrote:

 Yeah it looks to me like starting at the requested position is the
 only option. I saw the text about media engine triggered seeks, but
 it seems like users would be very surprised to see the seeking 
 seeked events for the seek they requested immediately followed by a
 pair of events to a location they didn't request. I can envision
 the angry bug reports now. ;)

Yeah, it's definitely bending the rules. If you only intended to
support seeking to the nearest keyframe, setting media.seekable would
be an honest way to advertise that, but also violates least surprise.

 Thanks for the response. Looks like we are interpreting the text
 the same way.

Yes, my recollection of the earlier discussion aligns with your
summary and Chris Double's. It's expensive, but what one naively
expects the API to do.

Video splicing/mixing was a use case we wanted to support, and such
applications aren't really possible without frame-accurate seeking.
Thus, it's better to require it up front and possibly allow
applications to relax it later, as with Frank and Philip's 'strict'
attribute, than to disallow such applications by leaving this to
implementation variance.

 -r

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOxDrhAAoJEEcAD3uxRB3vHT4H/0DfMChkHztMwqO0FEkJql3u
BYU8B0ZFxQ/rllU9qdTdu+ioYRvIriFP9UFbeO+sO85Qy5Jaz9u9soKWE8siIHMP
rWWxOVQbbZMbbLcrtgbNreePwyRX6P1fdTTpxjUUnl0g/mVajE+5BohaVwsc/dSK
mr0S53a49od3675dNQQaycLbSAEI8eaVvG5saOyOfN41GK+ctEtnfro7Z0cUZhzZ
A0C3P+/Jr+fKOZPocpJ4RPPWbkzCeO8BZOblRgrHynTpHYs20OTBbvW3TMZuT6Np
hnwVclFwye1hgqYjAR83PezVAz/9rPKsox96+VQc/in7fne2H5drKIMW0ADd+hM=
=r1zI
-END PGP SIGNATURE-


Re: [whatwg] HTML5 video seeking

2011-11-14 Thread Ralph Giles
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14/11/11 03:49 PM, Aaron Colwell wrote:

 Does this mean the user agent must resume playback at the exact
 location specified?

Maybe you can muck with the 'media.seekable' TimeRanges object to only
show keyframes?

Otherwise, it kind of sounds like you're supposed to start playback at
the requested position. The final paragraph of that section suggests
another out: you can reposition the playback head inside the playback
engine as long as you clip to media.seekable and fire the timeupdate
and seeked events.

 -r

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOwasKAAoJEEcAD3uxRB3vOqkH/0QFNAiOir+EOgZaAhZmhoub
xr0CThlwEIHUoo6TbDHJmqPRaiKVu9hobkf7DScG2yhjUDaT2vTptF2Wg+lgI2LE
pRdjSUi0hArKrNmC8zCV+zG/82yE0l+RlBXOLjLPKXBo0PDqovXKbOknlnv68/P7
0vyhB9t7L8zLhDCL0BEbuF5OoikpW1Zt9iru+ThbY+bU7RTCFSvE0MmnqMAB3MOx
7HOa2liovoeUotoFVEpDMnD5ZbSxbmifax0CRRdcb9u4m/7HG4EaVoK9GOjxmGzA
zqCtVZ7Yb+hoAPQIXP4tqsrgM9ma/U0LKgYj1lQMjD4whyp96X+iv/vNuxvVRh8=
=tort
-END PGP SIGNATURE-


Re: [whatwg] SRT research: timestamps

2011-10-11 Thread Ralph Giles
On 10/10/11 12:19 AM, Simon Pieters wrote:

 0 negative intervals
 0 cues skipped because field counts were different

That will teach me to proofread after posting. The real counts should be:

2227 negative intervals
6822 cues skipped because field counts were different

From which I conclude negative intervals aren't a significant problem.

Thanks for running my script!

 -r


Re: [whatwg] SRT research: timestamps

2011-10-07 Thread Ralph Giles
On 06/10/11 01:58 AM, Simon Pieters wrote:

 I don't know how many have negative interval, I'd need to run a new
 script over the 52,000,000 lines to figure out. (If you want me to check
 this, please contact me with details about what you want to count as
 negative interval.)

I had in mind something like:

cat *.vtt | awk '/--/ {
  ns = split($1, start, [:.,]);
  ne = split($3, end, [:.,]);
  if (ns != ne) print timestamp field counts differ;
  if (end[1] -start[1]  0) print negative interval;
}
BEGIN { negs = 0; misses = 0; }
END { print negs, negative intervals;
  print misses, cues skipped because field counts were different;
}'

Which will probably still miscount some garbage lines, but gives a rough
idea.

 leading id e.g.
 10300:11:53,891 -- 00:11:56,155
 
 33

OTOH, sounds like the leading id issue is vanishingly uncommon, so I'm
just curious if there are any other queues which would be rejected that way.

 -r



Re: [whatwg] SRT research: timestamps

2011-10-06 Thread Ralph Giles
This is all I meant as well. Of course we should all implement the parser as 
spec'd. My comments were with respect to amending the spec to be more forgiving 
of common errors.

 -r

Philip Jägenstedt phil...@opera.com wrote:

On Thu, 06 Oct 2011 07:36:00 +0200, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:

 On Thu, Oct 6, 2011 at 10:51 AM, Ralph Giles gi...@mozilla.com wrote:
 On 05/10/11 04:36 PM, Glenn Maynard wrote:

 If the files don't work in VTT in any major implementation, then  
 probably
 not many.  It's the fault of overly-lenient parsers that these things  
 happen
 in the first place.

 A point Philip Jägenstedt has made is that it's sufficiently tedious to
 verify correct subtitle playback that authors are unlikely to do so with
 any vigilance. Therefore the better trade-off is to make the parser
 forgiving, rather than inflict the occasional missing cue on viewers.

 That's a slippery slope to go down on. If they cannot see the
 consequence, they assume it's legal. It's not like we are totally
 screwing up the display - there's only one mis-authored cue missing.
 If we accept one type of mis-authoring, where do you stop with
 accepting weirdness? How can you make compatible implementations if
 everyone decides for themselves what weirdness that is not in the spec
 they accept?

 I'd rather we have strict parsing and recover from brokenness. It's
 the job of validators to identify broken cues. We should teach authors
 to use validators before they decide that their files are ok.

 As for some of the more dominant mis-authorings: we can accept them as
 correct authoring, but then they have to be made part of the
 specification and legalized.

To clarify, I have certainly never suggested that implementation do  
anything other than follow the spec to the letter. I *have* suggested that  
the parsing spec be more tolerant of certain errors, but looking at the  
extremely low error rates in our sample I have to conclude that either (1)  
the data is biased or (2) most of these errors are not common enough that  
they need to be handled.

-- 
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] (no subject)

2011-10-05 Thread Ralph Giles
On 05/10/11 11:37 AM, Ashley Sheridan wrote:

 I would assume the part that the Skype plugin is being used for, as the
 only other part of the chat that isn't HTML/Javascript code is the
 Jabber connectivity, which isn't strictly a plugin per-say, more an
 additional interface to the raw data that is enabled through server
 modules.

The Audio/Video chat part, which supports similar uses to the Skype
plugin, is part of the WebRTC effort. Jabber connectivity is something
you can currently do by tunnelling the stanzas (messages) over XHR or
WebSockets.

Hope that helps orient you,
 -r


Re: [whatwg] SRT research: timestamps

2011-10-05 Thread Ralph Giles
On 05/10/11 10:22 AM, Simon Pieters wrote:

 I did some research on authoring errors in SRT timestamps to inform
 whether WebVTT parsing of timestamps should be changed.

This is completely awesome, thanks for doing it.

 hours too many '(^|\s|)\d{3,}[:\.,]\d+[:\.,]\d+'
 834

As Silvia mentioned, the WebVTT spec currently leaves the number of
digits in the hour field as implementation defined, so long as it's at
least two.

I asked previously[1] if we could agree on and specify a limit. Would
you mind checking what the histogram of digit numbers is in the hours
field? Especially if you can separate cases like

 34500:24:01,000 -- 00:24:03,000

either because the index is missing, or because the the interval is
negative (for which the WebVTT spec would reject the entire cue).

Cheers,
 -r

[1]
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-September/033271.html


Re: [whatwg] SRT research: timestamps

2011-10-05 Thread Ralph Giles
On 05/10/11 04:36 PM, Glenn Maynard wrote:

 If the files don't work in VTT in any major implementation, then probably
 not many.  It's the fault of overly-lenient parsers that these things happen
 in the first place.

A point Philip Jägenstedt has made is that it's sufficiently tedious to
verify correct subtitle playback that authors are unlikely to do so with
any vigilance. Therefore the better trade-off is to make the parser
forgiving, rather than inflict the occasional missing cue on viewers.

 -r




Re: [whatwg] track / WebVTT issues

2011-09-21 Thread Ralph Giles

On 21/09/11 04:04 AM, Anne van Kesteren wrote:


I have an additional point. Can we maybe consider naming it just VTT?
At least as far as file signatures, media types, and other
developer-facing identifiers are concerned.


Three ASCII characters is a little sparse for a file signature. 
Otherwise, I don't object.


 -r


Re: [whatwg] What is the purpose of timeupdate?

2009-11-05 Thread Ralph Giles
On Thu, Nov 5, 2009 at 6:10 AM, Brian Campbell
brian.p.campb...@dartmouth.edu wrote:

 As implemented by Safari and Chrome (which is the minimum rate allowed by
 the spec), it's not really useful for that purpose, as 4 updates per second
 makes any sort of synchronization feel jerky and laggy.

It really depends what you're doing. It's fine for presentation
slides, and not too bad for subtitles. I agree it's useless for
anything moving faster than that.

 -r


Re: [whatwg] Serving up Theora video in the real world

2009-07-09 Thread Ralph Giles
On Thu, Jul 9, 2009 at 3:34 PM, David Gerarddger...@gmail.com wrote:

 Anyone got ideas on the iPhone problem?

I think this is off topic, and I am not an iPhone developer, but:

Assuming the app store terms allow video players, it should be
possible to distribute some sort of dedicated player application, free
or otherwise. I believe the fee for a cert to sign applications is
currently $100/year.

However, the iPhone doesn't have a shared filesystem, or
helper-applications in the normal sense, At least not as far as I can
tell. The work-around I'm aware of is for site authors to check if
you're running on the iPhone in javascript, and rewrite the video
elements to normal anchors with a custom schema, e.g.

  a href=oggplayer://example.com/file.ogvClick here to watch in
Ogg Player/a.

Then, if the user has installed the Ogg Player app, which registers
itself has handling the 'oggplayer' schama, Safari will pass the
custom uri to it, and it can download/stream/whathaveyou.

 -r


Re: [whatwg] Serving up Theora video in the real world

2009-07-09 Thread Ralph Giles
On Thu, Jul 9, 2009 at 9:22 PM, Maciej Stachowiakm...@apple.com wrote:

 I think at one point I suggested that canPlayType should return one of
 boolean false, true or maybe, so that naiive boolean tests would work. Or
 in any case, make the no option something that tests as boolean false.

We seem to have wound up with a tristate where one of the states is
never used, which is unfortunate. Is it too late now?

To recap (off the top of my head): it's hard to say if you can play
something because that requires either a validator, or actually
playing it, So in addition to 'yes' and 'no', a 'maybe' was added, to
say I've heard of the media type and codecs and profiles, so it's
worth trying; no promises.

But the whole idea of canPlayType is to be a lightweight test do
decide what to try playing in advance, so browsers are unlikely to
implement a check that involves actually loading and testing the
resource. So in practice 'yes' would only be returned in special
applications, not for general web resources. 'No' means no, 'maybe'
means yes and 'yes' never gets used.

It will be confusing either way, but if canPlayType were reverted to a
boolean, with a note that true means 'maybe', not 'yes', there would
at least be fewer programming errors.

But Ian's just replied with a clever compromise.

 -r


Re: [whatwg] Start position of media resources

2009-04-07 Thread Ralph Giles
On Tue, Apr 7, 2009 at 1:26 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:

 For example, take a video that is a subpart of a larger video and has
 been delivered through a media fragment URI
 (http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-reqs/).
 When a user watches both, the fragment and the full resource, and both
 start at 0, he/she will assume they are different resources, when in
 fact one is just a fragment of the other.

There is a ui problem here, in that the 'seek bar' control typically
displayed by web video players has finite resolution. It works great
for short-form clips a la YouTube, but a 30 second segment of a two
hour movie amounts to a few pixels. Displaying such a fragment in the
context of the complete work makes a linear progress bar useless for
seeking within the fragment itself, everything having been traded for
showing that it's part of a much larger resource. Never mind that a
temporal url can equally well reference a five minute section of a
2 hour webcam archive.

Showing a fragment in context is helpful, not just for the time
offset, but cue points, related resources, and so on. The default
controls the browser provides can't encompass all possible context
interfaces, so perhaps the focus here should be on what is necessary
to enable scripts (or browser vendors) to build more complicated
interfaces when they're appropriate.

 -r


Re: [whatwg] Video playback quality metric

2009-02-10 Thread Ralph Giles
On Tue, Feb 10, 2009 at 1:54 AM, Michael A. Puls II
shadow2...@gmail.com wrote:

 Flash has low, medium and high quality that the user can change (although a
 lot of sites/players seem to rudely disable that option in the menu for some
 reason). This helps out a lot and can allow a video to play better. I could
 imagine an Auto option too that automatically switched quality as
 necessary to get decent playback.

Isn't that rendering quality? That can of course be adjusted by the
browser, dynamically and/or according to a user setting, with or
without a javascript interface.

 -r


Re: [whatwg] Thoughts on video accessibility

2008-12-09 Thread Ralph Giles
On Mon, Dec 8, 2008 at 9:20 PM, Martin Atkins [EMAIL PROTECTED] wrote:

 My concern is that if the only thing linking the various streams together is
 the HTML document then the streams are less useful outside of a web browser
 context.

Absolutely. This proposal places an additional burden on the user to
download and integrate multiple resources. This trade-off to supports
applications where having the text available separately is valuable.

 -r


Re: [whatwg] Thoughts on video accessibility

2008-12-08 Thread Ralph Giles
On Mon, Dec 8, 2008 at 6:08 PM, Martin Atkins [EMAIL PROTECTED] wrote:

 What are the advantages of doing this directly in HTML rather than having
 the src attribute point at some sort of compound media document?

The general point here is that subtitle data is in current practice
often created and stored in external files. This is, in part, because
of poor support for embedded tracks in web video applications, but
also arises naturally in production workflow. Moreover, because they
are text, subtitle data is much more likely to be stored in a database
with other text-based content while audio and video is treated as
binary blobs. This scheme is intended to support such hybrid systems.

There is generally a tension between authors wanting to easily
manipulate and add tracks, users wanting a self-contained file, and
search engines wanting stand-alone access to just the text. Because
splitting and merging media files requires special tools, our thinking
in the Ogg accessibility group has been that we need to support both
embedded and external references for text tracks in html. Users (and
their tools) can then choose what methods they want to use in
particular circumstances.

We're also interested in a more sophisticated mechanism for
communicating track assortments between a server and a client, but in
the particular case of text tracks for accessiblity, I think having a
simple, explicit mechanism at the html level is worthwhile.

 -r


Re: [whatwg] Same-origin checking for media elements

2008-11-10 Thread Ralph Giles

On 10-Nov-08, at 7:49 PM, Maciej Stachowiak wrote:


1) Allow unrestricted cross-origin video/audio
2) Allow cross-origin video/audio but carefully restrict the  
API to limit the information a page can get about media loaded  
from a different origin
3) Disallow cross-origin video/audio unless the media server  
explicitly allows it via the Access Control spec (e.g. by sending  
the Access-Control-Allow-Origin: * header).


I'd prefer 1 or 2 (assuming the restrictions assumed by 2 are  
reasonable).


One point that came out of the theora-level thread is that (2) would  
be less surprising if there's some kind of error mechanism flagging  
the restriction. For example, taint-tracking infrastructure could  
throw an exception when the javascript vm attempts to move cross-site  
data outside the layout and render engines.


This would offer some help to authors when a locally tested design  
mysteriously stops working when deployed.


FWIW,
 -r



Re: [whatwg] Video element and duration attribute

2008-11-06 Thread Ralph Giles
On Thu, Nov 6, 2008 at 9:46 AM, Eric Carlson [EMAIL PROTECTED] wrote:

  Instead of seeking to the end of the file to calculate an 
 exact
 duration as you describe, it is much cheaper to estimate the duration by
 processing a fixed portion of the file and extrapolating to the duration
 based on the file size. QuickTime does this and it works quite well.

I expect this works much better for mp3 than it does for variable
bitrate streams. Nevertheless it's much better than nothing.

Your argument that the durationchange event makes flexible
determination reasonable is a good one. I'm fine with not including
the duration argument, for whatever that's worth.

 -r


Re: [whatwg] video tag: pixel aspect ratio

2008-10-15 Thread Ralph Giles
On Wed, Oct 15, 2008 at 2:40 AM, Ian Hickson [EMAIL PROTECTED] wrote:

 Is that not enough?

It is enough. Sander and Eduard have provided excellent arguments why
the pixel aspect ratio, and especially the frame rate, should be
represented as rationals in video formats. But as an override for
already broken video streams compliance to best practice does not
justify another data type in html5.

To put Anne's comment another way, one needs a gigapixel display
device before the difference between 1.0925 (rounded to only 5
figures) and 59/54 affects the behaviour of the scaling algorithm at
all. There aren't so many aspect ratios is common use--you're welcome
to choose the one nearest to the floating point value given if you
think it's important.

 -r

--
Ralph Giles
Xiph.org Foundation


Re: [whatwg] Video : Slow motion, fast forward effects

2008-08-07 Thread Ralph Giles
On Thu, Aug 7, 2008 at 1:57 AM, Philip Jägenstedt [EMAIL PROTECTED] wrote:

 I suggest that the spec allows raising the NOT_SUPPORTED_ERR exception
 in response to any playback rate which it cannot provide for the current
 configuration.

That sounds reasonable. It is a special effect.

 With a netcast you couldn't support any playback rate
 except 1.0 without first buffering all the data you want to play at a
 faster rate, so changing the playback rate doesn't make sense.

Well, it would be better to implement the requested playback rate (and
direction) as long as you can and then rebuffer when necessary. Doing
that well requires extra sophistication in the buffering code, but the
client might just be trying to scrub a bit, which would fit within a
normal seek buffer. Even with live streams, if you try to pull faster
than realtime you'll just buffer-wait, and if you're going slower
you'll fill whatever space you have, then have to drop and restart.

If all you have is a jitter buffer, then sure.

 -r


Re: [whatwg] Change request: rules for pixel ratio

2008-06-10 Thread Ralph Giles

On 10-Jun-08, at 9:31 AM, Philip Jägenstedt wrote:

The default value, if the attribute is omitted or cannot be parsed,  
is the media resource's self-described pixel ratio, or 1.0 for  
media resources that do not self-describe their pixel ratio.


This is actually how I read the original, but your wording resolves  
the ambiguity. Sounds like an improvement to me.


Does this mean the implementation must retrieve the self-described  
pixel aspect ratio from the stream and supply it as part of the DOM?


 -r

Re: [whatwg] Video codec requirements changed

2008-01-07 Thread Ralph Giles
On Mon, Jan 07, 2008 at 01:50:09PM -0800, Dave Singer wrote:

 I get the impression that this is not an openly-specified codec, 
 which I rather think is a problem.  That is, there is neither a 
 publicly available spec. nor publicly-available source, which means 
 that it is controlled by one company.

That matches my understanding.

Bink is widely distributed in commercial (game) software, under licence 
from another commercial entity, if that helps with the submarine patent 
risk.

 -r


Re: [whatwg] codecs and containers

2007-12-10 Thread Ralph Giles
On Mon, Dec 10, 2007 at 09:14:39AM -0800, James Justin Harrell wrote:

 The language could be improved. Ogg Theora refers to Theora-encoded video 
 enclosed in an Ogg
 container, not the Theora codec. Similar for Vorbis. Theora and Vorbis 
 should be used without
 Ogg to refer to the actual codecs.

It is important to specify the a container as well as a codec set for 
the interoperability baseline.

 I also feel the choice here of the Ogg container format is a bad one, and 
 that Matroska would be a
 much better choice.

My understanding is that Matroska doesn't stream well, which is a 
primary concern for loading large resources into web pages with 
progressive rendering. 

 -r


Re: [whatwg] Cue points in media elements

2007-04-30 Thread Ralph Giles
Thanks for adding to the discussion. We're very interested in 
implementing support for presentations as well, so it's good
to hear from someone with experience. 

Since we work on streaming media formats, I always assumed things would 
have to be broken up by the server and the various components streamed 
separately to a browser, and I hadn't noticed the cue point support 
until you pointed it out.

Some comments and questions below...

On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:

 in our language, you might see something like this:
 
   (movie Foo.mov :name 'movie)
   (wait @movie (tc 2 3))
   (show @bullet-1)
   (wait @movie)
   (show @bullet-2)
 
 If the user skips to the end of the media clip, that simply causes  
 all WAITs on that  media clip to return instantly. If they skip  
 forward in the media clip, without ending it, all WAITs before that  
 point will return instantly.

How does this work if, for example, the user seeks forward, and then
back to an earlier position? Would some of the 'show's be undone, or do 
they not seek backward with the media playback? Is the essential 
component of your system that all the shows be called in sequence 
to build up a display state, or that the last state trigger before the 
current playback point have been triggered? Isn't this slow if a bunch 
of intermediate animations are triggered by a seek?

Does your system support live streaming as well? That complicates the 
design some when the presentation media updates appear dynamically.

Anyway I think you could implement your system with the currently 
proposed interface by checking the current playback position and 
clearing a separate list of waits inside your timeupdate callback.

 This is a nice system, but I can't see how even as simple a system as  
 this could be implemented given the current specification of cue  
 points. The problem is that the callbacks execute when the current  
 playback position of a media element reaches the cue point. It seems  
 unclear to me what reaching a particular time means.

I agree this should be clarified. The appropriate interpretation should 
be when the current playback position reaches the frame corresponding to 
the queue point, but digital media has quantized frames, while the cue 
points are floating point numbers. Triggering all cue point callbacks 
between the last current playback position and the current one 
(including during seeks) would be one option, and do what you want as 
long as you aren't seeking backward. I'd be more in favor of triggering
any cue point callbacks that lie between the current playback position 
and the current playback position of the next frame (audio frame for 
audio/ and video frame for video/ I guess). That means more 
bookkeeping to implement your system, but is less surprising in other 
cases.

   If video  
 playback freezes for a second, and so misses a cue point, is that  
 considered to have been reached?

As I read it, cue points are relative to the current playback position, 
which does not advance if the stream buffer underruns, but it would
if playback restarts after a gap, as might happen if the connection
drops, or in an RTP stream. My proposal above would need to be amended
to handle that case, and the decoder dropping frames...finding the right 
language here is hard.

 In the current spec, all that is  
 provided for is controls to turn closed captions on or off. What  
 would be much better is a way to enable the video element to send  
 caption events, which include the text of the current caption, and  
 can be used to display those captions in a way that fits the design  
 of the content better.

I really like this idea. It would also be nice if, for example, the 
closed caption text were available through the DOM so it could be
presented elsewhere, searched locally, and so on. But what about things 
like album art, which might be embedded in an audio stream? Should that 
be accessible? Should a video element expose a set of known cue points 
embedded in the file? 

A more abstract interface is necessary than just 'caption events'. Here 
are some use cases worth considering:

* A media file has embedded textual metadata like title, author, 
copyright license, that the designer would like to access for associated 
display elsewhere in the page, or to alter the displayed user interface
based on the metadata. This is pretty essential for parity with 
flash-based internet radio players.

* A media file has embedded non-textual metadata like an album cover 
image, that the designer would like to access for display elsewhere in
the page.

* The designer wants to access closed captioned or subtitle text 
through the DOM as it becomes available for display elsewhere in the 
page.

* There are points in the media file where the embedded metadata 
changes. These points cannot be retrieved without scanning the file, 
which is expensive over 

Re: [whatwg] Give guidance about RFC 4281 codecs parameter

2007-04-12 Thread Ralph Giles
On Wed, Apr 11, 2007 at 05:45:34PM -0700, Dave Singer wrote:

 But [video/*] does at least indicate that we have a time-based multimedia 
 container on our hands, and that it might contain visual 
 presentation.  application/ suffers that it does not say even that, 
 and it raises the concern that this might be arbitrary, possibly 
 executable, data.  We discussed whether application/ was appropriate 
 for MP4 and decided that it masked important characteristics of the 
 format -- that it really is a time-based multimedia presentation -- 
 and raised unwarranted concerns.

I guess we made the opposite decision. Because Ogg was a container and 
could contain anything, including executable content, we went with the
most generic option, based on analogy with application/octet-stream,
application/pdf, etc. That we were working only on audio at the time
may have coloured our judgement; the video-contains-audio argument 
didn't fit.

I've noticed application/rss as a newer example, but I think that's
more to encourage handoff from browsers without native support than
an attempt at classification.

Maciej's suggestion (registering all three) would work for Ogg, but I
was under the impression that multiple registrations for the same format 
were discouraged. 

The disposition hinting proposal also works for general media types, 
without requiring registration of a suite of media types for every 
container. I also think it's a better solution for playlists, which are 
and aren't time-based media. Would you also go with video/x-m3u, video/rss 
for those text-based formats? Overloading the base types works, but
so does a separate indication. Both are backward-compatible extensions 
to the media-type field, and both require software changes to implement. 
One however, requires registering new types, including audio/quicktime. :)

Thanks for explaining your rationale, it's interesting to hear.

 -r


Re: [whatwg] Give guidance about RFC 4281 codecs parameter

2007-04-10 Thread Ralph Giles
On Tue, Apr 10, 2007 at 11:21:10AM -0700, Dave Singer wrote:

 # application/ogg; disposition=moving-image; codecs=theora, vorbis
 # application/ogg; disposition=sound; codecs=speex
 
 what is the 'disposition' parameter?

The idea of a 'disposition-type' is to mark content with presentational 
information. See the Content-Disposition Header for MIME described in 
RFC 1806 for an early example.

The specific proposal Silvia mentioned is to add the content-
disposition to the media-type to inform parsers of the general
nature of the content, even if they don't recognize the specific
codecs. The allowed values for the 'disposition' label come from
the Dublin Core set. This is not part of RFC 4281, and as far as
I know hasn't been formally documented with the IETF, but we do
think it's a good idea.

This arose out of the need to discover or record audio vs 
audiovisual status for media files in the context of routing
to the proper playback application, which has been particularly 
contentious with the Ogg container since we have insisted that
such distinctions be made via metadata or file inspection instead
of defining distinguishing filename extensions has has been done
with other containers. (MooV is perhaps another example.)

In terms of user presentation, audio vs video vs text vs 
still image is the important distinction, while the 'codecs' 
parameter answers the more technical question of what playback 
capabilities are necessary. A video/ or audio/ markup element 
already describes this adequately, but it is a larger issue for
media handling on the web.

Charles wrote a more detailed proposal in the context of RSS
media syndication, which is where I first heard of the idea.

  http://changelog.ca/log/2005/08/21/rss-disposition-hinting-proposal

We're essentially suggesting his proposal be extended to (media)
containers in general.

 -r


Re: [whatwg] on codecs in a 'video' tag.

2007-04-02 Thread Ralph Giles
On Mon, Apr 02, 2007 at 11:12:07AM -0700, Maciej Stachowiak wrote:

 I don't think Theora (or Dirac) are inherently more interoperable  
 than other codecs. There's only one implementation of each so far, so  
 there's actually less proof of this than for other codecs.

Just to clarify, there are two different Dirac implementations, and two 
different theora decoder implementations. But otherwise your points 
stand. There are many implementations of the mpeg codecs.

I'm not sure how many separate implementations ther are of the Windows 
Media codecs. FFMPEG has a VC-1 implementation and some decoders for
older formats based on reverse-engineering.

 -r


Re: [whatwg] video element feedback

2007-03-24 Thread Ralph Giles
On Fri, Mar 23, 2007 at 04:33:39PM -0700, Eric Carlson wrote:

 Yes, the UA needs the offset/chunking table in order to calculate  
 a file offset for a time, but this is efficient in the case of  
 container formats in which the table is stored together with other  
 information that's needed to play the file. This is not the case for  
 all container formats, of course.

Just to be clear, this isn't strictly true; one can still perform 
bisection seek over HTTP with the byte Range header. As has been 
mentioned, VLC implements this. Alsaplayer is another example. It
does work. It's of course less efficient than when one has a seek 
table, but not excessively so.

Tangentially, I at some point looked at implementing 'seconds' as 
a Range header unit in Apache. (The HTTP Range header allows arbitrary,
units, bytes is just the only one that is defined by the spec). The 
idea was to have the server do the seeking and return a valid file
starting at the requested time offset, or list of intervals. Then
a client could do very naive seeking and just play what it got.

In the end I abandonded it over worry with cache interaction. If you 
request a sequence of intervals you don't in general get the same byte
stream as if you request the whole file, because the server is 
re-packaging the data for each request. With Ogg this sort of works,
because concatenated streams are still in spec, so the decoded result
is the same, but it doesn't work for all containers. The annodex query 
path seemed a better choice.

 -r


Re: [whatwg] video element feedback

2007-03-24 Thread Ralph Giles
On Sat, Mar 24, 2007 at 01:57:45AM -0700, Kevin Marks wrote:

 How does one seek a Vorbis file with video in and recover framing?
 
 It looks like you skip to an arbitrary point and scan for 'OggS' then
 do a 64kB CRC to make sure this isn't a fluke. Then you have some
 packets that correspond to some part of a frame of video or audio.
 You recover a timestamp, and thus you can pick another random point
 and do a binary chop until you hit the timestamp before the one you
 wanted. Then you need to read pages until the timestamp changes and
 you have resynced that stream. Any other interleaved streams are
 presumably being resync'd in parallel so you can then get back to the
 read and skip framing. Try doing that from a CD-ROM.
 
 Do let me know if that has since been fixed.

Nope. That's still the algorithm. Also add that for a keyframe-based 
codec you need to (conceptually) seek again, after you've found the 
desired start point, to feed the decoder from the nearest previous 
restart point.

In practice, not everyone tries for sample-accurate seeking. I gather
the situation is similar with DVD playback.

Streamability (in the unix pipe sense of an unseekable file stream) was 
a design goal for Ogg. This seek algorithm is a consequence. Decoders 
must handle seeking without an index table, so we have regarded the
use of one, whether cached in the file or not, as an implementation 
detail.

FWIW,
 -r