Re: [whatwg] metadata attribute for media
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 17/02/13 05:48 AM, Nils Dagsson Moskopp wrote: If one cares to that extent, and is already handling format differences, dealing with vendor variation on top isn't that much more effort. I disagree, strongly. Ok, thanks for the feedback. Do you like my subsequent proposal better? http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2013-January/038705.html -r -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRImnnAAoJEEcAD3uxRB3vHBwH/14a91L4oN6nktTttTRvQQFf hmtz52nAO7BdInpG68fRACgLVsz2eT+eWakAAMuzh7T57NH9ZNCSWp1Rk/AM3BXD akx6QXQuk9eOwuRlk+BIVIUaKVFyod9D133BdTjGqF6/cVmn7emmIm9BWB0sxXrE 3CPuALqs+07ExHbyB/UsE5BWqsJRccJHg9QnJwXd4aMtHbIg37KAcfgN4aYVTHQV B1I95Uh1Ron2LUCNI9P+Pm8WPQGDcj0cuwFcueRMYy2O26a1xHisOew/l3Cz46Ex Oq6gAX3Lz/y5LxFhnnlPIeBDhBE9bB0r4jqOtY3FNA+dcUruKI769nIw5WtTmd4= =zJ6M -END PGP SIGNATURE-
Re: [whatwg] metadata attribute for media
On 12-12-11 5:23 PM, Ralph Giles wrote: That said, I'm not convinced this is an issue given the primary use-case, which is pretty much that web content wants to do more sophisticated things with the metadata than the user-agent's standardized parsing allows. If one cares to that extent, and is already handling format differences, dealing with vendor variation on top isn't that much more effort. Robert O'Callahan argued (off-list) that there is a significant difference. Exposing what the media framework supplies for metadata, multiplies the established format fragmentation by per-platform differences. That's not something we want to do if we can avoid it, and we can avoid it by doing our own tag normalization as part of the exposed API. We feel that's more important than the extended and custom tag use case, which is still addressible by direct parsing in javascript, etc. I don't intend to continue with implementation of the raw metadata api, and instead will focus on mapping everthing to a standard set of attributes. To that proposal I've added a 'tracknumber' attribute since this is imporant for sorting resources in media player applications. interface Metadata { readonly attribute DOMString title; readonly attribute DOMString artist; readonly attribute DOMString album; readonly attribute long tracknumber; readonly attribute Blob? artwork; }; partial interface MediaElement { Metadata metadata; }; -r
Re: [whatwg] metadata attribute for media
On 12-12-11 4:58 PM, Ian Hickson wrote: This seems reasonable. Thanks for the feedback. Anyone else? :-) I don't want to be the one to maintain the mapping from media formats to metadata schema, because this isn't my area of expertise, and it isn't trivial work. Good point. This would need to be standardized for the fixed-schema proposal, at least for the formats commonly supported by the HTML media element. The Web Ontology working group has done some work here, as Silvia mentioned. I don't think we should have an open-ended API without fixed names, because that is a recipe for an interoperability disaster. I agree it would have interoperability issues. My own implementation experience is that the easy thing to do is to mirror whatever representation your playback framework offers, which can result in per-platform differences as well as per-media-format (and per tagging application, etc.). That said, I'm not convinced this is an issue given the primary use-case, which is pretty much that web content wants to do more sophisticated things with the metadata than the user-agent's standardized parsing allows. If one cares to that extent, and is already handling format differences, dealing with vendor variation on top isn't that much more effort. We could say that user-agents should represent the metadata dictionaries as directly as possible, and to match the tag names from the schema spec when that's not possible. -r
Re: [whatwg] metadata attribute for media
On 12-11-26 4:18 PM, Ralph Giles wrote: interface HTMLMediaElement { ... object getMetadata(); }; After the metadataloaded event fires, this method would return a new object containing a copy of the metadata read from the resource, in whatever format the decoder implementation supplies. It would be up to the caller to do any semantic interpretation. The same method should be present on AudioTrack, VideoTrack, (TextTrack?) and Image elements. The prefixed version of this in Firefox is documented in the 'Methods' section of https://developer.mozilla.org/en-US/docs/DOM/HTMLMediaElement -r
Re: [whatwg] [mimesniff] Handling container formats like Ogg
On 12-11-27 9:19 AM, Gordon P. Hemsley wrote: Is it sufficient to sniff just for application/ogg and then let the UA's Ogg library determine whether or not the contents of the file can be handled? (I'm sensing the consensus is yes.) I think so. Defining a codec enumerating algorithm and mime type decision tree is going to be several pages of spec for container formats like Ogg and Matroska. As an example, the 'file' tool reads the first codec header present in an Ogg file, which can be found at a set of fixed offsets. This lets it get some useful information for a lot of files, especially since it's RECOMMENDED that primary codecs are listed first. I.e. theora should be first in a 'video/ogg', opus should be first in an 'audio/ogg; codecs=opus'. But files can be muxed without following the priority rule and will still play fine, and there can be a 'skeleton' header in front of the other codec data, obcuring primary role. Just looking at fixed offsets isn't sufficient to distinguish audio and video reliably. IMHO, -r
Re: [whatwg] metadata attribute for media
On 12-09-27 1:44 AM, Philip Jägenstedt wrote: I'm skeptical that all that we want from ID3v2 or common VorbisComment tags can be mapped to Dublin Core, it seems better to define mappings directly from the underlying format to the WebIDL interface. You're right. Given the open-endedness of metadata contained in actual media resources, I'm personally a bit skeptical that there's something we could add to the Web platform that would be better than just letting authors pass that metadata out-of-band using any representation they like, but what use cases are you trying to cover here? Two use cases I'm trying to address: - A web application presents some view of a media library. If the libray resides on a server, then yes, the server-side component of the app can parse, cache, and deliver the metadata out-of-band. But the library could also be local, in which case the webapp must do its own parsing, e.g. from a list of blob urls returned by the file api. - An author wants to display the embedded track title and artist name with simple scripting on a webpage. One of the goals of html media was to make video and audio as simple to include as images. User agents are generally parsing this metadata anyway; exposing it is straightforward, and greatly simplifies the two tasks above. In any case, the media.mozGetMetadata() method I described earlier is available in Firefox 17, released last week, with support for Vorbis tags in .ogg files. Firefox 18, now in beta, adds support for .opus files as well. Here's an example: https://people.xiph.org/~giles/2012/metadata.html You should see 'Title', 'Album', etc. below the controls if your browser supports mozGetMetadata(). We're continuing to implement this interface for other formats we support. I still think it's useful to define both this 'raw' metadata interface, returning just the data out of the file, and a more structured metadata object with standardized tag names. I've left off implementing the second one for lack of feedback on what the basic tags should be, and how useful they are. There certainly wasn't consensus here. So, what do you think of these two proposals: 1. Define the 'raw' getMetadata method in an unprefixed form: interface HTMLMediaElement { ... object getMetadata(); }; After the metadataloaded event fires, this method would return a new object containing a copy of the metadata read from the resource, in whatever format the decoder implementation supplies. It would be up to the caller to do any semantic interpretation. The same method should be present on AudioTrack, VideoTrack, (TextTrack?) and Image elements. 2. Define a parsed metadata attribute with some standard tags: interface Metadata { readonly attribute DOMString title; readonly attribute DOMString artist; readonly attribute DOMString album; readonly attribute Blob? artwork; }; interface MediaElement { ... Metadata metadata; }; So you could say something as simple as img.src = audio.metadata.artwork; to display the cover art embedded in a download single. DOMString attributes would have the value of an empty string if the underlying data isn't available. These four attributes are the absolute minimum, I think, for the use cases above. More could be added later as usage of the api evolves. For example: date, publisher, tracknumber, tracktotal, description, genre, sort-artist, sort-title, copyright, license, url. If having both media.getMetadata() (raw) and media.metadata is confusing, the first proposal could be named media.getRawMetadata() as well. Does it make sense to include more technical metadata here? For example: samplerate, channels, duration, width, height, framerate, aspect-ratio. Firefox currently has prefixed properties for the number of channels and the audio sample rate, and including these in the metadata interface would let us deprecate the prefixed versions. On the other hand, properties like duration, width, and height are available directly on media elements today, so maybe it makes more sense to do the same for the others. In that case, do we want the indirection of the Metadata inferface? Saying 'video.title' or 'img.src = audio.artwork' instead is shorter. Feedback welcome, -r
[whatwg] metadata attribute for media
Recently, we've been considering adding a 'tags' or 'metadata' attribute to HTML media elements in Firefox, to allow webcontent access to metadata from the playing media resource. In particular we're interested in tag data like creator, title, date, and so on. My recollection is that this has been discussed a number of times in the past, but there was never suffient motivation to support the interface. Our particular motivation here is webapps that present a media file library. While it's certainly possible to parse the tag data out directly with javascript, it's more convenient if the HTML media element does so, and the underlying platform decoder libraries usually provide this data already. As such I wanted to raise the issue here and get design feedback and levels of interest for other user agents. Here's a first idea: partial interface HTMLMediaElement { readonly attribute object tags; }; Accessing media.tags provides an object with a key: value data, for example: { 'title': 'My Movie', 'creator': 'This User', 'date': '2012-06-18', 'license': 'http://creativecommons.org/licenses/by-nc-sa/' } The keys may need to be filtered, since the files can contain things like base64-encoded cover art, which makes the object prohibitively large. The keys may need to be mapped to some standard scheme (i.e. dublic core) since vocabularies vary from format to format. This is nice because it's easy to access, can be simply enumerated, and extensible. Which is helpful when if gets added the img for exif data. -r
Re: [whatwg] WebVTT feedback (and some other video feedback that snuck in)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/12/11 05:38 AM, David Singer wrote: Very. Correct rendering of some text requires that it be correctly labelled with a BCP-47 tag, as I understand. For me, file-level default/overall setting, with the possibility of span-based over-rides, seems ideal. I think. Yes. A 'srclang' attribute in the referencing track element would override 'lang' in the file header. I think span-based overrides in the file have to override both file-level settings; means you can't fix them with markup, but nothing else makes sense. -r -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO2QjMAAoJEEcAD3uxRB3vfUUH/iUK/Cthllw1GZGLc3Wqlh6+ J08vA2TtSziEIivfx5bVh3OdpZDj0cKnn18s6VUYn4I2S/gUeu6EPyNM1a3R8MlY tHuwqbhiZ0kN2nKx1+I06uk1PEJXBYHuEwjzM1S6IlDdVWNpxplec9vgSXLtd13I eOYH9RowTymFF9ONjqfboeVfb1EE7xZEqh9VdB/rY5R4m0sL0c8ua5YY/qHY+WWf 7q5SieAv6Kcxf6t6iw5uuRjc97vgZRfctT3RU065UmNJe1VIJJfSL0rC3Xb5yAxg AJW1gJO+NuNC5uiNel9aDKNNfjdnugEFNbgnzX1IQEgTgJzzougA/itjwKOni6s= =8fTy -END PGP SIGNATURE-
Re: [whatwg] I have feature requests for video and audio tags.
On 11-11-17 12:29 PM, Jonas Sicking wrote: Authors do however know how loud the volume of the media is though. If a video is encoded with a very loud volume, or a very quiet volume, it can be quite useful to be able to adjust that up or down when linking to it. I was going to say, that sounds like user-agents should support replay-gain tags. Note that expected line level is actually a required header for the Opus codec's Off encapsulation. However, relying on a replaygain tag in the media file does violate the 'simple to author' goal of html. -r
Re: [whatwg] HTML5 video seeking
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 15/11/11 10:32 AM, Aaron Colwell wrote: Yeah it looks to me like starting at the requested position is the only option. I saw the text about media engine triggered seeks, but it seems like users would be very surprised to see the seeking seeked events for the seek they requested immediately followed by a pair of events to a location they didn't request. I can envision the angry bug reports now. ;) Yeah, it's definitely bending the rules. If you only intended to support seeking to the nearest keyframe, setting media.seekable would be an honest way to advertise that, but also violates least surprise. Thanks for the response. Looks like we are interpreting the text the same way. Yes, my recollection of the earlier discussion aligns with your summary and Chris Double's. It's expensive, but what one naively expects the API to do. Video splicing/mixing was a use case we wanted to support, and such applications aren't really possible without frame-accurate seeking. Thus, it's better to require it up front and possibly allow applications to relax it later, as with Frank and Philip's 'strict' attribute, than to disallow such applications by leaving this to implementation variance. -r -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOxDrhAAoJEEcAD3uxRB3vHT4H/0DfMChkHztMwqO0FEkJql3u BYU8B0ZFxQ/rllU9qdTdu+ioYRvIriFP9UFbeO+sO85Qy5Jaz9u9soKWE8siIHMP rWWxOVQbbZMbbLcrtgbNreePwyRX6P1fdTTpxjUUnl0g/mVajE+5BohaVwsc/dSK mr0S53a49od3675dNQQaycLbSAEI8eaVvG5saOyOfN41GK+ctEtnfro7Z0cUZhzZ A0C3P+/Jr+fKOZPocpJ4RPPWbkzCeO8BZOblRgrHynTpHYs20OTBbvW3TMZuT6Np hnwVclFwye1hgqYjAR83PezVAz/9rPKsox96+VQc/in7fne2H5drKIMW0ADd+hM= =r1zI -END PGP SIGNATURE-
Re: [whatwg] HTML5 video seeking
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14/11/11 03:49 PM, Aaron Colwell wrote: Does this mean the user agent must resume playback at the exact location specified? Maybe you can muck with the 'media.seekable' TimeRanges object to only show keyframes? Otherwise, it kind of sounds like you're supposed to start playback at the requested position. The final paragraph of that section suggests another out: you can reposition the playback head inside the playback engine as long as you clip to media.seekable and fire the timeupdate and seeked events. -r -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOwasKAAoJEEcAD3uxRB3vOqkH/0QFNAiOir+EOgZaAhZmhoub xr0CThlwEIHUoo6TbDHJmqPRaiKVu9hobkf7DScG2yhjUDaT2vTptF2Wg+lgI2LE pRdjSUi0hArKrNmC8zCV+zG/82yE0l+RlBXOLjLPKXBo0PDqovXKbOknlnv68/P7 0vyhB9t7L8zLhDCL0BEbuF5OoikpW1Zt9iru+ThbY+bU7RTCFSvE0MmnqMAB3MOx 7HOa2liovoeUotoFVEpDMnD5ZbSxbmifax0CRRdcb9u4m/7HG4EaVoK9GOjxmGzA zqCtVZ7Yb+hoAPQIXP4tqsrgM9ma/U0LKgYj1lQMjD4whyp96X+iv/vNuxvVRh8= =tort -END PGP SIGNATURE-
Re: [whatwg] SRT research: timestamps
On 10/10/11 12:19 AM, Simon Pieters wrote: 0 negative intervals 0 cues skipped because field counts were different That will teach me to proofread after posting. The real counts should be: 2227 negative intervals 6822 cues skipped because field counts were different From which I conclude negative intervals aren't a significant problem. Thanks for running my script! -r
Re: [whatwg] SRT research: timestamps
On 06/10/11 01:58 AM, Simon Pieters wrote: I don't know how many have negative interval, I'd need to run a new script over the 52,000,000 lines to figure out. (If you want me to check this, please contact me with details about what you want to count as negative interval.) I had in mind something like: cat *.vtt | awk '/--/ { ns = split($1, start, [:.,]); ne = split($3, end, [:.,]); if (ns != ne) print timestamp field counts differ; if (end[1] -start[1] 0) print negative interval; } BEGIN { negs = 0; misses = 0; } END { print negs, negative intervals; print misses, cues skipped because field counts were different; }' Which will probably still miscount some garbage lines, but gives a rough idea. leading id e.g. 10300:11:53,891 -- 00:11:56,155 33 OTOH, sounds like the leading id issue is vanishingly uncommon, so I'm just curious if there are any other queues which would be rejected that way. -r
Re: [whatwg] SRT research: timestamps
This is all I meant as well. Of course we should all implement the parser as spec'd. My comments were with respect to amending the spec to be more forgiving of common errors. -r Philip Jägenstedt phil...@opera.com wrote: On Thu, 06 Oct 2011 07:36:00 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Thu, Oct 6, 2011 at 10:51 AM, Ralph Giles gi...@mozilla.com wrote: On 05/10/11 04:36 PM, Glenn Maynard wrote: If the files don't work in VTT in any major implementation, then probably not many. It's the fault of overly-lenient parsers that these things happen in the first place. A point Philip Jägenstedt has made is that it's sufficiently tedious to verify correct subtitle playback that authors are unlikely to do so with any vigilance. Therefore the better trade-off is to make the parser forgiving, rather than inflict the occasional missing cue on viewers. That's a slippery slope to go down on. If they cannot see the consequence, they assume it's legal. It's not like we are totally screwing up the display - there's only one mis-authored cue missing. If we accept one type of mis-authoring, where do you stop with accepting weirdness? How can you make compatible implementations if everyone decides for themselves what weirdness that is not in the spec they accept? I'd rather we have strict parsing and recover from brokenness. It's the job of validators to identify broken cues. We should teach authors to use validators before they decide that their files are ok. As for some of the more dominant mis-authorings: we can accept them as correct authoring, but then they have to be made part of the specification and legalized. To clarify, I have certainly never suggested that implementation do anything other than follow the spec to the letter. I *have* suggested that the parsing spec be more tolerant of certain errors, but looking at the extremely low error rates in our sample I have to conclude that either (1) the data is biased or (2) most of these errors are not common enough that they need to be handled. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] (no subject)
On 05/10/11 11:37 AM, Ashley Sheridan wrote: I would assume the part that the Skype plugin is being used for, as the only other part of the chat that isn't HTML/Javascript code is the Jabber connectivity, which isn't strictly a plugin per-say, more an additional interface to the raw data that is enabled through server modules. The Audio/Video chat part, which supports similar uses to the Skype plugin, is part of the WebRTC effort. Jabber connectivity is something you can currently do by tunnelling the stanzas (messages) over XHR or WebSockets. Hope that helps orient you, -r
Re: [whatwg] SRT research: timestamps
On 05/10/11 10:22 AM, Simon Pieters wrote: I did some research on authoring errors in SRT timestamps to inform whether WebVTT parsing of timestamps should be changed. This is completely awesome, thanks for doing it. hours too many '(^|\s|)\d{3,}[:\.,]\d+[:\.,]\d+' 834 As Silvia mentioned, the WebVTT spec currently leaves the number of digits in the hour field as implementation defined, so long as it's at least two. I asked previously[1] if we could agree on and specify a limit. Would you mind checking what the histogram of digit numbers is in the hours field? Especially if you can separate cases like 34500:24:01,000 -- 00:24:03,000 either because the index is missing, or because the the interval is negative (for which the WebVTT spec would reject the entire cue). Cheers, -r [1] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-September/033271.html
Re: [whatwg] SRT research: timestamps
On 05/10/11 04:36 PM, Glenn Maynard wrote: If the files don't work in VTT in any major implementation, then probably not many. It's the fault of overly-lenient parsers that these things happen in the first place. A point Philip Jägenstedt has made is that it's sufficiently tedious to verify correct subtitle playback that authors are unlikely to do so with any vigilance. Therefore the better trade-off is to make the parser forgiving, rather than inflict the occasional missing cue on viewers. -r
Re: [whatwg] track / WebVTT issues
On 21/09/11 04:04 AM, Anne van Kesteren wrote: I have an additional point. Can we maybe consider naming it just VTT? At least as far as file signatures, media types, and other developer-facing identifiers are concerned. Three ASCII characters is a little sparse for a file signature. Otherwise, I don't object. -r
Re: [whatwg] What is the purpose of timeupdate?
On Thu, Nov 5, 2009 at 6:10 AM, Brian Campbell brian.p.campb...@dartmouth.edu wrote: As implemented by Safari and Chrome (which is the minimum rate allowed by the spec), it's not really useful for that purpose, as 4 updates per second makes any sort of synchronization feel jerky and laggy. It really depends what you're doing. It's fine for presentation slides, and not too bad for subtitles. I agree it's useless for anything moving faster than that. -r
Re: [whatwg] Serving up Theora video in the real world
On Thu, Jul 9, 2009 at 3:34 PM, David Gerarddger...@gmail.com wrote: Anyone got ideas on the iPhone problem? I think this is off topic, and I am not an iPhone developer, but: Assuming the app store terms allow video players, it should be possible to distribute some sort of dedicated player application, free or otherwise. I believe the fee for a cert to sign applications is currently $100/year. However, the iPhone doesn't have a shared filesystem, or helper-applications in the normal sense, At least not as far as I can tell. The work-around I'm aware of is for site authors to check if you're running on the iPhone in javascript, and rewrite the video elements to normal anchors with a custom schema, e.g. a href=oggplayer://example.com/file.ogvClick here to watch in Ogg Player/a. Then, if the user has installed the Ogg Player app, which registers itself has handling the 'oggplayer' schama, Safari will pass the custom uri to it, and it can download/stream/whathaveyou. -r
Re: [whatwg] Serving up Theora video in the real world
On Thu, Jul 9, 2009 at 9:22 PM, Maciej Stachowiakm...@apple.com wrote: I think at one point I suggested that canPlayType should return one of boolean false, true or maybe, so that naiive boolean tests would work. Or in any case, make the no option something that tests as boolean false. We seem to have wound up with a tristate where one of the states is never used, which is unfortunate. Is it too late now? To recap (off the top of my head): it's hard to say if you can play something because that requires either a validator, or actually playing it, So in addition to 'yes' and 'no', a 'maybe' was added, to say I've heard of the media type and codecs and profiles, so it's worth trying; no promises. But the whole idea of canPlayType is to be a lightweight test do decide what to try playing in advance, so browsers are unlikely to implement a check that involves actually loading and testing the resource. So in practice 'yes' would only be returned in special applications, not for general web resources. 'No' means no, 'maybe' means yes and 'yes' never gets used. It will be confusing either way, but if canPlayType were reverted to a boolean, with a note that true means 'maybe', not 'yes', there would at least be fewer programming errors. But Ian's just replied with a clever compromise. -r
Re: [whatwg] Start position of media resources
On Tue, Apr 7, 2009 at 1:26 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: For example, take a video that is a subpart of a larger video and has been delivered through a media fragment URI (http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-reqs/). When a user watches both, the fragment and the full resource, and both start at 0, he/she will assume they are different resources, when in fact one is just a fragment of the other. There is a ui problem here, in that the 'seek bar' control typically displayed by web video players has finite resolution. It works great for short-form clips a la YouTube, but a 30 second segment of a two hour movie amounts to a few pixels. Displaying such a fragment in the context of the complete work makes a linear progress bar useless for seeking within the fragment itself, everything having been traded for showing that it's part of a much larger resource. Never mind that a temporal url can equally well reference a five minute section of a 2 hour webcam archive. Showing a fragment in context is helpful, not just for the time offset, but cue points, related resources, and so on. The default controls the browser provides can't encompass all possible context interfaces, so perhaps the focus here should be on what is necessary to enable scripts (or browser vendors) to build more complicated interfaces when they're appropriate. -r
Re: [whatwg] Video playback quality metric
On Tue, Feb 10, 2009 at 1:54 AM, Michael A. Puls II shadow2...@gmail.com wrote: Flash has low, medium and high quality that the user can change (although a lot of sites/players seem to rudely disable that option in the menu for some reason). This helps out a lot and can allow a video to play better. I could imagine an Auto option too that automatically switched quality as necessary to get decent playback. Isn't that rendering quality? That can of course be adjusted by the browser, dynamically and/or according to a user setting, with or without a javascript interface. -r
Re: [whatwg] Thoughts on video accessibility
On Mon, Dec 8, 2008 at 9:20 PM, Martin Atkins [EMAIL PROTECTED] wrote: My concern is that if the only thing linking the various streams together is the HTML document then the streams are less useful outside of a web browser context. Absolutely. This proposal places an additional burden on the user to download and integrate multiple resources. This trade-off to supports applications where having the text available separately is valuable. -r
Re: [whatwg] Thoughts on video accessibility
On Mon, Dec 8, 2008 at 6:08 PM, Martin Atkins [EMAIL PROTECTED] wrote: What are the advantages of doing this directly in HTML rather than having the src attribute point at some sort of compound media document? The general point here is that subtitle data is in current practice often created and stored in external files. This is, in part, because of poor support for embedded tracks in web video applications, but also arises naturally in production workflow. Moreover, because they are text, subtitle data is much more likely to be stored in a database with other text-based content while audio and video is treated as binary blobs. This scheme is intended to support such hybrid systems. There is generally a tension between authors wanting to easily manipulate and add tracks, users wanting a self-contained file, and search engines wanting stand-alone access to just the text. Because splitting and merging media files requires special tools, our thinking in the Ogg accessibility group has been that we need to support both embedded and external references for text tracks in html. Users (and their tools) can then choose what methods they want to use in particular circumstances. We're also interested in a more sophisticated mechanism for communicating track assortments between a server and a client, but in the particular case of text tracks for accessiblity, I think having a simple, explicit mechanism at the html level is worthwhile. -r
Re: [whatwg] Same-origin checking for media elements
On 10-Nov-08, at 7:49 PM, Maciej Stachowiak wrote: 1) Allow unrestricted cross-origin video/audio 2) Allow cross-origin video/audio but carefully restrict the API to limit the information a page can get about media loaded from a different origin 3) Disallow cross-origin video/audio unless the media server explicitly allows it via the Access Control spec (e.g. by sending the Access-Control-Allow-Origin: * header). I'd prefer 1 or 2 (assuming the restrictions assumed by 2 are reasonable). One point that came out of the theora-level thread is that (2) would be less surprising if there's some kind of error mechanism flagging the restriction. For example, taint-tracking infrastructure could throw an exception when the javascript vm attempts to move cross-site data outside the layout and render engines. This would offer some help to authors when a locally tested design mysteriously stops working when deployed. FWIW, -r
Re: [whatwg] Video element and duration attribute
On Thu, Nov 6, 2008 at 9:46 AM, Eric Carlson [EMAIL PROTECTED] wrote: Instead of seeking to the end of the file to calculate an exact duration as you describe, it is much cheaper to estimate the duration by processing a fixed portion of the file and extrapolating to the duration based on the file size. QuickTime does this and it works quite well. I expect this works much better for mp3 than it does for variable bitrate streams. Nevertheless it's much better than nothing. Your argument that the durationchange event makes flexible determination reasonable is a good one. I'm fine with not including the duration argument, for whatever that's worth. -r
Re: [whatwg] video tag: pixel aspect ratio
On Wed, Oct 15, 2008 at 2:40 AM, Ian Hickson [EMAIL PROTECTED] wrote: Is that not enough? It is enough. Sander and Eduard have provided excellent arguments why the pixel aspect ratio, and especially the frame rate, should be represented as rationals in video formats. But as an override for already broken video streams compliance to best practice does not justify another data type in html5. To put Anne's comment another way, one needs a gigapixel display device before the difference between 1.0925 (rounded to only 5 figures) and 59/54 affects the behaviour of the scaling algorithm at all. There aren't so many aspect ratios is common use--you're welcome to choose the one nearest to the floating point value given if you think it's important. -r -- Ralph Giles Xiph.org Foundation
Re: [whatwg] Video : Slow motion, fast forward effects
On Thu, Aug 7, 2008 at 1:57 AM, Philip Jägenstedt [EMAIL PROTECTED] wrote: I suggest that the spec allows raising the NOT_SUPPORTED_ERR exception in response to any playback rate which it cannot provide for the current configuration. That sounds reasonable. It is a special effect. With a netcast you couldn't support any playback rate except 1.0 without first buffering all the data you want to play at a faster rate, so changing the playback rate doesn't make sense. Well, it would be better to implement the requested playback rate (and direction) as long as you can and then rebuffer when necessary. Doing that well requires extra sophistication in the buffering code, but the client might just be trying to scrub a bit, which would fit within a normal seek buffer. Even with live streams, if you try to pull faster than realtime you'll just buffer-wait, and if you're going slower you'll fill whatever space you have, then have to drop and restart. If all you have is a jitter buffer, then sure. -r
Re: [whatwg] Change request: rules for pixel ratio
On 10-Jun-08, at 9:31 AM, Philip Jägenstedt wrote: The default value, if the attribute is omitted or cannot be parsed, is the media resource's self-described pixel ratio, or 1.0 for media resources that do not self-describe their pixel ratio. This is actually how I read the original, but your wording resolves the ambiguity. Sounds like an improvement to me. Does this mean the implementation must retrieve the self-described pixel aspect ratio from the stream and supply it as part of the DOM? -r
Re: [whatwg] Video codec requirements changed
On Mon, Jan 07, 2008 at 01:50:09PM -0800, Dave Singer wrote: I get the impression that this is not an openly-specified codec, which I rather think is a problem. That is, there is neither a publicly available spec. nor publicly-available source, which means that it is controlled by one company. That matches my understanding. Bink is widely distributed in commercial (game) software, under licence from another commercial entity, if that helps with the submarine patent risk. -r
Re: [whatwg] codecs and containers
On Mon, Dec 10, 2007 at 09:14:39AM -0800, James Justin Harrell wrote: The language could be improved. Ogg Theora refers to Theora-encoded video enclosed in an Ogg container, not the Theora codec. Similar for Vorbis. Theora and Vorbis should be used without Ogg to refer to the actual codecs. It is important to specify the a container as well as a codec set for the interoperability baseline. I also feel the choice here of the Ogg container format is a bad one, and that Matroska would be a much better choice. My understanding is that Matroska doesn't stream well, which is a primary concern for loading large resources into web pages with progressive rendering. -r
Re: [whatwg] Cue points in media elements
Thanks for adding to the discussion. We're very interested in implementing support for presentations as well, so it's good to hear from someone with experience. Since we work on streaming media formats, I always assumed things would have to be broken up by the server and the various components streamed separately to a browser, and I hadn't noticed the cue point support until you pointed it out. Some comments and questions below... On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote: in our language, you might see something like this: (movie Foo.mov :name 'movie) (wait @movie (tc 2 3)) (show @bullet-1) (wait @movie) (show @bullet-2) If the user skips to the end of the media clip, that simply causes all WAITs on that media clip to return instantly. If they skip forward in the media clip, without ending it, all WAITs before that point will return instantly. How does this work if, for example, the user seeks forward, and then back to an earlier position? Would some of the 'show's be undone, or do they not seek backward with the media playback? Is the essential component of your system that all the shows be called in sequence to build up a display state, or that the last state trigger before the current playback point have been triggered? Isn't this slow if a bunch of intermediate animations are triggered by a seek? Does your system support live streaming as well? That complicates the design some when the presentation media updates appear dynamically. Anyway I think you could implement your system with the currently proposed interface by checking the current playback position and clearing a separate list of waits inside your timeupdate callback. This is a nice system, but I can't see how even as simple a system as this could be implemented given the current specification of cue points. The problem is that the callbacks execute when the current playback position of a media element reaches the cue point. It seems unclear to me what reaching a particular time means. I agree this should be clarified. The appropriate interpretation should be when the current playback position reaches the frame corresponding to the queue point, but digital media has quantized frames, while the cue points are floating point numbers. Triggering all cue point callbacks between the last current playback position and the current one (including during seeks) would be one option, and do what you want as long as you aren't seeking backward. I'd be more in favor of triggering any cue point callbacks that lie between the current playback position and the current playback position of the next frame (audio frame for audio/ and video frame for video/ I guess). That means more bookkeeping to implement your system, but is less surprising in other cases. If video playback freezes for a second, and so misses a cue point, is that considered to have been reached? As I read it, cue points are relative to the current playback position, which does not advance if the stream buffer underruns, but it would if playback restarts after a gap, as might happen if the connection drops, or in an RTP stream. My proposal above would need to be amended to handle that case, and the decoder dropping frames...finding the right language here is hard. In the current spec, all that is provided for is controls to turn closed captions on or off. What would be much better is a way to enable the video element to send caption events, which include the text of the current caption, and can be used to display those captions in a way that fits the design of the content better. I really like this idea. It would also be nice if, for example, the closed caption text were available through the DOM so it could be presented elsewhere, searched locally, and so on. But what about things like album art, which might be embedded in an audio stream? Should that be accessible? Should a video element expose a set of known cue points embedded in the file? A more abstract interface is necessary than just 'caption events'. Here are some use cases worth considering: * A media file has embedded textual metadata like title, author, copyright license, that the designer would like to access for associated display elsewhere in the page, or to alter the displayed user interface based on the metadata. This is pretty essential for parity with flash-based internet radio players. * A media file has embedded non-textual metadata like an album cover image, that the designer would like to access for display elsewhere in the page. * The designer wants to access closed captioned or subtitle text through the DOM as it becomes available for display elsewhere in the page. * There are points in the media file where the embedded metadata changes. These points cannot be retrieved without scanning the file, which is expensive over
Re: [whatwg] Give guidance about RFC 4281 codecs parameter
On Wed, Apr 11, 2007 at 05:45:34PM -0700, Dave Singer wrote: But [video/*] does at least indicate that we have a time-based multimedia container on our hands, and that it might contain visual presentation. application/ suffers that it does not say even that, and it raises the concern that this might be arbitrary, possibly executable, data. We discussed whether application/ was appropriate for MP4 and decided that it masked important characteristics of the format -- that it really is a time-based multimedia presentation -- and raised unwarranted concerns. I guess we made the opposite decision. Because Ogg was a container and could contain anything, including executable content, we went with the most generic option, based on analogy with application/octet-stream, application/pdf, etc. That we were working only on audio at the time may have coloured our judgement; the video-contains-audio argument didn't fit. I've noticed application/rss as a newer example, but I think that's more to encourage handoff from browsers without native support than an attempt at classification. Maciej's suggestion (registering all three) would work for Ogg, but I was under the impression that multiple registrations for the same format were discouraged. The disposition hinting proposal also works for general media types, without requiring registration of a suite of media types for every container. I also think it's a better solution for playlists, which are and aren't time-based media. Would you also go with video/x-m3u, video/rss for those text-based formats? Overloading the base types works, but so does a separate indication. Both are backward-compatible extensions to the media-type field, and both require software changes to implement. One however, requires registering new types, including audio/quicktime. :) Thanks for explaining your rationale, it's interesting to hear. -r
Re: [whatwg] Give guidance about RFC 4281 codecs parameter
On Tue, Apr 10, 2007 at 11:21:10AM -0700, Dave Singer wrote: # application/ogg; disposition=moving-image; codecs=theora, vorbis # application/ogg; disposition=sound; codecs=speex what is the 'disposition' parameter? The idea of a 'disposition-type' is to mark content with presentational information. See the Content-Disposition Header for MIME described in RFC 1806 for an early example. The specific proposal Silvia mentioned is to add the content- disposition to the media-type to inform parsers of the general nature of the content, even if they don't recognize the specific codecs. The allowed values for the 'disposition' label come from the Dublin Core set. This is not part of RFC 4281, and as far as I know hasn't been formally documented with the IETF, but we do think it's a good idea. This arose out of the need to discover or record audio vs audiovisual status for media files in the context of routing to the proper playback application, which has been particularly contentious with the Ogg container since we have insisted that such distinctions be made via metadata or file inspection instead of defining distinguishing filename extensions has has been done with other containers. (MooV is perhaps another example.) In terms of user presentation, audio vs video vs text vs still image is the important distinction, while the 'codecs' parameter answers the more technical question of what playback capabilities are necessary. A video/ or audio/ markup element already describes this adequately, but it is a larger issue for media handling on the web. Charles wrote a more detailed proposal in the context of RSS media syndication, which is where I first heard of the idea. http://changelog.ca/log/2005/08/21/rss-disposition-hinting-proposal We're essentially suggesting his proposal be extended to (media) containers in general. -r
Re: [whatwg] on codecs in a 'video' tag.
On Mon, Apr 02, 2007 at 11:12:07AM -0700, Maciej Stachowiak wrote: I don't think Theora (or Dirac) are inherently more interoperable than other codecs. There's only one implementation of each so far, so there's actually less proof of this than for other codecs. Just to clarify, there are two different Dirac implementations, and two different theora decoder implementations. But otherwise your points stand. There are many implementations of the mpeg codecs. I'm not sure how many separate implementations ther are of the Windows Media codecs. FFMPEG has a VC-1 implementation and some decoders for older formats based on reverse-engineering. -r
Re: [whatwg] video element feedback
On Fri, Mar 23, 2007 at 04:33:39PM -0700, Eric Carlson wrote: Yes, the UA needs the offset/chunking table in order to calculate a file offset for a time, but this is efficient in the case of container formats in which the table is stored together with other information that's needed to play the file. This is not the case for all container formats, of course. Just to be clear, this isn't strictly true; one can still perform bisection seek over HTTP with the byte Range header. As has been mentioned, VLC implements this. Alsaplayer is another example. It does work. It's of course less efficient than when one has a seek table, but not excessively so. Tangentially, I at some point looked at implementing 'seconds' as a Range header unit in Apache. (The HTTP Range header allows arbitrary, units, bytes is just the only one that is defined by the spec). The idea was to have the server do the seeking and return a valid file starting at the requested time offset, or list of intervals. Then a client could do very naive seeking and just play what it got. In the end I abandonded it over worry with cache interaction. If you request a sequence of intervals you don't in general get the same byte stream as if you request the whole file, because the server is re-packaging the data for each request. With Ogg this sort of works, because concatenated streams are still in spec, so the decoded result is the same, but it doesn't work for all containers. The annodex query path seemed a better choice. -r
Re: [whatwg] video element feedback
On Sat, Mar 24, 2007 at 01:57:45AM -0700, Kevin Marks wrote: How does one seek a Vorbis file with video in and recover framing? It looks like you skip to an arbitrary point and scan for 'OggS' then do a 64kB CRC to make sure this isn't a fluke. Then you have some packets that correspond to some part of a frame of video or audio. You recover a timestamp, and thus you can pick another random point and do a binary chop until you hit the timestamp before the one you wanted. Then you need to read pages until the timestamp changes and you have resynced that stream. Any other interleaved streams are presumably being resync'd in parallel so you can then get back to the read and skip framing. Try doing that from a CD-ROM. Do let me know if that has since been fixed. Nope. That's still the algorithm. Also add that for a keyframe-based codec you need to (conceptually) seek again, after you've found the desired start point, to feed the decoder from the nearest previous restart point. In practice, not everyone tries for sample-accurate seeking. I gather the situation is similar with DVD playback. Streamability (in the unix pipe sense of an unseekable file stream) was a design goal for Ogg. This seek algorithm is a consequence. Decoders must handle seeking without an index table, so we have regarded the use of one, whether cached in the file or not, as an implementation detail. FWIW, -r