Re: [whatwg] Displaying poster for audio in video
On Nov 30, 2011, at 8:31 AM, Simon Pieters wrote: On Tue, 29 Nov 2011 20:18:40 +0100, Tab Atkins Jr. jackalm...@gmail.com wrote: On Tue, Nov 29, 2011 at 5:54 AM, Jeroen Wijering m...@jeroenwijering.com wrote: Hello all, Playing audio in video succeeds consistently across browsers, as described in the spec: Both audio and video elements can be used for both audio and video. The main difference between the two is simply that the audio element has no playback area for visual content (such as video or captions), whereas the video element does. However, poster display behavior varies between browsers. Some browsers (FF, IE, Opera) will keep the poster up after an audio file has started, other browsers (Webkits, WinPho) clear the poster, which results in a blank area: http://goo.gl/0g77d It would be good if this behavior could be rationalized and/or addressed in the spec. My preference would be to continue showing the poster image if the media file has no (active) video track - a poster image looks much better than a blank area. Ideally, this would also be the case for fullscreen playback on mobile devices. I agree that keeping up the poster frame is the more useful behavior, and that this should be specified in the spec. It is already. When no video data is available (... the media resource does not have a video channel), the video element represents the poster frame. http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#attr-video-poster That covers it indeed, thanks. I'll file a bug with webkit then ;) - Jeroen
Re: [whatwg] Media elements statistics
Hey Steve, This looks great; would be a really useful set of data for video players / publishers. Since none of the metrics have a time component, developers can sample the data over the window / at the frequency they prefer. Would jitter be calculated over decodedFrames or decodedFrames+droppedFrames? In the first case, jitter is a more granular metric for measuring exact presentation performance. In the latter case, jitter can serve as a single metric for tracking processing power (simple!). In either case, it's fairly straightforward to calculate towards the other metric. Resetting the values when @src changes is a good idea. Changing @src is much used for advertising and playlist support (it works around iOS not supporting the play() call). Kind regards, Jeroen Wijering On May 3, 2011, at 12:15 AM, Steve Lacey wrote: All, I've updated the wiki with a proposal... http://wiki.whatwg.org/wiki/Video_Metrics#Proposal Cheers! Steve On Sat, Apr 9, 2011 at 7:08 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Ah, thanks for the link. I've included Silverlight stats, too, for completeness. If somebody knows about QuickTime stats, that would be another good one to add, I guess. Cheers, Silvia. On Fri, Apr 8, 2011 at 5:21 PM, Jeroen Wijering jer...@longtailvideo.com wrote: On Apr 7, 2011, at 8:11 AM, Silvia Pfeiffer wrote: I've also just added a section with the stats that the Adobe Flash player exposes. Great. Perhaps Silverlight stats might be of use too - though they're fairly similar: http://learn.iis.net/page.aspx/582/advanced-logging-for-iis-70---client-logging/ Apart from the statistics that are not currently available from the HTML5 player, there are stats that are already available, such as currentSrc, currentTime, and all the events which can be turned into hooks for measurement. Yes, the network and ready states are very useful to determine if clients are stalling for buffering etc. I think the page now has a lot of analysis of currently used stats - probably a sufficient amount. All the video publishing sites likely just use a subpart of the ones that Adobe Flash exposes in their analytics. Especially all the separate A/V bytecounts are overkill IMO. One useful metric I didn't list for JW Player but is very nice is Flash's isLive property. Kind regards, Jeroen On Thu, Apr 7, 2011 at 4:52 AM, Mark Watson wats...@netflix.com wrote: All, I added some material to the wiki page based on our experience here at Netflix and based on the metrics defined in MPEG DASH for adaptive streaming. I'd love to here what people think. Statistics about presentation/rendering seem to be covered, but what should also be considered are network performance statistics, which become increasingly difficult to collect from the server when sessions are making use of multiple servers, possibly across multiple CDNs. Another aspect important for performance management is error reporting. Some thoughts on that on the page. ...Mark On Mar 31, 2011, at 7:07 PM, Robert O'Callahan wrote: On Fri, Apr 1, 2011 at 1:33 PM, Chris Pearce ch...@pearce.org.nz wrote: On 1/04/2011 12:22 p.m., Steve Lacey wrote: Chris - in the mozilla stats, I agree on the need for a frame count of frames that actually make it the the screen, but am interested in why we need both presented and painted? Wouldn't just a simple 'presented' (i.e. presented to the user) suffice? We distinguish between painted and presented so we have a measure of the latency in our rendering pipeline. It's more for our benefit as browser developers than for web developers. Yeah, just to be clear, we don't necessarily think that everything in our stats API should be standardized. We should wait and see what authors actually use. Rob -- Now the Bereans were of more noble character than the Thessalonians, for they received the message with great eagerness and examined the Scriptures every day to see if what Paul said was true. [Acts 17:11]
Re: [whatwg] Media elements statistics
On May 5, 2011, at 7:04 PM, Steve Lacey wrote: Would jitter be calculated over decodedFrames or decodedFrames+droppedFrames? In the first case, jitter is a more granular metric for measuring exact presentation performance. In the latter case, jitter can serve as a single metric for tracking processing power (simple!). In either case, it's fairly straightforward to calculate towards the other metric. Actually, I would expect presentedFrames + droppedFrames would be used. This implies all decoded frames that were supposed to be presented. Ah yes, sorry. That makes sense. presentedFrames of course need to be included. - Jeroen
Re: [whatwg] How to handle multitrack media resources in HTML
Hey Ian, all, Sorry for the slow response .. There's a big difference between text tracks, audio tracks, and video tracks. While it makes sense, for instance, to have text tracks enabled but not showing, it makes no sense to do that with audio tracks. Audio and video tracks require more data, hence it's less preferred to allow them being enabled but not showing. If data wasn't an issue, it would be great if this were possible; it'd allow instant switching between multiple audio dubs, or camera angles. I think we mean different things by active here. The hidden state for a text track is one where the UA isn't rendering the track but the UA is still firing all the events and so forth. I don't understand what the parallel would be for a video or audio track. The parallel would be fetching / decoding the tracks but not showing them to the display (video) or speakers (audio). I agree that, implementation wise, this is much less useful than having an active but hidden state for text tracks. However, some people might want to manipulate hidden tracks with the audio data API, much like hidden text tracks can be manipulated with javascript. Text tracks are discontinuous units of potentially overlapping textual data with position information and other metadata that can be styled with CSS and can be mutated from script. Audio and video tracks are continuous streams of immutable media data. Video and audio tracks do not necessarily produce continuous output - it is perfectly legal to have gaps in either, eg. segments that do not render. Both audio and video tracks can have metadata that affect their rendering: an audio track has a volume metadata that attenuates its contribution to the overall mix-down, and a video track has matrix that controls its rendering. The only thing preventing us from styling a video track with CSS is the lack of definition. Yes, and the same (lack of definition) goes for javascript manipulation. It'd be great if we had the tools for manipulating video and audio tracks (extract/insert frames, move audio snippets around). It would make A/V editing - or more creative uses - really easy in HTML5. Kind regards, Jeroen
Re: [whatwg] How to handle multitrack media resources in HTML
On Apr 8, 2011, at 8:54 AM, Ian Hickson wrote: *) Discoverability is indeed an issue, but this can be fixed by defining a common track API for signalling and enabling/disabling tracks: {{{ interface Track { readonly attribute DOMString kind; readonly attribute DOMString label; readonly attribute DOMString language; const unsigned short OFF = 0; const unsigned short HIDDEN = 1; const unsigned short SHOWING = 2; attribute unsigned short mode; }; interface HTMLMediaElement : HTMLElement { [...] readonly attribute Track[] tracks; }; }}} There's a big difference between text tracks, audio tracks, and video tracks. While it makes sense, for instance, to have text tracks enabled but not showing, it makes no sense to do that with audio tracks. Audio and video tracks require more data, hence it's less preferred to allow them being enabled but not showing. If data wasn't an issue, it would be great if this were possible; it'd allow instant switching between multiple audio dubs, or camera angles. In terms of the data model, I don't believe there's major differences between audio, text or video tracks. They all exist at the same level - one down from the main presentation layer. Toggling versus layering can be an option for all three kinds of tracks. For example, multiple video tracks can be mixed together in one media element's display. Think about PiP, perspective side by side (Stevenote style) or a 3D grid (group chat, like Skype). Perhaps this should be supported instead of relying upon multiple video elements, manual positioning and APIs to knit things together. One would loose in terms of flexibility, but gain in terms of API complexity (it's still one video) and ease of implementation for HTML developers. - Jeroen
Re: [whatwg] How to handle multitrack media resources in HTML
On Apr 8, 2011, at 8:54 AM, Ian Hickson wrote: but should be linked to the main media resource through markup. What is a main media resource? e.g. consider youtubedoubler.com; what is the main resource? Or similarly, when watching the director's commentary track on a movie, is the commentary the main track, or the movie? In systems like MPEG TS and DASH, there's the notion of the system clock. This is the overarching resource to which all audio, meta, text and video tracks are synced. The clock has no video frames or audio samples by itself, it just acts as the wardrobe for all tracks. Perhaps it's worth investigating if this would be useful for media elements? - Jeroen
Re: [whatwg] Media elements statistics
On Apr 7, 2011, at 8:11 AM, Silvia Pfeiffer wrote: I've also just added a section with the stats that the Adobe Flash player exposes. Great. Perhaps Silverlight stats might be of use too - though they're fairly similar: http://learn.iis.net/page.aspx/582/advanced-logging-for-iis-70---client-logging/ Apart from the statistics that are not currently available from the HTML5 player, there are stats that are already available, such as currentSrc, currentTime, and all the events which can be turned into hooks for measurement. Yes, the network and ready states are very useful to determine if clients are stalling for buffering etc. I think the page now has a lot of analysis of currently used stats - probably a sufficient amount. All the video publishing sites likely just use a subpart of the ones that Adobe Flash exposes in their analytics. Especially all the separate A/V bytecounts are overkill IMO. One useful metric I didn't list for JW Player but is very nice is Flash's isLive property. Kind regards, Jeroen On Thu, Apr 7, 2011 at 4:52 AM, Mark Watson wats...@netflix.com wrote: All, I added some material to the wiki page based on our experience here at Netflix and based on the metrics defined in MPEG DASH for adaptive streaming. I'd love to here what people think. Statistics about presentation/rendering seem to be covered, but what should also be considered are network performance statistics, which become increasingly difficult to collect from the server when sessions are making use of multiple servers, possibly across multiple CDNs. Another aspect important for performance management is error reporting. Some thoughts on that on the page. ...Mark On Mar 31, 2011, at 7:07 PM, Robert O'Callahan wrote: On Fri, Apr 1, 2011 at 1:33 PM, Chris Pearce ch...@pearce.org.nz wrote: On 1/04/2011 12:22 p.m., Steve Lacey wrote: Chris - in the mozilla stats, I agree on the need for a frame count of frames that actually make it the the screen, but am interested in why we need both presented and painted? Wouldn't just a simple 'presented' (i.e. presented to the user) suffice? We distinguish between painted and presented so we have a measure of the latency in our rendering pipeline. It's more for our benefit as browser developers than for web developers. Yeah, just to be clear, we don't necessarily think that everything in our stats API should be standardized. We should wait and see what authors actually use. Rob -- Now the Bereans were of more noble character than the Thessalonians, for they received the message with great eagerness and examined the Scriptures every day to see if what Paul said was true. [Acts 17:11]
Re: [whatwg] How to handle multitrack media resources in HTML
Hello Silvia, all, First, thanks for the Multitrack wiki page. Very helpful for those who are not subscribed to the various lists. I also phrased below comments as feedback to this page: http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API USE CASE The use case is spot on; this is an issue that blocks HTML5 video from being chosen over a solution like Flash. An elaborate list of tracks is important, to correctly scope the conditions / resolutions: 1. Tracks targeting device capabilities: * Different containers / codes / profiles * Multiview (3D) or surround sound * Playback rights and/or decryption possibilities 2. Tracks targeting content customization: * Alternate viewing angles or alternate music scores * Director's comments or storyboard video 3. Tracks targeting accessibility: * Dubbed audio or text subtitles * Audio descriptions or closed captions * Tracks cleared from cursing / nudity / violence 4. Tracks targeting the interface: * Chapterlists, bookmarks, timed annotations, midroll hints.. * .. and any other type of scripting queues Note I included the HTML5 text tracks. I believe there are four kinds of tracks, all inherent part of a media presentation. These types designate the output of the track, not its encoded representation: * audio (producing sound) * metadata (producing scripting queues) * text (producing rendered text) * video (producing images) In this taxonomy, the HTML5 subtitles and captions track kinds are text, the descriptions kind is audio and the chapters and metadata kinds are metadata. REQUIREMENTS The requirements are elaborate, but do note they span beyond HTML5. Everything that plays back audio/video needs multitrack support: * Broad- and narrowcasting playback devices of any kind * Native desktop, mobile and settop applications/apps * Devices that play media standalone (mediaplayers, pictureframes, airplay) Also, on e.g. the iPhone and Android devices, playback of video is triggered by HTML5, but subsequently detached from it. Think about the custom fullscreen controls, the obscuring of all HTML and events/cueues that are deliberately ignored or not sent (such as play() in iOS). I wonder whether this is a temporary state or something that will remain and should be provisioned. With this in mind, I think an additional requirement is that there should be a full solution outside the scope of HTML5. HTML5 has unique capabilities like customization of the layout (CSS) and interaction (JavaScript), but it must not be required. SIDE CONDITIONS In the side conditions, I'm not sure on the relative volume of audio or positioning of video. Automation by default might work better and requires no parameters. For audio, blending can be done through a ducking mechanism (like the JW Player does). For video, blending can be done through an alpha channel. At a later stage, an API/heuristics for PIP support and gain control can be added. SOLUTIONS In terms of solutions, I lean much towards the manifest approach. The other approaches are options that each add more elements to HTML5, which: * Won't work for situations outside of HTML5. * Postpone, and perhaps clash with, the addition of manifests. Without a manifest, there'll probably be no adaptive streaming, which renders HTML5 video much less useful. At the same time, standardization around manifests (DASH) is largely wrapping up. EXAMPLE Here's some code on the manifest approach. First the HTML5 side: video id=v1 poster=video.png controls source src=manifest.xml type=video/mpeg-dash /video Second the manifest side: MPD mediaPresentationDuration=PT645S type=OnDemand BaseURLhttp://cdn.example.com/myVideo//BaseURL Period Group mimeType=video/webm lang=en Representation sourceURL=video-1600.webm / /Group Group mimeType=video/mp4; codecs=avc1.42E00C,mp4a.40.2 lang=en Representation sourceURL=video-1600.mp4 / /Group Group mimeType=text/vvt lang=en Accessibility type=CC / Representation sourceURL=captions.vtt / /Group /Period /MPD (I should more look into accessibility parameters, but there is support for signalling captions, audiodescriptions, sign language etc.) Note that this approach moves the text track outside of HTML5, making it accessible for other clients as well. Both codecs are also in the manifest - this is just one of the device capability selectors of DASH clients. DISADVANTAGES The two listed disadvantages for the manifest approach in the wiki page are lack of CSS and discoverability: *) The CSS styling issue can be fixed by making a conceptual change to CSS and text tracks. Instead of styling text tracks, a single text rendering area for each video element can be exposed and styled. Any text tracks that are enabled push data in it, which is automatically styled according to the video.textStyle/etc rules. *)
Re: [whatwg] Limiting the amount of downloaded but not watched video
On Jan 20, 2011, at 9:14 AM, Philip Jägenstedt wrote: (Since there is some overhead with each HTTP request, one must make sure that they are not unreasonably small.) When HTTP byte ranges are used to achieve bandwidth management, it's hard to talk about a single downloadBufferTarget that is the number of seconds buffered ahead. Rather, there might be an upper and lower limit within which the browser tries to stay, so that each request can be of a reasonable size. Neither an author-provided minumum or maximum value can be followed particularly closely, but could possibly be taken as a hint of some sort. Does it actually make sense to specify the read-ahead size, or should it simply be a flag (eg. unlimited, small buffer and don't care)? Is there really a case for setting the actual read-ahead value directly? In a sense, that seems akin to allowing web pages to control the TCP buffer sizes used by the client's browser--it's lower level than people usually care about. In particular, I'm thinking that most of the time all people care about is read ahead a little vs. read ahead a lot, and publishers shouldn't need to figure out the right buffer size to use for the former (and very likely getting it wrong). I'm inclined to agree, and we already have a way to say a little (preload=none/metadata) and a lot (preload=auto). However, it'd be great if all implementors could agree on the same interpretation of states. Specifically, this isn't required by the spec but would still be helpful to have consistency in: * effective state can only increase to higher states, never go from e.g. metadata to none (it makes no sense) * there is a state - invoked - between metadata and auto for when the video is playing * there could be a state between invoked and auto for autoplay, but if not autoplay implies preload=auto * in the invoked state, a conservative buffering strategy is used by default * when paused in the invoked state, we need to agree on what should happen If we could agree, then of course it should be documented somewhere, even if it seems somewhat restrictive of the spec to mandate an exact behavior. Perhaps the conservative buffering strategy should be client-side throttling after all. The pause-to-buffer argument several people put forward is a strong one - a big use case (perhaps more people pause b/c of this than b/c of all other reasons combined). Something like a downloadBufferTarget would be confusing and break this. Client-side throttling won't. - Jeroen
[whatwg] Limiting the amount of downloaded but not watched video
Hello all, We are getting some questions from JW Player users that HTML5 video is quite wasteful on bandwidth for longer videos (think 10min+). This because browsers download the entire movie once playback starts, regardless of whether a user pauses the player. If throttling is used, it seems very conservative, which means a lot of unwatched video is in the buffer when a user unloads a video. I did a simple test with a 10 minute video: playing it; pausing after 30 seconds and checking download progress after another 30 seconds. With all browsers (Firefox 4, Safari 5, Chrome 8, Opera 11, iOS 4.2), the video would indeed be fully downloaded after 60 seconds. Some throttling seems to be applied by Safari / iOS, but this could also be bandwidth fluctuations on my side. Either way, all browsers downloaded the 10min video while only 30 seconds were being watched. The HTML5 spec is a bit generic on this topic, allowing mechanisms such as stalling and throttling but not requiring them, or prescribing a scripting interface: http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource Are there people working on ways to trim down the amount of not-watched data for video playback? Any ideas on this, anything in the pipeline? --- A suggestion would be to implement / expose a property called downloadBufferTarget. It would be the amount of video in seconds the browser tries to keep in the download buffer. When a user starts (or seeks in) a video, the browser would try to download downloadBufferTarget amount in seconds of video. When downloaded currentTime + downloadBufferTarget, downloading would get stalled, until a certain lower treshold is reached (e.g. 50%) and the browser would start downloading additional data. A good default value for downloadBufferTarget would be 60 seconds. Webdevelopers who have short clips / do not care about downloads can set downloadBufferTarget to a higher value (e.g. 300). Webdevelopers who have long videos (15min+) / want to keep their bandwidth bill low can set downloadBufferTarget to a lower value (e.g. 15). Webdevelopers might even change the value of downloadBufferTarget per visitor; visitors with little bandwidth get a sizeable buffer (to prevent stuttering) and visitors with a big pipe get a small download buffer (they don't need it). The buffered timeranges could be used to compare the actual download buffer to the buffer target, should a user-interface want to display this feedback. Note that the download buffer is not the same as the playback buffer. A download buffer underrun should not result in pausing the video. The download buffer does also not apply to live streaming. Kind regards, Jeroen Wijering Longtail Video PS: Having the preload=none property available in all browsers will also help in keeping the amount of downloaded but not watched data low. In our tests, only Firefox (4 b9) seems to honor this property at present.
Re: [whatwg] HTML5 video: frame accuracy / SMPTE
On Jan 12, 2011, at 11:04 AM, whatwg-requ...@lists.whatwg.org wrote: Date: Wed, 12 Jan 2011 11:54:47 +0200 From: Mikko Rantalainen mikko.rantalai...@peda.net To: whatwg@lists.whatwg.org Subject: Re: [whatwg] HTML5 video: frame accuracy / SMPTE Message-ID: 4d2d7a67.7090...@peda.net Content-Type: text/plain; charset=ISO-8859-1 2011-01-12 00:40 EEST: Rob Coenen: Hi David- that is b/c in an ideal world I'd want to seek to a time expressed as a SMPTE timecode (think web apps that let users step x frames back, seek y frames forward etc.). In order to convert SMPTE to the floating point value for video.seekTime I need to know the frame rate. It seems to me that such an application really requires a method for querying the timestamp for previous and next frames when given a timestamp. If such an application requires FPS value, it can then compute it by itself it such a value is assumed meaningful. (Simply get next frame timestamp from zero timestamp and continue for a couple of frames to compute FPS and check if the FPS seems to be stable.) Perhaps there should be a method getRelativeFrameTime(timestamp, relation) where timestamp is the current timestamp and relation is one of previousFrame, nextFrame, previousKeyFrame, nextKeyFrame? Use of this method could be allowed only for paused video if needed for simple implementation. Alternatively, one could look at a step() function instead of a seek(pos,exact) function. The step function can be used for frame-accurate controls. e.g. step(2) or step(-1). The advantage over a seek(pos,exact) function (and the playback rate controls) is that the viewer really knows the video is X frames offset. This is very useful for both artistic/editing applications and for video analysis applications (think sports, medical or experiments). The downside of a step() to either always accurate seeking or a seek(pos,exact) is that it requires two steps in situations like bookmarking or chaptering. It seems like the framerate / SMPTE proposals done here are all a means to end up with frame-accurate seeking. With a step() function in place, there's no need for such things. In fact, one could do a step(10) or so and then use the difference in position to calculate framerate. - Jeroen
Re: [whatwg] HTML5 video: frame accuracy / SMPTE
On Jan 12, 2011, at 2:05 PM, Rob Coenen wrote: The need for SMPTE still remains as I want to be able to do things such as video.seekTo(smpte_timecode_converted_to_seconds, seek_exact=true); so that my video goes to exactly the exact frame as indicated by smpte_timecode_converted_to_seconds. Think chapter bookmarking, scene indexing, etc. With the step() in place, this would be a simple convenience function. This pseudo-code is not ideal and making some assumptions, but the approach should work: function seekToTimecode(timecode) { var seconds = convert_timecode_to_seconds(timecode); videoElement.seek(seconds); var delta = seconds - videoElement.currentTime; while (delta 0) { videoElement.step(1); delta = seconds - videoElement.currentTime; } }; Its basically stepping to the frame that's closest to the timecode (as elaborated by others, there's no such thing as timecode in MP4/WebM. It's just timestamps). Note you actually do want to have this conversion taking place in javascript, since there are many reasons to adjust/offset the conversion (sync issues, timecode base differences, ...). If it's locked up inside the browser API you have to do duplicate work around it if the input files / conversion assumptions don't align. - Jeroen
Re: [whatwg] WebSRT feedback
On Oct 8, 2010, at 2:24 PM, whatwg-requ...@lists.whatwg.org wrote: Even if very few subtitles use inline SVG, SVG in object, img, iframe, video, self-referencing track, etc in the cue text, all implementations would have to support it in the same way for it to be interoperable. That's quite an undertaking and I don't think it's really worth it. User agents only need to be interoperable over the common subset of HTML features they support. HTML is mostly designed to degrade gracefully when a user agent encounters elements it doesn't support. The simplest possible video player would use an HTML parser (hopefully off-the-shelf) to build some kind of DOM structure. Then it can group text into paragraphs for rendering, and ignore the rest of the content. In practice, we'll have to deal with user agents that support different sets of WebSRT features --- when version 2 of WebSRT is developed, if not before. Why not use existing, proven machinery --- HTML --- to cope with that situation? Rob The requests we receive on the captioning functionality of the JW Player always revolve around styling. Font size, color, style, weight, outline and family. Block x, y, width, height, text-align, vertical-align, padding, margin, background and alpha. Both for an entire SRT file, for distinct captioning entries and for specific parts of a captioning entry. Not to say that a full parsing engine wouldn't be nice or useful, but at present there's simply no requests for it (not even for a ;). Plus, more advanced timed track applications can easily be built with javascript (timed boucing 3D balls using WebGL). W3C's timed text does a decent job in facilitating the styling needs for captioning authors. Overall regions, single paragraphs and inline chunks (through span) can be styled. There are a few small misses, such as text outline, and vertical alignment (which can be done with separate regions though). IMO the biggest con of TT is that it uses its own, in-document styling namespace, instead of relying upon page CSS. Kind regards, Jeroen
Re: [whatwg] HTML 5 : The Youtube response
Hello, The Flash player exposes a string of metrics: http://help.adobe.com/en_US/AS3LCR/Flash_10.0/flash/net/NetStreamInfo.html The most useful ones are: *) droppedFrames: it can be used to determine whether the client can play the video without stuttering. *) maxBytesPerSecond: it can be used to determine the bandwidth of the connection. In addition to this, the metadata embedded in the video is interesting. For example: *) width height: already available *) duration: already avaliable *) bitrates: for comparison against maxBytesPerSecond. *) seekpoints: for anticipating seek results in the UI *) content metadata (e.g. ID3 or MP4 Images) Here's an example with the exposed metadata printed on top of the player: http://developer.longtailvideo.com/trac/testing/?plugins=metaviewerfile=%2Fplayer%2Ftesting%2Ffiles%2Fbunny.mp4height=260width=500autostart=truemute=true Kind regards, Jeroen One part of (2) [well, debatably part, but related to video streaming] is the lack of visibility into stream behavior. I can't ask the video element questions about dropped frames, bitrate, etc. This is incredibly useful in Flash for getting streaming feedback, and means I really don't know how well the HTML5 player is working for users. The best I can do is waiting/stalled events which is nowhere near as granular. I agree that exposing info like that would be useful. What does the Flash API for this look like? What parts of the available data do you find most useful? Regards, Maciej -Kevin On Thu, Jul 1, 2010 at 9:16 AM, Maciej Stachowiak m...@apple.com wrote: On Jul 1, 2010, at 6:12 AM, Kornel Lesinski wrote: I believe we can allow arbitrary content to go fullscreen, along the lines of what Robert O'Callahan has proposed on this list, if we impose sufficient restrictions to mitigate the above risks. In my opinion, the following measures would likely be sufficient: A) Have a distinctive animated sequence when an element goes into full-screen mode. This helps the user understand what happened. B) Limit the ability to go fullscreen to user gestures, much as many browsers limit pop-ups. This prevents shenanigans from happening while the user is away from the keyboard, and greatly limits the potential annoyance factor. C) On systems with keyboard/mouse input, limit the keys that may be processed by fullscreen content to a small set, such as the set that Flash limits to in full-screen mode: http://www.adobe.com/devnet/flashplayer/articles/fplayer10_security_changes_03.html#head5. D) On multitouch devices with an onscreen keyboard as the normal means of input, things are trickier, because it's possible for a dedicated attacker to simulate the keyboard. My best idea is make sure that a visually distinctive status indicator appears at the top of the screen even in full-screen mode, since that is the norm on such platforms. E) Reserve one or more obvious key combinations to exiting fullscreen no matter what (Escape, perhaps Cmd+W/Ctrl+W). F) Even on keyboard/mouse type systems, have some distinctive visual affordance which is either always present or appears on mouse moves, and which allows the user to exit full-screen mode. I think these measures greatly mitigate risks (1) and (2) above, and open up highly valued functionality (full screen video) with a UI that users will enjoy, and customizability that video hosting sites will appreciate. Another option (for low-res videos on desktop) might be to use lower screen resolution when in full screen — text and UI elements displayed by attacker will look noticeably different. That would probably make the controls look ugly for video with custom controls, and I suspect neither users nor content authors would appreciate that. Interesting idea, though. - Maciej
Re: [whatwg] Exposing framerate / statistics of video playback
Hello, Has any thought been given to exposing such metrics as framerate, how many frames are dropped, rebuffering, etc from the video tag? My understanding is that in the Flash player, many of these types of statistics are readily available. This is interesting for things not just like benchmarking, but for a site to determine if it is not working well for clients and should instead e.g. switch down to a lower bitrate video. Hasn't been discussed AFAIK, but I'd like to see a proposal. Here's a list of what is available in AS3 through the NetStream.info object: http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/NetStreamInfo.html For determining whether the user-agent is able to play a video, these are the most interesting properties: readonly attribute unsigned long bandwidth:: The current maximum server » client bandwidth, in bits per second. readonly attribute unsigned long droppedframes:: The number of frames dropped by the user agent since playback of this video was initialized. Kind regards, Jeroen Wijering
Re: [whatwg] A standard for adaptive HTTP streaming for media resources
mid-way between these alternative files I am personally not sure which is the right forum to create the new standard in, but I know that we have a need for it in HTML5. Agreed. By its current spec, HTML5 video is mostly suited for display of short clips. High-quality, long-form and live content need an additional level of functionality, which HTTP Streaming seems to provide. Would it be possible / the right way to start something like this as part of the Web applications work at WHATWG? (Incidentally, I've brought this up in W3C before an not got any replies, so I'm not sure W3C would be a better place for this work. Maybe IETF? But then, why not here...) What do people think? Cheers, Silvia. Kind regards, Jeroen Wijering