Re: [whatwg] Displaying poster for audio in video

2011-11-30 Thread Jeroen Wijering

On Nov 30, 2011, at 8:31 AM, Simon Pieters wrote:

 On Tue, 29 Nov 2011 20:18:40 +0100, Tab Atkins Jr. jackalm...@gmail.com 
 wrote:
 
 On Tue, Nov 29, 2011 at 5:54 AM, Jeroen Wijering
 m...@jeroenwijering.com wrote:
 Hello all,
 
 Playing audio in video succeeds consistently across browsers, as 
 described in the spec:
 
 Both audio and video elements can be used for both audio and video. The 
 main difference between the two is simply that the audio element has no 
 playback area for visual content (such as video or captions), whereas the 
 video element does.
 
 However, poster display behavior varies between browsers. Some browsers 
 (FF, IE, Opera) will keep the poster up after an audio file has started, 
 other browsers (Webkits, WinPho) clear the poster, which results in a blank 
 area:
 
 http://goo.gl/0g77d
 
 It would be good if this behavior could be rationalized and/or addressed in 
 the spec. My preference would be to continue showing the poster image if 
 the media file has no (active) video track - a poster image looks much 
 better than a blank area.
 
 Ideally, this would also be the case for fullscreen playback on mobile 
 devices.
 
 I agree that keeping up the poster frame is the more useful behavior,
 and that this should be specified in the spec.
 
 It is already.
 
 When no video data is available (... the media resource does not have a 
 video channel), the video element represents the poster frame.
 http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#attr-video-poster

That covers it indeed, thanks. I'll file a bug with webkit then ;)

- Jeroen



Re: [whatwg] Media elements statistics

2011-05-05 Thread Jeroen Wijering
Hey Steve,

This looks great; would be a really useful set of data for video players / 
publishers. Since none of the metrics have a time component, developers can 
sample the data over the window / at the frequency they prefer. 

Would jitter be calculated over decodedFrames or decodedFrames+droppedFrames? 
In the first case, jitter is a more granular metric for measuring exact 
presentation performance. In the latter case, jitter can serve as a single 
metric for tracking processing power (simple!). In either case, it's fairly 
straightforward to calculate towards the other metric.

Resetting the values when @src changes is a good idea. Changing @src is much 
used for advertising and playlist support (it works around iOS not supporting 
the play() call).

Kind regards,

Jeroen Wijering



On May 3, 2011, at 12:15 AM, Steve Lacey wrote:

 All,
 
 I've updated the wiki with a proposal...
 
 http://wiki.whatwg.org/wiki/Video_Metrics#Proposal
 
 Cheers!
 Steve
 
 On Sat, Apr 9, 2011 at 7:08 AM, Silvia Pfeiffer silviapfeiff...@gmail.com 
 wrote:
 Ah, thanks for the link. I've included Silverlight stats, too, for
 completeness. If somebody knows about QuickTime stats, that would be
 another good one to add, I guess.
 
 Cheers,
 Silvia.
 
 On Fri, Apr 8, 2011 at 5:21 PM, Jeroen Wijering
 jer...@longtailvideo.com wrote:
 
  On Apr 7, 2011, at 8:11 AM, Silvia Pfeiffer wrote:
 
  I've also just added a section with the stats that the Adobe Flash
  player exposes.
 
  Great. Perhaps Silverlight stats might be of use too - though they're 
  fairly similar:
 
  http://learn.iis.net/page.aspx/582/advanced-logging-for-iis-70---client-logging/
 
  Apart from the statistics that are not currently available from the
  HTML5 player, there are stats that are already available, such as
  currentSrc, currentTime, and all the events which can be turned into
  hooks for measurement.
 
  Yes, the network and ready states are very useful to determine if clients 
  are stalling for buffering etc.
 
  I think the page now has a lot of analysis of currently used stats -
  probably a sufficient amount. All the video publishing sites likely
  just use a subpart of the ones that Adobe Flash exposes in their
  analytics.
 
  Especially all the separate A/V bytecounts are overkill IMO.
 
  One useful metric I didn't list for JW Player but is very nice is Flash's 
  isLive property.
 
  Kind regards,
 
  Jeroen
 
 
 
 
  On Thu, Apr 7, 2011 at 4:52 AM, Mark Watson wats...@netflix.com wrote:
  All,
 
  I added some material to the wiki page based on our experience here at 
  Netflix and based on the metrics defined in MPEG DASH for adaptive 
  streaming. I'd love to here what people think.
 
  Statistics about presentation/rendering seem to be covered, but what 
  should also be considered are network performance statistics, which 
  become increasingly difficult to collect from the server when sessions 
  are making use of multiple servers, possibly across multiple CDNs.
 
  Another aspect important for performance management is error reporting. 
  Some thoughts on that on the page.
 
  ...Mark
 
  On Mar 31, 2011, at 7:07 PM, Robert O'Callahan wrote:
 
  On Fri, Apr 1, 2011 at 1:33 PM, Chris Pearce ch...@pearce.org.nz wrote:
 
  On 1/04/2011 12:22 p.m., Steve Lacey wrote:
 
  Chris - in the mozilla stats, I agree on the need for a frame count of
  frames that actually make it the the screen, but am interested in why 
  we
  need both presented and painted? Wouldn't just a simple 'presented' 
  (i.e.
  presented to the user) suffice?
 
 
  We distinguish between painted and presented so we have a measure of
  the latency in our rendering pipeline. It's more for our benefit as 
  browser
  developers than for web developers.
 
 
  Yeah, just to be clear, we don't necessarily think that everything in our
  stats API should be standardized. We should wait and see what authors
  actually use.
 
  Rob
  --
  Now the Bereans were of more noble character than the Thessalonians, for
  they received the message with great eagerness and examined the 
  Scriptures
  every day to see if what Paul said was true. [Acts 17:11]
 
 
 
 
 
 



Re: [whatwg] Media elements statistics

2011-05-05 Thread Jeroen Wijering

On May 5, 2011, at 7:04 PM, Steve Lacey wrote:

 Would jitter be calculated over decodedFrames or decodedFrames+droppedFrames? 
 In the first case, jitter is a more granular metric for measuring exact 
 presentation performance. In the latter case, jitter can serve as a single 
 metric for tracking processing power (simple!). In either case, it's fairly 
 straightforward to calculate towards the other metric.
 
 Actually, I would expect presentedFrames + droppedFrames would be used. This 
 implies all decoded frames that were supposed to be presented.

Ah yes, sorry. That makes sense. presentedFrames of course need to be included. 

- Jeroen

Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-18 Thread Jeroen Wijering
Hey Ian, all,

Sorry for the slow response .. 

 There's a big difference between text tracks, audio tracks, and video 
 tracks. While it makes sense, for instance, to have text tracks 
 enabled but not showing, it makes no sense to do that with audio 
 tracks.
 
 Audio and video tracks require more data, hence it's less preferred to 
 allow them being enabled but not showing. If data wasn't an issue, it 
 would be great if this were possible; it'd allow instant switching 
 between multiple audio dubs, or camera angles.
 
 I think we mean different things by active here.
 
 The hidden state for a text track is one where the UA isn't rendering 
 the track but the UA is still firing all the events and so forth. I don't 
 understand what the parallel would be for a video or audio track.

The parallel would be fetching / decoding the tracks but not showing them to 
the display (video) or speakers (audio). I agree that, implementation wise, 
this is much less useful than having an active but hidden state for text 
tracks. However, some people might want to manipulate hidden tracks with the 
audio data API, much like hidden text tracks can be manipulated with javascript.

 Text tracks are discontinuous units of potentially overlapping textual 
 data with position information and other metadata that can be styled with 
 CSS and can be mutated from script.
 
 Audio and video tracks are continuous streams of immutable media data.
 
 
 Video and audio tracks do not necessarily produce continuous output - it is 
 perfectly legal to have gaps in either, eg. segments that do not render. 
 Both audio and video tracks can have metadata that affect their rendering: an 
 audio track has a volume metadata that attenuates its contribution to the 
 overall mix-down, and a video track has matrix that controls its rendering. 
 The only thing preventing us from styling a video track with CSS is the lack 
 of definition.

Yes, and the same (lack of definition) goes for javascript manipulation. It'd 
be great if we had the tools for manipulating video and audio tracks 
(extract/insert frames, move audio snippets around). It would make A/V editing 
- or more creative uses - really easy in HTML5.

Kind regards,

Jeroen



Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-11 Thread Jeroen Wijering

On Apr 8, 2011, at 8:54 AM, Ian Hickson wrote:

 *) Discoverability is indeed an issue, but this can be fixed by defining 
 a common track API for signalling and enabling/disabling tracks:
 
 {{{
 interface Track {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
 
  const unsigned short OFF = 0;
  const unsigned short HIDDEN = 1;
  const unsigned short SHOWING = 2;
  attribute unsigned short mode;
 };
 
 interface HTMLMediaElement : HTMLElement {
  [...]
  readonly attribute Track[] tracks;
 };
 }}}
 
 There's a big difference between text tracks, audio tracks, and video 
 tracks. While it makes sense, for instance, to have text tracks enabled 
 but not showing, it makes no sense to do that with audio tracks. 

Audio and video tracks require more data, hence it's less preferred to allow 
them being  enabled but not showing. If data wasn't an issue, it would be great 
if this were possible; it'd allow instant switching between multiple audio 
dubs, or camera angles. 

In terms of the data model, I don't believe there's major differences between 
audio, text or video tracks. They all exist at the same level - one down from 
the main presentation layer. Toggling versus layering can be an option for all 
three kinds of tracks.

For example, multiple video tracks can be mixed together in one media element's 
display. Think about PiP, perspective side by side  (Stevenote style) or a 3D 
grid (group chat, like Skype). Perhaps this should be supported instead of 
relying upon multiple video elements, manual positioning and APIs to knit 
things together. One would loose in terms of flexibility, but gain in terms of 
API complexity (it's still one video) and ease of implementation for HTML 
developers.

- Jeroen






Re: [whatwg] How to handle multitrack media resources in HTML

2011-04-11 Thread Jeroen Wijering
On Apr 8, 2011, at 8:54 AM, Ian Hickson wrote:

 but should be linked to the main media resource through markup.
 
 What is a main media resource?
 
 e.g. consider youtubedoubler.com; what is the main resource?
 
 Or similarly, when watching the director's commentary track on a movie, is 
 the commentary the main track, or the movie?

In systems like MPEG TS and DASH, there's the notion of the system clock. 
This is the overarching resource to which all audio, meta, text and video 
tracks are synced. The clock has no video frames or audio samples by itself, it 
just acts as the wardrobe for all tracks. Perhaps it's worth investigating if 
this would be useful for media elements? 

- Jeroen

Re: [whatwg] Media elements statistics

2011-04-08 Thread Jeroen Wijering

On Apr 7, 2011, at 8:11 AM, Silvia Pfeiffer wrote:

 I've also just added a section with the stats that the Adobe Flash
 player exposes.

Great. Perhaps Silverlight stats might be of use too - though they're fairly 
similar:

http://learn.iis.net/page.aspx/582/advanced-logging-for-iis-70---client-logging/

 Apart from the statistics that are not currently available from the
 HTML5 player, there are stats that are already available, such as
 currentSrc, currentTime, and all the events which can be turned into
 hooks for measurement.

Yes, the network and ready states are very useful to determine if clients are 
stalling for buffering etc.

 I think the page now has a lot of analysis of currently used stats -
 probably a sufficient amount. All the video publishing sites likely
 just use a subpart of the ones that Adobe Flash exposes in their
 analytics.

Especially all the separate A/V bytecounts are overkill IMO. 

One useful metric I didn't list for JW Player but is very nice is Flash's 
isLive property.

Kind regards,

Jeroen




 On Thu, Apr 7, 2011 at 4:52 AM, Mark Watson wats...@netflix.com wrote:
 All,
 
 I added some material to the wiki page based on our experience here at 
 Netflix and based on the metrics defined in MPEG DASH for adaptive 
 streaming. I'd love to here what people think.
 
 Statistics about presentation/rendering seem to be covered, but what should 
 also be considered are network performance statistics, which become 
 increasingly difficult to collect from the server when sessions are making 
 use of multiple servers, possibly across multiple CDNs.
 
 Another aspect important for performance management is error reporting. Some 
 thoughts on that on the page.
 
 ...Mark
 
 On Mar 31, 2011, at 7:07 PM, Robert O'Callahan wrote:
 
 On Fri, Apr 1, 2011 at 1:33 PM, Chris Pearce ch...@pearce.org.nz wrote:
 
 On 1/04/2011 12:22 p.m., Steve Lacey wrote:
 
 Chris - in the mozilla stats, I agree on the need for a frame count of
 frames that actually make it the the screen, but am interested in why we
 need both presented and painted? Wouldn't just a simple 'presented' (i.e.
 presented to the user) suffice?
 
 
 We distinguish between painted and presented so we have a measure of
 the latency in our rendering pipeline. It's more for our benefit as browser
 developers than for web developers.
 
 
 Yeah, just to be clear, we don't necessarily think that everything in our
 stats API should be standardized. We should wait and see what authors
 actually use.
 
 Rob
 --
 Now the Bereans were of more noble character than the Thessalonians, for
 they received the message with great eagerness and examined the Scriptures
 every day to see if what Paul said was true. [Acts 17:11]
 
 
 



Re: [whatwg] How to handle multitrack media resources in HTML

2011-02-14 Thread Jeroen Wijering
Hello Silvia, all,

First, thanks for the Multitrack wiki page. Very helpful for those who are not 
subscribed to the various lists. I also phrased below comments as feedback to 
this page:

http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API

USE CASE 

The use case is spot on; this is an issue that blocks HTML5 video from being 
chosen over a solution like Flash. An elaborate list of tracks is important, to 
correctly scope the conditions / resolutions:

1. Tracks targeting device capabilities:
   * Different containers / codes / profiles
   * Multiview (3D) or surround sound
   * Playback rights and/or decryption possibilities
2. Tracks targeting content customization:
   * Alternate viewing angles or alternate music scores
   * Director's comments or storyboard video
3. Tracks targeting accessibility:
   * Dubbed audio or text subtitles
   * Audio descriptions or closed captions
   * Tracks cleared from cursing / nudity / violence
4. Tracks targeting the interface:
   * Chapterlists, bookmarks, timed annotations, midroll hints..
   * .. and any other type of scripting queues

Note I included the HTML5 text tracks. I believe there are four kinds of 
tracks, all inherent part of a media presentation. These types designate the 
output of the track, not its encoded representation:

* audio (producing sound)
* metadata (producing scripting queues)
* text (producing rendered text)
* video (producing images)

In this taxonomy, the HTML5 subtitles and captions track kinds are text, 
the descriptions kind is audio and the chapters and metadata kinds are 
metadata.

REQUIREMENTS

The requirements are elaborate, but do note they span beyond HTML5. Everything 
that plays back audio/video needs multitrack support:

* Broad- and narrowcasting playback devices of any kind
* Native desktop, mobile and settop applications/apps
* Devices that play media standalone (mediaplayers, pictureframes, airplay)

Also, on e.g. the iPhone and Android devices, playback of video is triggered by 
HTML5, but subsequently detached from it. Think about the custom fullscreen 
controls, the obscuring of all HTML and events/cueues that are deliberately 
ignored or not sent (such as play() in iOS). I wonder whether this is a 
temporary state or something that will remain and  should be provisioned. 

With this in mind, I think an additional requirement is that there should be a 
full solution outside the scope of HTML5. HTML5 has unique capabilities like 
customization of the layout (CSS) and interaction (JavaScript), but it must not 
be required.

SIDE CONDITIONS

In the side conditions, I'm not sure on the relative volume of audio or 
positioning of video. Automation by default might work better and requires no 
parameters. For audio, blending can be done through a ducking mechanism (like 
the JW Player does). For video, blending can be done through an alpha channel. 
At a later stage, an API/heuristics for PIP support and gain control can be 
added.

SOLUTIONS

In terms of solutions, I lean much towards the manifest approach. The other 
approaches are options that each add more elements to HTML5, which:

* Won't work for situations outside of HTML5.
* Postpone, and perhaps clash with, the addition of manifests.

Without a manifest, there'll probably be no adaptive streaming, which renders 
HTML5 video much less useful. At the same time, standardization around 
manifests (DASH) is largely wrapping up.

EXAMPLE

Here's some code on the manifest approach. First the HTML5 side:

video id=v1 poster=video.png controls
  source src=manifest.xml type=video/mpeg-dash
/video

Second the manifest side:

MPD mediaPresentationDuration=PT645S type=OnDemand
BaseURLhttp://cdn.example.com/myVideo//BaseURL
Period

Group mimeType=video/webm  lang=en
Representation sourceURL=video-1600.webm /
/Group

Group mimeType=video/mp4; codecs=avc1.42E00C,mp4a.40.2 lang=en
Representation sourceURL=video-1600.mp4 /
/Group

Group mimeType=text/vvt lang=en
Accessibility type=CC /
Representation sourceURL=captions.vtt /
/Group

/Period
/MPD


(I should more look into accessibility parameters, but there is support for 
signalling captions, audiodescriptions, sign language etc.)

Note that this approach moves the text track outside of HTML5, making it 
accessible for other clients as well. Both codecs are also in the manifest - 
this is just one of the device capability selectors of DASH clients.

DISADVANTAGES

The two listed disadvantages for the manifest approach in the wiki page are 
lack of CSS and discoverability:

*) The CSS styling issue can be fixed by making a conceptual change to CSS and 
text tracks. Instead of styling text tracks, a single text rendering area for 
each video element can be exposed and styled. Any text tracks that are enabled 
push data in it, which is automatically styled according to the 
video.textStyle/etc rules.

*) 

Re: [whatwg] Limiting the amount of downloaded but not watched video

2011-01-20 Thread Jeroen Wijering

On Jan 20, 2011, at 9:14 AM, Philip Jägenstedt wrote:

 
 (Since there is some overhead with each HTTP request, one must make sure
 that they are not unreasonably small.)
 
 When HTTP byte ranges are used to achieve bandwidth management, it's hard
 to talk about a single downloadBufferTarget that is the number of seconds
 buffered ahead. Rather, there might be an upper and lower limit within which
 the browser tries to stay, so that each request can be of a reasonable size.
 Neither an author-provided minumum or maximum value can be followed
 particularly closely, but could possibly be taken as a hint of some sort.
 
 Does it actually make sense to specify the read-ahead size, or should it
 simply be a flag (eg. unlimited, small buffer and don't care)?  Is
 there really a case for setting the actual read-ahead value directly?  In a
 sense, that seems akin to allowing web pages to control the TCP buffer sizes
 used by the client's browser--it's lower level than people usually care
 about.
 
 In particular, I'm thinking that most of the time all people care about is
 read ahead a little vs. read ahead a lot, and publishers shouldn't need
 to figure out the right buffer size to use for the former (and very likely
 getting it wrong).
 
 I'm inclined to agree, and we already have a way to say a little 
 (preload=none/metadata) and a lot (preload=auto).
 
 However, it'd be great if all implementors could agree on the same 
 interpretation of states. Specifically, this isn't required by the spec but 
 would still be helpful to have consistency in:
 
 * effective state can only increase to higher states, never go from e.g. 
 metadata to none (it makes no sense)
 * there is a state - invoked - between metadata and auto for when the video 
 is playing
 * there could be a state between invoked and auto for autoplay, but if not 
 autoplay implies preload=auto
 * in the invoked state, a conservative buffering strategy is used by default
 * when paused in the invoked state, we need to agree on what should happen
 
 If we could agree, then of course it should be documented somewhere, even if 
 it seems somewhat restrictive of the spec to mandate an exact behavior.

Perhaps the conservative buffering strategy should be client-side throttling 
after all. The pause-to-buffer argument several people put forward is a strong 
one - a big use case (perhaps more people pause b/c of this than b/c of all 
other reasons combined). Something like a downloadBufferTarget would be 
confusing and break this. Client-side throttling won't. 

- Jeroen

[whatwg] Limiting the amount of downloaded but not watched video

2011-01-17 Thread Jeroen Wijering
Hello all, 

We are getting some questions from JW Player users that HTML5 video is quite 
wasteful on bandwidth for longer videos (think 10min+). This because browsers 
download the entire movie once playback starts, regardless of whether a user 
pauses the player. If throttling is used, it seems very conservative, which 
means a lot of unwatched video is in the buffer when a user unloads a video.

I did a simple test with a 10 minute video: playing it; pausing after 30 
seconds and checking download progress after another 30 seconds. With all 
browsers (Firefox 4, Safari 5, Chrome 8, Opera 11, iOS 4.2), the video would 
indeed be fully downloaded after 60 seconds. Some throttling seems to be 
applied by Safari / iOS, but this could also be bandwidth fluctuations on my 
side. Either way, all browsers downloaded the 10min video while only 30 seconds 
were being watched. 

The HTML5 spec is a bit generic on this topic, allowing mechanisms such as 
stalling and throttling but not requiring them, or prescribing a scripting 
interface:

http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource

Are there people working on ways to trim down the amount of not-watched data 
for video playback? Any ideas on this, anything in the pipeline?

---

A suggestion would be to implement / expose a property called 
downloadBufferTarget. It would be the amount of video in seconds the browser 
tries to keep in the download buffer.

When a user starts (or seeks in) a video, the browser would try to download 
downloadBufferTarget amount in seconds of video. When downloaded  
currentTime + downloadBufferTarget, downloading would get stalled, until a 
certain lower treshold is reached (e.g. 50%) and the browser would start 
downloading additional data. 

A good default value for downloadBufferTarget would be 60 seconds. 
Webdevelopers who have short clips / do not care about downloads can set 
downloadBufferTarget to a higher value (e.g. 300). Webdevelopers who have 
long videos (15min+) / want to keep their bandwidth bill low can set 
downloadBufferTarget to a lower value (e.g. 15). Webdevelopers might even 
change the value of downloadBufferTarget per visitor; visitors with little 
bandwidth get a sizeable buffer (to prevent stuttering) and visitors with a big 
pipe get a small download buffer (they don't need it). 

The buffered timeranges could be used to compare the actual download buffer 
to the buffer target, should a user-interface want to display this feedback.

Note that the download buffer is not the same as the playback buffer. A 
download buffer underrun should not result in pausing the video. The download 
buffer does also not apply to live streaming.  

Kind regards,

Jeroen Wijering
Longtail Video



PS: Having the preload=none property available in all browsers will also help 
in keeping the amount of downloaded but not watched data low. In our tests, 
only Firefox (4 b9) seems to honor this property at present. 

Re: [whatwg] HTML5 video: frame accuracy / SMPTE

2011-01-12 Thread Jeroen Wijering

On Jan 12, 2011, at 11:04 AM, whatwg-requ...@lists.whatwg.org wrote:

 Date: Wed, 12 Jan 2011 11:54:47 +0200
 From: Mikko Rantalainen mikko.rantalai...@peda.net
 To: whatwg@lists.whatwg.org
 Subject: Re: [whatwg] HTML5 video: frame accuracy / SMPTE
 Message-ID: 4d2d7a67.7090...@peda.net
 Content-Type: text/plain; charset=ISO-8859-1
 
 2011-01-12 00:40 EEST: Rob Coenen:
 Hi David- that is b/c in an ideal world I'd want to seek to a time expressed
 as a SMPTE timecode (think web apps that let users step x frames back, seek
 y frames forward etc.). In order to convert SMPTE to the floating point
 value for video.seekTime I need to know the frame rate.
 
 It seems to me that such an application really requires a method for
 querying the timestamp for previous and next frames when given a
 timestamp. If such an application requires FPS value, it can then
 compute it by itself it such a value is assumed meaningful. (Simply get
 next frame timestamp from zero timestamp and continue for a couple of
 frames to compute FPS and check if the FPS seems to be stable.)
 
 Perhaps there should be a method
 
 getRelativeFrameTime(timestamp, relation)
 
 where timestamp is the current timestamp and relation is one of
 previousFrame, nextFrame, previousKeyFrame, nextKeyFrame?
 
 Use of this method could be allowed only for paused video if needed for
 simple implementation.


Alternatively, one could look at a step() function instead of a seek(pos,exact) 
function. The step function can be used for frame-accurate controls. e.g. 
step(2) or step(-1). The advantage over a seek(pos,exact) function (and the 
playback rate controls) is that the viewer really knows the video is X frames 
offset. This is very useful for both artistic/editing applications and for 
video analysis applications (think sports, medical or experiments).

The downside of a step() to either always accurate seeking or a seek(pos,exact) 
is that it requires two steps in situations like bookmarking or chaptering. 

It seems like the framerate / SMPTE proposals done here are all a means to end 
up with frame-accurate seeking. With a step() function in place, there's no 
need for such things. In fact, one could do a step(10) or so and then use the 
difference in position to calculate framerate. 

- Jeroen

Re: [whatwg] HTML5 video: frame accuracy / SMPTE

2011-01-12 Thread Jeroen Wijering

On Jan 12, 2011, at 2:05 PM, Rob Coenen wrote:

 The need for SMPTE still remains as I want to be able to do things such as 
 video.seekTo(smpte_timecode_converted_to_seconds, seek_exact=true); so that 
 my video goes to exactly the exact frame as indicated by 
 smpte_timecode_converted_to_seconds. Think chapter bookmarking, scene 
 indexing, etc.
 

With the step() in place, this would be a simple convenience function. This 
pseudo-code is not ideal and making some assumptions, but the approach should 
work: 

function seekToTimecode(timecode) {
var seconds = convert_timecode_to_seconds(timecode);
videoElement.seek(seconds);
var delta = seconds - videoElement.currentTime;
while (delta  0) {
videoElement.step(1);
delta = seconds - videoElement.currentTime;
}
};

Its basically stepping to the frame that's closest to the timecode (as 
elaborated by others, there's no such thing as timecode in MP4/WebM. It's just 
timestamps). 

Note you actually do want to have this conversion taking place in javascript, 
since there are many reasons to adjust/offset the conversion (sync issues, 
timecode base differences, ...). If it's locked  up inside the browser API you 
have to do duplicate work around it if the input files / conversion assumptions 
don't align.

- Jeroen

Re: [whatwg] WebSRT feedback

2010-10-08 Thread Jeroen Wijering

On Oct 8, 2010, at 2:24 PM, whatwg-requ...@lists.whatwg.org wrote:

 Even if very few subtitles use inline SVG, SVG in object, img,
 iframe, video, self-referencing track, etc in the cue text, all
 implementations would have to support it in the same way for it to be
 interoperable. That's quite an undertaking and I don't think it's really
 worth it.
 
 
 User agents only need to be interoperable over the common subset of HTML
 features they support. HTML is mostly designed to degrade gracefully when a
 user agent encounters elements it doesn't support. The simplest possible
 video player would use an HTML parser (hopefully off-the-shelf) to build
 some kind of DOM structure. Then it can group text into paragraphs for
 rendering, and ignore the rest of the content.
 
 In practice, we'll have to deal with user agents that support different sets
 of WebSRT features --- when version 2 of WebSRT is developed, if not before.
 Why not use existing, proven machinery --- HTML --- to cope with that
 situation?
 
 Rob

The requests we receive on the captioning functionality of the JW Player always 
revolve around styling. Font size, color, style, weight, outline and family. 
Block x, y, width, height, text-align, vertical-align, padding, margin, 
background and alpha. Both for an entire SRT file, for distinct captioning 
entries and for specific parts of a captioning entry. Not to say that a full 
parsing engine wouldn't be nice or useful, but at present there's simply no 
requests for it (not even for a ;). Plus, more advanced timed track 
applications can easily be built with javascript (timed boucing 3D balls using 
WebGL).

W3C's timed text does a decent job in facilitating the styling needs for 
captioning authors. Overall regions, single paragraphs and inline chunks 
(through span) can be styled. There are a few small misses, such as text 
outline, and vertical alignment (which can be done with separate regions 
though). IMO the biggest con of TT is that it uses its own, in-document styling 
namespace, instead of relying upon page CSS. 

Kind regards,

Jeroen

Re: [whatwg] HTML 5 : The Youtube response

2010-07-02 Thread Jeroen Wijering
Hello,

The Flash player exposes a string of metrics:

http://help.adobe.com/en_US/AS3LCR/Flash_10.0/flash/net/NetStreamInfo.html

The most useful ones are:

*) droppedFrames: it can be used to determine whether the client can play the 
video without stuttering.
*) maxBytesPerSecond: it can be used to determine the bandwidth of  the 
connection. 

In addition to this, the metadata embedded in the video is interesting. For 
example:

*) width  height: already available
*) duration: already avaliable
*) bitrates: for comparison against maxBytesPerSecond. 
*) seekpoints: for anticipating seek results in the UI
*) content metadata (e.g. ID3 or MP4 Images)

Here's an example with the exposed metadata printed on top of the player:

http://developer.longtailvideo.com/trac/testing/?plugins=metaviewerfile=%2Fplayer%2Ftesting%2Ffiles%2Fbunny.mp4height=260width=500autostart=truemute=true

Kind regards,

Jeroen





 One part of (2) [well, debatably part, but related to video streaming] is 
 the lack of visibility into stream behavior. I can't ask the video element 
 questions about dropped frames, bitrate, etc. This is incredibly useful in 
 Flash for getting streaming feedback, and means I really don't know how well 
 the HTML5 player is working for users. The best I can do is waiting/stalled 
 events which is nowhere near as granular.
 
 I agree that exposing info like that would be useful. What does the Flash API 
 for this look like? What parts of the available data do you find most useful?
 
 Regards,
 Maciej
 
 
 -Kevin
 
 On Thu, Jul 1, 2010 at 9:16 AM, Maciej Stachowiak m...@apple.com wrote:
 
 On Jul 1, 2010, at 6:12 AM, Kornel Lesinski wrote:
 
 
  I believe we can allow arbitrary content to go fullscreen, along the 
  lines of what Robert O'Callahan has proposed on this list, if we impose 
  sufficient restrictions to mitigate the above risks. In my opinion, the 
  following measures would likely be sufficient:
 
  A) Have a distinctive animated sequence when an element goes into 
  full-screen mode. This helps the user understand what happened.
  B) Limit the ability to go fullscreen to user gestures, much as many 
  browsers limit pop-ups. This prevents shenanigans from happening while 
  the user is away from the keyboard, and greatly limits the potential 
  annoyance factor.
  C) On systems with keyboard/mouse input, limit the keys that may be 
  processed by fullscreen content to a small set, such as the set that 
  Flash limits to in full-screen mode: 
  http://www.adobe.com/devnet/flashplayer/articles/fplayer10_security_changes_03.html#head5.
  D) On multitouch devices with an onscreen keyboard as the normal means of 
  input, things are trickier, because it's possible for a dedicated 
  attacker to simulate the keyboard. My best idea is make sure that a 
  visually distinctive status indicator appears at the top of the screen 
  even in full-screen mode, since that is the norm on such platforms.
  E) Reserve one or more obvious key combinations to exiting fullscreen no 
  matter what (Escape, perhaps Cmd+W/Ctrl+W).
  F) Even on keyboard/mouse type systems, have some distinctive visual 
  affordance which is either always present or appears on mouse moves, and 
  which allows the user to exit full-screen mode.
 
  I think these measures greatly mitigate risks (1) and (2) above, and open 
  up highly valued functionality (full screen video) with a UI that users 
  will enjoy, and customizability that video hosting sites will appreciate.
 
  Another option (for low-res videos on desktop) might be to use lower 
  screen resolution when in full screen — text and UI elements displayed by 
  attacker will look noticeably different.
 
 That would probably make the controls look ugly for video with custom 
 controls, and I suspect neither users nor content authors would appreciate 
 that. Interesting idea, though.
 
  - Maciej
 
 
 



Re: [whatwg] Exposing framerate / statistics of video playback

2010-05-30 Thread Jeroen Wijering
Hello, 

 Has any thought been given to exposing such metrics as framerate, how many 
 frames are dropped, rebuffering, etc from the video tag? My understanding 
 is that in the Flash player, many of these types of statistics are readily 
 available. This is interesting for things not just like benchmarking, but for 
 a site to determine if it is not working well for clients and should instead 
 e.g. switch down to a lower bitrate video. 
 
 Hasn't been discussed AFAIK, but I'd like to see a proposal.

Here's a list of what is available in AS3 through the NetStream.info object:

http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/NetStreamInfo.html

For determining whether the user-agent is able to play a video, these are the 
most interesting properties:

 readonly attribute unsigned long bandwidth::
The current maximum server » client bandwidth, in bits per second.

 readonly attribute unsigned long droppedframes::
The number of frames dropped by the user agent since playback of this video 
was initialized.

Kind regards,

Jeroen Wijering

Re: [whatwg] A standard for adaptive HTTP streaming for media resources

2010-05-28 Thread Jeroen Wijering
 mid-way between these alternative files
 
 I am personally not sure which is the right forum to create the new
 standard in, but I know that we have a need for it in HTML5.

Agreed. 

By its current spec, HTML5 video is mostly suited for display of short clips. 

High-quality, long-form and live content need an additional level of 
functionality, which HTTP Streaming seems to provide.

 Would it be possible / the right way to start something like this as
 part of the Web applications work at WHATWG?
 (Incidentally, I've brought this up in W3C before an not got any
 replies, so I'm not sure W3C would be a better place for this work.
 Maybe IETF? But then, why not here...)
 
 What do people think?
 
 Cheers,
 Silvia.

Kind regards,

Jeroen Wijering