Re: [whatwg] video feedback
On Thu, 20 Dec 2012, Jer Noble wrote: On Dec 17, 2012, at 4:01 PM, Ian Hickson i...@hixie.ch wrote: Should we add a preciseSeek() method with two arguments that does a seek using the given rational time? This method would be more useful if there were a way to retrieve the media's time scale. Otherwise, the script would have to pick an arbitrary scale value, or provide the correct media scale through other means (such as querying the server hosting the media). Additionally, authors like Rob are going to want to retrieve this precise representation of the currentTime. If rational time values were encapsulated into their own interface, a preciseCurrentTime (or similar) read-write attribute could be used instead. Ok. I assume this is something you (Apple) are interested in implementing; is this something any other browser vendors want to support? If so, I'll be happy to add something along these lines. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] video feedback
On 2012/12/18 9:01, Ian Hickson wrote: On Tue, 2 Oct 2012, Jer Noble wrote: The nature of floating point math makes precise frame navigation difficult, if not impossible. Rob's test is especially hairy, given that each frame has a timing bound of [startTime, endTime), and his test attempts to navigate directly to the startTime of a given frame, a value which gives approximately zero room for error. ... That makes sense. Should we add a preciseSeek() method with two arguments that does a seek using the given rational time? I draw your attention to Don't Store that in a float http://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/ and its suggestion to use a double starting at 2^32 to avoid the issue around precision changing with magnitude as the time increases. Regards -Mark -- 注意:この電子メールには、株式会社エイチアイの機密情報が含まれている場合 が有ります。正式なメール受信者では無い場合はメール複製、 再配信または情 報の使用を固く禁じております。エラー、手違いでこのメールを受け取られまし たら削除を行い配信者にご連絡をお願いいたし ます. NOTE: This electronic mail message may contain confidential and privileged information from HI Corporation. If you are not the intended recipient, any disclosure, photocopying, distribution or use of the contents of the received information is prohibited. If you have received this e-mail in error, please notify the sender immediately and permanently delete this message and all related copies.
Re: [whatwg] video feedback
On Thu, 20 Dec 2012, Mark Callow wrote: On 2012/12/18 9:01, Ian Hickson wrote: On Tue, 2 Oct 2012, Jer Noble wrote: The nature of floating point math makes precise frame navigation difficult, if not impossible. Rob's test is especially hairy, given that each frame has a timing bound of [startTime, endTime), and his test attempts to navigate directly to the startTime of a given frame, a value which gives approximately zero room for error. That makes sense. Should we add a preciseSeek() method with two arguments that does a seek using the given rational time? I draw your attention to Don't Store that in a float http://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/ and its suggestion to use a double starting at 2^32 to avoid the issue around precision changing with magnitude as the time increases. Everything in the Web platform already uses doubles. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] video feedback
On 12/20/12 9:54 AM, Ian Hickson wrote: Everything in the Web platform already uses doubles. Except WebGL. And Audio API wave tables, sample rates, AudioParams, PCM data (though thankfully times in Audio API do use doubles). And graphics libraries used to implement canvas, in many cases... I think the only safe claim about everything in the web platform is that it's all different. ;) -Boris
Re: [whatwg] video feedback
On 2012/12/21 2:54, Ian Hickson wrote: On Thu, 20 Dec 2012, Mark Callow wrote: I draw your attention to Don't Store that in a float http://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/ and its suggestion to use a double starting at 2^32 to avoid the issue around precision changing with magnitude as the time increases. Everything in the Web platform already uses doubles. Yes, except as noted by Boris. The important point is the idea of using 2^32 as zero time which means the precision barely changes across the range of time values of interest to games, videos, etc. Regards -Mark -- 注意:この電子メールには、株式会社エイチアイの機密情報が含まれている場合 が有ります。正式なメール受信者では無い場合はメール複製、 再配信または情 報の使用を固く禁じております。エラー、手違いでこのメールを受け取られまし たら削除を行い配信者にご連絡をお願いいたし ます. NOTE: This electronic mail message may contain confidential and privileged information from HI Corporation. If you are not the intended recipient, any disclosure, photocopying, distribution or use of the contents of the received information is prohibited. If you have received this e-mail in error, please notify the sender immediately and permanently delete this message and all related copies.
Re: [whatwg] video feedback
On Fri, 21 Dec 2012, Mark Callow wrote: On 2012/12/21 2:54, Ian Hickson wrote: On Thu, 20 Dec 2012, Mark Callow wrote: I draw your attention to Don't Store that in a float http://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/ and its suggestion to use a double starting at 2^32 to avoid the issue around precision changing with magnitude as the time increases. Everything in the Web platform already uses doubles. Yes, except as noted by Boris. The important point is the idea of using 2^32 as zero time which means the precision barely changes across the range of time values of interest to games, videos, etc. Ah, well, for video that ship has sailed, really. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] video feedback
On Dec 20, 2012, at 7:27 PM, Mark Callow callow.m...@artspark.co.jp wrote: On 2012/12/21 2:54, Ian Hickson wrote: On Thu, 20 Dec 2012, Mark Callow wrote: I draw your attention to Don't Store that in a float http://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/ and its suggestion to use a double starting at 2^32 to avoid the issue around precision changing with magnitude as the time increases. Everything in the Web platform already uses doubles. Yes, except as noted by Boris. The important point is the idea of using 2^32 as zero time which means the precision barely changes across the range of time values of interest to games, videos, etc. I don't believe the frame accuracy problem in question had to do with precision instability, per se. Many of Rob Coenen's frame accuracy issues were found within the first second of video. Admittedly, this is where the avaliable precision is changing most rapidly, but it is also where available precision is greatest by far. An integral rational number has a benefit over even the 2^32 zero time suggestion: for common time scale values[1], it is intrinsically stable over the range of time t=[0..2^43). It has the added benefit of being exactly the representation used by the underlying media engine. On Dec 17, 2012, at 4:01 PM, Ian Hickson i...@hixie.ch wrote: Should we add a preciseSeek() method with two arguments that does a seek using the given rational time? This method would be more useful if there were a way to retrieve the media's time scale. Otherwise, the script would have to pick an arbitrary scale value, or provide the correct media scale through other means (such as querying the server hosting the media). Additionally, authors like Rob are going to want to retrieve this precise representation of the currentTime. If rational time values were encapsulated into their own interface, a preciseCurrentTime (or similar) read-write attribute could be used instead. -Jer [i] E.g., 1001 is a common time scale for 29.997 and 23.976 FPS video.
Re: [whatwg] video feedback
On Tue, 2 Oct 2012, Jer Noble wrote: On Sep 17, 2012, at 12:43 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 9 Jul 2012, adam k wrote: i'm aware that crooked framerates (i.e. the notorious 29.97) were not supported when frame accuracy was implemented. in my tests, 29.97DF timecodes were incorrect by 1 to 3 frames at any given point. will there ever be support for crooked framerate accuracy? i would be more than happy to contribute whatever i can to help test it and make it possible. can someone comment on this? This is a Quality of Implementation issue, basically. I believe there's nothing inherently in the API that would make accuracy to such timecodes impossible. The nature of floating point math makes precise frame navigation difficult, if not impossible. Rob's test is especially hairy, given that each frame has a timing bound of [startTime, endTime), and his test attempts to navigate directly to the startTime of a given frame, a value which gives approximately zero room for error. I'm most familiar with MPEG containers, but I believe the following is also true of the WebM container: times are represented by a rational number, timeValue / timeScale, where both numerator and denominator are unsigned integers. To seek to a particular media time, we must convert a floating-point time value into this rational time format (e.g. when calculating the 4th frame's start time, from 3 * 1/29.97 to 3 * 1001/3). If there is a floating-point error in the wrong direction (e.g., as above, a numerator of 3002 vs 3003), the end result will not be the frame's startTime, but one timeScale before it. We've fixed some frame accuracy bugs in WebKit (and Chromium) by carefully rounding the incoming floating point time value, taking into account the media's time scale, and rounding to the nearest 1/timeScale value. This fixes Rob's precision test, but at the expense of precision. (I.e. in a 30 fps movie, currentTime = 0.99 / 30 will navigate to the second frame, not the first, due to rounding, which is technically incorrect.) This is a common problem, and Apple media frameworks (for example) therefore provide rational time classes which provide enough accuracy for precise navigation (e.g. QTTime, CMTime). Using a floating point number to represent time with any precision is not generally accepted as good practice when these rational time classes are available. That makes sense. Should we add a preciseSeek() method with two arguments that does a seek using the given rational time? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] video feedback
On Sep 17, 2012, at 12:43 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 9 Jul 2012, adam k wrote: i have a 25fps video, h264, with a burned in timecode. it seems to be off by 1 frame when i compare the burned in timecode to the calculated timecode. i'm using rob coenen's test app at http://www.massive-interactive.nl/html5_video/smpte_test_universal.html to load my own video. what's the process here to report issues? please let me know whatever formal or informal steps are required and i'll gladly follow them. Depends on the browser. Which browser? i'm aware that crooked framerates (i.e. the notorious 29.97) were not supported when frame accuracy was implemented. in my tests, 29.97DF timecodes were incorrect by 1 to 3 frames at any given point. will there ever be support for crooked framerate accuracy? i would be more than happy to contribute whatever i can to help test it and make it possible. can someone comment on this? This is a Quality of Implementation issue, basically. I believe there's nothing inherently in the API that would make accuracy to such timecodes impossible. TLDR; for precise navigation, you need to use a a rational time class, rather than a float value. The nature of floating point math makes precise frame navigation difficult, if not impossible. Rob's test is especially hairy, given that each frame has a timing bound of [startTime, endTime), and his test attempts to navigate directly to the startTime of a given frame, a value which gives approximately zero room for error. I'm most familiar with MPEG containers, but I believe the following is also true of the WebM container: times are represented by a rational number, timeValue / timeScale, where both numerator and denominator are unsigned integers. To seek to a particular media time, we must convert a floating-point time value into this rational time format (e.g. when calculating the 4th frame's start time, from 3 * 1/29.97 to 3 * 1001/3). If there is a floating-point error in the wrong direction (e.g., as above, a numerator of 3002 vs 3003), the end result will not be the frame's startTime, but one timeScale before it. We've fixed some frame accuracy bugs in WebKit (and Chromium) by carefully rounding the incoming floating point time value, taking into account the media's time scale, and rounding to the nearest 1/timeScale value. This fixes Rob's precision test, but at the expense of precision. (I.e. in a 30 fps movie, currentTime = 0.99 / 30 will navigate to the second frame, not the first, due to rounding, which is technically incorrect.) This is a common problem, and Apple media frameworks (for example) therefore provide rational time classes which provide enough accuracy for precise navigation (e.g. QTTime, CMTime). Using a floating point number to represent time with any precision is not generally accepted as good practice when these rational time classes are available. -Jer
Re: [whatwg] video feedback
On Wed, Oct 3, 2012 at 6:41 AM, Jer Noble jer.no...@apple.com wrote: On Sep 17, 2012, at 12:43 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 9 Jul 2012, adam k wrote: i have a 25fps video, h264, with a burned in timecode. it seems to be off by 1 frame when i compare the burned in timecode to the calculated timecode. i'm using rob coenen's test app at http://www.massive-interactive.nl/html5_video/smpte_test_universal.html to load my own video. what's the process here to report issues? please let me know whatever formal or informal steps are required and i'll gladly follow them. Depends on the browser. Which browser? i'm aware that crooked framerates (i.e. the notorious 29.97) were not supported when frame accuracy was implemented. in my tests, 29.97DF timecodes were incorrect by 1 to 3 frames at any given point. will there ever be support for crooked framerate accuracy? i would be more than happy to contribute whatever i can to help test it and make it possible. can someone comment on this? This is a Quality of Implementation issue, basically. I believe there's nothing inherently in the API that would make accuracy to such timecodes impossible. TLDR; for precise navigation, you need to use a a rational time class, rather than a float value. The nature of floating point math makes precise frame navigation difficult, if not impossible. Rob's test is especially hairy, given that each frame has a timing bound of [startTime, endTime), and his test attempts to navigate directly to the startTime of a given frame, a value which gives approximately zero room for error. I'm most familiar with MPEG containers, but I believe the following is also true of the WebM container: times are represented by a rational number, timeValue / timeScale, where both numerator and denominator are unsigned integers. FYI: the Ogg container also uses rational numbers to represent time. To seek to a particular media time, we must convert a floating-point time value into this rational time format (e.g. when calculating the 4th frame's start time, from 3 * 1/29.97 to 3 * 1001/3). If there is a floating-point error in the wrong direction (e.g., as above, a numerator of 3002 vs 3003), the end result will not be the frame's startTime, but one timeScale before it. We've fixed some frame accuracy bugs in WebKit (and Chromium) by carefully rounding the incoming floating point time value, taking into account the media's time scale, and rounding to the nearest 1/timeScale value. This fixes Rob's precision test, but at the expense of precision. (I.e. in a 30 fps movie, currentTime = 0.99 / 30 will navigate to the second frame, not the first, due to rounding, which is technically incorrect.) This is a common problem, and Apple media frameworks (for example) therefore provide rational time classes which provide enough accuracy for precise navigation (e.g. QTTime, CMTime). Using a floating point number to represent time with any precision is not generally accepted as good practice when these rational time classes are available. -Jer
Re: [whatwg] Video feedback
On Thu, 7 Jul 2011, Eric Winkelman wrote: On Thursday, June 02 Ian Hickson wrote: On Fri, 18 Mar 2011, Eric Winkelman wrote: For in-band metadata tracks, there is neither a standard way to represent the type of metadata in the HTMLTrackElement interface nor is there a standard way to represent multiple different types of metadata tracks. There can be a standard way. The idea is that all the types of metadata tracks that browsers will support should be specified so that all browsers can map them the same way. I'm happy to work with anyone interested in writing such a mapping spec, just let me know. I would be very interested in working on this spec. It would be several specs, probably, each focusing on a particular set of metadata in a particular format (e.g. advertising timings in an MPEG wrapper, or whatever). What's the next step? First, research: what formats and metadata streams are you interested in? Who uses them? How are they implemented in producers and (more importantly) consumers today? What are the use cases? Second, describe the problem: make a clear statement of purpose that scopes the effort to provide guidelines to prevent feature creep. Third, listen to implementors: find those that are interested in implementing this particular mapping of metadata to the DOM API, get their input, see what they want. Fourth, implement: make or have someone else make an experimental implementation of a mapping that addresses the problem described in the earlier steps. Fifth, specify: write a specification that describes the mapping described in step two, based on what you've researched in step one and based on the feedback from steps three and four. Sixth, test: update the experimental implement to fit the spec, get other implementations to implement the spec. Have real users play with it. Seventh, simplify: remove what you don't need. Finally, iterate: repeat all these steps for as long as there's any interest in this mapping, fixing problems, adding new features if they're needed, removing old features that didn't get used or implemented, etc. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Video feedback
-Original Message- From: whatwg-boun...@lists.whatwg.org [mailto:whatwg- boun...@lists.whatwg.org] On Behalf Of Mark Watson Sent: Monday, June 20, 2011 2:29 AM To: Eric Carlson Cc: Silvia Pfeiffer; whatwg Group; Simon Pieters Subject: Re: [whatwg] Video feedback On Jun 9, 2011, at 4:32 PM, Eric Carlson wrote: On Jun 9, 2011, at 12:02 AM, Silvia Pfeiffer wrote: On Thu, Jun 9, 2011 at 4:34 PM, Simon Pieters sim...@opera.com wrote: On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? It's not about what the spec supports, but what real-world streams provide. I don't think it makes sense to put an event on every single type of metadata that can change. Most of the time, when you have a stream change, many variables will change together, so a single event is a lot less events to raise. It's an event that signifies that the media framework has reset the video/audio decoding pipeline and loaded a whole bunch of new stuff. You should imagine it as a concatenation of different media resources. And yes, they can have different track constitution and different audio sampling rate (which the audio API will care about) etc etc. In addition, it is possible for a stream to lose or gain an audio track. In this case the dimensions won't change but a script may want to react to the change in audioTracks. The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I agree with Silvia, a more generic metadata changed event makes more sense. Yes, and it should support the case in which text tracks are added/removed too. Has there been a bug submitted to add a metadata changed event when video, audio or text tracks are added or deleted from a media resource? Thanks, Bob Lund Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. ...Mark eric
Re: [whatwg] Video feedback
On Thursday, June 02 Ian Hickson wrote: On Fri, 18 Mar 2011, Eric Winkelman wrote: For in-band metadata tracks, there is neither a standard way to represent the type of metadata in the HTMLTrackElement interface nor is there a standard way to represent multiple different types of metadata tracks. There can be a standard way. The idea is that all the types of metadata tracks that browsers will support should be specified so that all browsers can map them the same way. I'm happy to work with anyone interested in writing such a mapping spec, just let me know. I would be very interested in working on this spec. CableLabs works with numerous groups delivering content containing a variety of metadata, so we have a good idea what is currently used. We're also working with the groups defining adaptive bit rate delivery protocols about how metadata might be carried. What's the next step? Eric
Re: [whatwg] Video feedback
On Jun 9, 2011, at 4:32 PM, Eric Carlson wrote: On Jun 9, 2011, at 12:02 AM, Silvia Pfeiffer wrote: On Thu, Jun 9, 2011 at 4:34 PM, Simon Pieters sim...@opera.com wrote: On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? It's not about what the spec supports, but what real-world streams provide. I don't think it makes sense to put an event on every single type of metadata that can change. Most of the time, when you have a stream change, many variables will change together, so a single event is a lot less events to raise. It's an event that signifies that the media framework has reset the video/audio decoding pipeline and loaded a whole bunch of new stuff. You should imagine it as a concatenation of different media resources. And yes, they can have different track constitution and different audio sampling rate (which the audio API will care about) etc etc. In addition, it is possible for a stream to lose or gain an audio track. In this case the dimensions won't change but a script may want to react to the change in audioTracks. The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I agree with Silvia, a more generic metadata changed event makes more sense. Yes, and it should support the case in which text tracks are added/removed too. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. ...Mark eric
Re: [whatwg] Video feedback
On Mon, Jun 20, 2011 at 6:29 PM, Mark Watson wats...@netflix.com wrote: On Jun 9, 2011, at 4:32 PM, Eric Carlson wrote: On Jun 9, 2011, at 12:02 AM, Silvia Pfeiffer wrote: On Thu, Jun 9, 2011 at 4:34 PM, Simon Pieters sim...@opera.com wrote: On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? It's not about what the spec supports, but what real-world streams provide. I don't think it makes sense to put an event on every single type of metadata that can change. Most of the time, when you have a stream change, many variables will change together, so a single event is a lot less events to raise. It's an event that signifies that the media framework has reset the video/audio decoding pipeline and loaded a whole bunch of new stuff. You should imagine it as a concatenation of different media resources. And yes, they can have different track constitution and different audio sampling rate (which the audio API will care about) etc etc. In addition, it is possible for a stream to lose or gain an audio track. In this case the dimensions won't change but a script may want to react to the change in audioTracks. The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. I agree with Silvia, a more generic metadata changed event makes more sense. Yes, and it should support the case in which text tracks are added/removed too. Yes, it needs to be an event on the MediaElement. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? Different video dimensions should be provided through the source element and @media attribute, but within an adaptive stream, the alternatives should be consistent because the target device won't change. I guess this is a discussion for another thread... :-) Cheers, Silvia.
Re: [whatwg] Video feedback
On Jun 20, 2011, at 10:42 AM, Silvia Pfeiffer wrote: On Mon, Jun 20, 2011 at 6:29 PM, Mark Watson wats...@netflix.commailto:wats...@netflix.com wrote: On Jun 9, 2011, at 4:32 PM, Eric Carlson wrote: On Jun 9, 2011, at 12:02 AM, Silvia Pfeiffer wrote: On Thu, Jun 9, 2011 at 4:34 PM, Simon Pieters sim...@opera.commailto:sim...@opera.com wrote: On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.commailto:silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? It's not about what the spec supports, but what real-world streams provide. I don't think it makes sense to put an event on every single type of metadata that can change. Most of the time, when you have a stream change, many variables will change together, so a single event is a lot less events to raise. It's an event that signifies that the media framework has reset the video/audio decoding pipeline and loaded a whole bunch of new stuff. You should imagine it as a concatenation of different media resources. And yes, they can have different track constitution and different audio sampling rate (which the audio API will care about) etc etc. In addition, it is possible for a stream to lose or gain an audio track. In this case the dimensions won't change but a script may want to react to the change in audioTracks. The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. Are we talking about the same thing ? There is no TrackList array and TrackList is only used for audio/video, not text, so I don't understand the comment about cues. I'm talking about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which is the base class for MultipleTrackList and ExclusiveTrackList used to represent all the audio and video tracks (respectively). One instance of the object represents all the tracks, so I would assume that a change in the number of tracks is a change to this object. I agree with Silvia, a more generic metadata changed event makes more sense. Yes, and it should support the case in which text tracks are added/removed too. Yes, it needs to be an event on the MediaElement. Also, as Eric (C) pointed out, one of the things which can change is which of seve ral available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? I think of the different adaptive versions on a per-track basis (i.e. the alternatives are *within* each track), not a bunch of alternatives each of which contains several tracks. Both are possible, of course. It's certainly possible (indeed common) for different bitrate video encodings to have different resolutions - there are video encoding reasons to do this. Of course the aspect ratio should not change and nor should the dimensions on the screen (both would be a little peculiar for the user). Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not the same as the resolution (for a start, they are in CSS pixels, which
Re: [whatwg] Video feedback
On Mon, Jun 20, 2011 at 7:31 PM, Mark Watson wats...@netflix.com wrote: The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. Are we talking about the same thing ? There is no TrackList array and TrackList is only used for audio/video, not text, so I don't understand the comment about cues. I'm talking about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which is the base class for MultipleTrackList and ExclusiveTrackList used to represent all the audio and video tracks (respectively). One instance of the object represents all the tracks, so I would assume that a change in the number of tracks is a change to this object. Ah yes, you're right: I got confused. It says Whenever the selected track is changed, the user agent must queue a task to fire a simple event named change at the MultipleTrackList object. This means it fires when the selectedIndex is changed, i.e. the user chooses a different track for rendering. I still don't think it relates to changes in the composition of tracks of a resource. That should be something different and should probably be on the MediaElement and not on the track list to also cover changes in text tracks. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? I think of the different adaptive versions on a per-track basis (i.e. the alternatives are *within* each track), not a bunch of alternatives each of which contains several tracks. Both are possible, of course. It's certainly possible (indeed common) for different bitrate video encodings to have different resolutions - there are video encoding reasons to do this. Of course the aspect ratio should not change and nor should the dimensions on the screen (both would be a little peculiar for the user). Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not the same as the resolution (for a start, they are in CSS pixels, which are square), but I think it quite likely that if the resolution of the video changes than the videoWidth and videoHeight might change. I'd be interested to hear how existing implementations relate resolution to videoWidth and videoHeight. Well, if videoWidth and videoHeight change and no dimensions on the video are provided through CSS, then surely the video will change size and the display will shrink. That would be a terrible user experience. For that reason I would suggest that such a change not be made in alternative adaptive streams. Different video dimensions should be provided through the source element and @media attribute, but within an adaptive stream, the alternatives should be consistent because the target device won't change. I guess this is a discussion for another thread... :-) Possibly ;-) The device knows much better than the page author what capabilities it has and so what resolutions are suitable for the device. So it is better to provide all the alternatives as a single resource and have the device work out which subset it can support. Or at least, the list should be provided all at the same level - there is no rationale for a hierarchy of alternatives. The way in which HTML deals with different devices and their different capabilities is through media queries. As a author you provide your content with different versions of media-dependent style sheets and content, so that when you view the page with a different device, the capabilities of the device select the right style sheet and
Re: [whatwg] Video feedback
On Jun 20, 2011, at 11:52 AM, Silvia Pfeiffer wrote: On Mon, Jun 20, 2011 at 7:31 PM, Mark Watson wats...@netflix.com wrote: The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. Are we talking about the same thing ? There is no TrackList array and TrackList is only used for audio/video, not text, so I don't understand the comment about cues. I'm talking about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which is the base class for MultipleTrackList and ExclusiveTrackList used to represent all the audio and video tracks (respectively). One instance of the object represents all the tracks, so I would assume that a change in the number of tracks is a change to this object. Ah yes, you're right: I got confused. It says Whenever the selected track is changed, the user agent must queue a task to fire a simple event named change at the MultipleTrackList object. This means it fires when the selectedIndex is changed, i.e. the user chooses a different track for rendering. I still don't think it relates to changes in the composition of tracks of a resource. That should be something different and should probably be on the MediaElement and not on the track list to also cover changes in text tracks. Fair enough. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? I think of the different adaptive versions on a per-track basis (i.e. the alternatives are *within* each track), not a bunch of alternatives each of which contains several tracks. Both are possible, of course. It's certainly possible (indeed common) for different bitrate video encodings to have different resolutions - there are video encoding reasons to do this. Of course the aspect ratio should not change and nor should the dimensions on the screen (both would be a little peculiar for the user). Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not the same as the resolution (for a start, they are in CSS pixels, which are square), but I think it quite likely that if the resolution of the video changes than the videoWidth and videoHeight might change. I'd be interested to hear how existing implementations relate resolution to videoWidth and videoHeight. Well, if videoWidth and videoHeight change and no dimensions on the video are provided through CSS, then surely the video will change size and the display will shrink. That would be a terrible user experience. For that reason I would suggest that such a change not be made in alternative adaptive streams. That seems backwards to me! I would say For that reason I would suggest that dimensions are provided through CSS or through the width and height attributes. Alternatively, we change the specification of the video element to accommodate this aspect of adaptive streaming (for example, the videoWidth and videoHeight could be defined to be based on the highest resolution bitrate being considered.) There are good video encoding reasons for different bitrates to be encoded at different resolutions which are far more important than any reasons not to do either of the above. Different video dimensions should be provided through the source element and @media attribute, but within an adaptive stream, the alternatives should be consistent because the target device won't change. I guess this is a discussion for another thread... :-) Possibly ;-) The device knows much
Re: [whatwg] Video feedback
On Tue, Jun 21, 2011 at 12:07 AM, Mark Watson wats...@netflix.com wrote: On Jun 20, 2011, at 11:52 AM, Silvia Pfeiffer wrote: On Mon, Jun 20, 2011 at 7:31 PM, Mark Watson wats...@netflix.com wrote: The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. Are we talking about the same thing ? There is no TrackList array and TrackList is only used for audio/video, not text, so I don't understand the comment about cues. I'm talking about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which is the base class for MultipleTrackList and ExclusiveTrackList used to represent all the audio and video tracks (respectively). One instance of the object represents all the tracks, so I would assume that a change in the number of tracks is a change to this object. Ah yes, you're right: I got confused. It says Whenever the selected track is changed, the user agent must queue a task to fire a simple event named change at the MultipleTrackList object. This means it fires when the selectedIndex is changed, i.e. the user chooses a different track for rendering. I still don't think it relates to changes in the composition of tracks of a resource. That should be something different and should probably be on the MediaElement and not on the track list to also cover changes in text tracks. Fair enough. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? I think of the different adaptive versions on a per-track basis (i.e. the alternatives are *within* each track), not a bunch of alternatives each of which contains several tracks. Both are possible, of course. It's certainly possible (indeed common) for different bitrate video encodings to have different resolutions - there are video encoding reasons to do this. Of course the aspect ratio should not change and nor should the dimensions on the screen (both would be a little peculiar for the user). Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not the same as the resolution (for a start, they are in CSS pixels, which are square), but I think it quite likely that if the resolution of the video changes than the videoWidth and videoHeight might change. I'd be interested to hear how existing implementations relate resolution to videoWidth and videoHeight. Well, if videoWidth and videoHeight change and no dimensions on the video are provided through CSS, then surely the video will change size and the display will shrink. That would be a terrible user experience. For that reason I would suggest that such a change not be made in alternative adaptive streams. That seems backwards to me! I would say For that reason I would suggest that dimensions are provided through CSS or through the width and height attributes. Alternatively, we change the specification of the video element to accommodate this aspect of adaptive streaming (for example, the videoWidth and videoHeight could be defined to be based on the highest resolution bitrate being considered.) There are good video encoding reasons for different bitrates to be encoded at different resolutions which are far more important than any reasons not to do either of the above. Different video dimensions should be provided through the source element and @media attribute, but within an adaptive stream, the alternatives should be consistent because the target device won't change. I guess this is a
Re: [whatwg] Video feedback
On Jun 20, 2011, at 5:28 PM, Silvia Pfeiffer wrote: On Tue, Jun 21, 2011 at 12:07 AM, Mark Watson wats...@netflix.com wrote: On Jun 20, 2011, at 11:52 AM, Silvia Pfeiffer wrote: On Mon, Jun 20, 2011 at 7:31 PM, Mark Watson wats...@netflix.com wrote: The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. Are we talking about the same thing ? There is no TrackList array and TrackList is only used for audio/video, not text, so I don't understand the comment about cues. I'm talking about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which is the base class for MultipleTrackList and ExclusiveTrackList used to represent all the audio and video tracks (respectively). One instance of the object represents all the tracks, so I would assume that a change in the number of tracks is a change to this object. Ah yes, you're right: I got confused. It says Whenever the selected track is changed, the user agent must queue a task to fire a simple event named change at the MultipleTrackList object. This means it fires when the selectedIndex is changed, i.e. the user chooses a different track for rendering. I still don't think it relates to changes in the composition of tracks of a resource. That should be something different and should probably be on the MediaElement and not on the track list to also cover changes in text tracks. Fair enough. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? I think of the different adaptive versions on a per-track basis (i.e. the alternatives are *within* each track), not a bunch of alternatives each of which contains several tracks. Both are possible, of course. It's certainly possible (indeed common) for different bitrate video encodings to have different resolutions - there are video encoding reasons to do this. Of course the aspect ratio should not change and nor should the dimensions on the screen (both would be a little peculiar for the user). Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not the same as the resolution (for a start, they are in CSS pixels, which are square), but I think it quite likely that if the resolution of the video changes than the videoWidth and videoHeight might change. I'd be interested to hear how existing implementations relate resolution to videoWidth and videoHeight. Well, if videoWidth and videoHeight change and no dimensions on the video are provided through CSS, then surely the video will change size and the display will shrink. That would be a terrible user experience. For that reason I would suggest that such a change not be made in alternative adaptive streams. That seems backwards to me! I would say For that reason I would suggest that dimensions are provided through CSS or through the width and height attributes. Alternatively, we change the specification of the video element to accommodate this aspect of adaptive streaming (for example, the videoWidth and videoHeight could be defined to be based on the highest resolution bitrate being considered.) There are good video encoding reasons for different bitrates to be encoded at different resolutions which are far more important than any reasons not to do either of the above. Different video dimensions should be provided through the source element and @media attribute, but within an adaptive stream, the alternatives
Re: [whatwg] Video feedback
On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? -- Simon Pieters Opera Software
Re: [whatwg] Video feedback
On Thu, Jun 9, 2011 at 4:34 PM, Simon Pieters sim...@opera.com wrote: On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? It's not about what the spec supports, but what real-world streams provide. I don't think it makes sense to put an event on every single type of metadata that can change. Most of the time, when you have a stream change, many variables will change together, so a single event is a lot less events to raise. It's an event that signifies that the media framework has reset the video/audio decoding pipeline and loaded a whole bunch of new stuff. You should imagine it as a concatenation of different media resources. And yes, they can have different track constitution and different audio sampling rate (which the audio API will care about) etc etc. The durationchange is a different type of event. It has not much to do with having a change of a media format, but more one with getting new information that more data is available than previously expected. It's one that allows streaming of long video resources, even if they are just a of a single encoding setting. In contrast what we are talking about is that the encoding settings change mid-stream. Cheers, Silvia.
Re: [whatwg] Video feedback
On Jun 9, 2011, at 12:02 AM, Silvia Pfeiffer wrote: On Thu, Jun 9, 2011 at 4:34 PM, Simon Pieters sim...@opera.com wrote: On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? It's not about what the spec supports, but what real-world streams provide. I don't think it makes sense to put an event on every single type of metadata that can change. Most of the time, when you have a stream change, many variables will change together, so a single event is a lot less events to raise. It's an event that signifies that the media framework has reset the video/audio decoding pipeline and loaded a whole bunch of new stuff. You should imagine it as a concatenation of different media resources. And yes, they can have different track constitution and different audio sampling rate (which the audio API will care about) etc etc. In addition, it is possible for a stream to lose or gain an audio track. In this case the dimensions won't change but a script may want to react to the change in audioTracks. I agree with Silvia, a more generic metadata changed event makes more sense. eric
Re: [whatwg] Video feedback
On Wed, 08 Jun 2011 02:46:15 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Jun 7, 2011 at 7:04 PM, Philip Jägenstedt phil...@opera.com wrote: On Sat, 04 Jun 2011 03:39:58 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson i...@hixie.ch wrote: On Thu, 16 Dec 2010, Silvia Pfeiffer wrote: I do not know how technically the change of stream composition works in MPEG, but in Ogg we have to end a current stream and start a new one to switch compositions. This has been called sequential multiplexing or chaining. In this case, stream setup information is repeated, which would probably lead to creating a new steam handler and possibly a new firing of loadedmetadata. I am not sure how chaining is implemented in browsers. Per spec, chaining isn't currently supported. The closest thing I can find in the spec to this situation is handling a non-fatal error, which causes the unexpected content to be ignored. On Fri, 17 Dec 2010, Eric Winkelman wrote: The short answer for changing stream composition is that there is a Program Map Table (PMT) that is repeated every 100 milliseconds and describes the content of the stream. Depending on the programming, the stream's composition could change entering/exiting every advertisement. If this is something that browser vendors want to support, I can specify how to handle it. Anyone? Icecast streams have chained files, so streaming Ogg to an audio element would hit this problem. There is a bug in FF for this: https://bugzilla.mozilla.org/show_bug.cgi?id=455165 (and a duplicate bug at https://bugzilla.mozilla.org/show_bug.cgi?id=611519). There's also a webkit bug for icecast streaming, which is probably related https://bugs.webkit.org/show_bug.cgi?id=42750 . I'm not sure how Opera is able to deal with icecast streams, but it seems to deal with it. The thing is: you can implement playback and seeking without any further changes to the spec. But then the browser-internal metadata states will change depending on the chunk you're on. Should that also update the exposed metadata in the API then? Probably yes, because otherwise the JS developer may deal with contradictory information. Maybe we need a metadatachange event for this? An Icecast stream is conceptually just one infinite audio stream, even though at the container level it is several chained Ogg streams. duration will be Infinity and currentTime will be constantly increasing. This doesn't seem to be a case where any spec change is needed. Am I missing something? That is all correct. However, because it is a sequence of Ogg streams, there are new Ogg headers in the middle. These new Ogg headers will lead to new metadata loaded in the media framework - e.g. because the new Ogg stream is encoded with a different audio sampling rate and a different video width/height etc. So, therefore, the metadata in the media framework changes. However, what the browser reports to the JS developer doesn't change. Or if it does change, the JS developer is not informed of it because it is a single infinite audio (or video) stream. Thus the question whether we need a new metadatachange event to expose this to the JS developer. It would then also signify that potentially the number of tracks that are available may have changed and other such information. Nothing exposed via the current API would change, AFAICT. I agree that if we start exposing things like sampling rate or want to support arbitrary chained Ogg, then there is a problem. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video feedback
On Wed, Jun 8, 2011 at 6:14 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 08 Jun 2011 02:46:15 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Tue, Jun 7, 2011 at 7:04 PM, Philip Jägenstedt phil...@opera.com wrote: On Sat, 04 Jun 2011 03:39:58 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson i...@hixie.ch wrote: On Thu, 16 Dec 2010, Silvia Pfeiffer wrote: I do not know how technically the change of stream composition works in MPEG, but in Ogg we have to end a current stream and start a new one to switch compositions. This has been called sequential multiplexing or chaining. In this case, stream setup information is repeated, which would probably lead to creating a new steam handler and possibly a new firing of loadedmetadata. I am not sure how chaining is implemented in browsers. Per spec, chaining isn't currently supported. The closest thing I can find in the spec to this situation is handling a non-fatal error, which causes the unexpected content to be ignored. On Fri, 17 Dec 2010, Eric Winkelman wrote: The short answer for changing stream composition is that there is a Program Map Table (PMT) that is repeated every 100 milliseconds and describes the content of the stream. Depending on the programming, the stream's composition could change entering/exiting every advertisement. If this is something that browser vendors want to support, I can specify how to handle it. Anyone? Icecast streams have chained files, so streaming Ogg to an audio element would hit this problem. There is a bug in FF for this: https://bugzilla.mozilla.org/show_bug.cgi?id=455165 (and a duplicate bug at https://bugzilla.mozilla.org/show_bug.cgi?id=611519). There's also a webkit bug for icecast streaming, which is probably related https://bugs.webkit.org/show_bug.cgi?id=42750 . I'm not sure how Opera is able to deal with icecast streams, but it seems to deal with it. The thing is: you can implement playback and seeking without any further changes to the spec. But then the browser-internal metadata states will change depending on the chunk you're on. Should that also update the exposed metadata in the API then? Probably yes, because otherwise the JS developer may deal with contradictory information. Maybe we need a metadatachange event for this? An Icecast stream is conceptually just one infinite audio stream, even though at the container level it is several chained Ogg streams. duration will be Infinity and currentTime will be constantly increasing. This doesn't seem to be a case where any spec change is needed. Am I missing something? That is all correct. However, because it is a sequence of Ogg streams, there are new Ogg headers in the middle. These new Ogg headers will lead to new metadata loaded in the media framework - e.g. because the new Ogg stream is encoded with a different audio sampling rate and a different video width/height etc. So, therefore, the metadata in the media framework changes. However, what the browser reports to the JS developer doesn't change. Or if it does change, the JS developer is not informed of it because it is a single infinite audio (or video) stream. Thus the question whether we need a new metadatachange event to expose this to the JS developer. It would then also signify that potentially the number of tracks that are available may have changed and other such information. Nothing exposed via the current API would change, AFAICT. Thus, after a change mid-stream to, say, a smaller video width and height, would the video.videoWidth and video.videoHeight attributes represent the width and height of the previous stream or the current one? I agree that if we start exposing things like sampling rate or want to support arbitrary chained Ogg, then there is a problem. I think we already have a problem with width and height for chained Ogg and we cannot stop people from putting chained Ogg into the @src. I actually took this discussion away from MPEG PTM, which is where Eric's question came from, because I don't understand how it works with MPEG. But I can see that it's not just a problem of MPEG, but also of Ogg (and possibly of WebM which can have multiple Segments). So, I think we need a generic solution for it. Cheers, Silvia.
Re: [whatwg] Video feedback
On Wed, 08 Jun 2011 12:35:24 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Jun 8, 2011 at 6:14 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 08 Jun 2011 02:46:15 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: That is all correct. However, because it is a sequence of Ogg streams, there are new Ogg headers in the middle. These new Ogg headers will lead to new metadata loaded in the media framework - e.g. because the new Ogg stream is encoded with a different audio sampling rate and a different video width/height etc. So, therefore, the metadata in the media framework changes. However, what the browser reports to the JS developer doesn't change. Or if it does change, the JS developer is not informed of it because it is a single infinite audio (or video) stream. Thus the question whether we need a new metadatachange event to expose this to the JS developer. It would then also signify that potentially the number of tracks that are available may have changed and other such information. Nothing exposed via the current API would change, AFAICT. Thus, after a change mid-stream to, say, a smaller video width and height, would the video.videoWidth and video.videoHeight attributes represent the width and height of the previous stream or the current one? I agree that if we start exposing things like sampling rate or want to support arbitrary chained Ogg, then there is a problem. I think we already have a problem with width and height for chained Ogg and we cannot stop people from putting chained Ogg into the @src. I actually took this discussion away from MPEG PTM, which is where Eric's question came from, because I don't understand how it works with MPEG. But I can see that it's not just a problem of MPEG, but also of Ogg (and possibly of WebM which can have multiple Segments). So, I think we need a generic solution for it. OK, I don't think we disagree. I'm just saying that for Icecast audio streams, there is no problem. As for Ogg and WebM, I'm inclined to say that we just shouldn't support that, unless there's some compelling use case for it. There's also the option of tweaking the muxers so that all the streams are known up-front, even if there won't be any data arriving for them until half-way through the file. I also know nothing about MPEG or the use cases involved, so no opinions there. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video feedback
On Wed, Jun 8, 2011 at 9:18 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 08 Jun 2011 12:35:24 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Jun 8, 2011 at 6:14 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 08 Jun 2011 02:46:15 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: That is all correct. However, because it is a sequence of Ogg streams, there are new Ogg headers in the middle. These new Ogg headers will lead to new metadata loaded in the media framework - e.g. because the new Ogg stream is encoded with a different audio sampling rate and a different video width/height etc. So, therefore, the metadata in the media framework changes. However, what the browser reports to the JS developer doesn't change. Or if it does change, the JS developer is not informed of it because it is a single infinite audio (or video) stream. Thus the question whether we need a new metadatachange event to expose this to the JS developer. It would then also signify that potentially the number of tracks that are available may have changed and other such information. Nothing exposed via the current API would change, AFAICT. Thus, after a change mid-stream to, say, a smaller video width and height, would the video.videoWidth and video.videoHeight attributes represent the width and height of the previous stream or the current one? I agree that if we start exposing things like sampling rate or want to support arbitrary chained Ogg, then there is a problem. I think we already have a problem with width and height for chained Ogg and we cannot stop people from putting chained Ogg into the @src. I actually took this discussion away from MPEG PTM, which is where Eric's question came from, because I don't understand how it works with MPEG. But I can see that it's not just a problem of MPEG, but also of Ogg (and possibly of WebM which can have multiple Segments). So, I think we need a generic solution for it. OK, I don't think we disagree. I'm just saying that for Icecast audio streams, there is no problem. Hmm.. because there is nothing in the API that actually exposes audio metadata? As for Ogg and WebM, I'm inclined to say that we just shouldn't support that, unless there's some compelling use case for it. You know that you can also transmit video with icecast...? Silvia. There's also the option of tweaking the muxers so that all the streams are known up-front, even if there won't be any data arriving for them until half-way through the file. I also know nothing about MPEG or the use cases involved, so no opinions there. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video feedback
On Wed, 08 Jun 2011 13:38:18 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Jun 8, 2011 at 9:18 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 08 Jun 2011 12:35:24 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Wed, Jun 8, 2011 at 6:14 PM, Philip Jägenstedt phil...@opera.com wrote: On Wed, 08 Jun 2011 02:46:15 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: That is all correct. However, because it is a sequence of Ogg streams, there are new Ogg headers in the middle. These new Ogg headers will lead to new metadata loaded in the media framework - e.g. because the new Ogg stream is encoded with a different audio sampling rate and a different video width/height etc. So, therefore, the metadata in the media framework changes. However, what the browser reports to the JS developer doesn't change. Or if it does change, the JS developer is not informed of it because it is a single infinite audio (or video) stream. Thus the question whether we need a new metadatachange event to expose this to the JS developer. It would then also signify that potentially the number of tracks that are available may have changed and other such information. Nothing exposed via the current API would change, AFAICT. Thus, after a change mid-stream to, say, a smaller video width and height, would the video.videoWidth and video.videoHeight attributes represent the width and height of the previous stream or the current one? I agree that if we start exposing things like sampling rate or want to support arbitrary chained Ogg, then there is a problem. I think we already have a problem with width and height for chained Ogg and we cannot stop people from putting chained Ogg into the @src. I actually took this discussion away from MPEG PTM, which is where Eric's question came from, because I don't understand how it works with MPEG. But I can see that it's not just a problem of MPEG, but also of Ogg (and possibly of WebM which can have multiple Segments). So, I think we need a generic solution for it. OK, I don't think we disagree. I'm just saying that for Icecast audio streams, there is no problem. Hmm.. because there is nothing in the API that actually exposes audio metadata? Yes. As for Ogg and WebM, I'm inclined to say that we just shouldn't support that, unless there's some compelling use case for it. You know that you can also transmit video with icecast...? Nope :) I guess that invalidates everything I've said about Icecast. Practically, though, no one is using Icecast to mix audio tracks with audio+video tracks and getting upset that it doesn't work in browsers, right? -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video feedback
On Jun 8, 2011, at 3:35 AM, Silvia Pfeiffer wrote: Nothing exposed via the current API would change, AFAICT. Thus, after a change mid-stream to, say, a smaller video width and height, would the video.videoWidth and video.videoHeight attributes represent the width and height of the previous stream or the current one? I agree that if we start exposing things like sampling rate or want to support arbitrary chained Ogg, then there is a problem. I think we already have a problem with width and height for chained Ogg and we cannot stop people from putting chained Ogg into the @src. I actually took this discussion away from MPEG PTM, which is where Eric's question came from, because I don't understand how it works with MPEG. But I can see that it's not just a problem of MPEG, but also of Ogg (and possibly of WebM which can have multiple Segments). So, I think we need a generic solution for it. The characteristics of an Apple HTTP live stream can change on the fly. For example if the user's bandwidth to the streaming server changes, the video width and height can change as the stream resolution is switched up or down, or the number of tracks can change when a stream switches from video+audio to audio only. In addition, a server can insert segments with different characteristics into a stream on the fly, eg. inserting an ad or emergency announcement. It is not possible to predict these changes before they occur. eric
Re: [whatwg] Video feedback
-Original Message- From: whatwg-boun...@lists.whatwg.org [mailto:whatwg- boun...@lists.whatwg.org] On Behalf Of Eric Carlson Sent: Wednesday, June 08, 2011 9:34 AM To: Silvia Pfeiffer; Philip Jägenstedt Cc: whatwg@lists.whatwg.org Subject: Re: [whatwg] Video feedback On Jun 8, 2011, at 3:35 AM, Silvia Pfeiffer wrote: Nothing exposed via the current API would change, AFAICT. Thus, after a change mid-stream to, say, a smaller video width and height, would the video.videoWidth and video.videoHeight attributes represent the width and height of the previous stream or the current one? I agree that if we start exposing things like sampling rate or want to support arbitrary chained Ogg, then there is a problem. I think we already have a problem with width and height for chained Ogg and we cannot stop people from putting chained Ogg into the @src. I actually took this discussion away from MPEG PTM, which is where Eric's question came from, because I don't understand how it works with MPEG. But I can see that it's not just a problem of MPEG, but also of Ogg (and possibly of WebM which can have multiple Segments). So, I think we need a generic solution for it. The characteristics of an Apple HTTP live stream can change on the fly. For example if the user's bandwidth to the streaming server changes, the video width and height can change as the stream resolution is switched up or down, or the number of tracks can change when a stream switches from video+audio to audio only. In addition, a server can insert segments with different characteristics into a stream on the fly, eg. inserting an ad or emergency announcement. It is not possible to predict these changes before they occur. eric For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. Bob Lund
Re: [whatwg] Video feedback
On Thu, Jun 9, 2011 at 1:57 AM, Bob Lund b.l...@cablelabs.com wrote: -Original Message- From: whatwg-boun...@lists.whatwg.org [mailto:whatwg- boun...@lists.whatwg.org] On Behalf Of Eric Carlson Sent: Wednesday, June 08, 2011 9:34 AM To: Silvia Pfeiffer; Philip Jägenstedt Cc: whatwg@lists.whatwg.org Subject: Re: [whatwg] Video feedback On Jun 8, 2011, at 3:35 AM, Silvia Pfeiffer wrote: Nothing exposed via the current API would change, AFAICT. Thus, after a change mid-stream to, say, a smaller video width and height, would the video.videoWidth and video.videoHeight attributes represent the width and height of the previous stream or the current one? I agree that if we start exposing things like sampling rate or want to support arbitrary chained Ogg, then there is a problem. I think we already have a problem with width and height for chained Ogg and we cannot stop people from putting chained Ogg into the @src. I actually took this discussion away from MPEG PTM, which is where Eric's question came from, because I don't understand how it works with MPEG. But I can see that it's not just a problem of MPEG, but also of Ogg (and possibly of WebM which can have multiple Segments). So, I think we need a generic solution for it. The characteristics of an Apple HTTP live stream can change on the fly. For example if the user's bandwidth to the streaming server changes, the video width and height can change as the stream resolution is switched up or down, or the number of tracks can change when a stream switches from video+audio to audio only. In addition, a server can insert segments with different characteristics into a stream on the fly, eg. inserting an ad or emergency announcement. It is not possible to predict these changes before they occur. eric For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. Silvia.
Re: [whatwg] Video feedback
On Sat, 04 Jun 2011 03:39:58 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson i...@hixie.ch wrote: On Thu, 16 Dec 2010, Silvia Pfeiffer wrote: I do not know how technically the change of stream composition works in MPEG, but in Ogg we have to end a current stream and start a new one to switch compositions. This has been called sequential multiplexing or chaining. In this case, stream setup information is repeated, which would probably lead to creating a new steam handler and possibly a new firing of loadedmetadata. I am not sure how chaining is implemented in browsers. Per spec, chaining isn't currently supported. The closest thing I can find in the spec to this situation is handling a non-fatal error, which causes the unexpected content to be ignored. On Fri, 17 Dec 2010, Eric Winkelman wrote: The short answer for changing stream composition is that there is a Program Map Table (PMT) that is repeated every 100 milliseconds and describes the content of the stream. Depending on the programming, the stream's composition could change entering/exiting every advertisement. If this is something that browser vendors want to support, I can specify how to handle it. Anyone? Icecast streams have chained files, so streaming Ogg to an audio element would hit this problem. There is a bug in FF for this: https://bugzilla.mozilla.org/show_bug.cgi?id=455165 (and a duplicate bug at https://bugzilla.mozilla.org/show_bug.cgi?id=611519). There's also a webkit bug for icecast streaming, which is probably related https://bugs.webkit.org/show_bug.cgi?id=42750 . I'm not sure how Opera is able to deal with icecast streams, but it seems to deal with it. The thing is: you can implement playback and seeking without any further changes to the spec. But then the browser-internal metadata states will change depending on the chunk you're on. Should that also update the exposed metadata in the API then? Probably yes, because otherwise the JS developer may deal with contradictory information. Maybe we need a metadatachange event for this? An Icecast stream is conceptually just one infinite audio stream, even though at the container level it is several chained Ogg streams. duration will be Infinity and currentTime will be constantly increasing. This doesn't seem to be a case where any spec change is needed. Am I missing something? -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video feedback
On Fri, 03 Jun 2011 01:28:45 +0200, Ian Hickson i...@hixie.ch wrote: On Fri, 22 Oct 2010, Simon Pieters wrote: Actually it was me, but that's OK :) There was also some discussion about metadata. Language is sometimes necessary for the font engine to pick the right glyph. Could you elaborate on this? My assumption was that we'd just use CSS, which doesn't rely on language for this. It's not in any spec that I'm aware of, but some browsers (including Opera) pick different glyphs depending on the language of the text, which really helps when rendering CJK when you have several CJK fonts on the system. Browsers will already know the language from track srclang, so this would be for external players. How is this problem solved in SRT players today? Not at all, it seems. Both VLC and Totem allow setting the character encoding and font used for subtitles in the (global) preferences menu, so presumably you would change that if the default doesn't work. Font switching seems to mainly be an issue when your system has other default fonts than the text you're reading, and it appears that is rare enough that very little software does anything about it, browsers perhaps being an exception. On Mon, 3 Jan 2011, Philip Jägenstedt wrote: * The bad cue handling is stricter than it should be. After collecting an id, the next line must be a timestamp line. Otherwise, we skip everything until a blank line, so in the following the parser would jump to bad cue on line 2 and skip the whole cue. 1 2 00:00:00.000 -- 00:00:01.000 Bla This doesn't match what most existing SRT parsers do, as they simply look for timing lines and ignore everything else. If we really need to collect the id instead of ignoring it like everyone else, this should be more robust, so that a valid timing line always begins a new cue. Personally, I'd prefer if it is simply ignored and that we use some form of in-cue markup for styling hooks. The IDs are useful for referencing cues from script, so I haven't removed them. I've also left the parsing as is for when neither the first nor second line is a timing line, since that gives us a lot of headroom for future extensions (we can do anything so long as the second line doesn't start with a timestamp and -- and another timestamp). In the case of feeding future extensions to current parsers, it's way better fallback behavior to simply ignore the unrecognized second line than to discard the entire cue. The current behavior seems unnecessarily strict and makes the parser more complicated than it needs to be. My preference is just ignore anything preceding the timing line, but even if we must have IDs it can still be made simpler and more robust than what is currently spec'ed. If we just ignore content until we hit a line that happens to look like a timing line, then we are much more constrained in what we can do in the future. For example, we couldn't introduce a comment block syntax, since any comment containing a timing line wouldn't be ignored. On the other hand if we keep the syntax as it is now, we can introduce a comment block just by having its first line include a -- but not have it match the timestamp syntax, e.g. by having it be -- COMMENT or some such. One of us must be confused, do you mean something like this? 1 -- COMMENT 00:00.000 -- 00:01.000 Cue text Adding this syntax would break the *current* parser, as it would fail in step 39 (Collect WebVTT cue timings and settings) and then skip the rest of the cue. If we want any room for extensions along these lines, then multiple lines preceding the timing line must be handled gracefully. Looking at the parser more closely, I don't really see how doing anything more complex than skipping the block entirely would be simpler than what we have now, anyway. I suggest: * Step 31: Try to collect WebVTT cue timings and settings instead of checking for the substring --. If it succeeds, jump to what is now step 40. If it fails, continue at what is now step 32. (This allows adding any syntax as long as it doesn't exactly match a timing line, including -- COMMENT. As a bonus, one can fail faster when trying to parse an entire timing line rather than doing a substring search for --.) * Step 32: Only set the id line if it's not already set. (Assuming we want the first line to be the id line in future extensions.) * Step 39: Jump to the new step 31. In case not every detail is correct, the idea is to first try to match a timing line and to take the first line that is not a timing line (if any) as the id, leaving everything in between open for future syntax changes, even if they use --. I think it's fairly important that we handle this. Double id lines is an easy mistake to make when copying things around. Silently dropping those cues would be worse than what many existing (line-based, id-ignoring) SRT parsers do. On Sat, 22 Jan 2011,
Re: [whatwg] Video feedback
I'll be replying to WebVTT related stuff in a separate thread. Here just feedback on the other stuff. (Incidentally: why is there details element feedback in here with video? I don't really understand the connection.) On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson i...@hixie.ch wrote: On Thu, 16 Dec 2010, Silvia Pfeiffer wrote: I do not know how technically the change of stream composition works in MPEG, but in Ogg we have to end a current stream and start a new one to switch compositions. This has been called sequential multiplexing or chaining. In this case, stream setup information is repeated, which would probably lead to creating a new steam handler and possibly a new firing of loadedmetadata. I am not sure how chaining is implemented in browsers. Per spec, chaining isn't currently supported. The closest thing I can find in the spec to this situation is handling a non-fatal error, which causes the unexpected content to be ignored. On Fri, 17 Dec 2010, Eric Winkelman wrote: The short answer for changing stream composition is that there is a Program Map Table (PMT) that is repeated every 100 milliseconds and describes the content of the stream. Depending on the programming, the stream's composition could change entering/exiting every advertisement. If this is something that browser vendors want to support, I can specify how to handle it. Anyone? Icecast streams have chained files, so streaming Ogg to an audio element would hit this problem. There is a bug in FF for this: https://bugzilla.mozilla.org/show_bug.cgi?id=455165 (and a duplicate bug at https://bugzilla.mozilla.org/show_bug.cgi?id=611519). There's also a webkit bug for icecast streaming, which is probably related https://bugs.webkit.org/show_bug.cgi?id=42750 . I'm not sure how Opera is able to deal with icecast streams, but it seems to deal with it. The thing is: you can implement playback and seeking without any further changes to the spec. But then the browser-internal metadata states will change depending on the chunk you're on. Should that also update the exposed metadata in the API then? Probably yes, because otherwise the JS developer may deal with contradictory information. Maybe we need a metadatachange event for this? On Tue, 24 May 2011, Silvia Pfeiffer wrote: Ian and I had a brief conversation recently where I mentioned a problem with extended text descriptions with screen readers (and worse still with braille devices) and the suggestion was that the paused for user interaction state of a media element may be the solution. I would like to pick this up and discuss in detail how that would work to confirm my sketchy understanding. *The use case:* In the specification for media elements we have a track kind of descriptions, which are: Textual descriptions of the video component of the media resource, intended for audio synthesis when the visual component is unavailable (e.g. because the user is interacting with the application without a screen while driving, or because the user is blind). Synthesized as a separate audio track. I'm for now assuming that the synthesis will be done through a screen reader and not through the browser itself, thus making the descriptions available to users as synthesized audio or as braille if the screen reader is set up for a braille device. The textual descriptions are provided as chunks of text with a start and a end time (so-called cues). The cues are processed during video playback as the video's playback time starts to fall within the time frame of the cue. Thus, it is expected the that cues are consumed during the cue's time frame and are not present any more when the end time of the cue is reached, so they don't conflict with the video's normal audio. However, on many occasions, it is not possible to consume the cue text in the given time frame. In particular not in the following situations: 1. The screen reader takes longer to read out the cue text than the cue's time frame provides for. This is particularly the case with long cue text, but also when the screen reader's reading rate is slower than what the author of the cue text expected. 2. The braille device is used for reading. Since reading braille is much slower than listening to read-out text, the cue time frame will invariably be too short. 3. The user seeked right into the middle of a cue and thus the time frame that is available for reading out the cue text is shorter than the cue author calculated with. Correct me if I'm wrong, but it seems that what we need is a way for the screen reader to pause the video element from continuing to play while the screen reader is still busy delivering the cue text. (In a11y talk: what is required is a means to deal with extended descriptions, which extend the timeline of the video.) Once it's finished presenting, it can resume the video element's playback. Is it a requirement that the user be able to use the regular
Re: [whatwg] Video feedback
On Thu, Jun 2, 2011 at 7:28 PM, Ian Hickson i...@hixie.ch wrote: We can add comments pretty easily (e.g. we could say that ! starts a comment and ends it -- that's already being ignored by the current parser), if people really need them. But are comments really that useful? Did SRT have problem due to not supporting inline comments? (Or did it support inline comments?) I've only worked with SSA subtitles (fansubbing), where {text in braces} effectively worked as a comment. We used them a lot to communicate between editors on a phrase-by-phrase basis. But for that use case, using hidden spans makes more sense, since you can toggle them on and off to view them inline, etc. Given that, I'd be fine with a comment format that doesn't allow mid-cue comments, if it makes the format simpler. The text on the left is a transcription, the top is a transliteration, and the bottom is a translation. Aren't these three separate text tracks? They're all in the same track, in practice, since media players don't play multiple subtitle tracks. It's true that having them in separate tracks would be better, so they can be disabled individually. This is probably rare enough that it should just be sorted out with scripts, at least to start. It's not clear to me that we need language information to apply proper font selection and word wrapping, since CSS doesn't do it. But it doesn't have to, since HTML does this with @lang. Mixing one CJK language with one non-CJK language seems fine. That should always work, assuming you specify good fonts in the CSS. The font is ultimately in the user's control. I tell Firefox to always use Tahoma for Western text and MS Gothic for Japanese text, ignoring the often ugly site-specified fonts. The only control sites have over my fonts is the language they say the text is (or which the whole page is detected as). The same principle seems to apply for captions. (That's not to say that it's important enough to add yet and I'm fine with punting on this, at least for now. I just don't think specifying fonts is the right solution.) The most straightforward solution would seems to be having @lang be a CSS property; I don't know the rationale for this being done by HTML instead. I don't understand why we can't have good typography for CJK and non-CJK together. Surely there are fonts that get both right? I've never seen a Japanese font that didn't look terrible for English text. Also, I don't want my font selection to be severely limited due to the need to use a single font for both languages, instead of using the right font for the right text. One example of how this can be tricky: at 0:17, a caption on the bottom wraps and takes two lines, which then pushes the line at 0:19 upward (that part's simple enough). If instead the top part had appeared first, the renderer would need to figure out in advance to push it upwards, to make space for the two-line caption underneith it. Otherwise, the captions would be forced to switch places. Right, without lookahead I don't know how you'd solve it. With lookahead things get pretty dicey pretty quickly. The problem is that, at least here, the whole scene is nearly incomprehensible if the top/bottom arrangement isn't maintained. Lacking anything better, I suspect authors would use similar brittle hacks with WebVTT. Anyway, I don't have a simple solution either. I think that, no matter what you do, people will insert line breaks in cues. I'd follow the HTML model here: convert newlines to spaces and have a separate, explicit line break like br if needed, so people don't manually line-break unless they actually mean to. The line-breaks-are-line-breaks feature is one of the features that originally made SRT seem like a good idea. It still seems like the neatest way of having a line break. But does this matter? Line breaks within a cue are relatively uncommon in my experience (perhaps it's different for other languages), compared to how many people will insert line breaks in a text editor simply to break lines while authoring. If you do this while testing on a large monitor, it's likely to look reasonable when rendered; the brokenness won't show up until it's played in a smaller window. Anyone using a non-programmer's text editor that doesn't handle long lines cleanly is likely to do this. Wrapping lines manually in SRTs also appears to be common (even standard) practice, perhaps due to inadequate line wrapping in SRT renderers. Making line breaks explicit should help keep people from translating this habit to WebVTT. Related to line breaking, should there be an nbsp; escape? Inserting nbsp literally into files is somewhat annoying for authoring, since they're indistinguishable from regular spaces. How common would nbsp; be? I guess the main cases I've used nbsp for don't apply so much to captions, eg. ©nbsp;2011 (likely to come at the start of a caption, so not likely to be wrapped anyway). We
Re: [whatwg] video feedback
On Feb 9, 2010, at 9:03 PM, Ian Hickson wrote: On Sat, 31 Oct 2009, Brian Campbell wrote: As a multimedia developer, I am wondering about the purpose of the timeupdate event on media elements. It's primary use is keeping the UIs updated (specifically the timers and the scrubber bars). On first glance, it would appear that this event would be useful for synchronizing animations, bullets, captions, UI, and the like. Synchronising accompanying slides and animations won't work that well with an event, since you can't guarantee the timing of the event or anything like that. For anything where we want reliable synchronisation of multiple media, I think we need a more serious solution -- either something like SMIL, or the SMIL subset found in SVG, or some other solution. Yes, but that doesn't exist at the moment, so our current choices are to use timeupdate and to use setInterval(). At 4 timeupdate events per second, it isn't all that useful. I can replace it with setInterval, at whatever rate I want, query the time, and get the synchronization I need, but that makes the timeupdate event seem to be redundant. The important thing with timeupdate is that it also fires whenever the time changes in a significant way, e.g. immediately after a seek, or when reaching the end of the resource, etc. Also, the user agent can start lowering the rate in the face of high CPU load, which makes it more user-friendly than setInterval(). I agree, it is important to be able to reduce the rate in the face of high CPU load, but as currently implemented in WebKit, if you use timeupdate to keep anything in sync with the video, it feels fairly laggy and jerky. This means that for higher quality synchronization, you need to use setInterval, which defeats the purpose of making timeupdate more user friendly. Perhaps this is just a bug I should file to WebKit, as they are choosing an update interval at the extreme end of the allowed range for their default behavior; but I figured that it might make sense to mention a reasonable default value (such as 30 times per second, or once per frame displayed) in the spec, to give some guidance to browser vendors about what authors will be expecting. On Thu, 5 Nov 2009, Brian Campbell wrote: Would something like video firing events for every frame rendered help you out? This would help also fix the canvas over/under painting issue and improve synchronization. Yes, this would be considerably better than what is currently specced. There surely is a better solution than copying data from the video element to a canvas on every frame for whatever the problem that that solves is. What is the actual use case where you'd do that? This was not my use case (my use case was just synchronizing bullets, slide transitions, and animations to video), but an example I can think of is using this to composite video. Most (if not all) video formats supported by video in the various browsers do not store alpha channel information. In order to composite video against a dynamic background, authors may copy video data to a canvas, then paint transparent to all pixels matching a given color. This use case would clearly be better served by video formats that include alpha information, and implementations that support compositing video over other content, but given that we're having trouble finding any video format at all that the browsers can agree on, this seems to be a long way off, so stop-gap measures may be useful in the interim. Compositing video over dynamic content is actually an extremely important use case for rich, interactive multimedia, which I would like to encourage browser vendors to implement, but I'm not even sure where to start, given the situation on formats and codecs. I believe I've seen this discussed in Theora, but never went anywhere, and I don't have any idea how I'd even start getting involved in the MPEG standardization process. On Thu, 5 Nov 2009, Andrew Scherkus wrote: I'll see if we can do something for WebKit based browsers, because today it literally is hardcoded to 250ms for all ports. http://trac.webkit.org/browser/trunk/WebCore/html/HTMLMediaElement.cpp#L1254 Maybe we'll end up firing events based on frame updates for video, and something arbitrary for audio (as it is today). I strongly recommend making the ontimeupdate rate be sensitive to system load, and no faster than one frame per second. I'm assuming that you mean no faster than once per frame? On Fri, 6 Nov 2009, Philip Jägenstedt wrote: We've considered firing it for each frame, but there is one problem. If people expect that it fires once per frame they will probably write scripts which do frame-based animations by moving things n pixels per frame or similar. Some animations are just easier to do this way, so there's no reason to think that people won't do it. This will break horribly if a browser is
Re: [whatwg] video feedback
On Feb 10, 2010, at 8:01 AM, Brian Campbell wrote: On Feb 9, 2010, at 9:03 PM, Ian Hickson wrote: On Sat, 31 Oct 2009, Brian Campbell wrote: At 4 timeupdate events per second, it isn't all that useful. I can replace it with setInterval, at whatever rate I want, query the time, and get the synchronization I need, but that makes the timeupdate event seem to be redundant. The important thing with timeupdate is that it also fires whenever the time changes in a significant way, e.g. immediately after a seek, or when reaching the end of the resource, etc. Also, the user agent can start lowering the rate in the face of high CPU load, which makes it more user-friendly than setInterval(). I agree, it is important to be able to reduce the rate in the face of high CPU load, but as currently implemented in WebKit, if you use timeupdate to keep anything in sync with the video, it feels fairly laggy and jerky. This means that for higher quality synchronization, you need to use setInterval, which defeats the purpose of making timeupdate more user friendly. Perhaps this is just a bug I should file to WebKit, as they are choosing an update interval at the extreme end of the allowed range for their default behavior; but I figured that it might make sense to mention a reasonable default value (such as 30 times per second, or once per frame displayed) in the spec, to give some guidance to browser vendors about what authors will be expecting. I disagree that 30 times per second is a reasonable default. I understand that it would be useful for what you want to do, but your use case is not a typical. I think most pages won't listen for 'timeupdate' events at all so instead of making every page incur the extra overhead of waking up, allocating, queueing, and firing an event 30 times per second, WebKit sticks with the minimum frequency the spec mandates figuring that people like you that need something more can roll their own. On Thu, 5 Nov 2009, Brian Campbell wrote: Would something like video firing events for every frame rendered help you out? This would help also fix the canvas over/under painting issue and improve synchronization. Yes, this would be considerably better than what is currently specced. There surely is a better solution than copying data from the video element to a canvas on every frame for whatever the problem that that solves is. What is the actual use case where you'd do that? This was not my use case (my use case was just synchronizing bullets, slide transitions, and animations to video), but an example I can think of is using this to composite video. Most (if not all) video formats supported by video in the various browsers do not store alpha channel information. In order to composite video against a dynamic background, authors may copy video data to a canvas, then paint transparent to all pixels matching a given color. This use case would clearly be better served by video formats that include alpha information, and implementations that support compositing video over other content, but given that we're having trouble finding any video format at all that the browsers can agree on, this seems to be a long way off, so stop-gap measures may be useful in the interim. Compositing video over dynamic content is actually an extremely important use case for rich, interactive multimedia, which I would like to encourage browser vendors to implement, but I'm not even sure where to start, given the situation on formats and codecs. I believe I've seen this discussed in Theora, but never went anywhere, and I don't have any idea how I'd even start getting involved in the MPEG standardization process. Have you actually tried this? Rendering video frames to a canvas and processing every pixel from script is *extremely* processor intensive, you are unlikely to get reasonable frame rate. The H.262 does support alpha (see AVC spec 2nd edition, section 7.3.2.1.2 Sequence parameter set extension), but we do not support it correctly in WebKit at the moment. *Please* file bugs against WebKit if you would like to see this properly supported. QuickTime movies support alpha for a number of video formats (eg. png, animation, lossless, etc), you might give that a try. eric
Re: [whatwg] video feedback
On 2/10/10 1:37 PM, Eric Carlson wrote: Have you actually tried this? Rendering video frames to a canvas and processing every pixel from script is *extremely* processor intensive, you are unlikely to get reasonable frame rate. There's a demo that does just this at http://people.mozilla.com/~prouget/demos/green/green.xhtml -Boris
Re: [whatwg] video feedback
On Feb 10, 2010, at 1:37 PM, Eric Carlson wrote: On Feb 10, 2010, at 8:01 AM, Brian Campbell wrote: On Feb 9, 2010, at 9:03 PM, Ian Hickson wrote: On Sat, 31 Oct 2009, Brian Campbell wrote: At 4 timeupdate events per second, it isn't all that useful. I can replace it with setInterval, at whatever rate I want, query the time, and get the synchronization I need, but that makes the timeupdate event seem to be redundant. The important thing with timeupdate is that it also fires whenever the time changes in a significant way, e.g. immediately after a seek, or when reaching the end of the resource, etc. Also, the user agent can start lowering the rate in the face of high CPU load, which makes it more user-friendly than setInterval(). I agree, it is important to be able to reduce the rate in the face of high CPU load, but as currently implemented in WebKit, if you use timeupdate to keep anything in sync with the video, it feels fairly laggy and jerky. This means that for higher quality synchronization, you need to use setInterval, which defeats the purpose of making timeupdate more user friendly. Perhaps this is just a bug I should file to WebKit, as they are choosing an update interval at the extreme end of the allowed range for their default behavior; but I figured that it might make sense to mention a reasonable default value (such as 30 times per second, or once per frame displayed) in the spec, to give some guidance to browser vendors about what authors will be expecting. I disagree that 30 times per second is a reasonable default. I understand that it would be useful for what you want to do, but your use case is not a typical. I think most pages won't listen for 'timeupdate' events at all so instead of making every page incur the extra overhead of waking up, allocating, queueing, and firing an event 30 times per second, WebKit sticks with the minimum frequency the spec mandates figuring that people like you that need something more can roll their own. Do browsers fire events for which there are no listeners? It seems like it would be easiest to just not fire these events if no one is listening to them. And as Ian pointed out, just basic video UI can be better served by having at least 10 updates per second, if you want to show time at a resolution of tenths of a second. On Thu, 5 Nov 2009, Brian Campbell wrote: Would something like video firing events for every frame rendered help you out? This would help also fix the canvas over/under painting issue and improve synchronization. Yes, this would be considerably better than what is currently specced. There surely is a better solution than copying data from the video element to a canvas on every frame for whatever the problem that that solves is. What is the actual use case where you'd do that? This was not my use case (my use case was just synchronizing bullets, slide transitions, and animations to video), but an example I can think of is using this to composite video. Most (if not all) video formats supported by video in the various browsers do not store alpha channel information. In order to composite video against a dynamic background, authors may copy video data to a canvas, then paint transparent to all pixels matching a given color. This use case would clearly be better served by video formats that include alpha information, and implementations that support compositing video over other content, but given that we're having trouble finding any video format at all that the browsers can agree on, this seems to be a long way off, so stop-gap measures may be useful in the interim. Compositing video over dynamic content is actually an extremely important use case for rich, interactive multimedia, which I would like to encourage browser vendors to implement, but I'm not even sure where to start, given the situation on formats and codecs. I believe I've seen this discussed in Theora, but never went anywhere, and I don't have any idea how I'd even start getting involved in the MPEG standardization process. Have you actually tried this? Rendering video frames to a canvas and processing every pixel from script is *extremely* processor intensive, you are unlikely to get reasonable frame rate. Mozilla has a demo of this working, in Firefox only: https://developer.mozilla.org/samples/video/chroma-key/index.xhtml But no, this isn't something I would consider to be production quality. But perhaps if the WebGL typed arrays catch on, and start being used in more places, you might be able to start doing this with reasonable performance. The H.262 does support alpha (see AVC spec 2nd edition, section 7.3.2.1.2 Sequence parameter set extension), but we do not support it correctly in WebKit at the moment. *Please* file bugs against WebKit if you would like to see this properly supported. QuickTime movies support alpha for
Re: [whatwg] video feedback
On 2/10/10 2:19 PM, Brian Campbell wrote: Do browsers fire events for which there are no listeners? It varies. Gecko, for example, fires image load events not matter what but only fires mutation events if there are listeners. -Boris
Re: [whatwg] video feedback
On Wed, Feb 10, 2010 at 11:29 AM, Boris Zbarsky bzbar...@mit.edu wrote: On 2/10/10 2:19 PM, Brian Campbell wrote: Do browsers fire events for which there are no listeners? It varies. Gecko, for example, fires image load events not matter what but only fires mutation events if there are listeners. However checking for listeners has a non-trivial cost. You have to walk the full parentNode chain and see if any of the parents has a listener. This applies to both bubbling and non-bubbling events due to the capture phase. Also, feature which requires implementations to optimize for the feature not being used seems like a questionable feature to me. We want people to use the stuff we're creating, there's little point otherwise. / Jonas
Re: [whatwg] video feedback
On Thu, Feb 11, 2010 at 8:19 AM, Brian Campbell lam...@continuation.orgwrote: But no, this isn't something I would consider to be production quality. But perhaps if the WebGL typed arrays catch on, and start being used in more places, you might be able to start doing this with reasonable performance. With WebGL you could do the chroma-key processing on the GPU, and performance should be excellent. In fact you could probably prototype this today in Firefox. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] video feedback
On Wed, Feb 10, 2010 at 4:37 PM, Robert O'Callahan rob...@ocallahan.org wrote: On Thu, Feb 11, 2010 at 8:19 AM, Brian Campbell lam...@continuation.org wrote: But no, this isn't something I would consider to be production quality. But perhaps if the WebGL typed arrays catch on, and start being used in more places, you might be able to start doing this with reasonable performance. With WebGL you could do the chroma-key processing on the GPU, and performance should be excellent. In fact you could probably prototype this today in Firefox. You're not going to get solid professional quality keying results just by depending on a client side keying algorithm, even a computationally expensive one, without the ability to perform manual fixups. Being able to manipulate video data on the client is a powerful tool, but its not necessarily the right tool for every purpose.
Re: [whatwg] video feedback
On Thu, Feb 11, 2010 at 3:01 AM, Brian Campbell lam...@continuation.org wrote: On Feb 9, 2010, at 9:03 PM, Ian Hickson wrote: On Sat, 7 Nov 2009, Silvia Pfeiffer wrote: I use timeupdate to register a callback that will update captions/subtitles. That's only a temporary situation, though, so it shouldn't inform our decision. We should in due course develop much better solutions for captions and time-synchronised animations. The problem is, due to the slow pace of standards and browser development, we can sometimes be stuck with a temporary feature for many years. How long until enough IE users support HTML6 (or whatever standard includes a time-synchronization feature) for it to be usable? 10, 15 years? Even when we have a standard means of associate captions/subtitles with audio/video, we still want to allow for overriding the default presentation of these and do it all in JavaScript ourselves. I have just been pointed to a cool lyrics demo at http://svg-wow.org/audio/animated-lyrics.html which uses an audio file and essentially a caption file to display the lyrics in sync in svg. Problem is: they are using setInterval and setTimeout on the audio and that breaks synchronisation for me - probably because loading the audio over the distance takes longer than no time. Honestly, you cannot use setInterval for synchronising with a/v. You really need timeupdate. Maybe one option for pages that need a higher event firing rate than the default of the browser is to introduce a javascript api that lets it be set to anything between once per frame (25Hz) and every 250ms (4Hz)? I'm just wary what it may do to the responsiveness of the browser and whether the browser could refuse if it knew it would kill the performance. Cheers, Silvia.
Re: [whatwg] video feedback
On Thu, 26 Mar 2009, Matthew Gregan wrote: At 2009-03-25T10:16:32+, Ian Hickson wrote: On Fri, 13 Mar 2009, Matthew Gregan wrote: It's possible that neither a 'play' nor 'playing' event will be fired when a media element that has ended playback is played again. When first played, paused is set to false. When played again, playback has ended, so play() seeks to the beginning, but paused does not change (as it's already false), so the substeps that may fire play or playing are not run. 'playing' should fire, though, since the readyState will have dropped down to HAVE_CURRENT_DATA when the clip is ended, and will drop back up to HAVE_FUTURE_DATA after seeking. Right, so your intention is to interpret it thusly: readyState becomes HAVE_CURRENT_DATA when playback ends because it's not possible for the playback position to advance any further, and thus it's not possible to have data beyond the current playback position (which HAVE_FUTURE_DATA is predicated upon). Makes sense, but can the spec be made clearer about the behaviour in this case? HAVE_FUTURE_DATA talks about advancing *without reverting to HAVE_METADATA*, which doesn't apply in this case because we have all the data available locally. Clarified. Based on that interpretation, when the user sets playbackRate to -1 after playback ends, the readyState would change from HAVE_CURRENT_DATA to HAVE_FUTURE_DATA because the current playback position can now advance. I've made a bunch of changes to fix how things work when the direction of playback is backwards; there were some odd things in the way it was defined before (for example the previous definition actually had the playback position go infinitely negative and didn't stop at the start of the clip!). Following this logic, if playbackRate is set to 0 at any time, the readyState becomes HAVE_ENOUGH_DATA, as advancing the playback position by 0 units means the playback position can never overtake the available data before playback ends. Except this case seems to be specially handled by: The playbackRate can be 0.0, in which case the current playback position doesn't move, despite playback not being paused (paused doesn't become true, and the pause event doesn't fire). ...which uses the term move rather than advance, but suggests that the concept of the playbackRate advancing by 0 isn't consider advancing, which seems logical. I've clarified the uses of advance that I could find. Let me know if the spec is still ambiguous. Thanks! -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] video feedback
At 2009-03-25T10:16:32+, Ian Hickson wrote: On Fri, 13 Mar 2009, Matthew Gregan wrote: It's possible that neither a 'play' nor 'playing' event will be fired when a media element that has ended playback is played again. When first played, paused is set to false. When played again, playback has ended, so play() seeks to the beginning, but paused does not change (as it's already false), so the substeps that may fire play or playing are not run. 'playing' should fire, though, since the readyState will have dropped down to HAVE_CURRENT_DATA when the clip is ended, and will drop back up to HAVE_FUTURE_DATA after seeking. Right, so your intention is to interpret it thusly: readyState becomes HAVE_CURRENT_DATA when playback ends because it's not possible for the playback position to advance any further, and thus it's not possible to have data beyond the current playback position (which HAVE_FUTURE_DATA is predicated upon). Makes sense, but can the spec be made clearer about the behaviour in this case? HAVE_FUTURE_DATA talks about advancing *without reverting to HAVE_METADATA*, which doesn't apply in this case because we have all the data available locally. (Also, note that after the seek it'd return directly to HAVE_ENOUGH_DATA in the case I'm talking about, since the media is fully cached. That still requires a 'playing' event to fire, so that's fine.) Based on that interpretation, when the user sets playbackRate to -1 after playback ends, the readyState would change from HAVE_CURRENT_DATA to HAVE_FUTURE_DATA because the current playback position can now advance. Following this logic, if playbackRate is set to 0 at any time, the readyState becomes HAVE_ENOUGH_DATA, as advancing the playback position by 0 units means the playback position can never overtake the available data before playback ends. Except this case seems to be specially handled by: The playbackRate can be 0.0, in which case the current playback position doesn't move, despite playback not being paused (paused doesn't become true, and the pause event doesn't fire). ...which uses the term move rather than advance, but suggests that the concept of the playbackRate advancing by 0 isn't consider advancing, which seems logical. Cheers, -mjg -- Matthew Gregan |/ /|kine...@flim.org