Re: [whatwg] Video with MIME type application/octet-stream

2010-12-08 Thread Ian Hickson

Long story short: I haven't changed the spec where it talks about video, 
source type, Content-Type, and direct file inspection for type 
determination. My plan is to just wait and see what browsers do and update 
the spec accordingly in due course. This is mostly because we clearly have 
a wide range of opinions regarding what the right behaviour is, 
implementations are still changing, and implementors often disagree with 
their own implementations at this stage.


On Tue, 31 Aug 2010, Boris Zbarsky wrote:
 On 8/31/10 3:36 AM, Ian Hickson wrote:
   You might say Hey, but aren't you content sniffing then to find the 
   codecs and you'd be right. But in this case we're respecting the 
   MIME type sent by the server - it tells the browser to whatever 
   level of detail it wants (including codecs if needed) what type it 
   is sending. If the server sends 'text/plain' or 'video/x-matroska' I 
   wouldn't expect a browsers to sniff it for Ogg content.
  
  The Microsoft guys responded to my suggestion that they might want to 
  implement something like this with what's the benefit of doing 
  that?.
 
 One obvious benefit is that videos with the wrong type will not work, 
 and hence videos will be sent with the right type.
 
 If the question is what the benefits of that are, one is that the view 
 video in new window context menu option actually works.
 Another benefit is that you can send someone the link to the video, 
 instead of the embedding page, and it will work.
 Another is that when you save the video to disk the browser will fix up 
 the extension correctly, if needed.

I think that they would argue that these should work either way, with the 
same sniffing being used to ensure it works in all of these places.


  It seems that sniffing is context-sensitive.
 
 Yes, but one issue is that we really do want resources to be usable 
 outside the context the page happens to want to put them in.
 
 The ship has sailed on img, clearly, and is working on sailing on 
 video, but I feel that the behavior IE and Chrome are implementing 
 here is highly detrimental to the web.  Not that they care much.

  Sadly, the boat has sailed for text/html and XML at this point, but 
  for binary types, and for contexts where text/plain isn't a contender, 
  why bother doing anything but sniff?
 
 See above.  As long as some contexts are sniffing and some are not, we 
 have a problem.  If it were all-sniff (with the same algorithm across 
 the board!) or all-not-sniff, we might be ok.

I could go either way, but I think the road to all-sniff is less steep.


On Tue, 31 Aug 2010, Boris Zbarsky wrote:
 On 8/31/10 9:57 AM, Anne van Kesteren wrote:
   
   If the question is what the benefits of that are, one is that the 
   view video in new window context menu option actually works.
  
  If you sniff you can sniff there too.
 
 Not really, since it's just rendering in a toplevel browser window.  Or 
 rather... one could, but sniffing or not depending on something other 
 than the state of the url bar and the server response in toplevel 
 browser windows is extremely poor UI.

I'm not sure I follow. It works fine for sniffing JPEGs sent with the 
wrong type; why wouldn't it work for videos too?


   Another benefit is that you can send someone the link to the video, 
   instead of the embedding page, and it will work.
  
  If you sniff you can sniff there too. (Unless that user uses a 
  competitor's browser, but that would be an incentive to encourage that 
  user to use the sniffing browser.)
 
 You can't sniff in a toplevel browser window.  Not the same way that 
 people are sniffing in video.  It would break the web.

How so?


On Tue, 31 Aug 2010, Aryeh Gregor wrote:
 
 If you can't come up with any actual problems with what IE is doing, 
 then why is anything else even being considered?

Because a number of people I respect, such as Boris, who also happen to 
have more influence than I, since they are implementors, would rather we 
not determine types based on leading byte comparisons but on the MIME 
type.


 There's a very clear-cut problem with relying on MIME types: MIME types 
 are often wrong and hard for authors to configure, and this is not going 
 to change anytime soon.

Certainly it won't change if we have any sniffing going on. :-)


  Sadly, the boat has sailed for text/html and XML at this point, but 
  for binary types, and for contexts where text/plain isn't a contender, 
  why bother doing anything but sniff?
 
 If this is your position, why doesn't the spec match it?

The spec doesn't reflect my position. It would be quite different if it 
did. :-) It reflects what can be implemented interoperably within the 
constraints put forward by implementors.


On Tue, 31 Aug 2010, Boris Zbarsky wrote:
 On 8/31/10 3:59 PM, Aryeh Gregor wrote:
  On Tue, Aug 31, 2010 at 10:35 AM, Boris Zbarskybzbar...@mit.edu 
  wrote:
   You can't sniff in a toplevel browser window.  Not the same way that 
   people are 

Re: [whatwg] Video with MIME type application/octet-stream

2010-12-08 Thread Boris Zbarsky

On 12/8/10 8:19 PM, Ian Hickson wrote:

You can't sniff in a toplevel browser window.  Not the same way that
people are sniffing invideo.  It would break the web.


How so?


People actually rely on the not-sniffing behavior of UAs in actual 
browser windows in some cases.  For example, application/octet-stream at 
toplevel is somewhat commonly used to force downloads without a 
corresponding Content-Disposition header (poor practice, but support for 
Content-Disposition hasn't been historically great either).



(Note that the
spec as it stands takes a compromise position: the content is only
accepted if the Content-Type and type= values are supported types (if
present) and the content sniffs as a supported type, but nothing in the
spec checks that all three values are the same.)


Ah, I see.  So similar to the way img is handled...

I can't quite decide whether this is the best of both worlds, or the 
worst.  ;)


It certainly makes it simpler to implement video by delegating to 
QuickTime or the like, though I suspect such an implementation would 
also end up sniffing types the UA doesn't necessarily claim to 
support so maybe it's not simpler after all.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-16 Thread Mikko Rantalainen
2010-09-13 16:44 EEST: Roger Hågensen:
  On 2010-09-13 15:03, Mikko Rantalainen wrote:
 And why do we need this? Because web servers are not behaving correctly
 and are sending incorrect Content-Type headers? What makes you believe
 that BINID will not be incorrectly used?
 
 Because if they add a binary id then they obviously are aware of the
 standard.

And because Apache developers were obviously aware of the Content-Type
header they implemented it correctly? Unfortunately, that was not the
case. They even thought that Content-Type was important enough that no
response should be without it. Unfortunately, that did not work out. I
also fail to see future where server software vendors provide perfect
implementations.

The best we can do is to make such errors visible. I think a good UA
could fix such errors automatically but such fixes should not be applied
silently. On the contrary: a good UA should assume that any fix is only
a best effort and there's a good possibility that the resulting content
is not equal to the one the original author tried to provide. As a
result, a good UA should inform the user and possibly give a probability
for the correctness of the content.

 Old servers/software would just pass the file through as they are
 unaware so content type issues still exist there,
 eventually old servers/software are rotate out until most are binary id
 aware.
 This is how rolling out new standards work.
 A server would only add a binary id if none exist and it's certain (by
 previous sniffing) that it's guess is correct,

How about we add a new parameter to Content-Type header instead? For
example, the server could send a file with a header such as

Content-Type: text/plain; charset=iso-8859-1; accuracy=0.9

and a conforming user agent should assume that there's 90% possibility
that the given content type is correct. If accuracy is 1.0 (100%) then
sniffing MUST NOT be done. If the sniffing the UA is going has 95% hit
rate the results from such sniffing should probably be used instead of
HTTP provided content type if server provided accuracy is less than
0.95. I'll make it explicit that any sniffing done by UA should have a
probability of error attached to the result. A sniffing result without
probability for error has no value because otherwise a literal
text/plain is a good heuristic for any file (see also: Apache).

This way server administrators could opt-out from any sniffing and an
incompetent server administrator could specify global accuracy of 0.1 or
something like that. Hopefully new web servers would then either provide
no default accuracy at all or specify some low enough value that allow
for sniffing.

My point is that there's no need for BINID. My suggestion above is
compatible with existing servers, content and UAs, as far as I know. In
addition, it would provide a way to declare that the given content type
should be trusted even if UA thinks that honoring the content type
causes problems for viewing the content.

 Any sniffing would be as a fallback only if the UA suspects the content
 type is wrong (i.e. video of type text for example) or similar,
 and it would not hurt to have some standardized behavior in those cases
 that sniff for something simple like a short binary id rather than parse
 potentially several kilobytes of the stream (which was where this
 discussion took off originally).

Why do you think that such incorrectly transferred videos should be
working? I think we should just specify that such videos will never
work. That would be interoperable and an author of such video would then
have some incentive to fix the Content-Type if he wants to distribute
the content.

I know that this has issues with already existing content which may be
working with some UAs regardless of invalid content type. See the
accuracy parameter above for a possible solution.

-- 
Mikko



signature.asc
Description: OpenPGP digital signature


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-16 Thread Roger Hågensen

 On 2010-09-16 15:17, Mikko Rantalainen wrote:

2010-09-13 16:44 EEST: Roger Hågensen:

  On 2010-09-13 15:03, Mikko Rantalainen wrote:

And why do we need this? Because web servers are not behaving correctly
and are sending incorrect Content-Type headers? What makes you believe
that BINID will not be incorrectly used?

Because if they add a binary id then they obviously are aware of the
standard.

And because Apache developers were obviously aware of the Content-Type
header they implemented it correctly?... I
also fail to see future where server software vendors provide perfect
implementations.

We can dream can't we? *smiles*

Old servers/software would just pass the file through as they are
unaware so content type issues still exist there,
eventually old servers/software are rotate out until most are binary id
aware.
This is how rolling out new standards work.
A server would only add a binary id if none exist and it's certain (by
previous sniffing) that it's guess is correct,

How about we add a new parameter to Content-Type header instead? For
example, the server could send a file with a header such as

Content-Type: text/plain; charset=iso-8859-1; accuracy=0.9

and a conforming user agent should assume that there's 90% possibility
that the given content type is correct. If accuracy is 1.0 (100%) then
sniffing MUST NOT be done. If the sniffing the UA is going has 95% hit
rate the results from such sniffing should probably be used instead of
HTTP provided content type if server provided accuracy is less than
0.95. I'll make it explicit that any sniffing done by UA should have a
probability of error attached to the result. A sniffing result without
probability for error has no value because otherwise a literal
text/plain is a good heuristic for any file (see also: Apache).

This way server administrators could opt-out from any sniffing and an
incompetent server administrator could specify global accuracy of 0.1 or
something like that. Hopefully new web servers would then either provide
no default accuracy at all or specify some low enough value that allow
for sniffing.

My point is that there's no need for BINID. My suggestion above is
compatible with existing servers, content and UAs, as far as I know. In
addition, it would provide a way to declare that the given content type
should be trusted even if UA thinks that honoring the content type
causes problems for viewing the content.


Now we're getting somewhere, I really like this proposal.
Actually, with your idea a binary id would complement this as a server 
supporting it could provide accuracy=1.0 in that case.


So I have to say that your accuracy parameter seems quick to add/support 
(both in http header, and in HTML5 meta and other appropriate places?)


And I doubt the Apache Foundation will have much issues supporting this 
either.


I guess we'll have to see what the rest thinks in this list but... a 
solution this slick..

it certainly has my vote, nice work Mikko *thumbs up*.

--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-14 Thread Julian Reschke

On 13.09.2010 23:51, Aryeh Gregor wrote:

...

And for heavens sake, do not specify any sniffing as official.
Instead, explicitly specify all sniffing as UA specific and possibly
suggest that UAs should inform the user that content is broken and the
current rendering is best effort if any sniffing is required.


This is totally incompatible with the compelling interoperability and
security benefits of all browsers using the exact same sniffing
algorithm.
...


Again, there's more than browsers. And even for video in browsers, the 
actual component playing the video may not be part of the browser at all.


So there's *much* more that would need to implement the exact same 
sniffing.


Has anybody talked to the people responsible for VLC, Windows Media 
Player, and Quicktime?


Best regards, Julian



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-14 Thread Roger Hågensen

 On 2010-09-13 15:55, Nils Dagsson Moskopp wrote:

Mikko Rantalainenmikko.rantalai...@peda.net  schrieb am Mon, 13 Sep
2010 16:03:27 +0300:


[…]

Basically, this sounds like all the issues of BOM for all binary
files.

And why do we need this? Because web servers are not behaving
correctly and are sending incorrect Content-Type headers? What makes
you believe that BINID will not be incorrectly used?

(If you really believe that you can force content authors to provide
correct BINIDs, why you cannot force content authors to provide
correct Content-Types? Hopefully the goal is not to sniff if BINIDs
seems okay and ignore clearly incorrect ones in the future...)

This. BINID may be a well-intended idea, but would be an essentially
useless additional layer of abstraction that provides no more
safeguards against misuse than the Content-Type header.

The latter also required no changes to current binary file handling —
which for BINID would need to be universally updated in every
conceivable device that could ever get a BINID file.


Yeah! That is the one shorterm drawback, while the longterm benefit is 
that file extensions and content type would be redundant (as the files 
themselves would inform what they are, in a standard way).
Oh well! I can always dream that some form of binary id will come about 
in the next decade or so I guess...*laughs*


--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-14 Thread Roger Hågensen

 On 2010-09-14 08:37, Julian Reschke wrote:

On 13.09.2010 23:51, Aryeh Gregor wrote:

...

And for heavens sake, do not specify any sniffing as official.
Instead, explicitly specify all sniffing as UA specific and possibly
suggest that UAs should inform the user that content is broken and the
current rendering is best effort if any sniffing is required.


This is totally incompatible with the compelling interoperability and
security benefits of all browsers using the exact same sniffing
algorithm.
...


Again, there's more than browsers. And even for video in browsers, 
the actual component playing the video may not be part of the browser 
at all.


So there's *much* more that would need to implement the exact same 
sniffing.


Has anybody talked to the people responsible for VLC, Windows Media 
Player, and Quicktime?


Best regards, Julian




Good question, I can only speak for my self as a developer but I imagine 
that anything that allows a media player to stop sniffing sooner in a 
file is very welcome indeed
as that saves resources (memory, disk access, faster initialization of 
the codec and user related interface feedback, etc.)
Legacy files will always be an issue obviously, but there is no reason 
to let future files remain just as difficult, eventually legacy files 
will vanish or be transcoded or have their beginning patched to take 
advantage of it.
(in the case of my proposal one could easily add it by hand using a hex 
editor, so it's certainly not difficult to support in that regard.)


--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-13 Thread Mikko Rantalainen
2010-09-11 01:51 EEST: Roger Hågensen:
  On 2010-09-09 09:24, Philip Jägenstedt wrote:
 For at least WAVE, Ogg and WebM it's not possible as they begin with
 different magic bytes.
 
 Then why not define a new magic that is universal, so that if a proper
 content type is not stated then a sniffing for a standardized universal
 magic is done?
 
 Yep, I'm referring to my BINID proposal.
 If a content type is missing, sniff the first 265 bytes and see if it is
 a BINID, if it is a BINID check if it's a supported/expected one, and it
 is then play away, all is good.

From the what could possibly go wrong department of thought:

- a web server blindly prefixes files with BINID if it knows the file
suffix and as a result, a file ends up with a double BINID (server
assumes that no files contain BINID by default)
- a file has double BINID with contradicting content ids
- some internal API assumes that caller wants BINID in the stream, the
caller assumes that the stream has no BINID - as a result, the caller
will pass content with BINIDs embedded in the middle of stream.

Basically, this sounds like all the issues of BOM for all binary files.

And why do we need this? Because web servers are not behaving correctly
and are sending incorrect Content-Type headers? What makes you believe
that BINID will not be incorrectly used?

(If you really believe that you can force content authors to provide
correct BINIDs, why you cannot force content authors to provide correct
Content-Types? Hopefully the goal is not to sniff if BINIDs seems okay
and ignore clearly incorrect ones in the future...)


I'd like to specify that the only cases an UA is allowed to sniff the
content type are:

- Content-Type header is missing (because the server clearly does not
know the type), or
- Content-Type is literal text/plain, text/plain;
charset=iso-8859-1, text/plain; charset=ISO-8859-1 or text/plain;
charset=UTF-8 (to deal with historical mess caused by IIS and Apache), or
- Content-Type is literal application/octet-stream

(In all these cases, the server clearly has no real knowledge. If a file
is meant for downloading, the server should use Content-Disposition:
attachment header instead of hacks such as using
application/x-download for Content-Type.)

For any other value of Content-Type, honor the type specified in HTTP
level. And provide no overrides of any kind on any level above the HTTP.
Levels above HTTP may provide HINTS about the content that can be used
to aid or override *sniffing* but nothing should override any
*explicitly specified Content-Type*. [This is simplified version of the
logic that the Mozilla/Firefox already applies:
http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#684]

And for heavens sake, do not specify any sniffing as official.
Instead, explicitly specify all sniffing as UA specific and possibly
suggest that UAs should inform the user that content is broken and the
current rendering is best effort if any sniffing is required.

-- 
Mikko



signature.asc
Description: OpenPGP digital signature


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-13 Thread Roger Hågensen

 On 2010-09-13 15:03, Mikko Rantalainen wrote:

2010-09-11 01:51 EEST: Roger Hågensen:

  On 2010-09-09 09:24, Philip Jägenstedt wrote:

For at least WAVE, Ogg and WebM it's not possible as they begin with
different magic bytes.

Then why not define a new magic that is universal, so that if a proper
content type is not stated then a sniffing for a standardized universal
magic is done?

Yep, I'm referring to my BINID proposal.
If a content type is missing, sniff the first 265 bytes and see if it is
a BINID, if it is a BINID check if it's a supported/expected one, and it
is then play away, all is good.

 From the what could possibly go wrong department of thought:

- a web server blindly prefixes files with BINID if it knows the file
suffix and as a result, a file ends up with a double BINID (server
assumes that no files contain BINID by default)
- a file has double BINID with contradicting content ids
- some internal API assumes that caller wants BINID in the stream, the
caller assumes that the stream has no BINID - as a result, the caller
will pass content with BINIDs embedded in the middle of stream.

Basically, this sounds like all the issues of BOM for all binary files.

And why do we need this? Because web servers are not behaving correctly
and are sending incorrect Content-Type headers? What makes you believe
that BINID will not be incorrectly used?


Because if they add a binary id then they obviously are aware of the 
standard.
Old servers/software would just pass the file through as they are 
unaware so content type issues still exist there,
eventually old servers/software are rotate out until most are binary id 
aware.

This is how rolling out new standards work.
A server would only add a binary id if none exist and it's certain (by 
previous sniffing) that it's guess is correct,
though I guess the standard could state that if no binary id exist on a 
file then none should be added by the server at all (legacy behavior?)
And what I meant with the server adding it I meant services like Youtube 
(if Youtube transcode a video to MP4 then the server knows it's 
delivering just that),
likewise with streaming radio or video or similar, a regular webserver 
would have no right (or point) in modifying a file served than it does a 
.zip or .mp3 that a user downloads,
we are talking about streaming here mainly right? (where a short max 
length sniffing would be a huge benefit)



(If you really believe that you can force content authors to provide
correct BINIDs, why you cannot force content authors to provide correct
Content-Types? Hopefully the goal is not to sniff if BINIDs seems okay
and ignore clearly incorrect ones in the future...)


I do not see why web authors (or users at all) would need to mess with 
the binary id at all,

it's authoring software or transcoding software that would add them.

My BINID proposal is just that, a proposal for a binary id, it does not 
define how servers and browsers should handle it
as that is a different scope altogether. Something like a binary id 
would need a proper RFC writeup or similar.



I'd like to specify that the only cases an UA is allowed to sniff the
content type are:

- Content-Type header is missing (because the server clearly does not
know the type), or
- Content-Type is literal text/plain, text/plain;
charset=iso-8859-1, text/plain; charset=ISO-8859-1 or text/plain;
charset=UTF-8 (to deal with historical mess caused by IIS and Apache), or
- Content-Type is literal application/octet-stream

(In all these cases, the server clearly has no real knowledge. If a file
is meant for downloading, the server should use Content-Disposition:
attachment header instead of hacks such as using
application/x-download for Content-Type.)
Yes! But if the UA in those cases also checked for a binary ID (and 
found such) there would hardly be any ambiguity.

For any other value of Content-Type, honor the type specified in HTTP
level. And provide no overrides of any kind on any level above the HTTP.
Levels above HTTP may provide HINTS about the content that can be used
to aid or override *sniffing* but nothing should override any
*explicitly specified Content-Type*. [This is simplified version of the
logic that the Mozilla/Firefox already applies:
http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#684]

And for heavens sake, do not specify any sniffing as official.
Instead, explicitly specify all sniffing as UA specific and possibly
suggest that UAs should inform the user that content is broken and the
current rendering is best effort if any sniffing is required.


Any sniffing would be as a fallback only if the UA suspects the content 
type is wrong (i.e. video of type text for example) or similar,
and it would not hurt to have some standardized behavior in those cases 
that sniff for something simple like a short binary id rather than parse 
potentially several kilobytes of the stream (which was where this 
discussion took 

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-13 Thread Nils Dagsson Moskopp
Mikko Rantalainen mikko.rantalai...@peda.net schrieb am Mon, 13 Sep
2010 16:03:27 +0300:

[…]

 Basically, this sounds like all the issues of BOM for all binary
 files.
 
 And why do we need this? Because web servers are not behaving
 correctly and are sending incorrect Content-Type headers? What makes
 you believe that BINID will not be incorrectly used?

 (If you really believe that you can force content authors to provide
 correct BINIDs, why you cannot force content authors to provide
 correct Content-Types? Hopefully the goal is not to sniff if BINIDs
 seems okay and ignore clearly incorrect ones in the future...)

This. BINID may be a well-intended idea, but would be an essentially
useless additional layer of abstraction that provides no more
safeguards against misuse than the Content-Type header.

The latter also required no changes to current binary file handling —
which for BINID would need to be universally updated in every
conceivable device that could ever get a BINID file.

-- 
Nils Dagsson Moskopp // erlehmann
http://dieweltistgarnichtso.net


signature.asc
Description: PGP signature


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-13 Thread Aryeh Gregor
On Mon, Sep 13, 2010 at 9:03 AM, Mikko Rantalainen
mikko.rantalai...@peda.net wrote:
 For any other value of Content-Type, honor the type specified in HTTP
 level. And provide no overrides of any kind on any level above the HTTP.
 Levels above HTTP may provide HINTS about the content that can be used
 to aid or override *sniffing* but nothing should override any
 *explicitly specified Content-Type*. [This is simplified version of the
 logic that the Mozilla/Firefox already applies:
 http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#684]

This is not feasible at least for some legacy cases, like image MIME
types.  It's a good strategy to try for new cases, though.  If it
could be specced for video and audio, maybe we could get everyone to
converge sooner rather than later.

 And for heavens sake, do not specify any sniffing as official.
 Instead, explicitly specify all sniffing as UA specific and possibly
 suggest that UAs should inform the user that content is broken and the
 current rendering is best effort if any sniffing is required.

This is totally incompatible with the compelling interoperability and
security benefits of all browsers using the exact same sniffing
algorithm.


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-11 Thread Roger Hågensen

 On 2010-09-11 03:40, Silvia Pfeiffer wrote:
[snip...]


And yeah, this kinda stretched beyond the scope of HTML5 specs,
but you'd be swatting two flies at once, solving the sniffing
issue with video and audio, but also the sniffing issue that
every OS has had for the last couple of um... decades?! (poke your
OS/Filesystem colleagues and ask them what they think of something
like this.)
Then again, HTML5 is kinda a OS in it's own right, being a app
platform (not to mention supporting local storage of databases and
files even), so maybe it's not that far outside the scope anyway
to define something like this?

-- 
Roger Rescator Hågensen.

Freelancer - http://EmSai.net/



Is there a link to your BINID proposal? From reading this I wonder: 
Would it entail having to re-write all existing files with an extra 
identifier at the start?


Silvia.



http://www.emsai.net/projects/binid/details/
(it really need to be rewritten as it's way to wordy and repetitive to 
explain something so simple, I was planning to rewrite the document 
later this fall but...)


And to answer your question, unfortunately yes, but that is the only way 
to solve the issue.
Some current fileformats would allow such a binary id header to be added 
without any issues (as they scan past ID3v2 or similar meta information 
anyway).
Most existing software would have no issues adding a check for such a 
binary id, in the long run it will save CPU cycles also.
Certain streaming/transfer protocols could be updated too, and this is 
where video and audio could leap ahead.


The thing is as I said, is that a browser could easily strip off the 
binary id before passing it on, so a codec or a OS filesystem or local 
software would be completely unaware,

but in time they too would support it (hopefully).
A serverside script (PHP or Python for example) could easily add the 
binary id to the start of a file or stream when sent to the browser, or 
even added to the file during transcoding.
so even if the server or .htaccess is set to only 
application/octet-stream proper file format identification would be 
still possible by browser only checking the binary id header.


--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-10 Thread Roger Hågensen

 On 2010-09-09 09:24, Philip Jägenstedt wrote:
On Thu, 09 Sep 2010 02:15:27 +0200, David Singer sin...@apple.com 
wrote:



On Wed, Sep 8, 2010 at 3:13 PM, And Clover and...@doxdesk.com wrote:

Perhaps I *meant* to serve a non-video
file with something that looks a fingerprint from a video format at 
the top.


Anything's possible, but it's vastly more likely that you just made 
a mistake.


It may be possible to make one file that is valid under two formats.  
Kinda like those old competitions write a single file that when 
compiled and run through as many languages as possible prints hello, 
world! :-).


For at least WAVE, Ogg and WebM it's not possible as they begin with 
different magic bytes.




Then why not define a new magic that is universal, so that if a proper 
content type is not stated then a sniffing for a standardized universal 
magic is done?


Yep, I'm referring to my BINID proposal.
If a content type is missing, sniff the first 265 bytes and see if it is 
a BINID, if it is a BINID check if it's a supported/expected one, and it 
is then play away, all is good.
If a content type is given, then just in case sniff the first 265 bytes 
and see if it is a BINID, if it is a BINID check if it's a 
supported/expected one, and it is then play away, all is good.
If a content type is missing, and the sniffing of the first 265 bytes 
shows it is not a BINID or not a supported one, then it can only be 
treated as unknown binary and would fail (though in the case of a 
unsupported BINID the user would be shown what the BINID is so they 
won't be fully stuck if they miss a particular codec or the browser 
doesn't support it).
If a content type is given, and sniffing the first 265 bytes shows it's 
not a BINID or not a supported one, then treat it as per the context 
(video or audio) and hope the video or audio codec layer is able to find 
out what it is (what should happen currently right?).


It would be very easy to add support for something like BINID as it can 
be output at the start of a file or stream as the server sends it, a 
script could even output it or it could be at the start of the actual 
file itself,
and in the case of live streaming a server could easily add it to the 
start of the stream even if it's mid-stream. Even a wrongly configured 
webserver wouldn't be able to mess up the handling of this.
The benefit is that the browser would see that, Oh, this is a BINID and 
it's Webm, I'll pass this on to the video codec then.
Or if audio and the browser sees it is a BINID and it's MP3 it would 
pass it to the mp3 audio codec.
In time something like BINID might even propagate elsewhere beyond just 
video and audio.


I'm not saying that BINID must be used, but at least something very 
close to it (as unknown formats can be shown to a human user and make 
sense and be searchable), and maybe the first 8 bytes should be 
constructed slightly differently?.
Oh and although I haven't tested this, I suspect that most current 
codecs would ignore the first 265 bytes when they sniff for the start of 
the data anyway so a BINID would be partially backwards compatible,

and in any case certainly easy to patch in support for quite easily.
And the best part is that the browser could easily strip or skip past 
the BINID when passing the data to the OS or codecs (if such do not 
support BINID at all), or if saving the audio or video locally per user 
request.


Something like BINID (short for Binary Identification actually) is 
needed, and there is nothing wrong with HTML5 and video audio 
standard defining it,
it wouldn't be the first time a web standard has been adopted elsewhere 
later, it would surely see adoption outside of this, I certainly would 
use it elsewhere.


I invented BINID for a reason, because .*** file extensions just isn't 
good enough, and sniffing binary files is a real pain, the same pain as 
the video and audio discussion here is pointing out right now.


So if sniffing is bad, but sniffing can't be avoided, then why not 
simply standardize the sniffing by defining a universal, simple and end 
user friendly (the BINID can be displayed to the user, even if 
unknown/unsupported),
and the sniffing would be limited to the first 265 bytes (in the case of 
the BINID proposal), and this limited sniffing can't determine what 
something is and the context and extra info (like content type) does not 
clarify what it is or what to do with it then simply fail and inform the 
user, it doesn't have to be more complicated than that.


As simple as possible, but no simpler. Isn't that the ideal mantra of 
all coders here?


Remember, I'm not saying you must use BINID (but hey it's there and 
fleshed out already), if you must change the name, do so, if you must 
change the 8 byte sequence, do so, just make sure it has a max length, 
and the ID is humanly disaplayable if the format is unsupported. Just 
make it into an RFC or something, and spec it in the HTML standard that 
it must be supported, 

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-10 Thread Silvia Pfeiffer
On Sat, Sep 11, 2010 at 8:51 AM, Roger Hågensen resca...@emsai.net wrote:

  On 2010-09-09 09:24, Philip Jägenstedt wrote:

 On Thu, 09 Sep 2010 02:15:27 +0200, David Singer sin...@apple.com
 wrote:

  On Wed, Sep 8, 2010 at 3:13 PM, And Clover and...@doxdesk.com wrote:

 Perhaps I *meant* to serve a non-video
 file with something that looks a fingerprint from a video format at the
 top.


 Anything's possible, but it's vastly more likely that you just made a
 mistake.


 It may be possible to make one file that is valid under two formats.
  Kinda like those old competitions write a single file that when compiled
 and run through as many languages as possible prints hello, world! :-).


 For at least WAVE, Ogg and WebM it's not possible as they begin with
 different magic bytes.


 Then why not define a new magic that is universal, so that if a proper
 content type is not stated then a sniffing for a standardized universal
 magic is done?

 Yep, I'm referring to my BINID proposal.
 If a content type is missing, sniff the first 265 bytes and see if it is a
 BINID, if it is a BINID check if it's a supported/expected one, and it is
 then play away, all is good.
 If a content type is given, then just in case sniff the first 265 bytes and
 see if it is a BINID, if it is a BINID check if it's a supported/expected
 one, and it is then play away, all is good.
 If a content type is missing, and the sniffing of the first 265 bytes shows
 it is not a BINID or not a supported one, then it can only be treated as
 unknown binary and would fail (though in the case of a unsupported BINID the
 user would be shown what the BINID is so they won't be fully stuck if they
 miss a particular codec or the browser doesn't support it).
 If a content type is given, and sniffing the first 265 bytes shows it's not
 a BINID or not a supported one, then treat it as per the context (video or
 audio) and hope the video or audio codec layer is able to find out what it
 is (what should happen currently right?).

 It would be very easy to add support for something like BINID as it can be
 output at the start of a file or stream as the server sends it, a script
 could even output it or it could be at the start of the actual file itself,
 and in the case of live streaming a server could easily add it to the start
 of the stream even if it's mid-stream. Even a wrongly configured webserver
 wouldn't be able to mess up the handling of this.
 The benefit is that the browser would see that, Oh, this is a BINID and
 it's Webm, I'll pass this on to the video codec then.
 Or if audio and the browser sees it is a BINID and it's MP3 it would pass
 it to the mp3 audio codec.
 In time something like BINID might even propagate elsewhere beyond just
 video and audio.

 I'm not saying that BINID must be used, but at least something very close
 to it (as unknown formats can be shown to a human user and make sense and be
 searchable), and maybe the first 8 bytes should be constructed slightly
 differently?.
 Oh and although I haven't tested this, I suspect that most current codecs
 would ignore the first 265 bytes when they sniff for the start of the data
 anyway so a BINID would be partially backwards compatible,
 and in any case certainly easy to patch in support for quite easily.
 And the best part is that the browser could easily strip or skip past the
 BINID when passing the data to the OS or codecs (if such do not support
 BINID at all), or if saving the audio or video locally per user request.

 Something like BINID (short for Binary Identification actually) is needed,
 and there is nothing wrong with HTML5 and video audio standard defining
 it,
 it wouldn't be the first time a web standard has been adopted elsewhere
 later, it would surely see adoption outside of this, I certainly would use
 it elsewhere.

 I invented BINID for a reason, because .*** file extensions just isn't good
 enough, and sniffing binary files is a real pain, the same pain as the
 video and audio discussion here is pointing out right now.

 So if sniffing is bad, but sniffing can't be avoided, then why not simply
 standardize the sniffing by defining a universal, simple and end user
 friendly (the BINID can be displayed to the user, even if
 unknown/unsupported),
 and the sniffing would be limited to the first 265 bytes (in the case of
 the BINID proposal), and this limited sniffing can't determine what
 something is and the context and extra info (like content type) does not
 clarify what it is or what to do with it then simply fail and inform the
 user, it doesn't have to be more complicated than that.

 As simple as possible, but no simpler. Isn't that the ideal mantra of all
 coders here?

 Remember, I'm not saying you must use BINID (but hey it's there and fleshed
 out already), if you must change the name, do so, if you must change the 8
 byte sequence, do so, just make sure it has a max length, and the ID is
 humanly disaplayable if the format is unsupported. Just make it 

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-09 Thread Philip Jägenstedt

I think we should always sniff or never sniff, for simplicity.

Philip

On Wed, 08 Sep 2010 19:14:48 +0200, David Singer sin...@apple.com wrote:

what about don't sniff if the HTML gave you a mime type (i.e. a source  
element with a type attribute), or at least don't sniff for the  
purposes of determining CanPlay, dispatch, if the HTML source gave you a  
mime type?



On Sep 8, 2010, at 2:33 , Philip Jägenstedt wrote:

On Tue, 07 Sep 2010 22:00:55 +0200, Boris Zbarsky bzbar...@mit.edu  
wrote:



On 9/7/10 3:29 PM, Aryeh Gregor wrote:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.


I could probably live with those, actually.


* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder  
might consume arbitrary amounts of data before saying that there was  
an error.


I agree with Boris, the first two points are OK but the third I'd  
rather not implement, it's too much work for something that ought to  
happen very, very rarely.


--
Philip Jägenstedt
Core Developer
Opera Software


David Singer
Multimedia and Software Standards, Apple Inc.




--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-09 Thread David Singer
I can't think why always sniffing is simple, or cheap, or desirable.  I'd love 
to get to never-sniff, but am not sanguine.


On Sep 9, 2010, at 0:07 , Philip Jägenstedt wrote:

 I think we should always sniff or never sniff, for simplicity.
 
 Philip
 
 On Wed, 08 Sep 2010 19:14:48 +0200, David Singer sin...@apple.com wrote:
 
 what about don't sniff if the HTML gave you a mime type (i.e. a source 
 element with a type attribute), or at least don't sniff for the purposes of 
 determining CanPlay, dispatch, if the HTML source gave you a mime type?
 
 
 On Sep 8, 2010, at 2:33 , Philip Jägenstedt wrote:
 
 On Tue, 07 Sep 2010 22:00:55 +0200, Boris Zbarsky bzbar...@mit.edu wrote:
 
 On 9/7/10 3:29 PM, Aryeh Gregor wrote:
 * Sniff only if Content-Type is typical of what popular browsers serve
 for unrecognized filetypes.  E.g., only for no Content-Type,
 text/plain, or application/octet-stream, and only if the encoding is
 either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
 do here.
 * Sniff the same both for video tags and top-level browsing contexts,
 so open video in new tab doesn't mysteriously fail on some setups.
 
 I could probably live with those, actually.
 
 * If a file in a top-level browsing context is sniffed as video but
 then some kind of error is returned before the video plays the first
 frame, fall back to allowing the user to download it, or whatever the
 usual action would be if no sniffing had occurred.
 
 This might be pretty difficult to implement, since the video decoder might 
 consume arbitrary amounts of data before saying that there was an error.
 
 I agree with Boris, the first two points are OK but the third I'd rather 
 not implement, it's too much work for something that ought to happen very, 
 very rarely.
 
 --
 Philip Jägenstedt
 Core Developer
 Opera Software
 
 David Singer
 Multimedia and Software Standards, Apple Inc.
 
 
 
 -- 
 Philip Jägenstedt
 Core Developer
 Opera Software

David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-09 Thread Andy Berkheimer
Much of this discussion has focused on the careless server operator.  What
about the careful ones?

Given the past history of content sniffing and security warts, it is useful
- or at least comforting - to have a path for the careful server to indicate
I know this file really is intended to be handled as this type, please
don't sniff it.  This is particularly true for a server handling sanitized
files from unknown sources, as no sanitizer will be perfect.

Today we approximate this through accurate use of Content-Type and a recent
addition of X-Content-Type-Options: nosniff.

Never sniffing sounds idyllic and always sniffing makes life a bit riskier
for careful server operators.  The proposals of limiting video/audio
sniffing to a few troublesome Content-Types are quite reasonable.

-Andy

On Thu, Sep 9, 2010 at 3:07 AM, Philip Jägenstedt phil...@opera.com wrote:

 I think we should always sniff or never sniff, for simplicity.

 Philip


 On Wed, 08 Sep 2010 19:14:48 +0200, David Singer sin...@apple.com wrote:

  what about don't sniff if the HTML gave you a mime type (i.e. a source
 element with a type attribute), or at least don't sniff for the purposes of
 determining CanPlay, dispatch, if the HTML source gave you a mime type?


 On Sep 8, 2010, at 2:33 , Philip Jägenstedt wrote:

  On Tue, 07 Sep 2010 22:00:55 +0200, Boris Zbarsky bzbar...@mit.edu
 wrote:

  On 9/7/10 3:29 PM, Aryeh Gregor wrote:

 * Sniff only if Content-Type is typical of what popular browsers serve
 for unrecognized filetypes.  E.g., only for no Content-Type,
 text/plain, or application/octet-stream, and only if the encoding is
 either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
 do here.
 * Sniff the same both for video tags and top-level browsing contexts,
 so open video in new tab doesn't mysteriously fail on some setups.


 I could probably live with those, actually.

  * If a file in a top-level browsing context is sniffed as video but
 then some kind of error is returned before the video plays the first
 frame, fall back to allowing the user to download it, or whatever the
 usual action would be if no sniffing had occurred.


 This might be pretty difficult to implement, since the video decoder
 might consume arbitrary amounts of data before saying that there was an
 error.


 I agree with Boris, the first two points are OK but the third I'd rather
 not implement, it's too much work for something that ought to happen very,
 very rarely.

 --
 Philip Jägenstedt
 Core Developer
 Opera Software


 David Singer
 Multimedia and Software Standards, Apple Inc.



 --
 Philip Jägenstedt
 Core Developer
 Opera Software



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-09 Thread David Singer

On Sep 9, 2010, at 16:38 , Andy Berkheimer wrote:

 Much of this discussion has focused on the careless server operator.  What 
 about the careful ones?
 
 Given the past history of content sniffing and security warts, it is useful - 
 or at least comforting - to have a path for the careful server to indicate I 
 know this file really is intended to be handled as this type, please don't 
 sniff it.  This is particularly true for a server handling sanitized files 
 from unknown sources, as no sanitizer will be perfect.
 
 Today we approximate this through accurate use of Content-Type and a recent 
 addition of X-Content-Type-Options: nosniff.
 
 Never sniffing sounds idyllic and always sniffing makes life a bit riskier 
 for careful server operators.  The proposals of limiting video/audio sniffing 
 to a few troublesome Content-Types are quite reasonable.

I think I agree.  

The minimum I can think of is

sniff when (a) suspect types are supplied and (b) they are 'auto-generated' 
(e.g. by a web server).  If either are not true, you shouldn't need to sniff.  
Someone who writes

 source ... type=video/frubotziger ... / 

causes both tests to fail and deserves to be believed (and get the 
consequences). (Have you SEEN frubotziger format video :-))

 
 -Andy
 
 On Thu, Sep 9, 2010 at 3:07 AM, Philip Jägenstedt phil...@opera.com wrote:
 I think we should always sniff or never sniff, for simplicity.
 
 Philip

David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-08 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 22:00:55 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 3:29 PM, Aryeh Gregor wrote:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.


I could probably live with those, actually.


* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder  
might consume arbitrary amounts of data before saying that there was an  
error.


I agree with Boris, the first two points are OK but the third I'd rather  
not implement, it's too much work for something that ought to happen very,  
very rarely.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-08 Thread Julian Reschke

On 07.09.2010 22:00, Boris Zbarsky wrote:

...

* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder
might consume arbitrary amounts of data before saying that there was an
error.
...


It's not that hard if it's acceptable to restart the network request 
(just do it again, with a flag not-to-sniff).


Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-08 Thread David Singer
what about don't sniff if the HTML gave you a mime type (i.e. a source 
element with a type attribute), or at least don't sniff for the purposes of 
determining CanPlay, dispatch, if the HTML source gave you a mime type?


On Sep 8, 2010, at 2:33 , Philip Jägenstedt wrote:

 On Tue, 07 Sep 2010 22:00:55 +0200, Boris Zbarsky bzbar...@mit.edu wrote:
 
 On 9/7/10 3:29 PM, Aryeh Gregor wrote:
 * Sniff only if Content-Type is typical of what popular browsers serve
 for unrecognized filetypes.  E.g., only for no Content-Type,
 text/plain, or application/octet-stream, and only if the encoding is
 either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
 do here.
 * Sniff the same both for video tags and top-level browsing contexts,
 so open video in new tab doesn't mysteriously fail on some setups.
 
 I could probably live with those, actually.
 
 * If a file in a top-level browsing context is sniffed as video but
 then some kind of error is returned before the video plays the first
 frame, fall back to allowing the user to download it, or whatever the
 usual action would be if no sniffing had occurred.
 
 This might be pretty difficult to implement, since the video decoder might 
 consume arbitrary amounts of data before saying that there was an error.
 
 I agree with Boris, the first two points are OK but the third I'd rather not 
 implement, it's too much work for something that ought to happen very, very 
 rarely.
 
 -- 
 Philip Jägenstedt
 Core Developer
 Opera Software

David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-08 Thread Boris Zbarsky

On 9/8/10 11:05 AM, Julian Reschke wrote:

It's not that hard if it's acceptable to restart the network request
(just do it again, with a flag not-to-sniff).


It's common enough to not be ok to restart, though.  And even the 
restart behavior can be pretty complicated, since it requires not just 
redoing the network request but has interactions with session history, 
etc, etc.


It's a huge can of worms.

-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-08 Thread And Clover

On 09/07/2010 09:29 PM, Aryeh Gregor wrote:


I'm not a fan of sniffing, but I'm also not a fan of blindly believing
clearly wrong MIME types


Who decides what is clearly wrong? Perhaps I *meant* to serve a 
non-video file with something that looks a fingerprint from a video 
format at the top. In fact, given that HTML5 does not endorse or limit 
implementation to a particular video format, there could be any number 
of video formats whose header-words I have to avoid using in the first N 
bytes of my file.


If I were serving an image/png with no PNG header, you could say it was 
clearly wrong. But there's no way you can say any sequence of bytes is 
clearly not application/octet-stream or some other anything-goes type.



I'm not yet sure what the correct tradeoff is here, but I'm pretty sure it's
not no sniffing at all under any conditions.


I suggest:

1. Resources with no Content-Type continue to be fair game.

2. By far the most prevalent maybe wrong Content-Type that is widely 
deployed is text/plain, due to inappropriate web server defaults (both 
IIS and Apache2.3). Allow sniffing of text/plain resources, but provide 
an override for server operators to say I mean it, this is really 
text/plain. ie. standardise X-Content-Type-Options or something like it.


3. Sniffing should otherwise not occur in any context.

It is unfortunate that sniffing will continue to be needed for the 
text/plain case for a very long time yet. But we should be aiming to 
mitigate and deprecate this historical error, rather than make the 
problem an order of magnitude worse by requiring potentially-limitless 
new sniffing cases.



it's unreliable in the same way across all browsers.


It's already different in different browsers, even with the small number 
of filetypes we currently have. As previously commented, undistinctive 
fingerprints, starting mid-stream and other oddities like ID3 tags makes 
sniffing for media filetypes even more troublesome than it is for other 
types.


In any case, any sniffing solution will always be inconsistent as 
different browsers, OSes, installed codecs and options expose different 
media filetypes to the net.


Never mind just browsers, or even browsers that simply pass the resource 
to their underlying media frameworks for sniffing: there are far more 
already-deployed media players with HTTP capability than there are 
browsers with video/audio support. There is no chance we will ever be 
able to standardise the implementation of sniffing amongst this wide 
range of agents!


So there will always be non-compliant UAs. In the face of this, we might 
as well standardise the 'good' solution - minimal sniffing - and hope to 
drag a few modern browsers along with that, instead of mandating an 
unreliable sniffing approach that *still* isn't implemented universally.



This is particularly essential for security -- undocumented
sniffing behavior has caused more than one vulnerability in the past.


Yes. Undocumented sniffing behaviour has caused many vulnerabilities, as 
even well-known sniffing behaviour continues to do (see the current 
publicised difficulties with CSS-inclusion attacks). Lack of sniffing 
behaviour, however, has never caused a vulnerability. It fails safe.


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-08 Thread Aryeh Gregor
On Tue, Sep 7, 2010 at 4:00 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/7/10 3:29 PM, Aryeh Gregor wrote:

 * Sniff only if Content-Type is typical of what popular browsers serve
 for unrecognized filetypes.  E.g., only for no Content-Type,
 text/plain, or application/octet-stream, and only if the encoding is
 either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
 do here.
 * Sniff the same both for video tags and top-level browsing contexts,
 so open video in new tab doesn't mysteriously fail on some setups.

 I could probably live with those, actually.

On Wed, Sep 8, 2010 at 5:33 AM, Philip Jägenstedt phil...@opera.com wrote:
 I agree with Boris, the first two points are OK but the third I'd rather not
 implement, it's too much work for something that ought to happen very, very
 rarely.

That sounds promising.  What do other implementers think?

 * If a file in a top-level browsing context is sniffed as video but
 then some kind of error is returned before the video plays the first
 frame, fall back to allowing the user to download it, or whatever the
 usual action would be if no sniffing had occurred.

 This might be pretty difficult to implement, since the video decoder might
 consume arbitrary amounts of data before saying that there was an error.

And the problem is that you don't want to keep the data handy in case
it fails?  Hopefully it makes no big difference, then.

On Wed, Sep 8, 2010 at 1:14 PM, David Singer sin...@apple.com wrote:
 what about don't sniff if the HTML gave you a mime type (i.e. a source 
 element with a type attribute), or at least don't sniff for the purposes of 
 determining CanPlay, dispatch, if the HTML source gave you a mime type?

What advantage does this serve?

On Wed, Sep 8, 2010 at 3:13 PM, And Clover and...@doxdesk.com wrote:
 Perhaps I *meant* to serve a non-video
 file with something that looks a fingerprint from a video format at the top.

Anything's possible, but it's vastly more likely that you just made a mistake.

 I suggest:

 1. Resources with no Content-Type continue to be fair game.

 2. By far the most prevalent maybe wrong Content-Type that is widely
 deployed is text/plain, due to inappropriate web server defaults (both IIS
 and Apache2.3). Allow sniffing of text/plain resources, but provide an
 override for server operators to say I mean it, this is really text/plain.
 ie. standardise X-Content-Type-Options or something like it.

 3. Sniffing should otherwise not occur in any context.

This is basically the same as what I suggested in my last post, right?
 Except for the allow opting out of sniffing, which would be great
but is orthogonal.

 It's already different in different browsers, even with the small number of
 filetypes we currently have.

Because it's not standardized.

 In any case, any sniffing solution will always be inconsistent as different
 browsers, OSes, installed codecs and options expose different media
 filetypes to the net.

I don't follow.  A standardized sniffing solution can be implemented
across the board.

 Never mind just browsers, or even browsers that simply pass the resource to
 their underlying media frameworks for sniffing: there are far more
 already-deployed media players with HTTP capability than there are browsers
 with video/audio support. There is no chance we will ever be able to
 standardise the implementation of sniffing amongst this wide range of
 agents!

 So there will always be non-compliant UAs. In the face of this, we might as
 well standardise the 'good' solution - minimal sniffing - and hope to drag a
 few modern browsers along with that, instead of mandating an unreliable
 sniffing approach that *still* isn't implemented universally.

I don't follow this logic.  If these media players want to work the
same as browsers, they'll implement the spec.  If they don't implement
the spec, it makes no difference what the spec says, so why even
consider them?

 Yes. Undocumented sniffing behaviour has caused many vulnerabilities, as
 even well-known sniffing behaviour continues to do (see the current
 publicised difficulties with CSS-inclusion attacks). Lack of sniffing
 behaviour, however, has never caused a vulnerability. It fails safe.

The CSS-inclusion attacks that I'm aware of involve @import-ing an
HTML document and observing what syntax errors occur.  There is no
sniffing that occurs there.  What attacks were you thinking of?


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-08 Thread Boris Zbarsky

On 9/8/10 3:58 PM, Aryeh Gregor wrote:

And the problem is that you don't want to keep the data handy in case
it fails?


Yes.  The problem is that I don't want to have to buffer up 
potentially-arbitrary amounts of data.



Yes. Undocumented sniffing behaviour has caused many vulnerabilities, as
even well-known sniffing behaviour continues to do (see the current
publicised difficulties with CSS-inclusion attacks). Lack of sniffing
behaviour, however, has never caused a vulnerability. It fails safe.


The CSS-inclusion attacks that I'm aware of involve @import-ing an
HTML document and observing what syntax errors occur.  There is no
sniffing that occurs there.


There sort of is.  There's the fact that for quirks documents the 
Content-Type for style sheet resources was ignored.  (Note that the 
syntax errors are not what the issue was about, btw.)


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-08 Thread David Singer

On Sep 8, 2010, at 12:58 , Aryeh Gregor wrote:
 
 On Wed, Sep 8, 2010 at 1:14 PM, David Singer sin...@apple.com wrote:
 what about don't sniff if the HTML gave you a mime type (i.e. a source 
 element with a type attribute), or at least don't sniff for the purposes of 
 determining CanPlay, dispatch, if the HTML source gave you a mime type?
 
 What advantage does this serve?

It both significantly reduces the footprint of sniffing (knocks out a whole 
load of cases), and clarifies that 'canplay' decisions don't need to sniff (so 
you don't sniff a whole bunch of different files).  'Non-configured servers' is 
a valid excuse for HTTP content-type being wrong (for a few cases), but I can't 
think of any reason to disbelieve the page author, can you?

 
 On Wed, Sep 8, 2010 at 3:13 PM, And Clover and...@doxdesk.com wrote:
 Perhaps I *meant* to serve a non-video
 file with something that looks a fingerprint from a video format at the top.
 
 Anything's possible, but it's vastly more likely that you just made a mistake.

It may be possible to make one file that is valid under two formats.  Kinda 
like those old competitions write a single file that when compiled and run 
through as many languages as possible prints hello, world! :-).


David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt
On Tue, 07 Sep 2010 02:46:29 +0200, Gregory Maxwell gmaxw...@gmail.com  
wrote:


On Mon, Sep 6, 2010 at 3:19 PM, Aryeh Gregor simetrical+...@gmail.com  
wrote:
On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt phil...@opera.com  
wrote:
The Ogg page begins with the 4 bytes OggS, which is what Opera  
(GStreamer)
checks for. For additional safety, one could also check for the  
trailing
version indicator, which ought to be a NULL byte for current Ogg. [1]  
[2]


OggS\0 as the first five bytes seems safe to check for.  It's rather
short, I guess because it's repeated on every page, but five bytes is
long enough that it should occur by random only negligibly often, in
either text or binary files.


Um... If you do that you will fail to capture on files that most other
ogg reading tools will happily capture on.  Common software will read
forward until it hits OggS then it will check the page CRC (in total,
9 bytes of capture).  For example, here is a file which begins with a
kilobyte of \0: http://myrandomnode.dyndns.org:8080/~gmaxwell/test.ogg
 Everything I had handy played it.

This could fail to capture on a live stream that didn't ensure new
listeners began at a page boundary. I don't know if any of these
exist.

I don't know if breaking these cases would matter much but herein lies
the danger of sniffing— everyone thinks they're an expert but no one
really has a handle on the implications.



Your test file is too short, perhaps it was truncated? I made my own one  
by adding 1024 NULL bytes to the beginning of  
http://v2v.cc/~j/theora_testsuite/320x240.ogg


That file doesn't play in Totem, because it (GStreamer) relies on  
sniffing. It also won't play in Opera for this reason, but I haven't seen  
any bug reports about failure to play similar files since Opera introduced  
support for Ogg. It does play in Firefox, but not in Chrome. Just like  
with WebM, I think browsers should not support files that begin with  
arbitrary amounts of garbage, as it requires reading the whole file before  
failing.


The file doesn't play in VLC or MPlayer, but does play in xine.

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 03:56:54 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/6/10 3:19 PM, Aryeh Gregor wrote:
On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedtphil...@opera.com   
wrote:
The Ogg page begins with the 4 bytes OggS, which is what Opera  
(GStreamer)
checks for. For additional safety, one could also check for the  
trailing
version indicator, which ought to be a NULL byte for current Ogg. [1]  
[2]


OggS\0 as the first five bytes seems safe to check for.  It's rather
short, I guess because it's repeated on every page, but five bytes is
long enough that it should occur by random only negligibly often, in
either text or binary files.


So if a text file starts with U+4F67 U+6753 (both CJK ideographs) and  
any ASCII character (can this happen in the real world?) you're OK with  
treating it as Ogg?  Same for files staring with U+674F U+5367 (both CJK  
ideographs) and any plane-0 character whose Unicode codepoint is 0 mod  
2^16 (plenty of CJK stuff like that)?  Is your CJK good enough that you  
know text files would never start like this, or are you just assuming  
that people who are silly enough to use UTF-16 for their text files and  
aren't in Europe don't matter?  Or that you don't care about people who  
happen to not use a BOM?


Thanks for pointing out these cases. I hadn't thought about it, but my CJK  
is good enough to say something about them:


'佧杓A' encoded in UTF-16BE is 'OggS\x00A'. However, 佧杓 is nonsensical  
in at least Chinese, neither character is among the 3000 most common  
characters [1]. Search results on Google (4) and Baidu (3) are nonsense  
too. I don't know if things are any different for Japanese, but given the  
Google results I doubt it.


'杏卧' encoded in UTF-16LE is 'OggS', and both of these characters are in  
the top 3000, but together they're nonsense: apricot crouch. (That's the  
same crouch as in Crouching Tiger, Hidden Dragon, but the order is wrong  
so it doesn't mean Crouching Apricot). In the Google and Baidu results,  
the only occurrence of the string seems to be in 一衫红杏卧江亭, which  
appears to be a theme of an apricot tree by a pavillion that appears in  
several paintings [2] [3] [4].


All in all, I wouldn't be more worried about this than the risk of random  
binary data matching. Also, UTF-16 isn't a very common encoding for  
simplified Chinese (卧 is a simplified character), GBK is dominant.


We could also add checking of the 6th byte, which should normally be 0x02  
for first page of logical bitstream (bos).



It looks like you could check for 0x1a 0x45 0xdf 0xa3 as the first
four bytes


U+1A45 is Thai, looks like.  DFA3 is a surrogate, so you're ok there.

U+451A is CJK.  U+A3DF looks like a Yi syllable, so you're more or less  
ok there too.  I'm assuming you've already checked this byte sequence  
out in UTF-8 and some other common encodings?


It's garbage in at least UTF-8, Big5 and GBK.

I'm not sure what infrastructure is in place, but perhaps one could *not*  
sniff if Content-Type also indicates an encoding? That way there's a  
solution for those who really want to display the hypothetical false  
positives as text.


[1] http://www.zein.se/patrick/3000char.html
[2]  
http://hi.baidu.com/%BC%C5%D5%AB/blog/item/f0ee8a4c5a5d0c02b3de05aa.html

[3] http://blog.sina.com.cn/s/blog_475be8240100ew5q.html
[4] http://www.zgddhj.cn/zj/bh/zhouhongyi/201007/32053.html

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread And Clover

On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more* 
sniffing, and even enshrining it in a web standard.


Sniffing is a perpetual disaster that, after several security-sensitive 
problems, web browsers have been moving to deprecate/mitigate. If 
browsers want to guess types when no Content-Type is specified(*) then 
fine, but there is no good reason to ignore an explicitly-set type. I 
don't want my `application/octet-stream` file download service to be 
repurposeable as a video player for some other party!


For reasons already argued about here, you will never make the results 
of content-sniffing reliable, so why bother to standardise it? A 
standardised unreliable feature is no better than an unstandardised one.


The typing mechanism of the web (and more) is Content-Type, period. 
There should be no confusion of this with officially-endorsed sniffing. 
That it is 'hard' for web authors to ensure the correct Content-Types 
are set is:


* not W3/WHATWG's problem. If web servers make adding Content-Type 
information hard, then web servers need to be updated to make it easier;


* not really true, at least for Apache which can allow AddType et al in 
the .htaccess files that low-end shared hosts use. This may not be 
widely-known or practised, but that doesn't really merit changing the 
standards for everyone else to cope with.


(*: or, the traditional reason for sniffing, `text/plain`, due to Apache 
inappropriately sending this type for unknown files by default, bug 
13986. That doesn't seem to apply here.)


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Julian Reschke

On 07.09.2010 11:51, And Clover wrote:

On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more*
sniffing, and even enshrining it in a web standard.


+1


Sniffing is a perpetual disaster that, after several security-sensitive
problems, web browsers have been moving to deprecate/mitigate. If
browsers want to guess types when no Content-Type is specified(*) then
fine, but there is no good reason to ignore an explicitly-set type. I
don't want my `application/octet-stream` file download service to be
repurposeable as a video player for some other party!


Hmm, that's what Content-Disposition: attachment is for...


...


Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:


On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more*  
sniffing, and even enshrining it in a web standard.


IE9, Safari and Chrome ignore Content-Type in a video context and rely  
on sniffing. If you want Content-Type to be respected, convince the  
developers of those 3 browsers to change. If not, it's quite inevitable  
that Opera and Firefox will eventually have to follow.


Sniffing is a perpetual disaster that, after several security-sensitive  
problems, web browsers have been moving to deprecate/mitigate.


For reasons already argued about here, you will never make the results  
of content-sniffing reliable, so why bother to standardise it? A  
standardised unreliable feature is no better than an unstandardised one.


Unless all browsers agree to respect Content-Type, the next best thing is  
to agree on the same sniffing. Why would leaving it undefined be better?



The typing mechanism of the web (and more) is Content-Type, period.


Only in theory. In practice, Content-Type is an unreliable indicator of  
the type of a resource. Sniffing is already part of the web architecture,  
with all its problems.


(*: or, the traditional reason for sniffing, `text/plain`, due to Apache  
inappropriately sending this type for unknown files by default, bug  
13986. That doesn't seem to apply here.)


It hasn't been explicitly stated, but I assume that the only cases where  
sniffing for video formats would be employed would be for missing  
Content-Type, text/plain and application/octet-stream.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Julian Reschke

On 07.09.2010 12:52, Philip Jägenstedt wrote:

...
IE9, Safari and Chrome ignore Content-Type in a video context and rely
on sniffing. If you want Content-Type to be respected, convince the
developers of those 3 browsers to change. If not, it's quite inevitable
that Opera and Firefox will eventually have to follow.
...


We have heard that Safari sniffs for compatibility with content 
previously consumed by Quicktime, and that IE9 may sniff because they 
(currently) can't pass the content-type to the decoding machinery (or 
something like that).


So you really would have to standardize sniffing in the browsers, but 
also in the components they delegate video display to. Good luck with that.


Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no.  Also not what at least 
some of the browsers implement.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 6:01 AM, Julian Reschke wrote:

Hmm, that's what Content-Disposition: attachment is for...


This header is currently ignored in non-toplevel browsing contexts in 
web browsers, last I checked.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 4:11 AM, Philip Jägenstedt wrote:

It's garbage in at least UTF-8, Big5 and GBK.


Thanks.  I assume that applies to the OggS\0 sequence too, right?  I 
appreciate the data!



I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1 
(thanks, Apache!), that should be reasonable, I think.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no.  Also not what at least  
some of the browsers implement.


Oops, I was talking about top-level contexts here. In a video context,  
always ignoring the Content-Type and always sniffing is the most sane  
solution (apart from always respecting Content-Type).


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 9:03 AM, Philip Jägenstedt wrote:

On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no. Also not what at
least some of the browsers implement.


Oops, I was talking about top-level contexts here. In a video context,
always ignoring the Content-Type and always sniffing is the most sane
solution (apart from always respecting Content-Type).


Yes, the suggestion Aryeh is making is that toplevel contexts should use 
the same sniffing algorithm as the video context and should sniff 
everything for video, completely ignoring the Content-Type header.


-Boris



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 14:56:38 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 4:11 AM, Philip Jägenstedt wrote:

It's garbage in at least UTF-8, Big5 and GBK.


Thanks.  I assume that applies to the OggS\0 sequence too, right?  I  
appreciate the data!


UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do  
real-world text documents include \0 bytes? (I don't know.)



I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1  
(thanks, Apache!), that should be reasonable, I think.


Are you saying that Apache has, at various times, set the default  
character encoding to UTF-8 or ISO-8859-1? I was hoping that no encoding  
parameter at all would be sent :/


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 9:16 AM, Philip Jägenstedt wrote:

UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do
real-world text documents include \0 bytes?


Yes.  Real-world text documents include all sorts of gunk.  Just rarely.


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1
(thanks, Apache!), that should be reasonable, I think.


Are you saying that Apache has, at various times, set the default
character encoding to UTF-8 or ISO-8859-1?


Yes, precisely.  Though the UTF-8 stuff was Linux distros, I think, not 
Apache itself (in that Apache just sent the thing passed to 
AddDefaultCharset and they changed the value of that from ISO-8859-1 to 
UTF-8 in their distro packages).  Here's the relevant comment from the 
Gecko source where we do our text-or-binary sniffing for toplevel contexts:


 Make sure to do a case-sensitive exact match comparison here.  Apache
 1.x just sends text/plain for unknown, while Apache 2.x sends
 text/plain with a ISO-8859-1 charset.  Debian's Apache version, just to
 be different, sends text/plain with iso-8859-1 charset.  For extra fun,
 FC7, RHEL4, and Ubuntu Feisty send charset=UTF-8.  Don't do general
 case-insensitive comparison, since we really want to apply this crap as
 rarely as we can.


I was hoping that no encoding parameter at all would be sent :/


Heh.  I've long since given up all hope of reason on this stuff; I just 
try to keep it as sane and predictable and simple as possible.  :(


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Maciej Stachowiak

On Sep 7, 2010, at 3:52 AM, Philip Jägenstedt wrote:

 On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:
 
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.
 
 IE9, Safari and Chrome ignore Content-Type in a video context and rely on 
 sniffing. If you want Content-Type to be respected, convince the developers 
 of those 3 browsers to change. If not, it's quite inevitable that Opera and 
 Firefox will eventually have to follow.

At least in the case of Safari, we initially added sniffing for the benefit of 
video types likely to be played with the QuickTime plugin - mainly .mov and 
various flavors of MPEG. It is common for these to be served with an incorrect 
MIME type. And we did not want to impose a high transition cost on content 
already being served via the QuickTime plugin. The QuickTime plugin may be a 
slightly less relevant consideration now than when we first thought about this, 
but at this point it is possible content has been migrated to video while 
still carrying broken MIME types.

Ogg and WebM are probably not yet poisoned by a mass of unlabeled data. It 
might be possible to treat those types more strictly - i.e. only play Ogg or 
WebM when labeled as such, and not ever sniff content with those MIME types as 
anything else.

In Safari's case this would have limited impact since a non-default codec 
plugin would need to be installed to play either Ogg or WebM. I'm also not sure 
it's sensible to have varying levels of strictness for different types. But 
it's an option, if we want to go there.

Regards,
Maciej



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread David Singer

On Sep 7, 2010, at 2:51 , And Clover wrote:

 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.

Yes.  We should be striving for a world in which as little sniffing as possible 
happens (and is needed).  Basically, we have the problem because of 
mis-configured or (from the author's point of view) unconfigurable web servers. 
 

So I wonder if
* the presence of a source element with a type attribute should be believed 
(at least for the purposes of dispatch and 'canplay' decisions)? If the author 
of the page got it wrong or lied, surely they can accept (and deal with) the 
consequences?
* whether we should only really sniff the two types in HTTP headers that tend 
to get used as fallbacks (application/octet-stream and text/plain)?  Though I 
note that I have sometimes *wanted* a file displayed as text (and not 
interpreted) and been defeated by sniffing (though not as often as watching 
binary dumped on my screen as if it were text).



David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread David Singer
And like I said before, please be careful of assuming our intent and desires 
from the way things currently work.  We are thinking, listening, and 
implementing (and fixing bugs, and re-inspecting older behavior in lower-level 
code), so there is some...flexibility...I think.

On Sep 7, 2010, at 9:12 , Maciej Stachowiak wrote:

 
 On Sep 7, 2010, at 3:52 AM, Philip Jägenstedt wrote:
 
 On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:
 
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.
 
 IE9, Safari and Chrome ignore Content-Type in a video context and rely on 
 sniffing. If you want Content-Type to be respected, convince the developers 
 of those 3 browsers to change. If not, it's quite inevitable that Opera and 
 Firefox will eventually have to follow.
 
 At least in the case of Safari, we initially added sniffing for the benefit 
 of video types likely to be played with the QuickTime plugin - mainly .mov 
 and various flavors of MPEG. It is common for these to be served with an 
 incorrect MIME type. And we did not want to impose a high transition cost on 
 content already being served via the QuickTime plugin. The QuickTime plugin 
 may be a slightly less relevant consideration now than when we first thought 
 about this, but at this point it is possible content has been migrated to 
 video while still carrying broken MIME types.
 
 Ogg and WebM are probably not yet poisoned by a mass of unlabeled data. It 
 might be possible to treat those types more strictly - i.e. only play Ogg or 
 WebM when labeled as such, and not ever sniff content with those MIME types 
 as anything else.
 
 In Safari's case this would have limited impact since a non-default codec 
 plugin would need to be installed to play either Ogg or WebM. I'm also not 
 sure it's sensible to have varying levels of strictness for different types. 
 But it's an option, if we want to go there.
 
 Regards,
 Maciej
 

David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth
On Tue, Sep 7, 2010 at 3:01 AM, Julian Reschke julian.resc...@gmx.de wrote:
 On 07.09.2010 11:51, And Clover wrote:
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 P.S. Sniffing is harder that you seem to think. It really is...

 Quite. It surprises and saddens me that anyone wants to argue for *more*
 sniffing, and even enshrining it in a web standard.

 +1

-1

It sadden me when standards bodies ignore reality and leave
implementors to invent their own non-iteroperable algorithms for
security-critical behavior.

Adam


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 3:19 PM, Adam Barth wrote:

It sadden me when standards bodies ignore reality and leave
implementors to invent their own non-iteroperable algorithms for
security-critical behavior.


Of course nothing prevents us from saying UAs MUST NOT sniff but if they 
do anyway they MUST use a given algorithm, right?


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Aryeh Gregor
On Tue, Sep 7, 2010 at 5:51 AM, And Clover and...@doxdesk.com wrote:
 Quite. It surprises and saddens me that anyone wants to argue for *more*
 sniffing, and even enshrining it in a web standard.

I'm not a fan of sniffing, but I'm also not a fan of blindly believing
clearly wrong MIME types and thereby forcing authors to do needless
configuration work, which they might not even be able to do.  I'm not
yet sure what the correct tradeoff is here, but I'm pretty sure it's
not no sniffing at all under any conditions.

 Sniffing is a perpetual disaster that, after several security-sensitive
 problems, web browsers have been moving to deprecate/mitigate. If browsers
 want to guess types when no Content-Type is specified(*) then fine, but
 there is no good reason to ignore an explicitly-set type. I don't want my
 `application/octet-stream` file download service to be repurposeable as a
 video player for some other party!

If you don't want that, you should be using access control, not MIME types.

 For reasons already argued about here, you will never make the results of
 content-sniffing reliable, so why bother to standardise it? A standardised
 unreliable feature is no better than an unstandardised one.

Sure it is, because it's unreliable in the same way across all
browsers.  That means that in any given case, all browsers will work
the same.  This is particularly essential for security -- undocumented
sniffing behavior has caused more than one vulnerability in the past.

 The typing mechanism of the web (and more) is Content-Type, period. There
 should be no confusion of this with officially-endorsed sniffing.

We already have officially endorsed sniffing where web compat requires it:

http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#content-type-sniffing
http://tools.ietf.org/html/draft-abarth-mime-sniff-05

The question is if we can avoid it for new content types like
video/audio.  If not, we should spec it in advance so we at least have
something that's as sane as possible under the circumstances.

 That it is
 'hard' for web authors to ensure the correct Content-Types are set is:

 * not W3/WHATWG's problem. If web servers make adding Content-Type
 information hard, then web servers need to be updated to make it easier;

I don't know about the W3C, but reality is the WHATWG's problem.  We
can't let things be broken and just say it's someone else's fault.  We
need to institute workarounds at our level for failures on other
levels if that's what's necessary to get good security and a good
user/author experience.

 * not really true, at least for Apache which can allow AddType et al in the
 .htaccess files that low-end shared hosts use. This may not be widely-known
 or practised, but that doesn't really merit changing the standards for
 everyone else to cope with.

Creating a .htaccess file is a technical procedure that most users
will not know how to do, particularly since the problem will probably
just manifest itself as the video doesn't work.  It's also not
possible on some hosts -- although it's certainly possible on the
large majority of cheap shared hosts, and of course on hosts where the
author has root access.

On Tue, Sep 7, 2010 at 6:52 AM, Philip Jägenstedt phil...@opera.com wrote:
 It hasn't been explicitly stated, but I assume that the only cases where
 sniffing for video formats would be employed would be for missing
 Content-Type, text/plain and application/octet-stream.

If those are the only common MIME types incorrectly served for unknown
file types, that seems reasonable.  (Some files might be actively
misidentified, like if I have an Ogg file saved as .jpeg, but
hopefully this will be very rare.)

On Tue, Sep 7, 2010 at 8:56 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/7/10 4:11 AM, Philip Jägenstedt wrote:
 It's garbage in at least UTF-8, Big5 and GBK.

 Thanks.  I assume that applies to the OggS\0 sequence too, right?  I
 appreciate the data!

 I'm not sure what infrastructure is in place, but perhaps one could
 *not* sniff if Content-Type also indicates an encoding?

 As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1
 (thanks, Apache!), that should be reasonable, I think.

So at least for Ogg and WebM, how about:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.
* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.

Within these constraints, false positives in the sniffing 

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 3:29 PM, Aryeh Gregor wrote:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.


I could probably live with those, actually.


* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder 
might consume arbitrary amounts of data before saying that there was an 
error.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 3:29 PM, Aryeh Gregor wrote:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.


I could probably live with those, actually.


* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder 
might consume arbitrary amounts of data before saying that there was an 
error.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth
On Tue, Sep 7, 2010 at 12:21 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/7/10 3:19 PM, Adam Barth wrote:
 It sadden me when standards bodies ignore reality and leave
 implementors to invent their own non-iteroperable algorithms for
 security-critical behavior.

 Of course nothing prevents us from saying UAs MUST NOT sniff but if they do
 anyway they MUST use a given algorithm, right?

That's a contrary to duty imperative, which is something that's been
puzzling philosophers for centuries.  A more sensible requirement
would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
they do, they MUST use a the following algorithm.

Adam


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

Of course nothing prevents us from saying UAs MUST NOT sniff but if they do
anyway they MUST use a given algorithm, right?


That's a contrary to duty imperative, which is something that's been
puzzling philosophers for centuries.  A more sensible requirement
would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
they do, they MUST use a the following algorithm.


Except that in practice SHOULD NOT is treated as carte blanche to do the 
undesirable thing.  It has no teeth.  MUST NOT doesn't much either, but 
it's _something_ at least (in the sense that one can clearly claim that 
violating a MUST NOT is a bug).


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth
On Tue, Sep 7, 2010 at 2:13 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Of course nothing prevents us from saying UAs MUST NOT sniff but if they
 do
 anyway they MUST use a given algorithm, right?

 That's a contrary to duty imperative, which is something that's been
 puzzling philosophers for centuries.  A more sensible requirement
 would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
 they do, they MUST use a the following algorithm.

 Except that in practice SHOULD NOT is treated as carte blanche to do the
 undesirable thing.  It has no teeth.  MUST NOT doesn't much either, but it's
 _something_ at least (in the sense that one can clearly claim that violating
 a MUST NOT is a bug).

In any case, lawyering the requirement level in the spec isn't the way
to solve these problems.  You need to change the underlying incentives
to actually affect what gets implemented.

Adam


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 5:35 PM, Adam Barth wrote:

In any case, lawyering the requirement level in the spec isn't the way
to solve these problems.  You need to change the underlying incentives
to actually affect what gets implemented.


The incentive structure for pretty much any sort of sniffing is a 
prisoner's dilemma.  Life's hard.


-Boris



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-06 Thread Philip Jägenstedt
On Sun, 05 Sep 2010 21:59:09 +0200, Aryeh Gregor  
simetrical+...@gmail.com wrote:



On Fri, Sep 3, 2010 at 11:48 PM, Boris Zbarsky bzbar...@mit.edu wrote:


Is this a reasonable supposition?  What are these byte sequences for the
container formats at hand?  (Say WebM's restricted Matroska container,
whatever container format is supported for H.264 by IE and Chrome, and  
Ogg;

we'll ignore the generic Matroska weirdness for now.)


I don't know, which is why I'm considering a hypothetical.  If someone
who knows better could step up with this piece of info, that would be
helpful.


The Ogg page begins with the 4 bytes OggS, which is what Opera  
(GStreamer) checks for. For additional safety, one could also check for  
the trailing version indicator, which ought to be a NULL byte for current  
Ogg. [1] [2]


For WebM, the first 4 bytes are the EBML header: the bytes 0x1A, 0x45,  
0xDF, 0xA3. [3] The EBML DocType in the header must be webm. Since  
parsing the EBML header is a little bit complicated, Opera (GStreamer)  
simply checks for the string webm somewhere in the header. I've heard  
rumors that WebM files are allowed to contain arbitrary garbage before the  
EBML header, but this is something we happily ignore, i.e., such files  
would fail to play in Opera, regardless of MIME type. I haven't  
encountered any such files yet, and think that browsers should not support  
this feature.


[1] http://www.xiph.org/ogg/doc/framing.html#page_header
[2] http://www.xiph.org/ogg/doc/rfc3533.txt
[3] http://ebml.sourceforge.net/specs/

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-06 Thread Aryeh Gregor
On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt phil...@opera.com wrote:
 The Ogg page begins with the 4 bytes OggS, which is what Opera (GStreamer)
 checks for. For additional safety, one could also check for the trailing
 version indicator, which ought to be a NULL byte for current Ogg. [1] [2]

OggS\0 as the first five bytes seems safe to check for.  It's rather
short, I guess because it's repeated on every page, but five bytes is
long enough that it should occur by random only negligibly often, in
either text or binary files.

 For WebM, the first 4 bytes are the EBML header: the bytes 0x1A, 0x45, 0xDF,
 0xA3. [3] The EBML DocType in the header must be webm. Since parsing the
 EBML header is a little bit complicated, Opera (GStreamer) simply checks for
 the string webm somewhere in the header. I've heard rumors that WebM files
 are allowed to contain arbitrary garbage before the EBML header, but this is
 something we happily ignore, i.e., such files would fail to play in Opera,
 regardless of MIME type. I haven't encountered any such files yet, and think
 that browsers should not support this feature.

 [1] http://www.xiph.org/ogg/doc/framing.html#page_header
 [2] http://www.xiph.org/ogg/doc/rfc3533.txt
 [3] http://ebml.sourceforge.net/specs/

It looks like you could check for 0x1a 0x45 0xdf 0xa3 as the first
four bytes, followed by 0x42 0x82 0x84 webm somewhere in the first
255 bytes or whatever.  (0x42 0x82 is the DocType marker, and 0x84 is
the length, encoded UTF-8 style: 1 for a one-byte length, 010 for
the actual length.)  That seems very safe.  If WebM allows degenerate
stuff that makes sniffing hard, we can just prohibit it in the WebM
spec, I assume.


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-06 Thread Gregory Maxwell
On Mon, Sep 6, 2010 at 3:19 PM, Aryeh Gregor simetrical+...@gmail.com wrote:
 On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt phil...@opera.com wrote:
 The Ogg page begins with the 4 bytes OggS, which is what Opera (GStreamer)
 checks for. For additional safety, one could also check for the trailing
 version indicator, which ought to be a NULL byte for current Ogg. [1] [2]

 OggS\0 as the first five bytes seems safe to check for.  It's rather
 short, I guess because it's repeated on every page, but five bytes is
 long enough that it should occur by random only negligibly often, in
 either text or binary files.

Um... If you do that you will fail to capture on files that most other
ogg reading tools will happily capture on.  Common software will read
forward until it hits OggS then it will check the page CRC (in total,
9 bytes of capture).  For example, here is a file which begins with a
kilobyte of \0: http://myrandomnode.dyndns.org:8080/~gmaxwell/test.ogg
 Everything I had handy played it.

This could fail to capture on a live stream that didn't ensure new
listeners began at a page boundary. I don't know if any of these
exist.

I don't know if breaking these cases would matter much but herein lies
the danger of sniffing— everyone thinks they're an expert but no one
really has a handle on the implications.


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-05 Thread Aryeh Gregor
On Fri, Sep 3, 2010 at 5:05 PM, David Singer sin...@apple.com wrote:
 Um, I think that in some cases the code that is supporting video/audio has 
 ... historical artefacts ... which may not be entirely in line with the 
 specs.  I think it's dangerous to make assumptions in this area, especially 
 if you then go and ask for a change in a spec. based on assumptions.

Okay, okay, I'll try to avoid stating assumptions like that, at least
about people on the list.  :)  So never mind that point.  (Although I
was mostly thinking of IE, not Chrome and certainly not Safari.)  I
think sniffing is a good idea even if we could get everyone to agree
not to sniff.

On Fri, Sep 3, 2010 at 11:48 PM, Boris Zbarsky bzbar...@mit.edu wrote:
   Okay, you're being too theoretical for me.  Let's say we have

 fingerprints for all the major video types, of the form check if the
 first X bytes match this very simple pattern.  Let's say the spec
 says that whenever processing the response to an HTTP request,
 browsers must act as though they executed the sniffing algorithm and,
 if it sniffs as a video type, they must treat it the same as if the
 Content-Type matched the sniffed type.

 OK, so context-independent?  Note that not a single browser implements this
 today.

Either context-independent, or specified to occur only in certain key
contexts like video/top-level browsing context.  No browser
implements my suggested behavior today, but I think we all agree it's
confusing/harmful to only sniff for video and not top-level browsing
contexts too, because it breaks all sorts of expected behavior (open
in new tab, copy video URL, etc.).

 Is this a reasonable supposition?  What are these byte sequences for the
 container formats at hand?  (Say WebM's restricted Matroska container,
 whatever container format is supported for H.264 by IE and Chrome, and Ogg;
 we'll ignore the generic Matroska weirdness for now.)

I don't know, which is why I'm considering a hypothetical.  If someone
who knows better could step up with this piece of info, that would be
helpful.

 Might be a good idea to ask the IE team, the Chrome team, and the Safari
 team why they're not sniffing in toplevel browsing contexts...  I believe
 there's been at least one answer from a Chrome developer on that already,
 though.

That would also be helpful information.  Andrew Scherkus made it sound
like Chrome wouldn't necessarily object to sniffing on top-level
browsing contexts, just that it would have to be sandboxed (although
I'm not sure why).

 Sure, but it's early days in implementation.  Note, also, that I believe
 it's 3 browsers, not 2.

 . . .

 Some of these changes take time (e.g. having to rejigger quicktime to allow
 you to no sniff while using it).  So is it that they have not changed, or
 that they have no plans to change, ever?

 . . .

 Such changes have happened in the past (e.g. for stylesheets, and for
 toplevel browsing contexts).  Why is this case different?

Okay, so maybe I'm too pessimistic.  :)  Regardless of this point, I
still think sniffing consistently is the best solution, *if* it can be
done reliably -- i.e., given the assumptions I gave in my sketch of a
proposal (easily-checked fingerprints that make text matches
impossible and binary matches of negligible likelihood).  If those
assumptions hold, would you agree that consistently sniffing is a
better idea than honoring clearly incorrect MIME types, assuming we
could get implementers to agree one way or the other?  If not, why
not?  I don't see significant downsides, and the upside of actually
being able to have stuff work without configuring MIME types seems
big.


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-03 Thread Aryeh Gregor
On Thu, Sep 2, 2010 at 4:41 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Well, serving up data as text/plain for it to be readable is one.  I agree
 that for the specific case of video this is not a big deal.

Yes, I'm talking specifically about that.  Sniffing in other cases (in
particular, text formats) may be a bad idea.

 Why are you assuming that?

Because blocking an entire MIME type seems like it would be massive
overkill . . . but if that's a real use-case, well, okay.  It still
can't be *too* hard to check the first few bytes of the contents.
They must do it anyway if they implement this for images, right?

 There are proposals for standardizing several different types of sniffing,
 with the one used being context-dependent.  A proxy wouldn't have the
 context.

 It can all be made to work by erring on the side of blocking more stuff, but
 then you get to the point where the proxy makes it impossible to use the
 browser altogether, and then it's not a viable solution to the problem at
 hand.

Okay, you're being too theoretical for me.  Let's say we have
fingerprints for all the major video types, of the form check if the
first X bytes match this very simple pattern.  Let's say the spec
says that whenever processing the response to an HTTP request,
browsers must act as though they executed the sniffing algorithm and,
if it sniffs as a video type, they must treat it the same as if the
Content-Type matched the sniffed type.  (You could limit the scope of
that somewhat for ease of implementation if you like, but at least for
video plus top-level browsing contexts.)  Also suppose that the
fingerprints include byte sequences that cannot occur in normal text
encodings, and that they're long enough that random false positives
are extremely unlikely.  What's the problem with this specific
proposal?

 Put another way: the problem here is not that browsers sniff.  It's
 that browsers don't behave interoperably or predictably.  Speccing a
 precise sniffing algorithm that everyone's willing to follow allows
 proxies to reliably know what browsers will do with it.  What will
 cause problems is what you seem to be arguing for -- *not* speccing
 sniffing

 Er... Where did I propose this?  I proposed speccing that there MUST NOT be
 any sniffing, with browsers that sniff therefore being nonconformant.  I
 didn't propose allowing ad-hoc sniffing.

Right.  But the spec never allowed sniffing, and two browsers do it
anyway.  Ian has spoken to those browsers' implementers, and the
browsers have not changed, despite knowing that they aren't following
the spec.  Do you have any particular reason to believe that they'll
change?  If not, then the situation I described is exactly what your
proposal (i.e., the status quo) will result in, no?

 Only if consistent includes consistent across all contexts (which no
 one is proposing to either specify or implement).

Could you comment specifically on the behavior I outlined above?  It's
entirely possible that I'm missing a lot of subtleties here.


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-03 Thread David Singer

On Sep 3, 2010, at 12:48 , Aryeh Gregor wrote:

 Er... Where did I propose this?  I proposed speccing that there MUST NOT be
 any sniffing, with browsers that sniff therefore being nonconformant.  I
 didn't propose allowing ad-hoc sniffing.
 
 Right.  But the spec never allowed sniffing, and two browsers do it
 anyway.  Ian has spoken to those browsers' implementers, and the
 browsers have not changed, despite knowing that they aren't following
 the spec.  Do you have any particular reason to believe that they'll
 change?  If not, then the situation I described is exactly what your
 proposal (i.e., the status quo) will result in, no?
 

Um, I think that in some cases the code that is supporting video/audio has ... 
historical artefacts ... which may not be entirely in line with the specs.  I 
think it's dangerous to make assumptions in this area, especially if you then 
go and ask for a change in a spec. based on assumptions.

David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-03 Thread Boris Zbarsky

On 9/3/10 3:48 PM, Aryeh Gregor wrote:

Why are you assuming that?


Because blocking an entire MIME type seems like it would be massive
overkill . . . but if that's a real use-case, well, okay.  It still
can't be *too* hard to check the first few bytes of the contents.
They must do it anyway if they implement this for images, right?


Yes.

But note that for some video formats checking the first few bytes is not 
sufficient.  In fact, some video container formats can have arbitrary 
length prefixes before the actual video data starts.  Of course if 
sniffers are just restricted to the first few bytes that might be ok.


  Okay, you're being too theoretical for me.  Let's say we have

fingerprints for all the major video types, of the form check if the
first X bytes match this very simple pattern.  Let's say the spec
says that whenever processing the response to an HTTP request,
browsers must act as though they executed the sniffing algorithm and,
if it sniffs as a video type, they must treat it the same as if the
Content-Type matched the sniffed type.


OK, so context-independent?  Note that not a single browser implements 
this today.



Also suppose that the fingerprints include byte sequences that cannot occur in 
normal text
encodings


Is this a reasonable supposition?  What are these byte sequences for the 
container formats at hand?  (Say WebM's restricted Matroska container, 
whatever container format is supported for H.264 by IE and Chrome, and 
Ogg; we'll ignore the generic Matroska weirdness for now.)



and that they're long enough that random false positives
are extremely unlikely.  What's the problem with this specific
proposal?


Might be a good idea to ask the IE team, the Chrome team, and the Safari 
team why they're not sniffing in toplevel browsing contexts...  I 
believe there's been at least one answer from a Chrome developer on that 
already, though.



Er... Where did I propose this?  I proposed speccing that there MUST NOT be
any sniffing, with browsers that sniff therefore being nonconformant.  I
didn't propose allowing ad-hoc sniffing.


Right.  But the spec never allowed sniffing, and two browsers do it
anyway.


Sure, but it's early days in implementation.  Note, also, that I believe 
it's 3 browsers, not 2.



Ian has spoken to those browsers' implementers, and the
browsers have not changed, despite knowing that they aren't following
the spec.


Some of these changes take time (e.g. having to rejigger quicktime to 
allow you to no sniff while using it).  So is it that they have not 
changed, or that they have no plans to change, ever?



Do you have any particular reason to believe that they'll
change?


Such changes have happened in the past (e.g. for stylesheets, and for 
toplevel browsing contexts).  Why is this case different?



Only if consistent includes consistent across all contexts (which no
one is proposing to either specify or implement).


Could you comment specifically on the behavior I outlined above?


The behavior you outlined above is consistent in this sense, yes.

-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-02 Thread Aryeh Gregor
On Thu, Sep 2, 2010 at 12:21 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/1/10 4:46 PM, Aryeh Gregor wrote:
 Is this realistically possible unless the author deliberately crafts
 the file?

 I'm not an audio/video format expert; I have no idea.  Does it matter?

Yes.  If false positives were realistically possible by accident, that
would count strongly against sniffing.  If they're not, that at least
is not an issue.

 Why is it not a problem if there are suddenly use cases that are impossible
 because the browser will ignore the author's intent?

Which use-cases?

 have any issues ever been caused by this kind of sniffing problem?

 As far as I know, yes (of the remotely take control of the computer kind).

 Are there clear problems that have arisen in other cases?

 See above.

 The problem can't plausibly arise with media

 files -- if you can execute a vulnerability via getting the user to
 view a media file, it's probably via arbitrary code execution.  In
 that case you don't need to disguise yourself, just get the viewer to
 go to your own website and do whatever you want, since there are no
 same-domain restrictions.

 See above about people who take steps to protect themselves when problems
 like this arise and would be screwed over by sniffing.

Okay, but we're talking about standardizing sniffing in a spec.  As
long as browsers' behavior in processing a given resource is
well-defined and reliable, a proxy could work fine by just
implementing the same algorithm.  There's no reason that the proxy has
to only look at MIME types, is there?  It simplifies the proxy a bit,
but not much.  It will already have to do some content sniffing to
identify what content is dangerous, unless it's just going to block
everything of that file type (which I'm assuming isn't the case).

Put another way: the problem here is not that browsers sniff.  It's
that browsers don't behave interoperably or predictably.  Speccing a
precise sniffing algorithm that everyone's willing to follow allows
proxies to reliably know what browsers will do with it.  What will
cause problems is what you seem to be arguing for -- *not* speccing
sniffing, so that browsers that sniff do so in an ad hoc, undefined
manner that's difficult to predict.  For the use-case of filtering
exploits, it doesn't really matter what the behavior is, so long as
it's consistent.  Or am I missing something here?


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-02 Thread Boris Zbarsky

On 9/2/10 3:53 PM, Aryeh Gregor wrote:

Why is it not a problem if there are suddenly use cases that are impossible
because the browser will ignore the author's intent?


Which use-cases?


Well, serving up data as text/plain for it to be readable is one.  I 
agree that for the specific case of video this is not a big deal.



Okay, but we're talking about standardizing sniffing in a spec.  As
long as browsers' behavior in processing a given resource is
well-defined and reliable, a proxy could work fine by just
implementing the same algorithm.  There's no reason that the proxy has
to only look at MIME types, is there?  It simplifies the proxy a bit,
but not much.  It will already have to do some content sniffing to
identify what content is dangerous, unless it's just going to block
everything of that file type (which I'm assuming isn't the case).


Why are you assuming that?

There are proposals for standardizing several different types of 
sniffing, with the one used being context-dependent.  A proxy wouldn't 
have the context.


It can all be made to work by erring on the side of blocking more stuff, 
but then you get to the point where the proxy makes it impossible to use 
the browser altogether, and then it's not a viable solution to the 
problem at hand.



Put another way: the problem here is not that browsers sniff.  It's
that browsers don't behave interoperably or predictably.  Speccing a
precise sniffing algorithm that everyone's willing to follow allows
proxies to reliably know what browsers will do with it.  What will
cause problems is what you seem to be arguing for -- *not* speccing
sniffing


Er... Where did I propose this?  I proposed speccing that there MUST NOT 
be any sniffing, with browsers that sniff therefore being nonconformant. 
 I didn't propose allowing ad-hoc sniffing.



For the use-case of filtering
exploits, it doesn't really matter what the behavior is, so long as
it's consistent.


Only if consistent includes consistent across all contexts 
(which no one is proposing to either specify or implement).


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Philip Jägenstedt

On Tue, 31 Aug 2010 09:36:00 +0200, Ian Hickson i...@hixie.ch wrote:


On Mon, 19 Jul 2010, Philip Jägenstedt wrote:


I've tested Firefox 3.6.4, Firefox 4.0b1 and Chrome 5.0.375.99 and none
return maybe for canPlayType(application/octet-stream). I couldn't
get meaningful results from Safari on Windows (requires restart to
detect QuickTime, perhaps?).

It would appear that Opera is the only browser that supports
application/octet-stream. At the time I added this, it was simply
because it is true, maybe we can play it. However, I see no practical
benefit of this spec-wise or implementation-wise. Since no other
browsers have implemented it, I am going to remove it from Opera and
hope that the spec will be changed to match this.


Agreed. I've changed the spec to match.


I never did make that change, instead waiting for the outcome of this  
discussion. Note that since Opera uses the same code path for checking the  
argument to canPlayType and for the Content-Type header, the change would  
also have meant that videos served as application/octet-stream would stop  
working, in violation of the spec.



On Thu, 22 Jul 2010, Philip Jägenstedt wrote:


Chrome and Safari ignore the MIME type altogether, in my opinion if we
align with that we should do it full out, not just by adding text/plain
to the whitelist, as that would either require (a)
canPlayType(text/plain) to return maybe or (b) different code paths
for checking the MIME type in Content-Type and for canPlayType.


On Thu, 22 Jul 2010, Maciej Stachowiak wrote:


I don't think canPlayType(text/plain) has to return maybe. It's not
useful for a Web developer to test for the browser's ability to sniff to
overcome a bad MIME type. canPlayType should be thought of as testing
whether the browser could play a media resource that is really of a
given type, rather than labeled with that type over HTTP.


On Fri, 23 Jul 2010, Philip Jägenstedt wrote:


Right, it certainly isn't useful, I'm just pointing out that this is
what happens if one adds text/plain to the list of maybe codecs rather
than ignoring Content-Type altogether, which is the only thing you can
do within the bounds of the current spec to get text/plain to play. The
only 3 serious options I know are still the ones I outlined in my
earlier email.


canPlayType() is now hardcoded as not supporting application/octet-stream
even though that type is otherwise not considered one that isn't  
supported

(i.e. is a type that sniffs).


I'm not very happy with special-casing application/octet-stream only for  
canPlayType, especially as it only handles the exact string  
application/octet-stream, not e.g. application/octet-stream; which  
would instead be put through the same code path as Content-Type and return  
maybe.


At this point the least complex solution seems to be to ignore the  
Content-Type header and unless the teams behind Chrome, Safari and IE9  
have a sudden change of hearts it's the only realistic outcome. Perhaps we  
should also encourage authors to not send the Content-Type header at all,  
to remove any illusions of it having an effect.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Philip Jägenstedt
On Wed, 01 Sep 2010 02:59:54 +0200, Andrew Scherkus  
scher...@chromium.org wrote:



On Tue, Aug 31, 2010 at 12:59 PM, Aryeh Gregor
simetrical+...@gmail.comsimetrical%2b...@gmail.com

wrote:


On Tue, Aug 31, 2010 at 10:35 AM, Boris Zbarsky bzbar...@mit.edu  
wrote:

 You can't sniff in a toplevel browser window.  Not the same way that
people
 are sniffing in video.  It would break the web.

How so?  For the sake of argument, suppose you sniff only for known
binary video/audio types, and fall back to existing behavior if the
type isn't one of those (e.g., not video or audio).  Do people do
things like link to MP3 files with incorrect MIME types and no
Content-Disposition, and expect them to download?  If so, don't people
also link to MP3 files with correct MIME types and expect the same?  I
don't see how sniffing vs. using MIME type makes a compatibility
difference here, since media support in browsers is so new -- surely
whatever bad thing happens, sniffing will make it happen more often,
at worst.

What do Chrome and IE do here?



We use the incoming MIME type to determine whether we render the  
audio/video
in the browser versus download.  We would never want to execute  
multimedia
sniffing code in the trusted/browser process so implementing sniffing  
for a

top level browser window would involve sending the bytes to a sandboxed
process for inspection first.


Can you elaborate on this? What would be the problem with sniffing in this  
context?


This does have a side effect where a video may play fine on a page  
with a
bogus MIME type (due to sniffing), but viewing the video URL in the  
browser

itself would prompt a download.


If we start ignoring the Content-Type I expect we would also add sniffing  
so that opening a video served with the wrong (or missing) Content-Type  
still works in a top-level browsing context, as it does for images (I  
think).


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Brian Campbell
On Aug 31, 2010, at 9:40 AM, Boris Zbarsky wrote:

 On 8/31/10 3:36 AM, Ian Hickson wrote:
 You might say Hey, but aren't you content sniffing then to find the
 codecs and you'd be right. But in this case we're respecting the MIME
 type sent by the server - it tells the browser to whatever level of
 detail it wants (including codecs if needed) what type it is sending. If
 the server sends 'text/plain' or 'video/x-matroska' I wouldn't expect a
 browsers to sniff it for Ogg content.
 
 The Microsoft guys responded to my suggestion that they might want to
 implement something like this with what's the benefit of doing that?.
 
 One obvious benefit is that videos with the wrong type will not work, and 
 hence videos will be sent with the right type.

What makes you say this? Even if they are sent with the right type initially, 
the correct types are at high risk of bitrotting.

The big problem with MIME types is that they don't stick to files very well. 
So, while someone might get them working when they initially use video, if they 
move to a different web server, or upgrade their server, or someone mirrors 
their video, or any of a number of other things, they might lose the proper 
association of files and MIME types.

The real problem is that there is no standard way of storing and transmitting 
file type metadata on the majority of filesystems and majority of internet 
protocols, meaning that people need to maintain separate databases of MIME 
types, which are extremely easy to lose when moving between web servers. Until 
this problem is fixed (and this is a pretty big problem, even Apple gave up on 
tracking file type metadata years ago due to it's incompatibility with how 
other systems work), it will simply be too hard to maintain working 
Content-Type headers, and sniffing will be much more likely to produce the 
effects that the authors intended.

It seems that periodically, web standards bodies decide this time, if we're 
strict, people will just get the content right or it won't work (such as XHTML 
with XML parsing rules), and invariably, people manage to screw it up anyhow. 
Sure, when the author tests their page the first time it's fine, but a mistaken 
lack of quoting in a comments field breaks the whole page. This causes people 
to migrate to the browsers or technologies that are less strict, and actually 
show the user what they want to see, rather than just breaking due to something 
out of the user's control.

-- Brian

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Boris Zbarsky

On 9/1/10 4:12 AM, Philip Jägenstedt wrote:

If we start ignoring the Content-Type I expect we would also add
sniffing so that opening a video served with the wrong (or missing)
Content-Type still works in a top-level browsing context, as it does for
images (I think).


It can't possibly work for images.  If I send a file as text/html, and 
you load it from an img then you will render it as an image (possibly 
a broken one).  If you load it from a toplevel browsing context you will 
render it as text/html, even if it's image data (where you possibly 
excludes IE/Windows, which will do some sniffing in that situation).


-Boris



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Philip Jägenstedt

On Wed, 01 Sep 2010 15:14:10 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/1/10 4:12 AM, Philip Jägenstedt wrote:

If we start ignoring the Content-Type I expect we would also add
sniffing so that opening a video served with the wrong (or missing)
Content-Type still works in a top-level browsing context, as it does for
images (I think).


It can't possibly work for images.  If I send a file as text/html, and  
you load it from an img then you will render it as an image (possibly  
a broken one).  If you load it from a toplevel browsing context you will  
render it as text/html, even if it's image data (where you possibly  
excludes IE/Windows, which will do some sniffing in that situation).


Huh, I guessed incorrectly, neither serving a PNG as text/plain or  
text/html makes it be sniffed and rendered in a top-level browsing context  
in Opera. However, both work in IE8.


Why do you say that it can't possibly work? Are there any security risks  
with the browser potentially interpreting a plain text or HTML document  
and failing to decode it? Anything else?


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Boris Zbarsky

On 9/1/10 10:23 AM, Philip Jägenstedt wrote:

Huh, I guessed incorrectly, neither serving a PNG as text/plain or
text/html makes it be sniffed and rendered in a top-level browsing
context in Opera. However, both work in IE8.

Why do you say that it can't possibly work?


That was a statement about the current implementation state of opera, 
not about future possibilities.



Are there any security risks
with the browser potentially interpreting a plain text or HTML document


Yes, actually, if there's a filtering proxy trying to screen out video 
or image data that's trying to exploit known OS-level bugs, say.  But I 
had assumed, based on the rest of this discussion, that people simply 
didn't care about that.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Boris Zbarsky

On 9/1/10 9:13 AM, Brian Campbell wrote:

It seems that periodically, web standards bodies decide this time, if we're strict, 
people will just get the content right or it won't work (such as XHTML with XML 
parsing rules), and invariably, people manage to screw it up anyhow. Sure, when the 
author tests their page the first time it's fine, but a mistaken lack of quoting in a 
comments field breaks the whole page. This causes people to migrate to the browsers or 
technologies that are less strict, and actually show the user what they want to see, 
rather than just breaking due to something out of the user's control.


I hasn't actually happened for MIME types in toplevel documents (modulo 
the one known workaround for a common server issue with text/plain).  By 
and large, browsers don't sniff toplevel browsing contexts, and the one 
browser that does has been losing market share.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Julian Reschke

On 01.09.2010 10:12, Philip Jägenstedt wrote:

...
If we start ignoring the Content-Type I expect we would also add
sniffing so that opening a video served with the wrong (or missing)
Content-Type still works in a top-level browsing context, as it does for
images (I think).
...


Sniffing in the *absence* of a content type is fine. The interesting 
question is what to do when it's present, but wrong.


Best regards, Julian



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Julian Reschke

On 01.09.2010 16:23, Philip Jägenstedt wrote:

...
Huh, I guessed incorrectly, neither serving a PNG as text/plain or
text/html makes it be sniffed and rendered in a top-level browsing
context in Opera. However, both work in IE8.
...


Please don't say work when talking about something that's not supposed 
to happen...


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Julian Reschke

On 01.09.2010 15:13, Brian Campbell wrote:

On Aug 31, 2010, at 9:40 AM, Boris Zbarsky wrote:


On 8/31/10 3:36 AM, Ian Hickson wrote:

You might say Hey, but aren't you content sniffing then to find the
codecs and you'd be right. But in this case we're respecting the MIME
type sent by the server - it tells the browser to whatever level of
detail it wants (including codecs if needed) what type it is sending. If
the server sends 'text/plain' or 'video/x-matroska' I wouldn't expect a
browsers to sniff it for Ogg content.


The Microsoft guys responded to my suggestion that they might want to
implement something like this with what's the benefit of doing that?.


One obvious benefit is that videos with the wrong type will not work, and hence 
videos will be sent with the right type.


What makes you say this? Even if they are sent with the right type initially, 
the correct types are at high risk of bitrotting.

The big problem with MIME types is that they don't stick to files very well. 
So, while someone might get them working when they initially use video, if they 
move to a different web server, or upgrade their server, or someone mirrors 
their video, or any of a number of other things, they might lose the proper 
association of files and MIME types.
...


That's true, and the reason why people still use file extensions.

That's not super elegant, but it works.

Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Adrian Sutton
On 1 Sep 2010, at 15:45, Julian Reschke wrote:
 The big problem with MIME types is that they don't stick to files very well. 
 So, while someone might get them working when they initially use video, if 
 they move to a different web server, or upgrade their server, or someone 
 mirrors their video, or any of a number of other things, they might lose the 
 proper association of files and MIME types.
 ...
 
 That's true, and the reason why people still use file extensions.
 
 That's not super elegant, but it works.


Given that there is a very limited set of video formats that are supported 
anyway, wouldn't it be reasonable to just identify or define the standard 
file extensions then work with server vendors to update their standard file 
extension to mime type definitions to include that.  While adoption and 
upgrading to the new versions would obviously take time, that applies to the 
video tag itself anyway and is just a temporary source of pain.

Regards,

Adrian Sutton.
__
Adrian Sutton, CTO
UK: +44 1 628 353 032  US: +1 (650) 292 9659 x717
Ephox http://www.ephox.com/
Ephox Blogs http://people.ephox.com/, Personal Blog http://www.symphonious.net/






Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Zachary Ozer
On Wed, Sep 1, 2010 at 10:51 AM, Adrian Sutton adrian.sut...@ephox.com wrote:
 Given that there is a very limited set of video formats that are supported
 anyway, wouldn't it be reasonable to just identify or define the standard
 file extensions then work with server vendors to update their standard file
 extension to mime type definitions to include that.  While adoption and
 upgrading to the new versions would obviously take time, that applies to the
 video tag itself anyway and is just a temporary source of pain.

At first glance, my eyes almost popped out of my sockets when I saw
this suggestion. Using the file extension?! He must be mad!

Then I remembered that our Flash player *has* to use file extension
since the MIME type isn't available in Flash. Turns out that file
extension is a pretty good indicator, but it doesn't work for custom
server configurations where videos don't have extensions, ala YouTube.
For that reason, we allow users to override whatever we detect with a
type configuration parameter.

Ultimately, the question is, What are we trying to accomplish?

I think we're trying to make it easy for content creators to guarantee
that their content is available to all viewers regardless of their
browser.

If that's the case, I'd actually suggest that the browsers *strictly*
follow the MIME type, with the source type as a override, and
eliminating all sniffing (assuming that the file container format
contains the codec meta-data). If a publisher notices that their video
isn't working, they can either update their server's MIME type
mapping, or just hard code the type in the HTML. Neither is that time
consuming / difficult.

Moreover, as Adrian suggested, it's probably quite easy to get the big
HTTP servers (Apache, IIS, nginx, lighttpd) to add the new extensions
(if they haven't already), so this would gradually become less and
less of an issue.

Best,

Zach
--
Zachary Ozer
Developer, LongTail Video

w: longtailvideo.com • e: z...@longtailvideo.com • p: 212.244.0140 •
f: 212.656.1335
JW Player  |  Bits on the Run  |  AdSolution


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Eric Carlson

On Aug 31, 2010, at 4:01 PM, Ian Hickson wrote:

 On Tue, 31 Aug 2010, Eric Carlson wrote:
 On Aug 31, 2010, at 12:36 AM, Ian Hickson wrote:
 
 Safari does crazy things right now that we won't go into; for the 
 purposes of this discussion we'll assume Safari can change.
 
 What crazy things does Safari do that it should not?
 
 I forget the details, but IIRC one of the main problems was that it was 
 based on the URL's file extension exclusively.
 
  No, I don't see how you came to that conclusion. 

  QuickTime knows how to create a movie from a text file (to make it easy to 
create captions, chapters, etc), but it also assumes a file served as 
text/plain may be coming from a misconfigured server. Therefore, when it gets 
a file served as text/plain it first looks at the file content and/or  the 
file extension to see if it is a movie file. It opens it as text only if it 
doesn't look like a movie.

  In your test page (http://hixie.ch/tests/adhoc/html/video/002.html), all four 
movies have correct extensions but are served as text/plain:

!DOCTYPE HTML
titletext/plain video files/title
p video autoplay controls src=resources/text.txt/video
p video autoplay controls src=resources/text.webm/video 
p video autoplay controls src=resources/text.m4v/video
p video autoplay controls src=resources/text.ogv/video

  When the shipping version of Safari opens this page the MPEG-4 file opens 
correctly, and opens the other three as text (if you wait long enough) 
because by default QuickTime doesn't know how to open the Ogg or WebM files. If 
you add QuickTime importers for WebM and Ogg, those file will be opened as 
movies instead of as text because of the file extensions, despite the fact 
that they are serve as text.

  FWIW, in nightly builds we are now configuring QuickTime so it won't ever 
open files it identifies as text.

eric




Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Eric Carlson

On Sep 1, 2010, at 9:07 AM, Zachary Ozer wrote:

 On Wed, Sep 1, 2010 at 10:51 AM, Adrian Sutton adrian.sut...@ephox.com 
 wrote:
 Given that there is a very limited set of video formats that are supported
 anyway, wouldn't it be reasonable to just identify or define the standard
 file extensions then work with server vendors to update their standard file
 extension to mime type definitions to include that.  While adoption and
 upgrading to the new versions would obviously take time, that applies to the
 video tag itself anyway and is just a temporary source of pain.
 
 At first glance, my eyes almost popped out of my sockets when I saw
 this suggestion. Using the file extension?! He must be mad!
 
 Then I remembered that our Flash player *has* to use file extension
 since the MIME type isn't available in Flash. Turns out that file
 extension is a pretty good indicator, but it doesn't work for custom
 server configurations where videos don't have extensions, ala YouTube.
 For that reason, we allow users to override whatever we detect with a
 type configuration parameter.
 
 Ultimately, the question is, What are we trying to accomplish?
 
 I think we're trying to make it easy for content creators to guarantee
 that their content is available to all viewers regardless of their
 browser.
 
 If that's the case, I'd actually suggest that the browsers *strictly*
 follow the MIME type, with the source type as a override, and
 eliminating all sniffing (assuming that the file container format
 contains the codec meta-data). If a publisher notices that their video
 isn't working, they can either update their server's MIME type
 mapping, or just hard code the type in the HTML.
 

  Hard coding the type is only possible if the element uses a source element, 
@type isn't allowed on audio or video.

 Neither is that time consuming / difficult.
 
  It isn't hard to update a server if you control it, but it can be *very* 
difficult and time consuming if you don't (as is the case with most web 
developers, I assume).


 Moreover, as Adrian suggested, it's probably quite easy to get the big
 HTTP servers (Apache, IIS, nginx, lighttpd) to add the new extensions
 (if they haven't already), so this would gradually become less and
 less of an issue.
 
  Really? Your company specializes in web video and flv files have been around 
for years, but your own server still isn't configured for it:

eric% curl -I http://content.longtailvideo.com/videos/flvplayer.flv;
HTTP/1.1 200 OK
Server-Status: load=0
Content-Type: application/octet-stream
Accept-Ranges: bytes
ETag: 4288394655
Last-Modified: Wed, 23 Jun 2010 20:42:28 GMT
Content-Length: 2533148
Date: Wed, 01 Sep 2010 16:16:28 GMT
Server: bit_asic/3.8/r8s1-bitcast-b


eric



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Zachary Ozer
On Wed, Sep 1, 2010 at 12:29 PM, Eric Carlson eric.carl...@apple.com wrote:
   Hard coding the type is only possible if the element uses a source
 element, @type isn't allowed on audio or video.

Why isn't type allowed for video and audio? I know it doesn't
strictly make sense (since the tag doesn't have a type per-se), but
perhaps it could be an alias for the current item's type, much in the
same way src is the current source.

   It isn't hard to update a server if you control it, but it can be *very*
 difficult and time consuming if you don't (as is the case with most web
 developers, I assume).

Correct - but being able to manually specify type should be fine for
those situations, since that can be written into the HTML itself.

   Really? Your company specializes in web video and flv files have been
 around for years, but your own server still isn't configured for it:

Thanks for the heads up on this. However, I think this reemphasizes my
original point: The Flash platform *isn't* strict about MIME types, so
we've never bothered to do anything about it.


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Ian Hickson
On Wed, 1 Sep 2010, Julian Reschke wrote:
 On 01.09.2010 16:23, Philip Jägenstedt wrote:
  ...
  Huh, I guessed incorrectly, neither serving a PNG as text/plain or
  text/html makes it be sniffed and rendered in a top-level browsing
  context in Opera. However, both work in IE8.
 
 Please don't say work when talking about something that's not supposed 
 to happen...

For the record, in the context of the WHATWG mailing list, saying work 
here is fine. What's important is the user experience, not strict 
adherence to specifications.

In the case of the HTML spec, I'll change it to match what user agents 
implement. As mentioned earlier in the thread, for now I'm happy to give 
cover to Firefox and Opera (and hopefully Chrome and Safari) to more 
closely honour the Content-Type headers, but if the conclusion from 
implementors is that following Microsoft's route towards simply ignoring 
Content-Type with video as we do with img, that's fine.

As far as sniffing for top-level browsing contexts goes, my understanding 
is that Adam is still working on the relevant spec, and it would not be a 
problem to add common video formats to that algorithm so that we can get 
interoperable handling of mislabeled content.

(Currently, text/html won't ever sniff as binary IIRC, but text/plain, in 
certain cases, will. We could also make text/html sniff as binary if it 
turns out that this would be particularly helpful for Web compat.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Boris Zbarsky

On 9/1/10 2:51 PM, Ian Hickson wrote:

(Currently, text/html won't ever sniff as binary IIRC, but text/plain, in
certain cases, will.


Will sniff as binary so as not to render as text but will NOT, last I 
checked, render as an image or whatnot (for good security reasons, imho).


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Aryeh Gregor
On Tue, Aug 31, 2010 at 4:13 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 The issue would be someone linking to text or HTML or a binary blob that
 happens to have some bits at the beginning that look like an audio/video
 types and expecting them to be rendered respectivel as text or HTML or be
 downloaded.

Is this realistically possible unless the author deliberately crafts
the file?  We're talking quite a few bytes that have to be exactly
right, no?  If the author does deliberately craft the file, is there
any security risk in displaying it unexpectedly, given that media
isn't scriptable?

 The big danger with sniffing, as always, is that the server will think one
 thing will happen and suddenly the browser will do something totally
 different.

As long as what the browser is doing is almost certain to be closer to
the author's/user's/webmaster's intent, that's not a problem.
Sniffing is a problem if you risk false positives or security issues,
but I can't see how that's an issue in this specific case.  We have a
lot of experience with the perils of sniffing -- have any issues ever
been caused by this kind of sniffing problem?  The only sniffing
problems I know of are when

1) The sniffing is unreliable, so false identifications happen by
accident.  They're common with MIME types too, but at least with MIME
they're more predictable.  This will hold for pretty much any text
format, if only because you might want to serve the file as text/plain
to mean let the user view the source code instead of executing it.
But with binary formats it doesn't have to be plausible, if the string
you're sniffing for is reasonably long.

2) The MIME type is safe (e.g., not scriptable), and the type it's
sniffed as is not safe (e.g., it's HTML or JAR).  Then even if false
identifications are overwhelmingly improbable by accident, they'll
happen when people upload malicious files posing as an image or
whatever to get code to execute from a domain they don't control.

Are there clear problems that have arisen in other cases?

On Tue, Aug 31, 2010 at 8:59 PM, Andrew Scherkus scher...@chromium.org wrote:
 We use the incoming MIME type to determine whether we render the audio/video
 in the browser versus download.  We would never want to execute multimedia
 sniffing code in the trusted/browser process so implementing sniffing for a
 top level browser window would involve sending the bytes to a sandboxed
 process for inspection first.

Why can't you do media sniffing in the trusted process?  It must be a
lot simpler than parsing HTTP headers -- just a memcmp() or two per
format, if the format is designed so it can be sniffed well.

On Wed, Sep 1, 2010 at 12:27 AM, Gregory Maxwell gmaxw...@gmail.com wrote:
 Aggressive sniffing can and has resulted in some pretty nasty security bugs.

 E.g. an attacker crafts an input that a website identifies as video
 and permits the upload but which a browser sniffs out to be a java jar
 which can then access the source URL with the permissions of the user.

This is problem (2) above.  The solution is never to sniff for
scriptable content.  The problem can't plausibly arise with media
files -- if you can execute a vulnerability via getting the user to
view a media file, it's probably via arbitrary code execution.  In
that case you don't need to disguise yourself, just get the viewer to
go to your own website and do whatever you want, since there are no
same-domain restrictions.

 The sniffing rules, in some contexts and some browsers can also end up
 causing surprising failures... e.g. I've seen older versions of some
 sniffing heavy browsers automatically switch into UCS-2LE encoding at
 wrong and surprising times. Perhaps this is irrelevant in a video
 specific discussion of sniffing— but it is a hazard with sniffing in
 general.

Is this plausible in practice for common media formats?  I didn't find
info on sniffing media by quick Googling, but for instance, GIF starts
with GIF87a or GIF89a, and PNG has an eight-byte signature.
Random binary data is going to hit these one time in 2^48 or 2^64,
about 10^14 and 10^19 respectively.  The actual figure is likely to be
even lower, because most binary formats don't have arbitrary data in
their first few bytes.  Is this really something we should worry
about, given how obviously hard it is to get MIME types right?

 Moreover, it'll never be consistent from implementation to
 implementation, which seems to me to be pretty antithetical to
 standardization in general.

The exact sniffing algorithm needs to be precisely specced.  In fact,
there's work undergoing to do that right now, for other types of
sniffing:

http://tools.ietf.org/html/draft-abarth-mime-sniff-05

There's no reason it can't be perfectly consistent.  The reason it's
historically been inconsistent is because specs have tried to claim
that no sniffing is allowed, so implementers had no spec to follow.
Which is what's in the HTML5 spec now, and it's a mistake.

On Wed, Sep 1, 2010 at 10:37 

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Silvia Pfeiffer
On Thu, Sep 2, 2010 at 12:38 AM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 9/1/10 9:13 AM, Brian Campbell wrote:

 It seems that periodically, web standards bodies decide this time, if
 we're strict, people will just get the content right or it won't work (such
 as XHTML with XML parsing rules), and invariably, people manage to screw it
 up anyhow. Sure, when the author tests their page the first time it's fine,
 but a mistaken lack of quoting in a comments field breaks the whole page.
 This causes people to migrate to the browsers or technologies that are less
 strict, and actually show the user what they want to see, rather than just
 breaking due to something out of the user's control.


 I hasn't actually happened for MIME types in toplevel documents (modulo the
 one known workaround for a common server issue with text/plain).  By and
 large, browsers don't sniff toplevel browsing contexts, and the one browser
 that does has been losing market share.


sureley that's not the reason it's losing market share ;-)

S.


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Boris Zbarsky

On 9/1/10 10:59 PM, Silvia Pfeiffer wrote:

I hasn't actually happened for MIME types in toplevel documents
(modulo the one known workaround for a common server issue with
text/plain).  By and large, browsers don't sniff toplevel browsing
contexts, and the one browser that does has been losing market share.

sureley that's not the reason it's losing market share ;-)


My point is that the if you don't sniff all your users will leave 
argument is overly simplistic.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Boris Zbarsky

On 9/1/10 4:46 PM, Aryeh Gregor wrote:

On Tue, Aug 31, 2010 at 4:13 PM, Boris Zbarskybzbar...@mit.edu  wrote:

The issue would be someone linking to text or HTML or a binary blob that
happens to have some bits at the beginning that look like an audio/video
types and expecting them to be rendered respectivel as text or HTML or be
downloaded.


Is this realistically possible unless the author deliberately crafts
the file?


I'm not an audio/video format expert; I have no idea.  Does it matter?


If the author does deliberately craft the file, is there
any security risk in displaying it unexpectedly, given that media
isn't scriptable?


Yes; media codecs (including image decoders) are one of the most common 
sources or remote attacks on operating systems via web browsers.  So 
showing some image or video is always a risk.  If you have a known 
unpatched vulnerability and try to defend against it by blocking the 
relevant content, then sniffing can defeat the block, leading to exploits.


Note that this means that only people/organizations that are proactive 
about their security will have their vulnerability window made bigger by 
sniffing; everyone else is already screwed in the above situation no 
matter whether browsers sniff.



As long as what the browser is doing is almost certain to be closer to
the author's/user's/webmaster's intent, that's not a problem.


Why is it not a problem if there are suddenly use cases that are 
impossible because the browser will ignore the author's intent?



Sniffing is a problem if you risk false positives or security issues,


That's one case where it's a problem, yes.


but I can't see how that's an issue in this specific case.


See above.


have any issues ever been caused by this kind of sniffing problem?


As far as I know, yes (of the remotely take control of the computer kind).


Are there clear problems that have arisen in other cases?


See above.

 The problem can't plausibly arise with media

files -- if you can execute a vulnerability via getting the user to
view a media file, it's probably via arbitrary code execution.  In
that case you don't need to disguise yourself, just get the viewer to
go to your own website and do whatever you want, since there are no
same-domain restrictions.


See above about people who take steps to protect themselves when 
problems like this arise and would be screwed over by sniffing.



Yes, actually, if there's a filtering proxy trying to screen out video or
image data that's trying to exploit known OS-level bugs, say.


It seems like such a proxy would be unreliable in any event, since you
could do all sorts of things to obfuscate it, not least of all just
using HTTPS.


There's no reason such a proxy couldn't block https (and it has to, to 
work correctly, as you point out).



Do any such proxies exist?


In the past, yes.  I haven't checked in the last year or two.


On Wed, Sep 1, 2010 at 3:54 PM, Boris Zbarskybzbar...@mit.edu  wrote:

Will sniff as binary so as not to render as text but will NOT, last I
checked, render as an image or whatnot (for good security reasons, imho).


What reasons are these?


See above about not sneaking dangerous file formats past filtering software.

-Boris


[whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Ian Hickson

Quick terminology point: in this e-mail, I use the term sniff to mean 
examine the first few bytes of a resource, and determine its type 
heuristically in contrast with assuming that the type of a file is that 
given by its MIME type (or, heaven forfend, the file extension).

On Thu, 20 May 2010, Robert O'Callahan wrote:

 I just became aware that application/octet-stream is excluded from being 
 a type the user agent knows it cannot render. 
 http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#a-type-that-the-user-agent-knows-it-cannot-render
  
 Apparently this was done in response to a bug report: 
 http://www.w3.org/Bugs/Public/show_bug.cgi?id=7977
 Neither the bug report nor the editor's response give any indication why 
 this change was made.

On Thu, 20 May 2010, Simon Pieters wrote:
 
 Apparently this was done in response to:
 
 http://lists.w3.org/Archives/Public/public-html/2009Jul/0888.html
 http://html5.org/tools/web-apps-tracker?from=3497to=3498

On Thu, 20 May 2010, Robert O'Callahan wrote:
 
 This change means files served with application/octet-stream will make 
 it all the way to the step If the media data can be fetched but is 
 found by inspection to be in an unsupported format ..., so 
 implementations have to add support for binary sniffing for all the 
 types they support. We didn't need this before in Gecko. What was the 
 motivation for adding this implementation requirement?

On Thu, 20 May 2010, Robert O'Callahan wrote:
 
 Hmm. I guess it doesn't add any implementation requirements beyond what 
 you need to handle the complete absence of a Content-Type (which we 
 currently don't handle, but I suppose we should). So this isn't really a 
 problem.

As Simon says, it was added based on a request from Mozilla engineers. :-)


On Thu, 20 May 2010, Philip J�genstedt wrote:
 
 For the record, Opera implements canPlayType(application/octet-stream) and
 canPlayType(application/octet-stream; codecs=foo) as per spec (maybe and
  respectively), but I don't have any strong opinions about it.

On Thu, 20 May 2010, Simon Pieters wrote:

 This bug report was about application/octet-stream *with parameters*, 
 e.g. application/octet-stream; codecs=theora, vorbis. The spec had the 
 requirement about application/octet-stream before that bug report.

On Thu, 20 May 2010, Simon Pieters wrote:
 
 The spec requires binary sniffing for all the types you support even 
 without the application/octet-stream requirement, since a WebM file 
 labelled as video/ogg should play if both video/webm and video/ogg are 
 supported.

Currently that is indeed the case, but only Opera does it.

I spoke to various browser vendor engineers about this.

Microsoft's position is, as far as I can tell, that there's no point 
looking at the Content-Type header, so they always sniff the types of the 
video data once they've decided to try downloading it. I have no idea what 
algorithms they use, or how they would handle cases like Matroska (which 
doesn't have a guaranteed finite signature at the start of the file).

Safari does crazy things right now that we won't go into; for the 
purposes of this discussion we'll assume Safari can change.

Chrome right now sniffs like IE, modulo some bugs with text/plain that 
affect (only) the UI, but engineers tell me they're willing to change.

Opera does what the spec suggests. That is, if the type isn't supported 
then it is treated as a signal to give up without waiting for data to 
sniff; if the type _is_ supported then it isn't trusted, the data is 
obtained and examined to determine its real type.

Mozilla respects the Content-Type religiously, even if it gets data in a 
type it supports labelled with another type it supports, but engineers 
tell me they're willing to change.

As I see it we have two possible destinations: we can do what the spec 
says now, which is a somewhat reasonable (IMHO) compromise between 
religiously following the Content-Type, and being pragmatic about people 
getting the type wrong sometimes; or we can do what IE9 will do, and just 
always sniff once we've decided to try looking at the data, always 
ignoring the Content-Type headers. I think we all know where we're going 
to end up, but for now I've left the spec as is.


On Thu, 20 May 2010, David Singer wrote:

 Did anyone revise the registration of application/octet-stream to add 
 parameters?

On Thu, 20 May 2010, Simon Pieters wrote:
 
 No. It's just error handling.

On Thu, 20 May 2010, David Singer wrote:

 It's an error to have a parameter that isn't valid for the mime type, so 
 are you suggesting (a) that you throw away the parameter as it's invalid 
 or (b) since it's an error to supply application/octet-stream as the 
 mime type in the first place, we may as well process its invalid 
 parameter in an attempt to recover?

On Thu, 20 May 2010, Simon Pieters wrote:
 
 I'm just suggesting that it should be defined what to do when you get 
 application/octet-stream with 

Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Julian Reschke

On 31.08.2010 09:36, Ian Hickson wrote:

Fromhttp://greenbytes.de/tech/webdav/rfc2046.html#rfc.section.1:

Parameters are modifiers of the media subtype, and as such do not
fundamentally affect the nature of the content. The set of meaningful
parameters depends on the media type and subtype. Most parameters are
associated with a single specific subtype. However, a given top-level
media type may define parameters which are applicable to any subtype of
that type. Parameters may be required by their defining media type or
subtype or they may be optional. MIME implementations must also ignore
any parameters whose names they do not recognize.

So, as codecs is not defined on application/octet-stream, the
parameter simply should be ignored, thus the advice [...]:

The MIME type application/octet-stream with no parameters is never a
type that the user agent knows it cannot render. User agents must treat
that type as equivalent to the lack of any explicit Content-Type
metadata when it is used to label a potential media resource.

Note: In the absence of a specification to the contrary, the MIME type
application/octet-stream when used with parameters, e.g.
application/octet-stream;codecs=theora, is a type that the user agent
knows it cannot render.

is incorrect, because it requires handling application/octet-stream
and application/octet-stream;codecs=theora differently.


That's not incorrect. The type with no parameters is a special case that
corresponds to a common configuration default. The case with parameters is
not that case, and represents likely intentional configuration and thus
clearly not a video format the UA supports.


My point is that it's incorrect to make this distinction, and that it's 
furthermore misleading to mention the codecs parameter in the context 
of a type that doesn't define it.



It's also not clear whether the note applies to all parameters or just
codecs.


The normative text you quote doesn't mention any specific parameters.


In which case it would be a *bit* clearer if the note used a parameter 
that doesn't suggest that codecs has any meaning on a/o.



Regarding codecs= in particular, it's an implementation reality that
user agents that support it are likely to support it regardless of the
type, so there's really no point trying to maintain an artificial boundary
of which types it has semantics for and which it doesn't.


David Singer pointed out in 
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10202#c11 that this is 
the wrong thing to do.


Do you have any evidence that UAs already use codecs on types on which 
they aren't defined, *and*, if this is the case, they can't be changed 
anymore?


Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Boris Zbarsky

On 8/31/10 3:36 AM, Ian Hickson wrote:

You might say Hey, but aren't you content sniffing then to find the
codecs and you'd be right. But in this case we're respecting the MIME
type sent by the server - it tells the browser to whatever level of
detail it wants (including codecs if needed) what type it is sending. If
the server sends 'text/plain' or 'video/x-matroska' I wouldn't expect a
browsers to sniff it for Ogg content.


The Microsoft guys responded to my suggestion that they might want to
implement something like this with what's the benefit of doing that?.


One obvious benefit is that videos with the wrong type will not work, 
and hence videos will be sent with the right type.


If the question is what the benefits of that are, one is that the view 
video in new window context menu option actually works.


Another benefit is that you can send someone the link to the video, 
instead of the embedding page, and it will work.


Another is that when you save the video to disk the browser will fix up 
the extension correctly, if needed.


Basically, getting the types right means that use of the video outside 
the video tag won't be broken.  Inside the video tag there's 
probably no difference.



It seems that sniffing is context-sensitive.


Yes, but one issue is that we really do want resources to be usable 
outside the context the page happens to want to put them in.


The ship has sailed on img, clearly, and is working on sailing on 
video, but I feel that the behavior IE and Chrome are implementing 
here is highly detrimental to the web.  Not that they care much.



Sadly, the boat has sailed for text/html and XML at this point, but for
binary types, and for contexts where text/plain isn't a contender, why
bother doing anything but sniff?


See above.  As long as some contexts are sniffing and some are not, we 
have a problem.  If it were all-sniff (with the same algorithm across 
the board!) or all-not-sniff, we might be ok.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Anne van Kesteren

Devil's advocate.

On Tue, 31 Aug 2010 15:40:18 +0200, Boris Zbarsky bzbar...@mit.edu wrote:

On 8/31/10 3:36 AM, Ian Hickson wrote:

The Microsoft guys responded to my suggestion that they might want to
implement something like this with what's the benefit of doing that?.


One obvious benefit is that videos with the wrong type will not work,  
and hence videos will be sent with the right type.


If the question is what the benefits of that are, one is that the view  
video in new window context menu option actually works.


If you sniff you can sniff there too.


Another benefit is that you can send someone the link to the video,  
instead of the embedding page, and it will work.


If you sniff you can sniff there too. (Unless that user uses a  
competitor's browser, but that would be an incentive to encourage that  
user to use the sniffing browser.)



Another is that when you save the video to disk the browser will fix up  
the extension correctly, if needed.


If you sniff you can fix it up correctly too.


--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Julian Reschke

On 31.08.2010 15:57, Anne van Kesteren wrote:

...

Another is that when you save the video to disk the browser will fix
up the extension correctly, if needed.


If you sniff you can fix it up correctly too.
...


Then let's hope that sniffing doesn't recognize Windows binaries.

Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Aryeh Gregor
On Tue, Aug 31, 2010 at 3:36 AM, Ian Hickson i...@hixie.ch wrote:
 The Microsoft guys responded to my suggestion that they might want to
 implement something like this with what's the benefit of doing that?.
 It's a tough question, in this context, given that there's no possibilty
 of script execution or other privilege escalation with the types we're
 talking about (currently, anyway).

If you can't come up with any actual problems with what IE is doing,
then why is anything else even being considered?  There's a very
clear-cut problem with relying on MIME types: MIME types are often
wrong and hard for authors to configure, and this is not going to
change anytime soon.

 Sadly, the boat has sailed for text/html and XML at this point, but for
 binary types, and for contexts where text/plain isn't a contender, why
 bother doing anything but sniff?

If this is your position, why doesn't the spec match it?

On Tue, Aug 31, 2010 at 10:35 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 You can't sniff in a toplevel browser window.  Not the same way that people
 are sniffing in video.  It would break the web.

How so?  For the sake of argument, suppose you sniff only for known
binary video/audio types, and fall back to existing behavior if the
type isn't one of those (e.g., not video or audio).  Do people do
things like link to MP3 files with incorrect MIME types and no
Content-Disposition, and expect them to download?  If so, don't people
also link to MP3 files with correct MIME types and expect the same?  I
don't see how sniffing vs. using MIME type makes a compatibility
difference here, since media support in browsers is so new -- surely
whatever bad thing happens, sniffing will make it happen more often,
at worst.

What do Chrome and IE do here?


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Boris Zbarsky

On 8/31/10 3:59 PM, Aryeh Gregor wrote:

On Tue, Aug 31, 2010 at 10:35 AM, Boris Zbarskybzbar...@mit.edu  wrote:

You can't sniff in a toplevel browser window.  Not the same way that people
are sniffing invideo.  It would break the web.


How so?  For the sake of argument, suppose you sniff only for known
binary video/audio types, and fall back to existing behavior if the
type isn't one of those (e.g., not video or audio).  Do people do
things like link to MP3 files with incorrect MIME types and no
Content-Disposition, and expect them to download?


The issue would be someone linking to text or HTML or a binary blob that 
happens to have some bits at the beginning that look like an audio/video 
types and expecting them to be rendered respectivel as text or HTML or 
be downloaded.



I don't see how sniffing vs. using MIME type makes a compatibility
difference here, since media support in browsers is so new -- surely
whatever bad thing happens, sniffing will make it happen more often,
at worst.


The big danger with sniffing, as always, is that the server will think 
one thing will happen and suddenly the browser will do something totally 
different.



What do Chrome and IE do here?


Good question.

-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Andrew Scherkus
On Tue, Aug 31, 2010 at 12:59 PM, Aryeh Gregor
simetrical+...@gmail.comsimetrical%2b...@gmail.com
 wrote:

 On Tue, Aug 31, 2010 at 10:35 AM, Boris Zbarsky bzbar...@mit.edu wrote:
  You can't sniff in a toplevel browser window.  Not the same way that
 people
  are sniffing in video.  It would break the web.

 How so?  For the sake of argument, suppose you sniff only for known
 binary video/audio types, and fall back to existing behavior if the
 type isn't one of those (e.g., not video or audio).  Do people do
 things like link to MP3 files with incorrect MIME types and no
 Content-Disposition, and expect them to download?  If so, don't people
 also link to MP3 files with correct MIME types and expect the same?  I
 don't see how sniffing vs. using MIME type makes a compatibility
 difference here, since media support in browsers is so new -- surely
 whatever bad thing happens, sniffing will make it happen more often,
 at worst.

 What do Chrome and IE do here?


We use the incoming MIME type to determine whether we render the audio/video
in the browser versus download.  We would never want to execute multimedia
sniffing code in the trusted/browser process so implementing sniffing for a
top level browser window would involve sending the bytes to a sandboxed
process for inspection first.

This does have a side effect where a video may play fine on a page with a
bogus MIME type (due to sniffing), but viewing the video URL in the browser
itself would prompt a download.

Andrew


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Gregory Maxwell
On 8/31/10, Aryeh Gregor simetrical+...@gmail.com wrote:
 If you can't come up with any actual problems with what IE is doing,
 then why is anything else even being considered?  There's a very
 clear-cut problem with relying on MIME types: MIME types are often
 wrong and hard for authors to configure, and this is not going to
 change anytime soon.

Aggressive sniffing can and has resulted in some pretty nasty security bugs.

E.g. an attacker crafts an input that a website identifies as video
and permits the upload but which a browser sniffs out to be a java jar
which can then access the source URL with the permissions of the user.

The sniffing rules, in some contexts and some browsers can also end up
causing surprising failures... e.g. I've seen older versions of some
sniffing heavy browsers automatically switch into UCS-2LE encoding at
wrong and surprising times. Perhaps this is irrelevant in a video
specific discussion of sniffing— but it is a hazard with sniffing in
general.  Moreover, it'll never be consistent from implementation to
implementation, which seems to me to be pretty antithetical to
standardization in general.


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-31 Thread Adam Barth
On Tue, Aug 31, 2010 at 9:27 PM, Gregory Maxwell gmaxw...@gmail.com wrote:
 On 8/31/10, Aryeh Gregor simetrical+...@gmail.com wrote:
 If you can't come up with any actual problems with what IE is doing,
 then why is anything else even being considered?  There's a very
 clear-cut problem with relying on MIME types: MIME types are often
 wrong and hard for authors to configure, and this is not going to
 change anytime soon.

 Aggressive sniffing can and has resulted in some pretty nasty security bugs.

 E.g. an attacker crafts an input that a website identifies as video
 and permits the upload but which a browser sniffs out to be a java jar
 which can then access the source URL with the permissions of the user.

Indeed.  However, that would be an issue with the browser sniffing for
jars, not an issue with the browser sniffing for video.

 The sniffing rules, in some contexts and some browsers can also end up
 causing surprising failures... e.g. I've seen older versions of some
 sniffing heavy browsers automatically switch into UCS-2LE encoding at
 wrong and surprising times. Perhaps this is irrelevant in a video
 specific discussion of sniffing— but it is a hazard with sniffing in
 general.  Moreover, it'll never be consistent from implementation to
 implementation, which seems to me to be pretty antithetical to
 standardization in general.

Why will sniffing never be consistent?  We need only step up as a
community and spec things that implementors are willing to implement.
Inoperability suffers when we insist on specing things that
implementors refuse to implement.

Adam


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-19 Thread Julian Reschke

On 18.08.2010 13:47, Julian Reschke wrote:




In the meantime, Ian did some test, see

  http://krijnhoetmer.nl/irc-logs/whatwg/20100819#l-28

and

  http://hixie.ch/tests/adhoc/html/video/001.html

Ian, any chance you could tests for *absent* content type?

Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-08-18 Thread Julian Reschke

On 20.05.2010 20:53, Simon Pieters wrote:

On Thu, 20 May 2010 20:18:43 +0200, David Singer sin...@apple.com wrote:


It's an error to have a parameter that isn't valid for the mime type,
so are you suggesting (a) that you throw away the parameter as it's
invalid or (b) since it's an error to supply application/octet-stream
as the mime type in the first place, we may as well process its
invalid parameter in an attempt to recover?


I'm just suggesting that it should be defined what to do when you get
application/octet-stream with parameters. I don't care which handling
that is, or whether it's valid or why the specific handling was chosen.


Picking up an old thread because of 
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10202.


From http://greenbytes.de/tech/webdav/rfc2046.html#rfc.section.1:

Parameters are modifiers of the media subtype, and as such do not 
fundamentally affect the nature of the content. The set of meaningful 
parameters depends on the media type and subtype. Most parameters are 
associated with a single specific subtype. However, a given top-level 
media type may define parameters which are applicable to any subtype of 
that type. Parameters may be required by their defining media type or 
subtype or they may be optional. MIME implementations must also ignore 
any parameters whose names they do not recognize.


So, as codecs is not defined on application/octet-stream, the 
parameter simply should be ignored, thus the advice in 
http://dev.w3.org/html5/spec/Overview.html#rel-archives:


The MIME type application/octet-stream with no parameters is never a 
type that the user agent knows it cannot render. User agents must treat 
that type as equivalent to the lack of any explicit Content-Type 
metadata when it is used to label a potential media resource.


In the absence of a specification to the contrary, the MIME type 
application/octet-stream when used with parameters, e.g. 
application/octet-stream;codecs=theora, is a type that the user agent 
knows it cannot render.


is incorrect, because it requires handling application/octet-stream 
and application/octet-stream;codecs=theora differently (*).


Best regards, Julian

(*) It's also not clear whether the note applies to all parameters or 
just codecs.


[whatwg] Video with MIME type application/octet-stream

2010-05-20 Thread Robert O'Callahan
I just became aware that application/octet-stream is excluded from being a
type the user agent knows it cannot render.
http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#a-type-that-the-user-agent-knows-it-cannot-render
Apparently this was done in response to a bug report:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7977
Neither the bug report nor the editor's response give any indication why
this change was made.

This change means files served with application/octet-stream will make it
all the way to the step If the media
datahttp://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#media-datacan
be fetched but is found by inspection to be in an unsupported format
..., so implementations have to add support for binary sniffing for all the
types they support. We didn't need this before in Gecko. What was the
motivation for adding this implementation requirement?

Thanks,
Rob
-- 
He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all. [Isaiah
53:5-6]


Re: [whatwg] Video with MIME type application/octet-stream

2010-05-20 Thread Robert O'Callahan
On Thu, May 20, 2010 at 9:55 PM, Robert O'Callahan rob...@ocallahan.orgwrote:

 I just became aware that application/octet-stream is excluded from being a
 type the user agent knows it cannot render.

 http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#a-type-that-the-user-agent-knows-it-cannot-render
 Apparently this was done in response to a bug report:
 http://www.w3.org/Bugs/Public/show_bug.cgi?id=7977
 Neither the bug report nor the editor's response give any indication why
 this change was made.

 This change means files served with application/octet-stream will make it
 all the way to the step If the media 
 datahttp://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#media-datacan
  be fetched but is found by inspection to be in an unsupported format
 ..., so implementations have to add support for binary sniffing for all the
 types they support. We didn't need this before in Gecko. What was the
 motivation for adding this implementation requirement?


Hmm. I guess it doesn't add any implementation requirements beyond what you
need to handle the complete absence of a Content-Type (which we currently
don't handle, but I suppose we should). So this isn't really a problem.

I'd still like to know why application/octet-stream has been added here,
though.

Rob
-- 
He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all. [Isaiah
53:5-6]


Re: [whatwg] Video with MIME type application/octet-stream

2010-05-20 Thread Philip Jägenstedt
On Thu, 20 May 2010 17:59:42 +0800, Robert O'Callahan  
rob...@ocallahan.org wrote:


On Thu, May 20, 2010 at 9:55 PM, Robert O'Callahan  
rob...@ocallahan.orgwrote:


I just became aware that application/octet-stream is excluded from  
being a

type the user agent knows it cannot render.

http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#a-type-that-the-user-agent-knows-it-cannot-render
Apparently this was done in response to a bug report:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7977
Neither the bug report nor the editor's response give any indication why
this change was made.

This change means files served with application/octet-stream will make  
it
all the way to the step If the media  
datahttp://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#media-datacan  
be fetched but is found by inspection to be in an unsupported format
..., so implementations have to add support for binary sniffing for  
all the

types they support. We didn't need this before in Gecko. What was the
motivation for adding this implementation requirement?



Hmm. I guess it doesn't add any implementation requirements beyond what  
you

need to handle the complete absence of a Content-Type (which we currently
don't handle, but I suppose we should). So this isn't really a problem.

I'd still like to know why application/octet-stream has been added here,
though.


For the record, Opera implements canPlayType(application/octet-stream)  
and canPlayType(application/octet-stream; codecs=foo) as per spec  
(maybe and  respectively), but I don't have any strong opinions about  
it.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-05-20 Thread Simon Pieters
On Thu, 20 May 2010 11:55:01 +0200, Robert O'Callahan  
rob...@ocallahan.org wrote:


I just became aware that application/octet-stream is excluded from being  
a

type the user agent knows it cannot render.
http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#a-type-that-the-user-agent-knows-it-cannot-render
Apparently this was done in response to a bug report:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7977
Neither the bug report nor the editor's response give any indication why
this change was made.


This bug report was about application/octet-stream *with parameters*, e.g.  
application/octet-stream; codecs=theora, vorbis. The spec had the  
requirement about application/octet-stream before that bug report.




This change means files served with application/octet-stream will make it
all the way to the step If the media
datahttp://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#media-datacan
be fetched but is found by inspection to be in an unsupported format
..., so implementations have to add support for binary sniffing for all  
the

types they support. We didn't need this before in Gecko. What was the
motivation for adding this implementation requirement?

Thanks,
Rob



--
Simon Pieters
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-05-20 Thread Simon Pieters

On Thu, 20 May 2010 12:36:36 +0200, Simon Pieters sim...@opera.com wrote:

On Thu, 20 May 2010 11:55:01 +0200, Robert O'Callahan  
rob...@ocallahan.org wrote:


I just became aware that application/octet-stream is excluded from  
being a

type the user agent knows it cannot render.
http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#a-type-that-the-user-agent-knows-it-cannot-render
Apparently this was done in response to a bug report:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7977
Neither the bug report nor the editor's response give any indication why
this change was made.


This bug report was about application/octet-stream *with parameters*,  
e.g. application/octet-stream; codecs=theora, vorbis. The spec had the  
requirement about application/octet-stream before that bug report.



This change means files served with application/octet-stream will make  
it

all the way to the step If the media
datahttp://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#media-datacan
be fetched but is found by inspection to be in an unsupported format
..., so implementations have to add support for binary sniffing for  
all the

types they support. We didn't need this before in Gecko. What was the
motivation for adding this implementation requirement?


The spec requires binary sniffing for all the types you support even  
without the application/octet-stream requirement, since a WebM file  
labelled as video/ogg should play if both video/webm and video/ogg are  
supported.


--
Simon Pieters
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-05-20 Thread Simon Pieters

On Thu, 20 May 2010 12:46:16 +0200, Simon Pieters sim...@opera.com wrote:

On Thu, 20 May 2010 12:36:36 +0200, Simon Pieters sim...@opera.com  
wrote:


On Thu, 20 May 2010 11:55:01 +0200, Robert O'Callahan  
rob...@ocallahan.org wrote:


I just became aware that application/octet-stream is excluded from  
being a

type the user agent knows it cannot render.
http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#a-type-that-the-user-agent-knows-it-cannot-render


Apparently this was done in response to:

http://lists.w3.org/Archives/Public/public-html/2009Jul/0888.html
http://html5.org/tools/web-apps-tracker?from=3497to=3498

--
Simon Pieters
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-05-20 Thread Boris Zbarsky

On 5/20/10 5:59 AM, Robert O'Callahan wrote:

Hmm. I guess it doesn't add any implementation requirements beyond what
you need to handle the complete absence of a Content-Type (which we
currently don't handle, but I suppose we should).


For what it's worth, the above-necko layer in Gecko never sees absence 
of a Content-Type.  If there isn't one, necko will sniff, period.  Of 
course that sniffing knows nothing about video at the moment, and will 
likely just detect it as application/octet-stream (modulo the 
extension-sniffing bits).


-Boris



  1   2   >