subject:"\[whatwg\] Video with MIME type application\/octet\-stream"

On Tue, 07 Sep 2010 02:46:29 +0200, Gregory Maxwell gmaxw...@gmail.com  
wrote:


On Mon, Sep 6, 2010 at 3:19 PM, Aryeh Gregor simetrical+...@gmail.com  
wrote:
On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt phil...@opera.com  
wrote:
The Ogg page begins with the 4 bytes OggS, which is what Opera  
(GStreamer)
checks for. For additional safety, one could also check for the  
trailing
version indicator, which ought to be a NULL byte for current Ogg. [1]  
[2]


OggS\0 as the first five bytes seems safe to check for.  It's rather
short, I guess because it's repeated on every page, but five bytes is
long enough that it should occur by random only negligibly often, in
either text or binary files.


Um... If you do that you will fail to capture on files that most other
ogg reading tools will happily capture on.  Common software will read
forward until it hits OggS then it will check the page CRC (in total,
9 bytes of capture).  For example, here is a file which begins with a
kilobyte of \0: http://myrandomnode.dyndns.org:8080/~gmaxwell/test.ogg
 Everything I had handy played it.

This could fail to capture on a live stream that didn't ensure new
listeners began at a page boundary. I don't know if any of these
exist.

I don't know if breaking these cases would matter much but herein lies
the danger of sniffing— everyone thinks they're an expert but no one
really has a handle on the implications.



Your test file is too short, perhaps it was truncated? I made my own one  
by adding 1024 NULL bytes to the beginning of  
http://v2v.cc/~j/theora_testsuite/320x240.ogg


That file doesn't play in Totem, because it (GStreamer) relies on  
sniffing. It also won't play in Opera for this reason, but I haven't seen  
any bug reports about failure to play similar files since Opera introduced  
support for Ogg. It does play in Firefox, but not in Chrome. Just like  
with WebM, I think browsers should not support files that begin with  
arbitrary amounts of garbage, as it requires reading the whole file before  
failing.


The file doesn't play in VLC or MPlayer, but does play in xine.

--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream


On Tue, 07 Sep 2010 03:56:54 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/6/10 3:19 PM, Aryeh Gregor wrote:
On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedtphil...@opera.com   
wrote:
The Ogg page begins with the 4 bytes OggS, which is what Opera  
(GStreamer)
checks for. For additional safety, one could also check for the  
trailing
version indicator, which ought to be a NULL byte for current Ogg. [1]  
[2]


OggS\0 as the first five bytes seems safe to check for.  It's rather
short, I guess because it's repeated on every page, but five bytes is
long enough that it should occur by random only negligibly often, in
either text or binary files.


So if a text file starts with U+4F67 U+6753 (both CJK ideographs) and  
any ASCII character (can this happen in the real world?) you're OK with  
treating it as Ogg?  Same for files staring with U+674F U+5367 (both CJK  
ideographs) and any plane-0 character whose Unicode codepoint is 0 mod  
2^16 (plenty of CJK stuff like that)?  Is your CJK good enough that you  
know text files would never start like this, or are you just assuming  
that people who are silly enough to use UTF-16 for their text files and  
aren't in Europe don't matter?  Or that you don't care about people who  
happen to not use a BOM?


Thanks for pointing out these cases. I hadn't thought about it, but my CJK  
is good enough to say something about them:


'佧杓A' encoded in UTF-16BE is 'OggS\x00A'. However, 佧杓 is nonsensical  
in at least Chinese, neither character is among the 3000 most common  
characters [1]. Search results on Google (4) and Baidu (3) are nonsense  
too. I don't know if things are any different for Japanese, but given the  
Google results I doubt it.


'杏卧' encoded in UTF-16LE is 'OggS', and both of these characters are in  
the top 3000, but together they're nonsense: apricot crouch. (That's the  
same crouch as in Crouching Tiger, Hidden Dragon, but the order is wrong  
so it doesn't mean Crouching Apricot). In the Google and Baidu results,  
the only occurrence of the string seems to be in 一衫红杏卧江亭, which  
appears to be a theme of an apricot tree by a pavillion that appears in  
several paintings [2] [3] [4].


All in all, I wouldn't be more worried about this than the risk of random  
binary data matching. Also, UTF-16 isn't a very common encoding for  
simplified Chinese (卧 is a simplified character), GBK is dominant.


We could also add checking of the 6th byte, which should normally be 0x02  
for first page of logical bitstream (bos).



It looks like you could check for 0x1a 0x45 0xdf 0xa3 as the first
four bytes


U+1A45 is Thai, looks like.  DFA3 is a surrogate, so you're ok there.

U+451A is CJK.  U+A3DF looks like a Yi syllable, so you're more or less  
ok there too.  I'm assuming you've already checked this byte sequence  
out in UTF-8 and some other common encodings?


It's garbage in at least UTF-8, Big5 and GBK.

I'm not sure what infrastructure is in place, but perhaps one could *not*  
sniff if Content-Type also indicates an encoding? That way there's a  
solution for those who really want to display the hypothetical false  
positives as text.


[1] http://www.zein.se/patrick/3000char.html
[2]  
http://hi.baidu.com/%BC%C5%D5%AB/blog/item/f0ee8a4c5a5d0c02b3de05aa.html

[3] http://blog.sina.com.cn/s/blog_475be8240100ew5q.html
[4] http://www.zgddhj.cn/zj/bh/zhouhongyi/201007/32053.html

--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread And Clover


On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more* 
sniffing, and even enshrining it in a web standard.


Sniffing is a perpetual disaster that, after several security-sensitive 
problems, web browsers have been moving to deprecate/mitigate. If 
browsers want to guess types when no Content-Type is specified(*) then 
fine, but there is no good reason to ignore an explicitly-set type. I 
don't want my `application/octet-stream` file download service to be 
repurposeable as a video player for some other party!


For reasons already argued about here, you will never make the results 
of content-sniffing reliable, so why bother to standardise it? A 
standardised unreliable feature is no better than an unstandardised one.


The typing mechanism of the web (and more) is Content-Type, period. 
There should be no confusion of this with officially-endorsed sniffing. 
That it is 'hard' for web authors to ensure the correct Content-Types 
are set is:


* not W3/WHATWG's problem. If web servers make adding Content-Type 
information hard, then web servers need to be updated to make it easier;


* not really true, at least for Apache which can allow AddType et al in 
the .htaccess files that low-end shared hosts use. This may not be 
widely-known or practised, but that doesn't really merit changing the 
standards for everyone else to cope with.


(*: or, the traditional reason for sniffing, `text/plain`, due to Apache 
inappropriately sending this type for unknown files by default, bug 
13986. That doesn't seem to apply here.)


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Julian Reschke


On 07.09.2010 11:51, And Clover wrote:

On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more*
sniffing, and even enshrining it in a web standard.


+1


Sniffing is a perpetual disaster that, after several security-sensitive
problems, web browsers have been moving to deprecate/mitigate. If
browsers want to guess types when no Content-Type is specified(*) then
fine, but there is no good reason to ignore an explicitly-set type. I
don't want my `application/octet-stream` file download service to be
repurposeable as a video player for some other party!


Hmm, that's what Content-Disposition: attachment is for...


...


Best regards, Julian

Re: [whatwg] Video with MIME type application/octet-stream


On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:


On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more*  
sniffing, and even enshrining it in a web standard.


IE9, Safari and Chrome ignore Content-Type in a video context and rely  
on sniffing. If you want Content-Type to be respected, convince the  
developers of those 3 browsers to change. If not, it's quite inevitable  
that Opera and Firefox will eventually have to follow.


Sniffing is a perpetual disaster that, after several security-sensitive  
problems, web browsers have been moving to deprecate/mitigate.


For reasons already argued about here, you will never make the results  
of content-sniffing reliable, so why bother to standardise it? A  
standardised unreliable feature is no better than an unstandardised one.


Unless all browsers agree to respect Content-Type, the next best thing is  
to agree on the same sniffing. Why would leaving it undefined be better?



The typing mechanism of the web (and more) is Content-Type, period.


Only in theory. In practice, Content-Type is an unreliable indicator of  
the type of a resource. Sniffing is already part of the web architecture,  
with all its problems.


(*: or, the traditional reason for sniffing, `text/plain`, due to Apache  
inappropriately sending this type for unknown files by default, bug  
13986. That doesn't seem to apply here.)


It hasn't been explicitly stated, but I assume that the only cases where  
sniffing for video formats would be employed would be for missing  
Content-Type, text/plain and application/octet-stream.


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Julian Reschke


On 07.09.2010 12:52, Philip Jägenstedt wrote:

...
IE9, Safari and Chrome ignore Content-Type in a video context and rely
on sniffing. If you want Content-Type to be respected, convince the
developers of those 3 browsers to change. If not, it's quite inevitable
that Opera and Firefox will eventually have to follow.
...


We have heard that Safari sniffs for compatibility with content 
previously consumed by Quicktime, and that IE9 may sniff because they 
(currently) can't pass the content-type to the decoding machinery (or 
something like that).


So you really would have to standardize sniffing in the browsers, but 
also in the components they delegate video display to. Good luck with that.


Best regards, Julian

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no.  Also not what at least 
some of the browsers implement.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 6:01 AM, Julian Reschke wrote:

Hmm, that's what Content-Disposition: attachment is for...


This header is currently ignored in non-toplevel browsing contexts in 
web browsers, last I checked.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 4:11 AM, Philip Jägenstedt wrote:

It's garbage in at least UTF-8, Big5 and GBK.


Thanks.  I assume that applies to the OggS\0 sequence too, right?  I 
appreciate the data!



I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1 
(thanks, Apache!), that should be reasonable, I think.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream


On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no.  Also not what at least  
some of the browsers implement.


Oops, I was talking about top-level contexts here. In a video context,  
always ignoring the Content-Type and always sniffing is the most sane  
solution (apart from always respecting Content-Type).


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 9:03 AM, Philip Jägenstedt wrote:

On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no. Also not what at
least some of the browsers implement.


Oops, I was talking about top-level contexts here. In a video context,
always ignoring the Content-Type and always sniffing is the most sane
solution (apart from always respecting Content-Type).


Yes, the suggestion Aryeh is making is that toplevel contexts should use 
the same sniffing algorithm as the video context and should sniff 
everything for video, completely ignoring the Content-Type header.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream


On Tue, 07 Sep 2010 14:56:38 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 4:11 AM, Philip Jägenstedt wrote:

It's garbage in at least UTF-8, Big5 and GBK.


Thanks.  I assume that applies to the OggS\0 sequence too, right?  I  
appreciate the data!


UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do  
real-world text documents include \0 bytes? (I don't know.)



I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1  
(thanks, Apache!), that should be reasonable, I think.


Are you saying that Apache has, at various times, set the default  
character encoding to UTF-8 or ISO-8859-1? I was hoping that no encoding  
parameter at all would be sent :/


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 9:16 AM, Philip Jägenstedt wrote:

UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do
real-world text documents include \0 bytes?


Yes.  Real-world text documents include all sorts of gunk.  Just rarely.


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1
(thanks, Apache!), that should be reasonable, I think.


Are you saying that Apache has, at various times, set the default
character encoding to UTF-8 or ISO-8859-1?


Yes, precisely.  Though the UTF-8 stuff was Linux distros, I think, not 
Apache itself (in that Apache just sent the thing passed to 
AddDefaultCharset and they changed the value of that from ISO-8859-1 to 
UTF-8 in their distro packages).  Here's the relevant comment from the 
Gecko source where we do our text-or-binary sniffing for toplevel contexts:


 Make sure to do a case-sensitive exact match comparison here.  Apache
 1.x just sends text/plain for unknown, while Apache 2.x sends
 text/plain with a ISO-8859-1 charset.  Debian's Apache version, just to
 be different, sends text/plain with iso-8859-1 charset.  For extra fun,
 FC7, RHEL4, and Ubuntu Feisty send charset=UTF-8.  Don't do general
 case-insensitive comparison, since we really want to apply this crap as
 rarely as we can.


I was hoping that no encoding parameter at all would be sent :/


Heh.  I've long since given up all hope of reason on this stuff; I just 
try to keep it as sane and predictable and simple as possible.  :(


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Maciej Stachowiak


On Sep 7, 2010, at 3:52 AM, Philip Jägenstedt wrote:

 On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:
 
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.
 
 IE9, Safari and Chrome ignore Content-Type in a video context and rely on 
 sniffing. If you want Content-Type to be respected, convince the developers 
 of those 3 browsers to change. If not, it's quite inevitable that Opera and 
 Firefox will eventually have to follow.

At least in the case of Safari, we initially added sniffing for the benefit of 
video types likely to be played with the QuickTime plugin - mainly .mov and 
various flavors of MPEG. It is common for these to be served with an incorrect 
MIME type. And we did not want to impose a high transition cost on content 
already being served via the QuickTime plugin. The QuickTime plugin may be a 
slightly less relevant consideration now than when we first thought about this, 
but at this point it is possible content has been migrated to video while 
still carrying broken MIME types.

Ogg and WebM are probably not yet poisoned by a mass of unlabeled data. It 
might be possible to treat those types more strictly - i.e. only play Ogg or 
WebM when labeled as such, and not ever sniff content with those MIME types as 
anything else.

In Safari's case this would have limited impact since a non-default codec 
plugin would need to be installed to play either Ogg or WebM. I'm also not sure 
it's sensible to have varying levels of strictness for different types. But 
it's an option, if we want to go there.

Regards,
Maciej

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread David Singer


On Sep 7, 2010, at 2:51 , And Clover wrote:

 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.

Yes.  We should be striving for a world in which as little sniffing as possible 
happens (and is needed).  Basically, we have the problem because of 
mis-configured or (from the author's point of view) unconfigurable web servers. 
 

So I wonder if
* the presence of a source element with a type attribute should be believed 
(at least for the purposes of dispatch and 'canplay' decisions)? If the author 
of the page got it wrong or lied, surely they can accept (and deal with) the 
consequences?
* whether we should only really sniff the two types in HTTP headers that tend 
to get used as fallbacks (application/octet-stream and text/plain)?  Though I 
note that I have sometimes *wanted* a file displayed as text (and not 
interpreted) and been defeated by sniffing (though not as often as watching 
binary dumped on my screen as if it were text).



David Singer
Multimedia and Software Standards, Apple Inc.

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread David Singer

And like I said before, please be careful of assuming our intent and desires 
from the way things currently work.  We are thinking, listening, and 
implementing (and fixing bugs, and re-inspecting older behavior in lower-level 
code), so there is some...flexibility...I think.

On Sep 7, 2010, at 9:12 , Maciej Stachowiak wrote:

 
 On Sep 7, 2010, at 3:52 AM, Philip Jägenstedt wrote:
 
 On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:
 
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.
 
 IE9, Safari and Chrome ignore Content-Type in a video context and rely on 
 sniffing. If you want Content-Type to be respected, convince the developers 
 of those 3 browsers to change. If not, it's quite inevitable that Opera and 
 Firefox will eventually have to follow.
 
 At least in the case of Safari, we initially added sniffing for the benefit 
 of video types likely to be played with the QuickTime plugin - mainly .mov 
 and various flavors of MPEG. It is common for these to be served with an 
 incorrect MIME type. And we did not want to impose a high transition cost on 
 content already being served via the QuickTime plugin. The QuickTime plugin 
 may be a slightly less relevant consideration now than when we first thought 
 about this, but at this point it is possible content has been migrated to 
 video while still carrying broken MIME types.
 
 Ogg and WebM are probably not yet poisoned by a mass of unlabeled data. It 
 might be possible to treat those types more strictly - i.e. only play Ogg or 
 WebM when labeled as such, and not ever sniff content with those MIME types 
 as anything else.
 
 In Safari's case this would have limited impact since a non-default codec 
 plugin would need to be installed to play either Ogg or WebM. I'm also not 
 sure it's sensible to have varying levels of strictness for different types. 
 But it's an option, if we want to go there.
 
 Regards,
 Maciej
 

David Singer
Multimedia and Software Standards, Apple Inc.

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth

On Tue, Sep 7, 2010 at 3:01 AM, Julian Reschke julian.resc...@gmx.de wrote:
 On 07.09.2010 11:51, And Clover wrote:
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 P.S. Sniffing is harder that you seem to think. It really is...

 Quite. It surprises and saddens me that anyone wants to argue for *more*
 sniffing, and even enshrining it in a web standard.

 +1

-1

It sadden me when standards bodies ignore reality and leave
implementors to invent their own non-iteroperable algorithms for
security-critical behavior.

Adam

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 3:19 PM, Adam Barth wrote:

It sadden me when standards bodies ignore reality and leave
implementors to invent their own non-iteroperable algorithms for
security-critical behavior.


Of course nothing prevents us from saying UAs MUST NOT sniff but if they 
do anyway they MUST use a given algorithm, right?


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Aryeh Gregor

On Tue, Sep 7, 2010 at 5:51 AM, And Clover and...@doxdesk.com wrote:
Quite. It surprises and saddens me that anyone wants to argue for *more*
sniffing, and even enshrining it in a web standard.

I'm not a fan of sniffing, but I'm also not a fan of blindly believing
clearly wrong MIME types and thereby forcing authors to do needless
configuration work, which they might not even be able to do. I'm not
yet sure what the correct tradeoff is here, but I'm pretty sure it's
not no sniffing at all under any conditions.

Sniffing is a perpetual disaster that, after several security-sensitive
problems, web browsers have been moving to deprecate/mitigate. If browsers
want to guess types when no Content-Type is specified(*) then fine, but
there is no good reason to ignore an explicitly-set type. I don't want my
`application/octet-stream` file download service to be repurposeable as a
video player for some other party!

If you don't want that, you should be using access control, not MIME types.

For reasons already argued about here, you will never make the results of
content-sniffing reliable, so why bother to standardise it? A standardised
unreliable feature is no better than an unstandardised one.

Sure it is, because it's unreliable in the same way across all
browsers. That means that in any given case, all browsers will work
the same. This is particularly essential for security -- undocumented
sniffing behavior has caused more than one vulnerability in the past.

The typing mechanism of the web (and more) is Content-Type, period. There
should be no confusion of this with officially-endorsed sniffing.

We already have officially endorsed sniffing where web compat requires it:

http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#content-type-sniffing
http://tools.ietf.org/html/draft-abarth-mime-sniff-05

The question is if we can avoid it for new content types like
video/audio. If not, we should spec it in advance so we at least have
something that's as sane as possible under the circumstances.

That it is
'hard' for web authors to ensure the correct Content-Types are set is:

* not W3/WHATWG's problem. If web servers make adding Content-Type
information hard, then web servers need to be updated to make it easier;

I don't know about the W3C, but reality is the WHATWG's problem. We
can't let things be broken and just say it's someone else's fault. We
need to institute workarounds at our level for failures on other
levels if that's what's necessary to get good security and a good
user/author experience.

* not really true, at least for Apache which can allow AddType et al in the
.htaccess files that low-end shared hosts use. This may not be widely-known
or practised, but that doesn't really merit changing the standards for
everyone else to cope with.

Creating a .htaccess file is a technical procedure that most users
will not know how to do, particularly since the problem will probably
just manifest itself as the video doesn't work. It's also not
possible on some hosts -- although it's certainly possible on the
large majority of cheap shared hosts, and of course on hosts where the
author has root access.

On Tue, Sep 7, 2010 at 6:52 AM, Philip Jägenstedt phil...@opera.com wrote:
It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.

If those are the only common MIME types incorrectly served for unknown
file types, that seems reasonable. (Some files might be actively
misidentified, like if I have an Ogg file saved as .jpeg, but
hopefully this will be very rare.)

On Tue, Sep 7, 2010 at 8:56 AM, Boris Zbarsky bzbar...@mit.edu wrote:
On 9/7/10 4:11 AM, Philip Jägenstedt wrote:
It's garbage in at least UTF-8, Big5 and GBK.

Thanks. I assume that applies to the OggS\0 sequence too, right? I
appreciate the data!

I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?

As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1
(thanks, Apache!), that should be reasonable, I think.

So at least for Ogg and WebM, how about:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes. E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1. Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.
* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.

Within these constraints, false positives in the sniffing

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 3:29 PM, Aryeh Gregor wrote:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.


I could probably live with those, actually.


* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder 
might consume arbitrary amounts of data before saying that there was an 
error.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 3:29 PM, Aryeh Gregor wrote:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.


I could probably live with those, actually.


* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder 
might consume arbitrary amounts of data before saying that there was an 
error.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth

On Tue, Sep 7, 2010 at 12:21 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/7/10 3:19 PM, Adam Barth wrote:
 It sadden me when standards bodies ignore reality and leave
 implementors to invent their own non-iteroperable algorithms for
 security-critical behavior.

 Of course nothing prevents us from saying UAs MUST NOT sniff but if they do
 anyway they MUST use a given algorithm, right?

That's a contrary to duty imperative, which is something that's been
puzzling philosophers for centuries.  A more sensible requirement
would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
they do, they MUST use a the following algorithm.

Adam

Re: [whatwg] Video with MIME type application/octet-stream


Of course nothing prevents us from saying UAs MUST NOT sniff but if they do
anyway they MUST use a given algorithm, right?


That's a contrary to duty imperative, which is something that's been
puzzling philosophers for centuries.  A more sensible requirement
would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
they do, they MUST use a the following algorithm.


Except that in practice SHOULD NOT is treated as carte blanche to do the 
undesirable thing.  It has no teeth.  MUST NOT doesn't much either, but 
it's _something_ at least (in the sense that one can clearly claim that 
violating a MUST NOT is a bug).


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth

On Tue, Sep 7, 2010 at 2:13 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Of course nothing prevents us from saying UAs MUST NOT sniff but if they
 do
 anyway they MUST use a given algorithm, right?

 That's a contrary to duty imperative, which is something that's been
 puzzling philosophers for centuries.  A more sensible requirement
 would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
 they do, they MUST use a the following algorithm.

 Except that in practice SHOULD NOT is treated as carte blanche to do the
 undesirable thing.  It has no teeth.  MUST NOT doesn't much either, but it's
 _something_ at least (in the sense that one can clearly claim that violating
 a MUST NOT is a bug).

In any case, lawyering the requirement level in the spec isn't the way
to solve these problems.  You need to change the underlying incentives
to actually affect what gets implemented.

Adam

Re: [whatwg] Video with MIME type application/octet-stream


On 9/7/10 5:35 PM, Adam Barth wrote:

In any case, lawyering the requirement level in the spec isn't the way
to solve these problems.  You need to change the underlying incentives
to actually affect what gets implemented.


The incentive structure for pretty much any sort of sniffing is a 
prisoner's dilemma.  Life's hard.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-06 Thread Philip Jägenstedt

On Sun, 05 Sep 2010 21:59:09 +0200, Aryeh Gregor  
simetrical+...@gmail.com wrote:



On Fri, Sep 3, 2010 at 11:48 PM, Boris Zbarsky bzbar...@mit.edu wrote:


Is this a reasonable supposition?  What are these byte sequences for the
container formats at hand?  (Say WebM's restricted Matroska container,
whatever container format is supported for H.264 by IE and Chrome, and  
Ogg;

we'll ignore the generic Matroska weirdness for now.)


I don't know, which is why I'm considering a hypothetical.  If someone
who knows better could step up with this piece of info, that would be
helpful.


The Ogg page begins with the 4 bytes OggS, which is what Opera  
(GStreamer) checks for. For additional safety, one could also check for  
the trailing version indicator, which ought to be a NULL byte for current  
Ogg. [1] [2]


For WebM, the first 4 bytes are the EBML header: the bytes 0x1A, 0x45,  
0xDF, 0xA3. [3] The EBML DocType in the header must be webm. Since  
parsing the EBML header is a little bit complicated, Opera (GStreamer)  
simply checks for the string webm somewhere in the header. I've heard  
rumors that WebM files are allowed to contain arbitrary garbage before the  
EBML header, but this is something we happily ignore, i.e., such files  
would fail to play in Opera, regardless of MIME type. I haven't  
encountered any such files yet, and think that browsers should not support  
this feature.


[1] http://www.xiph.org/ogg/doc/framing.html#page_header
[2] http://www.xiph.org/ogg/doc/rfc3533.txt
[3] http://ebml.sourceforge.net/specs/

--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-06 Thread Aryeh Gregor

On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt phil...@opera.com wrote:
 The Ogg page begins with the 4 bytes OggS, which is what Opera (GStreamer)
 checks for. For additional safety, one could also check for the trailing
 version indicator, which ought to be a NULL byte for current Ogg. [1] [2]

OggS\0 as the first five bytes seems safe to check for.  It's rather
short, I guess because it's repeated on every page, but five bytes is
long enough that it should occur by random only negligibly often, in
either text or binary files.

 For WebM, the first 4 bytes are the EBML header: the bytes 0x1A, 0x45, 0xDF,
 0xA3. [3] The EBML DocType in the header must be webm. Since parsing the
 EBML header is a little bit complicated, Opera (GStreamer) simply checks for
 the string webm somewhere in the header. I've heard rumors that WebM files
 are allowed to contain arbitrary garbage before the EBML header, but this is
 something we happily ignore, i.e., such files would fail to play in Opera,
 regardless of MIME type. I haven't encountered any such files yet, and think
 that browsers should not support this feature.

 [1] http://www.xiph.org/ogg/doc/framing.html#page_header
 [2] http://www.xiph.org/ogg/doc/rfc3533.txt
 [3] http://ebml.sourceforge.net/specs/

It looks like you could check for 0x1a 0x45 0xdf 0xa3 as the first
four bytes, followed by 0x42 0x82 0x84 webm somewhere in the first
255 bytes or whatever.  (0x42 0x82 is the DocType marker, and 0x84 is
the length, encoded UTF-8 style: 1 for a one-byte length, 010 for
the actual length.)  That seems very safe.  If WebM allows degenerate
stuff that makes sniffing hard, we can just prohibit it in the WebM
spec, I assume.

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-06 Thread Gregory Maxwell

On Mon, Sep 6, 2010 at 3:19 PM, Aryeh Gregor simetrical+...@gmail.com wrote:
 On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt phil...@opera.com wrote:
 The Ogg page begins with the 4 bytes OggS, which is what Opera (GStreamer)
 checks for. For additional safety, one could also check for the trailing
 version indicator, which ought to be a NULL byte for current Ogg. [1] [2]

 OggS\0 as the first five bytes seems safe to check for.  It's rather
 short, I guess because it's repeated on every page, but five bytes is
 long enough that it should occur by random only negligibly often, in
 either text or binary files.

Um... If you do that you will fail to capture on files that most other
ogg reading tools will happily capture on.  Common software will read
forward until it hits OggS then it will check the page CRC (in total,
9 bytes of capture).  For example, here is a file which begins with a
kilobyte of \0: http://myrandomnode.dyndns.org:8080/~gmaxwell/test.ogg
 Everything I had handy played it.

This could fail to capture on a live stream that didn't ensure new
listeners began at a page boundary. I don't know if any of these
exist.

I don't know if breaking these cases would matter much but herein lies
the danger of sniffing— everyone thinks they're an expert but no one
really has a handle on the implications.

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-05 Thread Aryeh Gregor

On Fri, Sep 3, 2010 at 5:05 PM, David Singer sin...@apple.com wrote:
 Um, I think that in some cases the code that is supporting video/audio has 
 ... historical artefacts ... which may not be entirely in line with the 
 specs.  I think it's dangerous to make assumptions in this area, especially 
 if you then go and ask for a change in a spec. based on assumptions.

Okay, okay, I'll try to avoid stating assumptions like that, at least
about people on the list.  :)  So never mind that point.  (Although I
was mostly thinking of IE, not Chrome and certainly not Safari.)  I
think sniffing is a good idea even if we could get everyone to agree
not to sniff.

On Fri, Sep 3, 2010 at 11:48 PM, Boris Zbarsky bzbar...@mit.edu wrote:
   Okay, you're being too theoretical for me.  Let's say we have

 fingerprints for all the major video types, of the form check if the
 first X bytes match this very simple pattern.  Let's say the spec
 says that whenever processing the response to an HTTP request,
 browsers must act as though they executed the sniffing algorithm and,
 if it sniffs as a video type, they must treat it the same as if the
 Content-Type matched the sniffed type.

 OK, so context-independent?  Note that not a single browser implements this
 today.

Either context-independent, or specified to occur only in certain key
contexts like video/top-level browsing context.  No browser
implements my suggested behavior today, but I think we all agree it's
confusing/harmful to only sniff for video and not top-level browsing
contexts too, because it breaks all sorts of expected behavior (open
in new tab, copy video URL, etc.).

 Is this a reasonable supposition?  What are these byte sequences for the
 container formats at hand?  (Say WebM's restricted Matroska container,
 whatever container format is supported for H.264 by IE and Chrome, and Ogg;
 we'll ignore the generic Matroska weirdness for now.)

I don't know, which is why I'm considering a hypothetical.  If someone
who knows better could step up with this piece of info, that would be
helpful.

 Might be a good idea to ask the IE team, the Chrome team, and the Safari
 team why they're not sniffing in toplevel browsing contexts...  I believe
 there's been at least one answer from a Chrome developer on that already,
 though.

That would also be helpful information.  Andrew Scherkus made it sound
like Chrome wouldn't necessarily object to sniffing on top-level
browsing contexts, just that it would have to be sandboxed (although
I'm not sure why).

 Sure, but it's early days in implementation.  Note, also, that I believe
 it's 3 browsers, not 2.

 . . .

 Some of these changes take time (e.g. having to rejigger quicktime to allow
 you to no sniff while using it).  So is it that they have not changed, or
 that they have no plans to change, ever?

 . . .

 Such changes have happened in the past (e.g. for stylesheets, and for
 toplevel browsing contexts).  Why is this case different?

Okay, so maybe I'm too pessimistic.  :)  Regardless of this point, I
still think sniffing consistently is the best solution, *if* it can be
done reliably -- i.e., given the assumptions I gave in my sketch of a
proposal (easily-checked fingerprints that make text matches
impossible and binary matches of negligible likelihood).  If those
assumptions hold, would you agree that consistently sniffing is a
better idea than honoring clearly incorrect MIME types, assuming we
could get implementers to agree one way or the other?  If not, why
not?  I don't see significant downsides, and the upside of actually
being able to have stuff work without configuring MIME types seems
big.

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-03 Thread Aryeh Gregor

On Thu, Sep 2, 2010 at 4:41 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Well, serving up data as text/plain for it to be readable is one.  I agree
 that for the specific case of video this is not a big deal.

Yes, I'm talking specifically about that.  Sniffing in other cases (in
particular, text formats) may be a bad idea.

 Why are you assuming that?

Because blocking an entire MIME type seems like it would be massive
overkill . . . but if that's a real use-case, well, okay.  It still
can't be *too* hard to check the first few bytes of the contents.
They must do it anyway if they implement this for images, right?

 There are proposals for standardizing several different types of sniffing,
 with the one used being context-dependent.  A proxy wouldn't have the
 context.

 It can all be made to work by erring on the side of blocking more stuff, but
 then you get to the point where the proxy makes it impossible to use the
 browser altogether, and then it's not a viable solution to the problem at
 hand.

Okay, you're being too theoretical for me.  Let's say we have
fingerprints for all the major video types, of the form check if the
first X bytes match this very simple pattern.  Let's say the spec
says that whenever processing the response to an HTTP request,
browsers must act as though they executed the sniffing algorithm and,
if it sniffs as a video type, they must treat it the same as if the
Content-Type matched the sniffed type.  (You could limit the scope of
that somewhat for ease of implementation if you like, but at least for
video plus top-level browsing contexts.)  Also suppose that the
fingerprints include byte sequences that cannot occur in normal text
encodings, and that they're long enough that random false positives
are extremely unlikely.  What's the problem with this specific
proposal?

 Put another way: the problem here is not that browsers sniff.  It's
 that browsers don't behave interoperably or predictably.  Speccing a
 precise sniffing algorithm that everyone's willing to follow allows
 proxies to reliably know what browsers will do with it.  What will
 cause problems is what you seem to be arguing for -- *not* speccing
 sniffing

 Er... Where did I propose this?  I proposed speccing that there MUST NOT be
 any sniffing, with browsers that sniff therefore being nonconformant.  I
 didn't propose allowing ad-hoc sniffing.

Right.  But the spec never allowed sniffing, and two browsers do it
anyway.  Ian has spoken to those browsers' implementers, and the
browsers have not changed, despite knowing that they aren't following
the spec.  Do you have any particular reason to believe that they'll
change?  If not, then the situation I described is exactly what your
proposal (i.e., the status quo) will result in, no?

 Only if consistent includes consistent across all contexts (which no
 one is proposing to either specify or implement).

Could you comment specifically on the behavior I outlined above?  It's
entirely possible that I'm missing a lot of subtleties here.

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-03 Thread David Singer


On Sep 3, 2010, at 12:48 , Aryeh Gregor wrote:

 Er... Where did I propose this?  I proposed speccing that there MUST NOT be
 any sniffing, with browsers that sniff therefore being nonconformant.  I
 didn't propose allowing ad-hoc sniffing.
 
 Right.  But the spec never allowed sniffing, and two browsers do it
 anyway.  Ian has spoken to those browsers' implementers, and the
 browsers have not changed, despite knowing that they aren't following
 the spec.  Do you have any particular reason to believe that they'll
 change?  If not, then the situation I described is exactly what your
 proposal (i.e., the status quo) will result in, no?
 

Um, I think that in some cases the code that is supporting video/audio has ... 
historical artefacts ... which may not be entirely in line with the specs.  I 
think it's dangerous to make assumptions in this area, especially if you then 
go and ask for a change in a spec. based on assumptions.

David Singer
Multimedia and Software Standards, Apple Inc.

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-03 Thread Boris Zbarsky


On 9/3/10 3:48 PM, Aryeh Gregor wrote:

Why are you assuming that?


Because blocking an entire MIME type seems like it would be massive
overkill . . . but if that's a real use-case, well, okay.  It still
can't be *too* hard to check the first few bytes of the contents.
They must do it anyway if they implement this for images, right?


Yes.

But note that for some video formats checking the first few bytes is not 
sufficient.  In fact, some video container formats can have arbitrary 
length prefixes before the actual video data starts.  Of course if 
sniffers are just restricted to the first few bytes that might be ok.


  Okay, you're being too theoretical for me.  Let's say we have

fingerprints for all the major video types, of the form check if the
first X bytes match this very simple pattern.  Let's say the spec
says that whenever processing the response to an HTTP request,
browsers must act as though they executed the sniffing algorithm and,
if it sniffs as a video type, they must treat it the same as if the
Content-Type matched the sniffed type.


OK, so context-independent?  Note that not a single browser implements 
this today.



Also suppose that the fingerprints include byte sequences that cannot occur in 
normal text
encodings


Is this a reasonable supposition?  What are these byte sequences for the 
container formats at hand?  (Say WebM's restricted Matroska container, 
whatever container format is supported for H.264 by IE and Chrome, and 
Ogg; we'll ignore the generic Matroska weirdness for now.)



and that they're long enough that random false positives
are extremely unlikely.  What's the problem with this specific
proposal?


Might be a good idea to ask the IE team, the Chrome team, and the Safari 
team why they're not sniffing in toplevel browsing contexts...  I 
believe there's been at least one answer from a Chrome developer on that 
already, though.



Er... Where did I propose this?  I proposed speccing that there MUST NOT be
any sniffing, with browsers that sniff therefore being nonconformant.  I
didn't propose allowing ad-hoc sniffing.


Right.  But the spec never allowed sniffing, and two browsers do it
anyway.


Sure, but it's early days in implementation.  Note, also, that I believe 
it's 3 browsers, not 2.



Ian has spoken to those browsers' implementers, and the
browsers have not changed, despite knowing that they aren't following
the spec.


Some of these changes take time (e.g. having to rejigger quicktime to 
allow you to no sniff while using it).  So is it that they have not 
changed, or that they have no plans to change, ever?



Do you have any particular reason to believe that they'll
change?


Such changes have happened in the past (e.g. for stylesheets, and for 
toplevel browsing contexts).  Why is this case different?



Only if consistent includes consistent across all contexts (which no
one is proposing to either specify or implement).


Could you comment specifically on the behavior I outlined above?


The behavior you outlined above is consistent in this sense, yes.

-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-02 Thread Aryeh Gregor

On Thu, Sep 2, 2010 at 12:21 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/1/10 4:46 PM, Aryeh Gregor wrote:
 Is this realistically possible unless the author deliberately crafts
 the file?

 I'm not an audio/video format expert; I have no idea.  Does it matter?

Yes.  If false positives were realistically possible by accident, that
would count strongly against sniffing.  If they're not, that at least
is not an issue.

 Why is it not a problem if there are suddenly use cases that are impossible
 because the browser will ignore the author's intent?

Which use-cases?

 have any issues ever been caused by this kind of sniffing problem?

 As far as I know, yes (of the remotely take control of the computer kind).

 Are there clear problems that have arisen in other cases?

 See above.

 The problem can't plausibly arise with media

 files -- if you can execute a vulnerability via getting the user to
 view a media file, it's probably via arbitrary code execution.  In
 that case you don't need to disguise yourself, just get the viewer to
 go to your own website and do whatever you want, since there are no
 same-domain restrictions.

 See above about people who take steps to protect themselves when problems
 like this arise and would be screwed over by sniffing.

Okay, but we're talking about standardizing sniffing in a spec.  As
long as browsers' behavior in processing a given resource is
well-defined and reliable, a proxy could work fine by just
implementing the same algorithm.  There's no reason that the proxy has
to only look at MIME types, is there?  It simplifies the proxy a bit,
but not much.  It will already have to do some content sniffing to
identify what content is dangerous, unless it's just going to block
everything of that file type (which I'm assuming isn't the case).

Put another way: the problem here is not that browsers sniff.  It's
that browsers don't behave interoperably or predictably.  Speccing a
precise sniffing algorithm that everyone's willing to follow allows
proxies to reliably know what browsers will do with it.  What will
cause problems is what you seem to be arguing for -- *not* speccing
sniffing, so that browsers that sniff do so in an ad hoc, undefined
manner that's difficult to predict.  For the use-case of filtering
exploits, it doesn't really matter what the behavior is, so long as
it's consistent.  Or am I missing something here?

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-02 Thread Boris Zbarsky


On 9/2/10 3:53 PM, Aryeh Gregor wrote:

Why is it not a problem if there are suddenly use cases that are impossible
because the browser will ignore the author's intent?


Which use-cases?


Well, serving up data as text/plain for it to be readable is one.  I 
agree that for the specific case of video this is not a big deal.



Okay, but we're talking about standardizing sniffing in a spec.  As
long as browsers' behavior in processing a given resource is
well-defined and reliable, a proxy could work fine by just
implementing the same algorithm.  There's no reason that the proxy has
to only look at MIME types, is there?  It simplifies the proxy a bit,
but not much.  It will already have to do some content sniffing to
identify what content is dangerous, unless it's just going to block
everything of that file type (which I'm assuming isn't the case).


Why are you assuming that?

There are proposals for standardizing several different types of 
sniffing, with the one used being context-dependent.  A proxy wouldn't 
have the context.


It can all be made to work by erring on the side of blocking more stuff, 
but then you get to the point where the proxy makes it impossible to use 
the browser altogether, and then it's not a viable solution to the 
problem at hand.



Put another way: the problem here is not that browsers sniff.  It's
that browsers don't behave interoperably or predictably.  Speccing a
precise sniffing algorithm that everyone's willing to follow allows
proxies to reliably know what browsers will do with it.  What will
cause problems is what you seem to be arguing for -- *not* speccing
sniffing


Er... Where did I propose this?  I proposed speccing that there MUST NOT 
be any sniffing, with browsers that sniff therefore being nonconformant. 
 I didn't propose allowing ad-hoc sniffing.



For the use-case of filtering
exploits, it doesn't really matter what the behavior is, so long as
it's consistent.


Only if consistent includes consistent across all contexts 
(which no one is proposing to either specify or implement).


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Philip Jägenstedt


On Tue, 31 Aug 2010 09:36:00 +0200, Ian Hickson i...@hixie.ch wrote:


On Mon, 19 Jul 2010, Philip Jägenstedt wrote:


I've tested Firefox 3.6.4, Firefox 4.0b1 and Chrome 5.0.375.99 and none
return maybe for canPlayType(application/octet-stream). I couldn't
get meaningful results from Safari on Windows (requires restart to
detect QuickTime, perhaps?).

It would appear that Opera is the only browser that supports
application/octet-stream. At the time I added this, it was simply
because it is true, maybe we can play it. However, I see no practical
benefit of this spec-wise or implementation-wise. Since no other
browsers have implemented it, I am going to remove it from Opera and
hope that the spec will be changed to match this.


Agreed. I've changed the spec to match.


I never did make that change, instead waiting for the outcome of this  
discussion. Note that since Opera uses the same code path for checking the  
argument to canPlayType and for the Content-Type header, the change would  
also have meant that videos served as application/octet-stream would stop  
working, in violation of the spec.



On Thu, 22 Jul 2010, Philip Jägenstedt wrote:


Chrome and Safari ignore the MIME type altogether, in my opinion if we
align with that we should do it full out, not just by adding text/plain
to the whitelist, as that would either require (a)
canPlayType(text/plain) to return maybe or (b) different code paths
for checking the MIME type in Content-Type and for canPlayType.


On Thu, 22 Jul 2010, Maciej Stachowiak wrote:


I don't think canPlayType(text/plain) has to return maybe. It's not
useful for a Web developer to test for the browser's ability to sniff to
overcome a bad MIME type. canPlayType should be thought of as testing
whether the browser could play a media resource that is really of a
given type, rather than labeled with that type over HTTP.


On Fri, 23 Jul 2010, Philip Jägenstedt wrote:


Right, it certainly isn't useful, I'm just pointing out that this is
what happens if one adds text/plain to the list of maybe codecs rather
than ignoring Content-Type altogether, which is the only thing you can
do within the bounds of the current spec to get text/plain to play. The
only 3 serious options I know are still the ones I outlined in my
earlier email.


canPlayType() is now hardcoded as not supporting application/octet-stream
even though that type is otherwise not considered one that isn't  
supported

(i.e. is a type that sniffs).


I'm not very happy with special-casing application/octet-stream only for  
canPlayType, especially as it only handles the exact string  
application/octet-stream, not e.g. application/octet-stream; which  
would instead be put through the same code path as Content-Type and return  
maybe.


At this point the least complex solution seems to be to ignore the  
Content-Type header and unless the teams behind Chrome, Safari and IE9  
have a sudden change of hearts it's the only realistic outcome. Perhaps we  
should also encourage authors to not send the Content-Type header at all,  
to remove any illusions of it having an effect.


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Philip Jägenstedt

On Wed, 01 Sep 2010 02:59:54 +0200, Andrew Scherkus  
scher...@chromium.org wrote:



On Tue, Aug 31, 2010 at 12:59 PM, Aryeh Gregor
simetrical+...@gmail.comsimetrical%2b...@gmail.com

wrote:


On Tue, Aug 31, 2010 at 10:35 AM, Boris Zbarsky bzbar...@mit.edu  
wrote:

 You can't sniff in a toplevel browser window.  Not the same way that
people
 are sniffing in video.  It would break the web.

How so?  For the sake of argument, suppose you sniff only for known
binary video/audio types, and fall back to existing behavior if the
type isn't one of those (e.g., not video or audio).  Do people do
things like link to MP3 files with incorrect MIME types and no
Content-Disposition, and expect them to download?  If so, don't people
also link to MP3 files with correct MIME types and expect the same?  I
don't see how sniffing vs. using MIME type makes a compatibility
difference here, since media support in browsers is so new -- surely
whatever bad thing happens, sniffing will make it happen more often,
at worst.

What do Chrome and IE do here?



We use the incoming MIME type to determine whether we render the  
audio/video
in the browser versus download.  We would never want to execute  
multimedia
sniffing code in the trusted/browser process so implementing sniffing  
for a

top level browser window would involve sending the bytes to a sandboxed
process for inspection first.


Can you elaborate on this? What would be the problem with sniffing in this  
context?


This does have a side effect where a video may play fine on a page  
with a
bogus MIME type (due to sniffing), but viewing the video URL in the  
browser

itself would prompt a download.


If we start ignoring the Content-Type I expect we would also add sniffing  
so that opening a video served with the wrong (or missing) Content-Type  
still works in a top-level browsing context, as it does for images (I  
think).


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Brian Campbell

On Aug 31, 2010, at 9:40 AM, Boris Zbarsky wrote:

 On 8/31/10 3:36 AM, Ian Hickson wrote:
 You might say Hey, but aren't you content sniffing then to find the
 codecs and you'd be right. But in this case we're respecting the MIME
 type sent by the server - it tells the browser to whatever level of
 detail it wants (including codecs if needed) what type it is sending. If
 the server sends 'text/plain' or 'video/x-matroska' I wouldn't expect a
 browsers to sniff it for Ogg content.
 
 The Microsoft guys responded to my suggestion that they might want to
 implement something like this with what's the benefit of doing that?.
 
 One obvious benefit is that videos with the wrong type will not work, and 
 hence videos will be sent with the right type.

What makes you say this? Even if they are sent with the right type initially, 
the correct types are at high risk of bitrotting.

The big problem with MIME types is that they don't stick to files very well. 
So, while someone might get them working when they initially use video, if they 
move to a different web server, or upgrade their server, or someone mirrors 
their video, or any of a number of other things, they might lose the proper 
association of files and MIME types.

The real problem is that there is no standard way of storing and transmitting 
file type metadata on the majority of filesystems and majority of internet 
protocols, meaning that people need to maintain separate databases of MIME 
types, which are extremely easy to lose when moving between web servers. Until 
this problem is fixed (and this is a pretty big problem, even Apple gave up on 
tracking file type metadata years ago due to it's incompatibility with how 
other systems work), it will simply be too hard to maintain working 
Content-Type headers, and sniffing will be much more likely to produce the 
effects that the authors intended.

It seems that periodically, web standards bodies decide this time, if we're 
strict, people will just get the content right or it won't work (such as XHTML 
with XML parsing rules), and invariably, people manage to screw it up anyhow. 
Sure, when the author tests their page the first time it's fine, but a mistaken 
lack of quoting in a comments field breaks the whole page. This causes people 
to migrate to the browsers or technologies that are less strict, and actually 
show the user what they want to see, rather than just breaking due to something 
out of the user's control.

-- Brian

Re: [whatwg] Video with MIME type application/octet-stream


On 9/1/10 4:12 AM, Philip Jägenstedt wrote:

If we start ignoring the Content-Type I expect we would also add
sniffing so that opening a video served with the wrong (or missing)
Content-Type still works in a top-level browsing context, as it does for
images (I think).


It can't possibly work for images.  If I send a file as text/html, and 
you load it from an img then you will render it as an image (possibly 
a broken one).  If you load it from a toplevel browsing context you will 
render it as text/html, even if it's image data (where you possibly 
excludes IE/Windows, which will do some sniffing in that situation).


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Philip Jägenstedt


On Wed, 01 Sep 2010 15:14:10 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/1/10 4:12 AM, Philip Jägenstedt wrote:

If we start ignoring the Content-Type I expect we would also add
sniffing so that opening a video served with the wrong (or missing)
Content-Type still works in a top-level browsing context, as it does for
images (I think).


It can't possibly work for images.  If I send a file as text/html, and  
you load it from an img then you will render it as an image (possibly  
a broken one).  If you load it from a toplevel browsing context you will  
render it as text/html, even if it's image data (where you possibly  
excludes IE/Windows, which will do some sniffing in that situation).


Huh, I guessed incorrectly, neither serving a PNG as text/plain or  
text/html makes it be sniffed and rendered in a top-level browsing context  
in Opera. However, both work in IE8.


Why do you say that it can't possibly work? Are there any security risks  
with the browser potentially interpreting a plain text or HTML document  
and failing to decode it? Anything else?


--
Philip Jägenstedt
Core Developer
Opera Software

Re: [whatwg] Video with MIME type application/octet-stream


On 9/1/10 10:23 AM, Philip Jägenstedt wrote:

Huh, I guessed incorrectly, neither serving a PNG as text/plain or
text/html makes it be sniffed and rendered in a top-level browsing
context in Opera. However, both work in IE8.

Why do you say that it can't possibly work?


That was a statement about the current implementation state of opera, 
not about future possibilities.



Are there any security risks
with the browser potentially interpreting a plain text or HTML document


Yes, actually, if there's a filtering proxy trying to screen out video 
or image data that's trying to exploit known OS-level bugs, say.  But I 
had assumed, based on the rest of this discussion, that people simply 
didn't care about that.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream


On 9/1/10 9:13 AM, Brian Campbell wrote:

It seems that periodically, web standards bodies decide this time, if we're strict, 
people will just get the content right or it won't work (such as XHTML with XML 
parsing rules), and invariably, people manage to screw it up anyhow. Sure, when the 
author tests their page the first time it's fine, but a mistaken lack of quoting in a 
comments field breaks the whole page. This causes people to migrate to the browsers or 
technologies that are less strict, and actually show the user what they want to see, 
rather than just breaking due to something out of the user's control.


I hasn't actually happened for MIME types in toplevel documents (modulo 
the one known workaround for a common server issue with text/plain).  By 
and large, browsers don't sniff toplevel browsing contexts, and the one 
browser that does has been losing market share.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Julian Reschke


On 01.09.2010 10:12, Philip Jägenstedt wrote:

...
If we start ignoring the Content-Type I expect we would also add
sniffing so that opening a video served with the wrong (or missing)
Content-Type still works in a top-level browsing context, as it does for
images (I think).
...


Sniffing in the *absence* of a content type is fine. The interesting 
question is what to do when it's present, but wrong.


Best regards, Julian

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Julian Reschke


On 01.09.2010 16:23, Philip Jägenstedt wrote:

...
Huh, I guessed incorrectly, neither serving a PNG as text/plain or
text/html makes it be sniffed and rendered in a top-level browsing
context in Opera. However, both work in IE8.
...


Please don't say work when talking about something that's not supposed 
to happen...

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Julian Reschke


On 01.09.2010 15:13, Brian Campbell wrote:

On Aug 31, 2010, at 9:40 AM, Boris Zbarsky wrote:


On 8/31/10 3:36 AM, Ian Hickson wrote:

You might say Hey, but aren't you content sniffing then to find the
codecs and you'd be right. But in this case we're respecting the MIME
type sent by the server - it tells the browser to whatever level of
detail it wants (including codecs if needed) what type it is sending. If
the server sends 'text/plain' or 'video/x-matroska' I wouldn't expect a
browsers to sniff it for Ogg content.


The Microsoft guys responded to my suggestion that they might want to
implement something like this with what's the benefit of doing that?.


One obvious benefit is that videos with the wrong type will not work, and hence 
videos will be sent with the right type.


What makes you say this? Even if they are sent with the right type initially, 
the correct types are at high risk of bitrotting.

The big problem with MIME types is that they don't stick to files very well. 
So, while someone might get them working when they initially use video, if they 
move to a different web server, or upgrade their server, or someone mirrors 
their video, or any of a number of other things, they might lose the proper 
association of files and MIME types.
...


That's true, and the reason why people still use file extensions.

That's not super elegant, but it works.

Best regards, Julian

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Adrian Sutton

On 1 Sep 2010, at 15:45, Julian Reschke wrote:
 The big problem with MIME types is that they don't stick to files very well. 
 So, while someone might get them working when they initially use video, if 
 they move to a different web server, or upgrade their server, or someone 
 mirrors their video, or any of a number of other things, they might lose the 
 proper association of files and MIME types.
 ...
 
 That's true, and the reason why people still use file extensions.
 
 That's not super elegant, but it works.


Given that there is a very limited set of video formats that are supported 
anyway, wouldn't it be reasonable to just identify or define the standard 
file extensions then work with server vendors to update their standard file 
extension to mime type definitions to include that.  While adoption and 
upgrading to the new versions would obviously take time, that applies to the 
video tag itself anyway and is just a temporary source of pain.

Regards,

Adrian Sutton.
__
Adrian Sutton, CTO
UK: +44 1 628 353 032  US: +1 (650) 292 9659 x717
Ephox http://www.ephox.com/
Ephox Blogs http://people.ephox.com/, Personal Blog http://www.symphonious.net/

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Zachary Ozer

On Wed, Sep 1, 2010 at 10:51 AM, Adrian Sutton adrian.sut...@ephox.com wrote:
 Given that there is a very limited set of video formats that are supported
 anyway, wouldn't it be reasonable to just identify or define the standard
 file extensions then work with server vendors to update their standard file
 extension to mime type definitions to include that.  While adoption and
 upgrading to the new versions would obviously take time, that applies to the
 video tag itself anyway and is just a temporary source of pain.

At first glance, my eyes almost popped out of my sockets when I saw
this suggestion. Using the file extension?! He must be mad!

Then I remembered that our Flash player *has* to use file extension
since the MIME type isn't available in Flash. Turns out that file
extension is a pretty good indicator, but it doesn't work for custom
server configurations where videos don't have extensions, ala YouTube.
For that reason, we allow users to override whatever we detect with a
type configuration parameter.

Ultimately, the question is, What are we trying to accomplish?

I think we're trying to make it easy for content creators to guarantee
that their content is available to all viewers regardless of their
browser.

If that's the case, I'd actually suggest that the browsers *strictly*
follow the MIME type, with the source type as a override, and
eliminating all sniffing (assuming that the file container format
contains the codec meta-data). If a publisher notices that their video
isn't working, they can either update their server's MIME type
mapping, or just hard code the type in the HTML. Neither is that time
consuming / difficult.

Moreover, as Adrian suggested, it's probably quite easy to get the big
HTTP servers (Apache, IIS, nginx, lighttpd) to add the new extensions
(if they haven't already), so this would gradually become less and
less of an issue.

Best,

Zach
--
Zachary Ozer
Developer, LongTail Video

w: longtailvideo.com • e: z...@longtailvideo.com • p: 212.244.0140 •
f: 212.656.1335
JW Player  |  Bits on the Run  |  AdSolution

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Eric Carlson


On Aug 31, 2010, at 4:01 PM, Ian Hickson wrote:

 On Tue, 31 Aug 2010, Eric Carlson wrote:
 On Aug 31, 2010, at 12:36 AM, Ian Hickson wrote:
 
 Safari does crazy things right now that we won't go into; for the 
 purposes of this discussion we'll assume Safari can change.
 
 What crazy things does Safari do that it should not?
 
 I forget the details, but IIRC one of the main problems was that it was 
 based on the URL's file extension exclusively.
 
  No, I don't see how you came to that conclusion. 

  QuickTime knows how to create a movie from a text file (to make it easy to 
create captions, chapters, etc), but it also assumes a file served as 
text/plain may be coming from a misconfigured server. Therefore, when it gets 
a file served as text/plain it first looks at the file content and/or  the 
file extension to see if it is a movie file. It opens it as text only if it 
doesn't look like a movie.

  In your test page (http://hixie.ch/tests/adhoc/html/video/002.html), all four 
movies have correct extensions but are served as text/plain:

!DOCTYPE HTML
titletext/plain video files/title
p video autoplay controls src=resources/text.txt/video
p video autoplay controls src=resources/text.webm/video 
p video autoplay controls src=resources/text.m4v/video
p video autoplay controls src=resources/text.ogv/video

  When the shipping version of Safari opens this page the MPEG-4 file opens 
correctly, and opens the other three as text (if you wait long enough) 
because by default QuickTime doesn't know how to open the Ogg or WebM files. If 
you add QuickTime importers for WebM and Ogg, those file will be opened as 
movies instead of as text because of the file extensions, despite the fact 
that they are serve as text.

  FWIW, in nightly builds we are now configuring QuickTime so it won't ever 
open files it identifies as text.

eric

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Eric Carlson


On Sep 1, 2010, at 9:07 AM, Zachary Ozer wrote:

 On Wed, Sep 1, 2010 at 10:51 AM, Adrian Sutton adrian.sut...@ephox.com 
 wrote:
 Given that there is a very limited set of video formats that are supported
 anyway, wouldn't it be reasonable to just identify or define the standard
 file extensions then work with server vendors to update their standard file
 extension to mime type definitions to include that.  While adoption and
 upgrading to the new versions would obviously take time, that applies to the
 video tag itself anyway and is just a temporary source of pain.
 
 At first glance, my eyes almost popped out of my sockets when I saw
 this suggestion. Using the file extension?! He must be mad!
 
 Then I remembered that our Flash player *has* to use file extension
 since the MIME type isn't available in Flash. Turns out that file
 extension is a pretty good indicator, but it doesn't work for custom
 server configurations where videos don't have extensions, ala YouTube.
 For that reason, we allow users to override whatever we detect with a
 type configuration parameter.
 
 Ultimately, the question is, What are we trying to accomplish?
 
 I think we're trying to make it easy for content creators to guarantee
 that their content is available to all viewers regardless of their
 browser.
 
 If that's the case, I'd actually suggest that the browsers *strictly*
 follow the MIME type, with the source type as a override, and
 eliminating all sniffing (assuming that the file container format
 contains the codec meta-data). If a publisher notices that their video
 isn't working, they can either update their server's MIME type
 mapping, or just hard code the type in the HTML.
 

  Hard coding the type is only possible if the element uses a source element, 
@type isn't allowed on audio or video.

 Neither is that time consuming / difficult.
 
  It isn't hard to update a server if you control it, but it can be *very* 
difficult and time consuming if you don't (as is the case with most web 
developers, I assume).


 Moreover, as Adrian suggested, it's probably quite easy to get the big
 HTTP servers (Apache, IIS, nginx, lighttpd) to add the new extensions
 (if they haven't already), so this would gradually become less and
 less of an issue.
 
  Really? Your company specializes in web video and flv files have been around 
for years, but your own server still isn't configured for it:

eric% curl -I http://content.longtailvideo.com/videos/flvplayer.flv;
HTTP/1.1 200 OK
Server-Status: load=0
Content-Type: application/octet-stream
Accept-Ranges: bytes
ETag: 4288394655
Last-Modified: Wed, 23 Jun 2010 20:42:28 GMT
Content-Length: 2533148
Date: Wed, 01 Sep 2010 16:16:28 GMT
Server: bit_asic/3.8/r8s1-bitcast-b


eric

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Zachary Ozer

On Wed, Sep 1, 2010 at 12:29 PM, Eric Carlson eric.carl...@apple.com wrote:
   Hard coding the type is only possible if the element uses a source
 element, @type isn't allowed on audio or video.

Why isn't type allowed for video and audio? I know it doesn't
strictly make sense (since the tag doesn't have a type per-se), but
perhaps it could be an alias for the current item's type, much in the
same way src is the current source.

   It isn't hard to update a server if you control it, but it can be *very*
 difficult and time consuming if you don't (as is the case with most web
 developers, I assume).

Correct - but being able to manually specify type should be fine for
those situations, since that can be written into the HTML itself.

   Really? Your company specializes in web video and flv files have been
 around for years, but your own server still isn't configured for it:

Thanks for the heads up on this. However, I think this reemphasizes my
original point: The Flash platform *isn't* strict about MIME types, so
we've never bothered to do anything about it.

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Ian Hickson

On Wed, 1 Sep 2010, Julian Reschke wrote:
 On 01.09.2010 16:23, Philip Jägenstedt wrote:
  ...
  Huh, I guessed incorrectly, neither serving a PNG as text/plain or
  text/html makes it be sniffed and rendered in a top-level browsing
  context in Opera. However, both work in IE8.
 
 Please don't say work when talking about something that's not supposed 
 to happen...

For the record, in the context of the WHATWG mailing list, saying work 
here is fine. What's important is the user experience, not strict 
adherence to specifications.

In the case of the HTML spec, I'll change it to match what user agents 
implement. As mentioned earlier in the thread, for now I'm happy to give 
cover to Firefox and Opera (and hopefully Chrome and Safari) to more 
closely honour the Content-Type headers, but if the conclusion from 
implementors is that following Microsoft's route towards simply ignoring 
Content-Type with video as we do with img, that's fine.

As far as sniffing for top-level browsing contexts goes, my understanding 
is that Adam is still working on the relevant spec, and it would not be a 
problem to add common video formats to that algorithm so that we can get 
interoperable handling of mislabeled content.

(Currently, text/html won't ever sniff as binary IIRC, but text/plain, in 
certain cases, will. We could also make text/html sniff as binary if it 
turns out that this would be particularly helpful for Web compat.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Video with MIME type application/octet-stream


On 9/1/10 2:51 PM, Ian Hickson wrote:

(Currently, text/html won't ever sniff as binary IIRC, but text/plain, in
certain cases, will.


Will sniff as binary so as not to render as text but will NOT, last I 
checked, render as an image or whatnot (for good security reasons, imho).


-Boris

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Aryeh Gregor

On Tue, Aug 31, 2010 at 4:13 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 The issue would be someone linking to text or HTML or a binary blob that
 happens to have some bits at the beginning that look like an audio/video
 types and expecting them to be rendered respectivel as text or HTML or be
 downloaded.

Is this realistically possible unless the author deliberately crafts
the file?  We're talking quite a few bytes that have to be exactly
right, no?  If the author does deliberately craft the file, is there
any security risk in displaying it unexpectedly, given that media
isn't scriptable?

 The big danger with sniffing, as always, is that the server will think one
 thing will happen and suddenly the browser will do something totally
 different.

As long as what the browser is doing is almost certain to be closer to
the author's/user's/webmaster's intent, that's not a problem.
Sniffing is a problem if you risk false positives or security issues,
but I can't see how that's an issue in this specific case.  We have a
lot of experience with the perils of sniffing -- have any issues ever
been caused by this kind of sniffing problem?  The only sniffing
problems I know of are when

1) The sniffing is unreliable, so false identifications happen by
accident.  They're common with MIME types too, but at least with MIME
they're more predictable.  This will hold for pretty much any text
format, if only because you might want to serve the file as text/plain
to mean let the user view the source code instead of executing it.
But with binary formats it doesn't have to be plausible, if the string
you're sniffing for is reasonably long.

2) The MIME type is safe (e.g., not scriptable), and the type it's
sniffed as is not safe (e.g., it's HTML or JAR).  Then even if false
identifications are overwhelmingly improbable by accident, they'll
happen when people upload malicious files posing as an image or
whatever to get code to execute from a domain they don't control.

Are there clear problems that have arisen in other cases?

On Tue, Aug 31, 2010 at 8:59 PM, Andrew Scherkus scher...@chromium.org wrote:
 We use the incoming MIME type to determine whether we render the audio/video
 in the browser versus download.  We would never want to execute multimedia
 sniffing code in the trusted/browser process so implementing sniffing for a
 top level browser window would involve sending the bytes to a sandboxed
 process for inspection first.

Why can't you do media sniffing in the trusted process?  It must be a
lot simpler than parsing HTTP headers -- just a memcmp() or two per
format, if the format is designed so it can be sniffed well.

On Wed, Sep 1, 2010 at 12:27 AM, Gregory Maxwell gmaxw...@gmail.com wrote:
 Aggressive sniffing can and has resulted in some pretty nasty security bugs.

 E.g. an attacker crafts an input that a website identifies as video
 and permits the upload but which a browser sniffs out to be a java jar
 which can then access the source URL with the permissions of the user.

This is problem (2) above.  The solution is never to sniff for
scriptable content.  The problem can't plausibly arise with media
files -- if you can execute a vulnerability via getting the user to
view a media file, it's probably via arbitrary code execution.  In
that case you don't need to disguise yourself, just get the viewer to
go to your own website and do whatever you want, since there are no
same-domain restrictions.

 The sniffing rules, in some contexts and some browsers can also end up
 causing surprising failures... e.g. I've seen older versions of some
 sniffing heavy browsers automatically switch into UCS-2LE encoding at
 wrong and surprising times. Perhaps this is irrelevant in a video
 specific discussion of sniffing— but it is a hazard with sniffing in
 general.

Is this plausible in practice for common media formats?  I didn't find
info on sniffing media by quick Googling, but for instance, GIF starts
with GIF87a or GIF89a, and PNG has an eight-byte signature.
Random binary data is going to hit these one time in 2^48 or 2^64,
about 10^14 and 10^19 respectively.  The actual figure is likely to be
even lower, because most binary formats don't have arbitrary data in
their first few bytes.  Is this really something we should worry
about, given how obviously hard it is to get MIME types right?

 Moreover, it'll never be consistent from implementation to
 implementation, which seems to me to be pretty antithetical to
 standardization in general.

The exact sniffing algorithm needs to be precisely specced.  In fact,
there's work undergoing to do that right now, for other types of
sniffing:

http://tools.ietf.org/html/draft-abarth-mime-sniff-05

There's no reason it can't be perfectly consistent.  The reason it's
historically been inconsistent is because specs have tried to claim
that no sniffing is allowed, so implementers had no spec to follow.
Which is what's in the HTML5 spec now, and it's a mistake.

On Wed, Sep 1, 2010 at 10:37

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-01 Thread Silvia Pfeiffer

On Thu, Sep 2, 2010 at 12:38 AM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 9/1/10 9:13 AM, Brian Campbell wrote:

 It seems that periodically, web standards bodies decide this time, if
 we're strict, people will just get the content right or it won't work (such
 as XHTML with XML parsing rules), and invariably, people manage to screw it
 up anyhow. Sure, when the author tests their page the first time it's fine,
 but a mistaken lack of quoting in a comments field breaks the whole page.
 This causes people to migrate to the browsers or technologies that are less
 strict, and actually show the user what they want to see, rather than just
 breaking due to something out of the user's control.


 I hasn't actually happened for MIME types in toplevel documents (modulo the
 one known workaround for a common server issue with text/plain).  By and
 large, browsers don't sniff toplevel browsing contexts, and the one browser
 that does has been losing market share.


sureley that's not the reason it's losing market share ;-)

S.

Re: [whatwg] Video with MIME type application/octet-stream


On 9/1/10 10:59 PM, Silvia Pfeiffer wrote:

I hasn't actually happened for MIME types in toplevel documents
(modulo the one known workaround for a common server issue with
text/plain).  By and large, browsers don't sniff toplevel browsing
contexts, and the one browser that does has been losing market share.

sureley that's not the reason it's losing market share ;-)


My point is that the if you don't sniff all your users will leave 
argument is overly simplistic.


-Boris

Re: [whatwg] Video with MIME type application/octet-stream