Re: [whatwg] How to determine content-type of file: protocol

2014-08-15 Thread 段垚

于 2014/8/14 21:23, Nils Dagsson Moskopp 写道:

duanyao duan...@ustc.edu writes:


On 07/28/2014 22:08, Gordon P. Hemsley wrote:

On 07/28/2014 08:01 AM, duanyao wrote:

On 07/28/2014 06:34, Gordon P. Hemsley wrote:

Sorry for the delay in responding. Your message fell through the
cracks in my e-mail filters.

On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification
(http://mimesniff.spec.whatwg.org):

 5.1 Interpreting the resource metadata
 ...
 If the resource is retrieved directly from the file system, set
supplied-type to the MIME type
 provided by the file system.

As far as I know, no main-stream file systems record MIME type for
files. Does the spec actually want to say provided by the operating
system or
provided by the file name extension?

Yeah, you've hit a known (though apparently unrecorded) bug in the
spec, originally pointed out to me by Boris Zbarsky via IRC many
months ago. The intent here is basically just whatever the computer
says it is—whether that be via the file system, the operating system,
or whatever, and whether it uses magic bytes, file extensions, or
whatever.

In other words, feel free to read that as the correct behavior is
undefined/unknown at this point.

Thanks for the explanation.

Recently, file: protocol becomes more and more important due to the
popularity of packaged web applications, including PhoneGap app, Chrome
app, Firefox OS app, Window 8 HTML app, etc (not all of them use file:
protocol directly, but underlying mechanisms are similar).
So If we can't specify a interoperable way to determine a local file's
mime type, porting of packaged web applications can be problematic in
some situations (actually my team already hit this).

I know that currently there is no standard way to determine a local
file's mime type, this may be one of the reason that mimesniff spec has
not defined a behavior here.

Well, the most basic reason is because I never delved into how it
actually works, because I was primarily concerned with HTTP connections.

It's possible that there is no interoperable way to determine a local
file's MIME type, but see below.


I'd like to propose a simple way to resolve this problem:
For mime types that has already been standardized by IANA and used in
web standards, determine a local file's supplied-type according to its
file extension.
This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg,
png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by
any means.

I think this rule should resolve most of the interoperability problems,
and largely maintain compatibility with current UAs' implementations.

There is already a standard in place to detect file types on the
operating system level:

http://www.freedesktop.org/wiki/Specifications/shared-mime-info-spec/
http://cgit.freedesktop.org/xdg/shared-mime-info/

I could just refer to that and be done with it. Do you think that
would work? (That specification has complex rules for detecting files,
including magic bytes and whatnot, and is already used on a number of
Linux distros and probably other operating systems.)


Maybe no.
(1) it's a standard of *nix desktops, I doubt MS widows will adopt it,

I see this as pure speculation.

MS Windows never have a similar mechanism like freedesktop.
It can only determine mime-type from filename extension, not file 
content; and the mapping between extension and type
is not even shipped with Windows itself -- it relies on installed 
applications to register extensions and mime-types.
See 
http://stackoverflow.com/questions/3442607/mime-types-in-the-windows-registry 
.


Do you have any clue that Windows will change this in near furture?

and maybe it's a bit heavy for mobile OS;

Widely used mobile operating systems are based on Unix (e.g. iOS,
Android). Based on your measurements, how long does file(1) take?
Android does have a mime-type database and can guess mime-type from both 
extension and content, i.e.

 java.net.URLConnection.guessContentTypeFromName(String filename)
 java.net.URLConnection.guessContentTypeFromStream(java.io.InputStream in)

However, iOS doesn't have such things, and can only guess from 
extension. See

http://stackoverflow.com/questions/1363813/how-can-you-read-a-files-mime-type-in-objective-c

Not to mention windows phone.

Sniffing mime-type from file content and using mime-type database is 
always much slower than guessing from extension,
because much more data are required to read from disk, and much more CPU 
cycles are needed to analyze these data.

This is why web servers only guess types from extensions.

Also because browsers already implemented mime-type sniffing, it's a 
waste to do it twice.


However, if most mobile OSs would ship with mime-type database in future 
and browsers are willing to use it, I'm OK.



(2) many packaged web apps are ported from (and share codes with) normal
web apps, and most web servers simply deduce mime 

Re: [whatwg] How to determine content-type of file: protocol

2014-08-14 Thread Nils Dagsson Moskopp
duanyao duan...@ustc.edu writes:

 On 07/28/2014 22:08, Gordon P. Hemsley wrote:
 On 07/28/2014 08:01 AM, duanyao wrote:
 On 07/28/2014 06:34, Gordon P. Hemsley wrote:
 Sorry for the delay in responding. Your message fell through the
 cracks in my e-mail filters.

 On 07/17/2014 08:26 AM, duanyao wrote:
 Hi,

 My first question is about a rule in MIME Sniffing specification
 (http://mimesniff.spec.whatwg.org):

 5.1 Interpreting the resource metadata
 ...
 If the resource is retrieved directly from the file system, set
 supplied-type to the MIME type
 provided by the file system.

 As far as I know, no main-stream file systems record MIME type for
 files. Does the spec actually want to say provided by the operating
 system or
 provided by the file name extension?

 Yeah, you've hit a known (though apparently unrecorded) bug in the
 spec, originally pointed out to me by Boris Zbarsky via IRC many
 months ago. The intent here is basically just whatever the computer
 says it is—whether that be via the file system, the operating system,
 or whatever, and whether it uses magic bytes, file extensions, or
 whatever.

 In other words, feel free to read that as the correct behavior is
 undefined/unknown at this point.
 Thanks for the explanation.

 Recently, file: protocol becomes more and more important due to the
 popularity of packaged web applications, including PhoneGap app, Chrome
 app, Firefox OS app, Window 8 HTML app, etc (not all of them use file:
 protocol directly, but underlying mechanisms are similar).
 So If we can't specify a interoperable way to determine a local file's
 mime type, porting of packaged web applications can be problematic in
 some situations (actually my team already hit this).

 I know that currently there is no standard way to determine a local
 file's mime type, this may be one of the reason that mimesniff spec has
 not defined a behavior here.

 Well, the most basic reason is because I never delved into how it 
 actually works, because I was primarily concerned with HTTP connections.

 It's possible that there is no interoperable way to determine a local 
 file's MIME type, but see below.

 I'd like to propose a simple way to resolve this problem:
 For mime types that has already been standardized by IANA and used in
 web standards, determine a local file's supplied-type according to its
 file extension.
 This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg,
 png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by
 any means.

 I think this rule should resolve most of the interoperability problems,
 and largely maintain compatibility with current UAs' implementations.

 There is already a standard in place to detect file types on the 
 operating system level:

 http://www.freedesktop.org/wiki/Specifications/shared-mime-info-spec/
 http://cgit.freedesktop.org/xdg/shared-mime-info/

 I could just refer to that and be done with it. Do you think that 
 would work? (That specification has complex rules for detecting files, 
 including magic bytes and whatnot, and is already used on a number of 
 Linux distros and probably other operating systems.)

 Maybe no.
 (1) it's a standard of *nix desktops, I doubt MS widows will adopt it,

I see this as pure speculation.

 and maybe it's a bit heavy for mobile OS;

Widely used mobile operating systems are based on Unix (e.g. iOS,
Android). Based on your measurements, how long does file(1) take?

 (2) many packaged web apps are ported from (and share codes with) normal 
 web apps, and most web servers simply deduce mime type from file extension,
 so doing the same thing in UAs probably results in better
 compatibility.

It may not be possible to deduce the media type from the file extension
alone, since there can be parameters to the media type like “charset” or
“codecs”, e.g. “text/html; charset=UTF-8” or “audio/ogg; codecs=vorbis”.

 (3) UAs are already required to do mime type sniffing, which should be 
 enough to correct most wrong supplied-type.

Is this interoperable enough yet for the purpose at hand?

-- 
Nils Dagsson Moskopp // erlehmann
http://dieweltistgarnichtso.net


Re: [whatwg] How to determine content-type of file: protocol

2014-07-31 Thread duanyao

于 2014年07月31日 02:02, Anne van Kesteren 写道:

On Tue, Jul 29, 2014 at 4:26 PM, 段垚 duan...@ustc.edu wrote:

于 2014/7/29 18:48, Anne van Kesteren 写道:

There's an enormous amount of tricky things to define around file
URLs, this being one of them.

Are there some resources on those tricky things?

No, not really. But it's a short list:

1) Parsing
2) Mapping a parsed file URL to an OS-specific filesystem
(case-sensitivity, case folding, ...)
3) Turning the resource into something that looks like a HTTP response

1 is for the URL Standard and would ideally be agnostic of OS. 2 and 3
would be for the Fetch Standard, if we were to define the details. I'm
hoping to get 1 done at least.
I feel that case handling is somewhat out-of-scope, because it is 
OS-dependent, and even http urls may break

when migrating between OSs with different case sensitiveness.
What are the tricky parts of 3? I'm aware of content-type and status code.

I agree that file protocol is less important than http. However packaged web
applications (PhoneGap app, Chrome app, Firefox OS app, Window 8 HTML app,
etc) are increasing their popularity, and they are using file: protocol or
similar things to access their local assets. So I think it's worthwhile to
work on file
protocol to reduce porting issues of packaged web applications.

Well, or similar is important. Because those things are not really
similar at all but instead something that's actually portable across
systems and something we can reasonably standardize.
I don't think url schemes used by packaged web apps are much more 
portable than file: for now.
Actually, they usually have very similar behaviors with file: on 
corresponding browsers.
For example, Firefox OS app use app: scheme, and XHR treat any file as 
XML; Chrome app
use chrome-extension: scheme, and XHR deduce mime type from file 
extension, while Content-Type

header is missing.

Also some of these schemes are designed to be private and may not be 
standardize.
In contrast, file: scheme has been standardized to some extend. If we 
could fully standardize file:
first, schemes like app: and chrome-extension: would probably mimic its 
behaviors.





Re: [whatwg] How to determine content-type of file: protocol

2014-07-30 Thread Anne van Kesteren
On Tue, Jul 29, 2014 at 4:26 PM, 段垚 duan...@ustc.edu wrote:
 于 2014/7/29 18:48, Anne van Kesteren 写道:
 There's an enormous amount of tricky things to define around file
 URLs, this being one of them.

 Are there some resources on those tricky things?

No, not really. But it's a short list:

1) Parsing
2) Mapping a parsed file URL to an OS-specific filesystem
(case-sensitivity, case folding, ...)
3) Turning the resource into something that looks like a HTTP response

1 is for the URL Standard and would ideally be agnostic of OS. 2 and 3
would be for the Fetch Standard, if we were to define the details. I'm
hoping to get 1 done at least.


 I agree that file protocol is less important than http. However packaged web
 applications (PhoneGap app, Chrome app, Firefox OS app, Window 8 HTML app,
 etc) are increasing their popularity, and they are using file: protocol or
 similar things to access their local assets. So I think it's worthwhile to
 work on file
 protocol to reduce porting issues of packaged web applications.

Well, or similar is important. Because those things are not really
similar at all but instead something that's actually portable across
systems and something we can reasonably standardize.


 Firefox developers said they won't change their implementation of XHR with
 file: before the spec explicitly define the behavior,
 so it looks like a chicken-egg problem to me.

I guess.


 Also I'd like to know some general principles of introducing new URL schemes
 (like file:) into web standards:
 (1) Should new URLs mimic http's behaviors as much as possible? Such as
 status codes, content-type, etc.
 (2) Should XHR and static resource fetching behave consistently with new
 URLs?
 As a web developer, my personal answers are all yes.

Sure.


-- 
http://annevankesteren.nl/


Re: [whatwg] How to determine content-type of file: protocol

2014-07-29 Thread Anne van Kesteren
On Thu, Jul 17, 2014 at 2:26 PM, duanyao duan...@ustc.edu wrote:
 I think rule 5.1 should be applied to both static fetching and XHR 
 consistently. Browsers should set Content-Type header to local files' actual 
 type for XHR, and interpret
 them accordingly. But firefox developers think this would break some existing 
 codes that already rely on firefox's behavior
 (see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

 What do you think?

Basically, this comes down to what
http://fetch.spec.whatwg.org/#basic-fetch should do. For now,
unfortunate as it is, file and ftp URLs are left as an exercise for
the reader.

There's an enormous amount of tricky things to define around file
URLs, this being one of them. My theory to date has been that defining
those things has less benefit than defining other things, such as
parsing URLs or the way fetching works in general. If someone were to
sort the issues out and get implementations to converge I would
certainly not be opposed to including the result of such work in the
specification.


-- 
http://annevankesteren.nl/


Re: [whatwg] How to determine content-type of file: protocol

2014-07-29 Thread 段垚

于 2014/7/29 18:48, Anne van Kesteren 写道:

On Thu, Jul 17, 2014 at 2:26 PM, duanyao duan...@ustc.edu wrote:

I think rule 5.1 should be applied to both static fetching and XHR 
consistently. Browsers should set Content-Type header to local files' actual 
type for XHR, and interpret
them accordingly. But firefox developers think this would break some existing 
codes that already rely on firefox's behavior
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Basically, this comes down to what
http://fetch.spec.whatwg.org/#basic-fetch should do. For now,
unfortunate as it is, file and ftp URLs are left as an exercise for
the reader.

There's an enormous amount of tricky things to define around file
URLs, this being one of them.

Are there some resources on those tricky things?

My theory to date has been that defining
those things has less benefit than defining other things, such as
parsing URLs or the way fetching works in general.
I agree that file protocol is less important than http. However packaged 
web applications (PhoneGap app, Chrome app, Firefox OS app, Window 8 
HTML app, etc) are increasing their popularity, and they are using file: 
protocol or similar things to access their local assets. So I think it's 
worthwhile to work on file

protocol to reduce porting issues of packaged web applications.

If someone were to
sort the issues out and get implementations to converge I would
certainly not be opposed to including the result of such work in the
specification.
Firefox developers said they won't change their implementation of XHR 
with file: before the spec explicitly define the behavior,

so it looks like a chicken-egg problem to me.

Also I'd like to know some general principles of introducing new URL 
schemes (like file:) into web standards:
(1) Should new URLs mimic http's behaviors as much as possible? Such as 
status codes, content-type, etc.
(2) Should XHR and static resource fetching behave consistently with new 
URLs?

As a web developer, my personal answers are all yes.

Regards,
Duan Yao.




Re: [whatwg] How to determine content-type of file: protocol

2014-07-28 Thread duanyao

On 07/28/2014 06:34, Gordon P. Hemsley wrote:
Sorry for the delay in responding. Your message fell through the 
cracks in my e-mail filters.


On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification 
(http://mimesniff.spec.whatwg.org):


5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set 
supplied-type to the MIME type

provided by the file system.

As far as I know, no main-stream file systems record MIME type for 
files. Does the spec actually want to say provided by the operating 
system or

provided by the file name extension?


Yeah, you've hit a known (though apparently unrecorded) bug in the 
spec, originally pointed out to me by Boris Zbarsky via IRC many 
months ago. The intent here is basically just whatever the computer 
says it is—whether that be via the file system, the operating system, 
or whatever, and whether it uses magic bytes, file extensions, or 
whatever.


In other words, feel free to read that as the correct behavior is 
undefined/unknown at this point.

Thanks for the explanation.

Recently, file: protocol becomes more and more important due to the 
popularity of packaged web applications, including PhoneGap app, Chrome 
app, Firefox OS app, Window 8 HTML app, etc (not all of them use file: 
protocol directly, but underlying mechanisms are similar).
So If we can't specify a interoperable way to determine a local file's 
mime type, porting of packaged web applications can be problematic in 
some situations (actually my team already hit this).


I know that currently there is no standard way to determine a local 
file's mime type, this may be one of the reason that mimesniff spec has 
not defined a behavior here.


I'd like to propose a simple way to resolve this problem:
For mime types that has already been standardized by IANA and used in 
web standards, determine a local file's supplied-type according to its 
file extension.
This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg, 
png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by 
any means.


I think this rule should resolve most of the interoperability problems, 
and largely maintain compatibility with current UAs' implementations.


My second question is: does above rule apply equally to both fetching 
static resources (top level, iframe, img, etc) and XMLHttpRequest?


It seems all browsers try to figure out actual type for local static 
resources, so that .htm and .xhtml files are rendered as HTML and 
XHTML respectively,

so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local 
files of any type; and if setting xhr.responseType = 'document', 
response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always 
'application/xml'. This is significantly diverse from static fetching 
behavior.


Chromium(34) set Content-Type header to null for local files of any 
type; but if setting xhr.responseType = 'document', response is 
parsed according to its actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting 
xhr.responseType = 'blob', blob.type is the file's actual type, i.e. 
'text/html' for .htm and 'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however 
Content-Type header is missing.


I think rule 5.1 should be applied to both static fetching and XHR 
consistently. Browsers should set Content-Type header to local files' 
actual type for XHR, and interpret
them accordingly. But firefox developers think this would break some 
existing codes that already rely on firefox's behavior

(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Regards,
 Duan Yao.




Anne's the person to ask about XHR first, I think. I don't want to 
make any judgements or claims until I hear his view on the situation.


That being said, I created the Contexts wiki article [1] and began 
splitting up the mimesniff spec according to contexts [2] in an effort 
to clarify this situation and make sure that all bases were covered. 
It's still a work in progress, awaiting feedback from implementers and 
other spec writers.


I agree that there's a hole in how mimesniff, XHR, and Contexts 
intersect, and I'll be happy to update mimesniff to fill it, if that's 
determined to be the best course of action.


HTH,
Gordon

[1] http://wiki.whatwg.org/wiki/Contexts
[2] http://mimesniff.spec.whatwg.org/#context-specific-sniffing

I note that in the Contexts wiki article, connection context (which 
XHR belongs to) has no sniffing algorithm specified.
Does this mean UA should not sniff in case of XHR, or just mean the 
algorithm has not been specified yet?
Personally I'd like to have connection context use same algorithm as 
browsing context, because client js codes aren't always

sure about the mime types sent via XHR, much like 

Re: [whatwg] How to determine content-type of file: protocol

2014-07-28 Thread Gordon P. Hemsley

On 07/28/2014 08:01 AM, duanyao wrote:

On 07/28/2014 06:34, Gordon P. Hemsley wrote:

Sorry for the delay in responding. Your message fell through the
cracks in my e-mail filters.

On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification
(http://mimesniff.spec.whatwg.org):

5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set
supplied-type to the MIME type
provided by the file system.

As far as I know, no main-stream file systems record MIME type for
files. Does the spec actually want to say provided by the operating
system or
provided by the file name extension?


Yeah, you've hit a known (though apparently unrecorded) bug in the
spec, originally pointed out to me by Boris Zbarsky via IRC many
months ago. The intent here is basically just whatever the computer
says it is—whether that be via the file system, the operating system,
or whatever, and whether it uses magic bytes, file extensions, or
whatever.

In other words, feel free to read that as the correct behavior is
undefined/unknown at this point.

Thanks for the explanation.

Recently, file: protocol becomes more and more important due to the
popularity of packaged web applications, including PhoneGap app, Chrome
app, Firefox OS app, Window 8 HTML app, etc (not all of them use file:
protocol directly, but underlying mechanisms are similar).
So If we can't specify a interoperable way to determine a local file's
mime type, porting of packaged web applications can be problematic in
some situations (actually my team already hit this).

I know that currently there is no standard way to determine a local
file's mime type, this may be one of the reason that mimesniff spec has
not defined a behavior here.


Well, the most basic reason is because I never delved into how it 
actually works, because I was primarily concerned with HTTP connections.


It's possible that there is no interoperable way to determine a local 
file's MIME type, but see below.



I'd like to propose a simple way to resolve this problem:
For mime types that has already been standardized by IANA and used in
web standards, determine a local file's supplied-type according to its
file extension.
This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg,
png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by
any means.

I think this rule should resolve most of the interoperability problems,
and largely maintain compatibility with current UAs' implementations.


There is already a standard in place to detect file types on the 
operating system level:


http://www.freedesktop.org/wiki/Specifications/shared-mime-info-spec/
http://cgit.freedesktop.org/xdg/shared-mime-info/

I could just refer to that and be done with it. Do you think that would 
work? (That specification has complex rules for detecting files, 
including magic bytes and whatnot, and is already used on a number of 
Linux distros and probably other operating systems.)



My second question is: does above rule apply equally to both fetching
static resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static
resources, so that .htm and .xhtml files are rendered as HTML and
XHTML respectively,
so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local
files of any type; and if setting xhr.responseType = 'document',
response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always
'application/xml'. This is significantly diverse from static fetching
behavior.

Chromium(34) set Content-Type header to null for local files of any
type; but if setting xhr.responseType = 'document', response is
parsed according to its actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting
xhr.responseType = 'blob', blob.type is the file's actual type, i.e.
'text/html' for .htm and 'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however
Content-Type header is missing.

I think rule 5.1 should be applied to both static fetching and XHR
consistently. Browsers should set Content-Type header to local files'
actual type for XHR, and interpret
them accordingly. But firefox developers think this would break some
existing codes that already rely on firefox's behavior
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Regards,
 Duan Yao.




Anne's the person to ask about XHR first, I think. I don't want to
make any judgements or claims until I hear his view on the situation.

That being said, I created the Contexts wiki article [1] and began
splitting up the mimesniff spec according to contexts [2] in an effort
to clarify this situation and make sure that all bases were covered.
It's still a work in progress, awaiting feedback from implementers and
other spec writers.

I 

Re: [whatwg] How to determine content-type of file: protocol

2014-07-28 Thread duanyao

On 07/28/2014 22:08, Gordon P. Hemsley wrote:

On 07/28/2014 08:01 AM, duanyao wrote:

On 07/28/2014 06:34, Gordon P. Hemsley wrote:

Sorry for the delay in responding. Your message fell through the
cracks in my e-mail filters.

On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification
(http://mimesniff.spec.whatwg.org):

5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set
supplied-type to the MIME type
provided by the file system.

As far as I know, no main-stream file systems record MIME type for
files. Does the spec actually want to say provided by the operating
system or
provided by the file name extension?


Yeah, you've hit a known (though apparently unrecorded) bug in the
spec, originally pointed out to me by Boris Zbarsky via IRC many
months ago. The intent here is basically just whatever the computer
says it is—whether that be via the file system, the operating system,
or whatever, and whether it uses magic bytes, file extensions, or
whatever.

In other words, feel free to read that as the correct behavior is
undefined/unknown at this point.

Thanks for the explanation.

Recently, file: protocol becomes more and more important due to the
popularity of packaged web applications, including PhoneGap app, Chrome
app, Firefox OS app, Window 8 HTML app, etc (not all of them use file:
protocol directly, but underlying mechanisms are similar).
So If we can't specify a interoperable way to determine a local file's
mime type, porting of packaged web applications can be problematic in
some situations (actually my team already hit this).

I know that currently there is no standard way to determine a local
file's mime type, this may be one of the reason that mimesniff spec has
not defined a behavior here.


Well, the most basic reason is because I never delved into how it 
actually works, because I was primarily concerned with HTTP connections.


It's possible that there is no interoperable way to determine a local 
file's MIME type, but see below.



I'd like to propose a simple way to resolve this problem:
For mime types that has already been standardized by IANA and used in
web standards, determine a local file's supplied-type according to its
file extension.
This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg,
png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by
any means.

I think this rule should resolve most of the interoperability problems,
and largely maintain compatibility with current UAs' implementations.


There is already a standard in place to detect file types on the 
operating system level:


http://www.freedesktop.org/wiki/Specifications/shared-mime-info-spec/
http://cgit.freedesktop.org/xdg/shared-mime-info/

I could just refer to that and be done with it. Do you think that 
would work? (That specification has complex rules for detecting files, 
including magic bytes and whatnot, and is already used on a number of 
Linux distros and probably other operating systems.)



Maybe no.
(1) it's a standard of *nix desktops, I doubt MS widows will adopt it, 
and maybe it's a bit heavy for mobile OS;
(2) many packaged web apps are ported from (and share codes with) normal 
web apps, and most web servers simply deduce mime type from file extension,

so doing the same thing in UAs probably results in better compatibility.
(3) UAs are already required to do mime type sniffing, which should be 
enough to correct most wrong supplied-type.

My second question is: does above rule apply equally to both fetching
static resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static
resources, so that .htm and .xhtml files are rendered as HTML and
XHTML respectively,
so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local
files of any type; and if setting xhr.responseType = 'document',
response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always
'application/xml'. This is significantly diverse from static fetching
behavior.

Chromium(34) set Content-Type header to null for local files of any
type; but if setting xhr.responseType = 'document', response is
parsed according to its actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting
xhr.responseType = 'blob', blob.type is the file's actual type, i.e.
'text/html' for .htm and 'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however
Content-Type header is missing.

I think rule 5.1 should be applied to both static fetching and XHR
consistently. Browsers should set Content-Type header to local files'
actual type for XHR, and interpret
them accordingly. But firefox developers think this would break some
existing codes that already rely on firefox's behavior
(see 

Re: [whatwg] How to determine content-type of file: protocol

2014-07-27 Thread Gordon P. Hemsley
Sorry for the delay in responding. Your message fell through the cracks 
in my e-mail filters.


On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification 
(http://mimesniff.spec.whatwg.org):

5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set 
supplied-type to the MIME type
provided by the file system.

As far as I know, no main-stream file systems record MIME type for files. Does the spec 
actually want to say provided by the operating system or
provided by the file name extension?


Yeah, you've hit a known (though apparently unrecorded) bug in the spec, 
originally pointed out to me by Boris Zbarsky via IRC many months ago. 
The intent here is basically just whatever the computer says it 
is—whether that be via the file system, the operating system, or 
whatever, and whether it uses magic bytes, file extensions, or whatever.


In other words, feel free to read that as the correct behavior is 
undefined/unknown at this point.



My second question is: does above rule apply equally to both fetching static 
resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static resources, 
so that .htm and .xhtml files are rendered as HTML and XHTML respectively,
so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local files of any 
type; and if setting xhr.responseType = 'document', response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always 
'application/xml'. This is significantly diverse from static fetching behavior.

Chromium(34) set Content-Type header to null for local files of any type; but 
if setting xhr.responseType = 'document', response is parsed according to its 
actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting xhr.responseType = 
'blob', blob.type is the file's actual type, i.e. 'text/html' for .htm and 
'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however Content-Type 
header is missing.

I think rule 5.1 should be applied to both static fetching and XHR 
consistently. Browsers should set Content-Type header to local files' actual 
type for XHR, and interpret
them accordingly. But firefox developers think this would break some existing 
codes that already rely on firefox's behavior
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Regards,
 Duan Yao.




Anne's the person to ask about XHR first, I think. I don't want to make 
any judgements or claims until I hear his view on the situation.


That being said, I created the Contexts wiki article [1] and began 
splitting up the mimesniff spec according to contexts [2] in an effort 
to clarify this situation and make sure that all bases were covered. 
It's still a work in progress, awaiting feedback from implementers and 
other spec writers.


I agree that there's a hole in how mimesniff, XHR, and Contexts 
intersect, and I'll be happy to update mimesniff to fill it, if that's 
determined to be the best course of action.


HTH,
Gordon

[1] http://wiki.whatwg.org/wiki/Contexts
[2] http://mimesniff.spec.whatwg.org/#context-specific-sniffing

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/