Re: [whatwg] How to determine content-type of file: protocol

2014-07-28 Thread Gordon P. Hemsley

On 07/28/2014 08:01 AM, duanyao wrote:

On 07/28/2014 06:34, Gordon P. Hemsley wrote:

Sorry for the delay in responding. Your message fell through the
cracks in my e-mail filters.

On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification
(http://mimesniff.spec.whatwg.org):

5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set
supplied-type to the MIME type
provided by the file system.

As far as I know, no main-stream file systems record MIME type for
files. Does the spec actually want to say "provided by the operating
system" or
"provided by the file name extension"?


Yeah, you've hit a known (though apparently unrecorded) bug in the
spec, originally pointed out to me by Boris Zbarsky via IRC many
months ago. The intent here is basically just "whatever the computer
says it is"—whether that be via the file system, the operating system,
or whatever, and whether it uses magic bytes, file extensions, or
whatever.

In other words, feel free to read that as "the correct behavior is
undefined/unknown" at this point.

Thanks for the explanation.

Recently, file: protocol becomes more and more important due to the
popularity of packaged web applications, including PhoneGap app, Chrome
app, Firefox OS app, Window 8 HTML app, etc (not all of them use file:
protocol directly, but underlying mechanisms are similar).
So If we can't specify a interoperable way to determine a local file's
mime type, porting of packaged web applications can be problematic in
some situations (actually my team already hit this).

I know that currently there is no standard way to determine a local
file's mime type, this may be one of the reason that mimesniff spec has
not defined a behavior here.


Well, the most basic reason is because I never delved into how it 
actually works, because I was primarily concerned with HTTP connections.


It's possible that there is no interoperable way to determine a local 
file's MIME type, but see below.



I'd like to propose a simple way to resolve this problem:
For mime types that has already been standardized by IANA and used in
web standards, determine a local file's supplied-type according to its
file extension.
This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg,
png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by
any means.

I think this rule should resolve most of the interoperability problems,
and largely maintain compatibility with current UAs' implementations.


There is already a "standard" in place to detect file types on the 
operating system level:


http://www.freedesktop.org/wiki/Specifications/shared-mime-info-spec/
http://cgit.freedesktop.org/xdg/shared-mime-info/

I could just refer to that and be done with it. Do you think that would 
work? (That specification has complex rules for detecting files, 
including magic bytes and whatnot, and is already used on a number of 
Linux distros and probably other operating systems.)



My second question is: does above rule apply equally to both fetching
static resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static
resources, so that .htm and .xhtml files are rendered as HTML and
XHTML respectively,
so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local
files of any type; and if setting xhr.responseType = 'document',
response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always
'application/xml'. This is significantly diverse from static fetching
behavior.

Chromium(34) set Content-Type header to null for local files of any
type; but if setting xhr.responseType = 'document', response is
parsed according to its actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting
xhr.responseType = 'blob', blob.type is the file's actual type, i.e.
'text/html' for .htm and 'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however
Content-Type header is missing.

I think rule 5.1 should be applied to both static fetching and XHR
consistently. Browsers should set Content-Type header to local files'
actual type for XHR, and interpret
them accordingly. But firefox developers think this would break some
existing codes that already rely on firefox's behavior
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Regards,
 Duan Yao.




Anne's the person to ask about XHR first, I think. I don't want to
make any judgements or claims until I hear his view on the situation.

That being said, I created the Contexts wiki article [1] and began
splitting up the mime

Re: [whatwg] How to determine content-type of file: protocol

2014-07-27 Thread Gordon P. Hemsley
Sorry for the delay in responding. Your message fell through the cracks 
in my e-mail filters.


On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification 
(http://mimesniff.spec.whatwg.org):

5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set 
supplied-type to the MIME type
provided by the file system.

As far as I know, no main-stream file systems record MIME type for files. Does the spec 
actually want to say "provided by the operating system" or
"provided by the file name extension"?


Yeah, you've hit a known (though apparently unrecorded) bug in the spec, 
originally pointed out to me by Boris Zbarsky via IRC many months ago. 
The intent here is basically just "whatever the computer says it 
is"—whether that be via the file system, the operating system, or 
whatever, and whether it uses magic bytes, file extensions, or whatever.


In other words, feel free to read that as "the correct behavior is 
undefined/unknown" at this point.



My second question is: does above rule apply equally to both fetching static 
resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static resources, 
so that .htm and .xhtml files are rendered as HTML and XHTML respectively,
so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local files of any 
type; and if setting xhr.responseType = 'document', response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always 
'application/xml'. This is significantly diverse from static fetching behavior.

Chromium(34) set Content-Type header to null for local files of any type; but 
if setting xhr.responseType = 'document', response is parsed according to its 
actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting xhr.responseType = 
'blob', blob.type is the file's actual type, i.e. 'text/html' for .htm and 
'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however Content-Type 
header is missing.

I think rule 5.1 should be applied to both static fetching and XHR 
consistently. Browsers should set Content-Type header to local files' actual 
type for XHR, and interpret
them accordingly. But firefox developers think this would break some existing 
codes that already rely on firefox's behavior
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Regards,
 Duan Yao.




Anne's the person to ask about XHR first, I think. I don't want to make 
any judgements or claims until I hear his view on the situation.


That being said, I created the Contexts wiki article [1] and began 
splitting up the mimesniff spec according to contexts [2] in an effort 
to clarify this situation and make sure that all bases were covered. 
It's still a work in progress, awaiting feedback from implementers and 
other spec writers.


I agree that there's a hole in how mimesniff, XHR, and Contexts 
intersect, and I'll be happy to update mimesniff to fill it, if that's 
determined to be the best course of action.


HTH,
Gordon

[1] http://wiki.whatwg.org/wiki/Contexts
[2] http://mimesniff.spec.whatwg.org/#context-specific-sniffing

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


Re: [whatwg] [mimesniff] The Apache workaround should not sniff random types

2014-01-16 Thread Gordon P. Hemsley

On 08/27/2013 12:26 PM, Boris Zbarsky wrote:

The current mimesniff spec says that when the Apache workaround is
applied sniffing should still be able to detect the content as
PostScript, images, videos, archives, audio formats, etc.

I feel that this poses an unacceptable security risk due to allowing
content through firewalls that is then interpreted differently by a UA.
  In particular, postscript and media formats can be used to attack
viewers and decoders.

Web compat does not require this behavior: Gecko only allows
"text/plain" and "application/octet-stream" as output types when the
Apache workaround is being applied, and we have been successfully
shipping this for a while.  I would strongly oppose changing the Gecko
behavior here due to the security implications.

Given the security risks and the lack of web compat issues, I believe
the spec should not require the behavior it currently requires.

-Boris


I have finally made this change. Please confirm that this is what you 
had in mind:


https://github.com/whatwg/mimesniff/commit/d7bafc16ee480a5dea4c27d60dd5272388e022ce

http://mimesniff.spec.whatwg.org/#rules-for-text-or-binary

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


Re: [whatwg] [mimesniff] The Apache workaround should not sniff random types

2013-11-16 Thread Gordon P. Hemsley

On 8/27/13 12:26 PM, Boris Zbarsky wrote:

The current mimesniff spec says that when the Apache workaround is
applied sniffing should still be able to detect the content as
PostScript, images, videos, archives, audio formats, etc.

I feel that this poses an unacceptable security risk due to allowing
content through firewalls that is then interpreted differently by a UA.
  In particular, postscript and media formats can be used to attack
viewers and decoders.

Web compat does not require this behavior: Gecko only allows
"text/plain" and "application/octet-stream" as output types when the
Apache workaround is being applied, and we have been successfully
shipping this for a while.  I would strongly oppose changing the Gecko
behavior here due to the security implications.

Given the security risks and the lack of web compat issues, I believe
the spec should not require the behavior it currently requires.

-Boris


I'm inclined to agree.

Having heard no objection (or, indeed, any discussion whatsoever) in the 
last 3 months, I plan to move ahead with this proposed change.


Anyone else have anything to say before I do?

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Gordon P. Hemsley

On 8/28/13 9:32 AM, Anne van Kesteren wrote:

We have thought of three approaches for zip URL design thus far:

* Using a sub-scheme (zip) with a zip-path (after !):
zip:http://www.example.org/zip!image.gif
* Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
* Using media fragments: http://www.example.org/zip#path=image.gif

High-level drawbacks:

* Sub-scheme: requires changing the URL syntax with both sub-scheme
and zip-path.
* Zip-path: requires changing the URL syntax.
* Fragments: fail to work well for URLs relative to a zip archive.

Fragments are conceptually the cleanest as the only part of a URL
that's supposed to depend on the Content-Type is the fragment.
However, if you want to link to an ID inside an HTML resource you'd
have to do #path=test.html&id=test which would require adding
knowledge to the HTML resource that it is contained in a zip archive
and have special processing based on that. And not just HTML, same
goes for CSS or JavaScript.

I'm not sure we need to consider sub-scheme if zip-path can work as
it's more complex and not very well thought out. E.g. imagine
view-source:zip:http://www.example.org/zip!test.html. (I hope we never
need to standardize view-source and that it can be restricted to the
address bar in browsers.)

zip-path makes zip archive packaging by far the easiest. If we use %!
as separator that would cause a network error in some existing
browsers (due to an illegal %), which means it's extensible there,
though not backwards compatible.

We'd adjust the URL parser to build a zip-path once %! is encountered.
And relative URLs would first look if there's a zip-path and work
against that, and use path otherwise.

Fetching would always use the path. If there's a zip-path and the
returned resource is not a zip archive it would cause a network error.

As for nested zip archives. Andrea suggested we should support this,
but that would require zip-path to be a sequence of paths. I think we
never went to allow relative URLs to escape the top-most zip archive.
But I suppose we could support in a way that

   %!test.zip!test.html

goes one level deeper. And "../image.gif" in test.html looks in the
enclosing zip. And "../../image.gif" in test.html looks in the
enclosing zip as well because it cannot ever be relative to the path,
only the zip-path.



As the following URLs suggest, the %! (or %-anything) will likely not 
work for ZIP files generated by a script using the query portion of the 
URL, as the path information will be subsumed into the last value 
without causing a network error:


http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%!example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%/example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1?example.png

(And feel free to use that script to try out any other combos.)

However, since fragments (i.e. anything beginning with '#') are already 
not sent to the server, what if you modified the URL parser to use a 
special hash-prefix combo that indicates the path? Then you could avoid 
the problem of having to make documents aware of the fact that they're 
in a ZIP because the hash-prefix combo would come before the plain hash 
which holds the ID.


So, for example:

http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1#/example.html#middle

Then you could also take the opportunity to spec the #! prefix (and 
other hash-combo prefixes) that is used by a lot of sites nowadays.


--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


Re: [whatwg] [mimesniff] More issues on the MIME Sniffing spec

2013-06-06 Thread Gordon P. Hemsley
On Thu, Jun 6, 2013 at 5:42 AM, Peter Occil  wrote:
> I want to respond to the following issues in the MIME Sniffing spec:
>
> Resources
>
> I suggest the following wording for the issue box starting with "A resource
> is..."
>
>A resource is a data item or message, such as a file or an HTTP response.
>
> I believe this covers the cases that would normally be associated with a
> MIME type.

I already have an idea about how to define "resource".

The reason it's not currently in the spec is because I recall Hixie
expressing some concern about complexity beyond "bag of bits" and I'm
waiting on feedback from him.

> Contexts
>
> I don't think the word "context" needs to be specially defined.  The start
> of section 8
> could be rewritten to remove the definition:
>
> [[
> In certain cases, it is only useful to identify resources that belong to a
> certain subset of MIME types. In these cases, it is appropriate to use a
> context-specific sniffing algorithm in place of the MIME type sniffing
> algorithm in order to determine the sniffed MIME type of a resource.
>
> This specification defines the following context-specific sniffing
> algorithms.
> ]]

On the contrary, I think it may be important to define "context", as
it is the only lens through which to see fetching and sniffing and the
like.

Currently, the HTML spec only defines "(nested) browsing context", so
I put together a wiki page that lists all the other ones that exist
implicitly:

http://wiki.whatwg.org/wiki/Contexts

I plan to rewrite the whole second half of the spec to be in terms of
contexts soon.

> Apache Bug
>
> As for the Apache bug flag, would it be useful to additionally check the
> HTTP
> headers for a Server header and check if it contains "Apache/"?  I don't
> know which
> version of Apache the bug involved was fixed in, so I can't suggest a more
> accurate
> string check.

That thought had crossed my mind, but the handling of the situation
mostly predates my editing of the spec, so I haven't given much
thought into whether the current method is the ideal one.

> MP3 Sniffing
>
> Finally, the Firefox team has recently included a patch to support sniffing
> MP3
> files better [1] and would like to document it and add it to the MIME
> Sniffing
> spec. [2]  The disadvantage, though, is that more than 512 bytes
> are required for an accurate detection.
>
> --Peter
>
> [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=862088
> [2]: https://bugzilla.mozilla.org/show_bug.cgi?id=879429
>

I'm aware of this. I was told that a proposal would be made in due
course, so I'm waiting on that.

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Review request: Parsing a MIME type

2013-06-01 Thread Gordon P. Hemsley
(Re-added the list; I hope that's OK.)

The canPlayType method (and similar mechanisms) are only
approximations of what the browser can support. The "codecs" is
generally not strictly necessary when the UA goes to actually play the
file—if the "codecs" parameter is missing, it can generally be
recovered by parsing/processing the file. Thus, it is not an
especially reliable testing method.

On Sat, Jun 1, 2013 at 8:17 PM, Peter Occil  wrote:
>> However, in order to test parameters, I have been
>> using 'charset' (because that's they only one I'm aware of that has a
>> Web-visible effect), and certain implementations may be sniffing
>> specifically for the string "charset=", which would cloud the results
>> of my testing.
>>
>
> There are other parameters that are significant in MIME types, such as
> "codecs",
> which is used in certain newer HTML5 APIs. For example, some very
> recent browsers support the canPlayType method of the  element,
> which takes a MIME type as a parameter (though it doesn't work well in OS X
> versions of
> Firefox 21, apparently [1]).  The parameters, especially the "codecs"
> parameter,
> can make a difference in what value is returned by the API.
>
> [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=875385
>



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Review request: Parsing a MIME type

2013-06-01 Thread Gordon P. Hemsley
On Sat, Jun 1, 2013 at 11:41 AM, Gordon P. Hemsley  wrote:
> On Fri, May 31, 2013 at 11:50 PM, Peter Occil  wrote:
>> * The word "base64" can only appear at the end of the MIME type, so that a
>> data URL like
>>   "data:application/example;base64;foo=bar,AA==" will not be encoded in
>> base64, strictly speaking. A parameter name (base64 or otherwise)
>>   cannot otherwise appear without a parameter value.
>
> As I mentioned, "strictly speaking" doesn't matter, as all browsers do
> the same thing, according to the resource you linked: base64
> parameters with values are fine; base64 boolean parameters in other
> than last place are warnings. (Not sure what the reasoning behind that
> distinction is, but that's what reality is.)

It seems I read the purpose of the test wrong for base64 parameters
with values: They're fine insofar as they're allowed, but they don't
trigger base64 decoding (except in Safari?), unlike if the boolean
base64 parameter is in a non-last position.

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Review request: Parsing a MIME type

2013-06-01 Thread Gordon P. Hemsley
On Fri, May 31, 2013 at 11:50 PM, Peter Occil  wrote:
>
>> * Another important point to notice is the fact that this algorithm
>> allows parameter names to appear without values. This is useful in
>> situations such as the "base64" option in data: URLs that use the mere
>> presence or absence of a parameter to set its boolean value.
>
>
> Since you mention data URLs I should note that data URLs can be percent
> encoded, which HTTP
> and MIME headers can't be. This raises additional considerations when
> parsing a data URL's MIME type correctly;
> see reference [1] for test cases.  In particular:
>
> [1]: http://greenbytes.de/tech/tc/datauri/

This is a very useful resource; thank you for pointing it out to me.

Realize now that that's the only thing that matters: What do the browsers do?

(And percent encoding doesn't matter, as that gets handled before the
parsing begins.)

> * A data URL that begins with "data:," or "data:;base64," (with no MIME
> type) is assumed to have the MIME type
>  "text/plain;charset=us-ascii" under RFC2397.
> * A data URL that begins with  "data:;" (with no type or subtype, but with
> parameters) is assumed to have the MIME type
>  "text/plain" under RFC2397.

An empty or invalide MIME type will get treated as unknown and will
eventually be sniffed (if it isn't already). I'll have to consider
what to do with the base64 and other parameters parts, though.

> * The word "base64" can only appear at the end of the MIME type, so that a
> data URL like
>   "data:application/example;base64;foo=bar,AA==" will not be encoded in
> base64, strictly speaking. A parameter name (base64 or otherwise)
>   cannot otherwise appear without a parameter value.

As I mentioned, "strictly speaking" doesn't matter, as all browsers do
the same thing, according to the resource you linked: base64
parameters with values are fine; base64 boolean parameters in other
than last place are warnings. (Not sure what the reasoning behind that
distinction is, but that's what reality is.)

So it seems the only issue I have to worry about is what to do with
MIME types which only have parameters.

Regards,
Gordon

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Review request: Parsing a MIME type

2013-05-31 Thread Gordon P. Hemsley
Hello all,

This is a request seeking feedback and review on the MIME Sniffing
algorithm to "parse a MIME type":

http://mimesniff.spec.whatwg.org/#parse-a-mime-type

After numerous iterations, I think it is in a state that accurately
reflects the best current practices for interoperability.

As is common with such things, there are numerous points in this
algorithm where implementations do not agree. In general, Firefox and
Chrome tend to pattern together, as do IE and Opera. Safari often
patterns on its own, in favor of a more literal interpretation of the
various RFCs on the matter.

At times, I have had to make a decision as to which was the best
approach. This usually results in half of the implementations being in
violation of the spec; I hope, in those instances, the implementations
in question can be updated to become interoperable with the rest.

With that being said, there are two specific points I want to raise:

(1) The more recent RFCs on the matter restrict type, subtype, and
parameter names to 127 characters. No implementation actually enforces
this limit, but I have included it in the algorithm (relevant points
appear in red) because I think it would be better and safer for both
the user and the user agent to do so.

(2) Based on my analysis of existing implementations, anything that
occurs between the semicolon (and any first whitespace) and the equals
sign is treated as the parameter name, including any whitespace before
the equals sign. However, in order to test parameters, I have been
using 'charset' (because that's they only one I'm aware of that has a
Web-visible effect), and certain implementations may be sniffing
specifically for the string "charset=", which would cloud the results
of my testing. Any enlightenment into this issue would be much
appreciated.

I also have a few general points:

* You may notice in the algorithm that I am using hybrid terminology,
sometimes talking about bytes and sometimes talking about characters.
This is mostly because I haven't decided/determined whether to treat a
MIME type as ASCII or as UTF-8. I think there are arguments on both
sides of the issue, but I'm eager to hear your opinions and advice
(especially about how I might phrase the algorithm if it were written
in terms of characters instead of bytes).

* One of the most controversial parts of this algorithm might be the
issue of what to do when a parameter appears more than once. (The RFCs
suggest that the MIME type should be treated as invalid in such a
case, but no implementation actually treats it that way.) I have opted
to make a later appearance of a parameter override and replace an
earlier appearance of a parameter. Modulo caveat (2) above, this is
only done in half the implementations; in particular, IE and Opera
appear to use the first instance of the parameter as the canonical
value.

* Another important point to notice is the fact that this algorithm
allows parameter names to appear without values. This is useful in
situations such as the "base64" option in data: URLs that use the mere
presence or absence of a parameter to set its boolean value. Note,
however, that a parameter that has been given an explicit value (even
if that value is the empty string) does not get overridden by the
later appearance of a boolean parameter of the same name.

I think those are the important points of background information you
need to know in order to evaluate this algorithm.

I look forward to your response.

Regards,
Gordon

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Complete MIME type parsing algorithm for section 5

2013-05-28 Thread Gordon P. Hemsley
Peter,

The main reason I haven't yet responded to your e-mails is because I'm
still actively working on improving and testing the algorithm.

But I do want you to know that your comments are valuable to me,
because they point out the areas I need to consider and test.

And while you should continue to bring inconsistencies with RFCs to my
attention, you should keep in mind that some of these inconsistencies
may be "willful violations".

The IETF has the power to restrict the format of the MIME types that
are formally registered, but they have little power over what winds up
deployed in the wild.

Browsers, on the other hand, need to know how to handle all sorts of
things that the IETF would consider invalid—and in many cases existing
browsers do things in violation of the RFCs.

Since one of the main goals of this spec, and the WHATWG as a whole,
is to improve interoperability, making the spec consistent with a
majority of browsers overrides making the spec consistent with
existing RFCs.

One specific comment I have about your latest e-mail: I think you
should read the algorithm again, because I'm fairly sure that it does
guard against empty values for type, subtype, and parameter names.
(But I'll check again.)

Regards,
Gordon

On Tue, May 28, 2013 at 4:25 PM, Peter Occil  wrote:
>
> I see you've updated the MIME sniffing algorithm in response to my feedback.
> Here
> I'll go over the difference and I want you to comment on these.
>
> 1. I assume the term "whitespace character" means the same as a "whitespace
> byte" under
> the MIME Sniffing spec.  As such the use of that term is inadequate for the
> following reasons.
>
>   * A whitespace character includes 0x0C, form feed (FF), which is not
> considered whitespace
>  in either HTTP or the Internet Message Format (IMF, RFC5322).
>
>  For example, the following would not be well-formed under HTTP or IMF:
>
>  text/plain{FF}; charset=utf-8
>
>  But the current algorithm would consider that string well-formed
> anyway.
>
>   * All steps in the document that are the same as step 7 skip all
> whitespace characters, even
>  if the whitespace isn't well formed under HTTP or IMF.  For example, a
> bare carriage
>  return (CR) or line feed character (LF) is not allowed, and a CR-LF
> pair not followed by either
>  SPACE or TAB is also not allowed. IMF also allows comments within
> whitespace.
>
>  For example, the following would not be well-formed under HTTP or IMF:
>
>  text/plain;{CR} charset=utf-8
>  text/plain;{LF} charset=utf-8
>  text/plain;{CR}{LF}charset=utf-8
>
>  (Note the lack of space in the last example. Note also that folding
> whitespace is deprecated
>  under the current HTTP draft.)
>
>  And the following examples would be allowed under IMF, but not HTTP:
>
>  (comment) text/plain; charset=utf-8
>  text/plain; (comment) charset=utf-8
>  text/plain; (comment (nested)) charset=utf-8
>  text/plain; charset=utf-8 (comment)
>  text/plain; {CR}{LF} (comment) charset=utf-8
>
> 2. While the type, subtype, and parameter name are checked for their length,
> the other rules
>  for wellformedness are not checked in your version, namely, that they must
> not be empty,
>  contain a byte that isn't a MIME type byte (see my original message), or
> begin with a byte that
>  isn't an ASCII alphanumeric.
>
>  For example, the following would not be well-formed under RFC6838:
>
>  te*xt/plain;charset=utf-8
>  text/pl*ain;charset=utf-8
>  text/plain;ch*arset=utf-8
>  text/plain;=utf-8
>  text/;charset=utf-8
>  /plain;charset=utf-8
>
>  The first three examples are because "*" isn't a MIME type byte.
>
>
> 3. Unquoted parameter values are not checked to ensure that they are not
> empty and do
>  not contain a byte that isn't a parameter value byte (see my original
> message).
>
>  For example, the following would not be well-formed under HTTP or MIME:
>
>  text/plain;charset=ut?f-8
>  text/plain;charset=utf=8
>
> 4. Quoted parameter values are not checked to ensure that they do not
> contain a 0x7F byte
>  or a byte other than TAB (0x09) that is less than 0x20.
>
>  For example, the following would not be well-formed under HTTP or MIME:
>
>  text/plain;charset="utf{LF}-8"
>  text/plain;charset="utf{0x7F}-8"
>  text/plain;charset="utf\{LF}-8"
>  text/plain;charset="utf\{0x7F}-8"
>
> Please give your comments.
>
> --Peter
>
>
> -Original Message- From: Gordon P. Hemsley
> Sent: Saturday, May 25, 2013 1:26 PM
>
> To: Peter Occil
> Cc: WHATWG
> Subject: Re:

Re: [whatwg] [mimesniff] Complete MIME type parsing algorithm for section 5

2013-05-25 Thread Gordon P. Hemsley
On Sat, May 25, 2013 at 12:46 PM, Peter Occil  wrote:
> My algorithm skips only SPACE and TAB instead of all whitespace characters
> because it assumes that the field value was already extracted from
> Content-Type according to the HTTP/HTTPbis spec (0x0C, form feed, is never
> considered whitespace in HTTP headers). In particular, it assumes that
> folding whitespace (obs-fold) was replaced with spaces (or the message with
> obs-fold rejected) before the Content-Type value was interpreted.

Thanks for your detailed explanation.

It'll take me a little while to evaluate what you've proposed here,
but in the meantime: Keep in mind that the Content-Type header is not
the only source for a MIME type. This algorithm needs to consider MIME
types from all possible sources.

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Complete MIME type parsing algorithm for section 5

2013-05-25 Thread Gordon P. Hemsley
Peter,

The burden is on you to describe your proposals and what their purpose
and benefit would be.

How does this proposed algorithm differ from what is already in the
spec? How is it better?

Regards,
Gordon

On Sat, May 25, 2013 at 3:58 AM, Peter Occil  wrote:
> I present this draft of the complete algorithm for parsing a MIME type.  I 
> would appreciate comments.
>
> --Peter
>
> 
>
> An ASCII alphanumeric is a byte or character in the ranges 0x41-0x5A, 
> 0x61-0x7A, and 0x30-0x39.
> A MIME type byte is an ASCII alphanumeric or one of the following bytes: ! # 
> $ & ^ _ . + -
> A parameter value byte is a MIME type byte or one of the following bytes: % ' 
> * ` | ~
>
> To parse a MIME type, run the following steps:
>
> 1. Let length be the length of the byte sequence of the MIME type.
> 2. If length is less than 1, return undefined.
> 3. Let pointer be 0.  Pointer is a zero-based index to the current byte in 
> the byte sequence.
> 4. Advance pointer to the next byte other than 0x20 (SPACE) or 0x09 (TAB).
> 5. Let type be the byte string from the current byte up to but not including 
> the next "/" byte. Advance pointer to the next "/" byte.
> 6. If the current byte isn't "/", return undefined.
> 7. Increment pointer by 1.
> 8. Let subtype be the byte string from the current byte up to but not 
> including the next 0x20 (SPACE), 0x09 (TAB), or ";" byte.  Advance pointer to 
> the next 0x20 (SPACE), 0x09 (TAB), or ";" byte.
> 9. If type is empty, contains a byte that isn't a MIME type byte, or doesn't
> begin with an ASCII alphanumeric, or is longer than 127 bytes, return 
> undefined.
> 10. If subtype is empty, contains a byte that isn't a MIME type byte, or 
> doesn't begin with an ASCII alphanumeric, or is longer than 127 bytes, return 
> undefined.
> 11. Convert type and subtype to ASCII lowercase.
> 12. Let parameters be an empty dictionary.
> 13. Run the following substeps in a loop.
>  1. Advance pointer to the next byte other than 0x20 (SPACE) or 0x09 
> (TAB).
>  2. If pointer is equal to length, return type, subtype, and parameters.
>  3. If the current byte isn't ";", return undefined.
>  4. Increment pointer by 1.
>  5. If pointer is equal to length, return type, subtype, and parameters.
>  6. Let parameter be the byte string from the current byte up to but not 
> including the next "=" byte. Advance pointer to the next "=" byte.
>  7. If parameter is empty, contains a byte that isn't a MIME type byte, 
> or doesn't begin with an ASCII alphanumeric, or is longer than 127 bytes, 
> return undefined.
>  8. If parameters contains a mapping for parameter, return undefined.
>  9. Convert parameter to ASCII lowercase.
>  10. If the current byte isn't "=", return undefined.
>  11. Increment pointer by 1.
>  12. If the current byte equals 0x22 (quotation mark), run the following 
> substeps:
>   1. Let value be an empty byte string.
>   2. Increment pointer by 1.
>   3. Run these substeps in a loop.
>   1. If pointer is equal to length, return type, subtype, 
> and parameters.
>   2. If the current byte equals 0x7F or is less than 
> 0x20, and the current byte isn't TAB (0x09), return type, subtype, and 
> parameters.
>   3. If the current byte equals 0x22 (quotation mark), 
> increment pointer by 1 and terminate this loop.
>   4. Otherwise, if the current byte is "\", increment 
> pointer by 1. Then, if there is a current byte, append that byte to value.
>   5. Otherwise, append the current byte to value.
>   6. Increment pointer by 1.
>   4. Add the mapping of parameter to value to the parameters 
> dictionary.
>  13. Otherwise, run these substeps:
>   1. Let value be the byte string from the current byte up to but 
> not including the next 0x20 (SPACE), 0x09 (TAB), or ";" byte.  Advance 
> pointer to the next 0x20 (SPACE), 0x09 (TAB), or ";" byte.
>   2. If value is empty or contains a byte that isn't a parameter 
> value byte, return undefined.
>   3. Add the mapping of parameter to value to the parameters 
> dictionary.
>
> ---
>
>



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] An alternative approach to section 9 of Mime Sniffing

2013-05-25 Thread Gordon P. Hemsley
Section 5 is highlighted with all that red warning stuff precisely
because it is known to be incomplete and insufficient. I haven't yet
decided how I'm going to go about writing that up (and it isn't
inherently obvious that what is there now is bad). So that's not the
best example; and it certainly doesn't have anything to do with
section 9 (at least, not with regard to formatting).

I still don't understand what problem you're trying to solve (and if I
don't understand the problem, I can't come up with a solution). Are
you just having trouble reading and understanding what's there?

MIME Sniffing and WebVTT have very different usecases and, in some
ways, very different audiences. I don't think you can directly compare
the two.

Gordon

On Sat, May 25, 2013 at 1:58 AM, Peter Occil  wrote:
> What I think is that even if an ABNF won't be the normative definition of a
> syntax format, it can help put the format's syntax into a higher-level
> perspective and aid understanding of its syntax: once we understand, for
> example, what the Content-Type header field value ought to contain, in the
> form of an ABNF or in some other way, it will be easier to write processing
> rules for that field value in the spec.  (Right now I'm in the process of
> rewriting section 5 of the MIME sniffing spec.)
>
> Take the WebVTT spec for example.  For each part of the WebVTT format
> there's a definition of what that part contains in terms of characters, and
> the actual processing rules for parsing that part.  For example, the
> definition for "WebVTT cue timings" and the algorithm to "collect WebVTT cue
> timings and settings." The definition aids understanding of the syntax for
> WebVTT cue timings and informs how the rules for collecting WebVTT cue
> timings are written in the WebVTT spec.
>
>
> --Peter
>
> -Original Message- From: Anne van Kesteren
> Sent: Friday, May 24, 2013 1:28 AM
>
> To: Peter Occil
> Cc: WHATWG
> Subject: Re: [whatwg] An alternative approach to section 9 of Mime Sniffing
>
> On Thu, May 23, 2013 at 2:49 PM, Peter Occil  wrote:
>>
>> Explain further why you don't recommend ABNF for this case.
>
>
> We don't recommend ABNF in general because often ABNF results in a
> mismatch between prescribed and actual processing. E.g. Content-Type
> is defined as an ABNF and technically "text/html;" does not match that
> ABNF, but everyone (logically) processes that as "text/html" without
> parameters.
>
> It's much better to define the actual processing so implementers are
> less inclined to take shortcuts when implementing (test suites also
> help, but they're typically written way-after-the-fact).
>
>
>> You should also explain whether another change to make section 9 more
>> readable is
>> appropriate (though it currently is relatively readable as is).
>
>
> I'll leave that to Gordon.
>
>
> --
> http://annevankesteren.nl/



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] An alternative approach to section 9 of Mime Sniffing

2013-05-23 Thread Gordon P. Hemsley
The pattern matching algorithm is used because certain patterns
require other-than-exact matching. That is why the "pattern mask"
exists. This is particularly important for the "rules for identifying
an unknown MIME type" (defined in 10.1), which matches ASCII
characters case-insensitively; it is also important for a number of
patterns that contain unimportant bytes that should be ignored (like
WebP, in your example).

The algorithm lays out the information in tabular form because that
makes clearer the separation between the important bytes and the
unimportant (or case-insensitive) bytes. Keep in mind that
implementations may read one byte at a time; using ABNF would give
them no benefit, and would likely make things more confusing.

I wonder: What problem are you trying to solve with this proposal?

(In the future, please add "[mimesniff]" to the beginning of your
subject line for MIME Sniffing discussions; this will ensure that I
see them and pay attention to them more quickly.)

Regards,
Gordon

On Thu, May 23, 2013 at 2:10 AM, Peter Occil  wrote:
> I propose rewriting section 9 and parts of section 10 in a different way, to 
> use the ABNF format in RFC 5234. (Note that ABNFs are already  used in the 
> current Fetch specification.) With this approach, the definitions for "byte 
> pattern",  "pattern mask", and the "pattern matching algorithm" can be 
> eliminated (all of which are found before section 9.1).
>
> An example for the image pattern matching algorithm is given below.
>
> ---
>
> 9.1  Matching an image type pattern
>
> The image pattern matching algorithm takes a byte sequence as input.  The 
> algorithm goes through the following image types in the order given.  For 
> each image MIME type given below, if the start of the byte sequence matches 
> its ABNF, return the concatenation of "image/" and the name of the ABNF (in 
> lowercase), and terminate the image pattern matching algorithm.
>
> vnd.microsoft.icon = %x00.00.01.00
>; A Windows Icon signature.
> bmp = %x42.4D
>; The string "BM", a BMP signature.
> gif = %x47.49.46.38 (%x37 / %x39) %x61
>; The string "GIF87a" or "GIF89a", a GIF signature.
> webp = %x52.49.46.46 4OCTET %57.45.42.50.56.50
>; The string "RIFF" followed by four bytes followed by the string "WEBPVP".
> png = %x89.50.4E.47.0D.0A.1A.0A
>; The byte 0x89 followed by the string "PNG"
>; followed by CR LF SUB LF, the PNG signature.
> jpeg = %xFF.D8.FF
>; The JPEG Start of Image marker followed by the indicator
>; byte of another marker.
>
> If the start of the byte sequence doesn't match any ABNF given above, return 
> undefined.
>
> ---
>
> I would appreciate comments.
>
> --Peter



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Priority between and content-disposition

2013-05-08 Thread Gordon P. Hemsley
On Wed, May 8, 2013 at 12:21 PM, Boris Zbarsky  wrote:
> On 5/8/13 12:15 PM, Gordon P. Hemsley wrote:
>>
>> Perhaps. But maybe I'm not clear on what exactly the alternate
>> proposal is. Are you suggesting not supporting the @download
>> attribute? Or just ignoring it when Content-Disposition specifies a
>> filename? (I would suggest that neither is the appropriate response.)
>
>
> What Gecko implements right now is:
>
> 1)  @download is ignored for non-same-origin links.
> 2)  If Content-Disposition specifies a filename, that filename is used
> no matter what @download says.

I understand now the motivation for this, but I would think that it
would remove a lot of the usefulness of the @download attribute: If
you have the same origin, you probably already have access to (a) name
the file appropriately in the first place, or (b) set the
Content-Disposition header to send the appropriate filename. No?

>>> This is not trivial, since sniffing can easily fail on files that are
>>> both
>>> HTML and png or both HTML and exe at the same time.  There's a good bit
>>> of
>>> research on things like this.
>>
>>
>> Yes, and that research has already gone into creating the mimesniff
>> standard, has it not? I'm suggesting use the existing algoirthm(s) in
>> an additional arena, not creating a new, separate algorithm.
>
>
> The mimesniff standard doesn't try to sniff for types UAs don't render
> natively, which is what would be needed here.

I'm not so sure about that, but I'll leave it to someone else to
argue. (If you determine a file to be a PNG, then you suggest a .png
extension, regardless of whether there might be an embedded
executable; if you don't support the file format, then how do you know
that it isn't supposed to be an executable in the first place? —and
what is it doing on the Web?)

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Priority between and content-disposition

2013-05-08 Thread Gordon P. Hemsley
On Wed, May 8, 2013 at 12:01 PM, Boris Zbarsky  wrote:
> On 5/8/13 10:45 AM, Gordon P. Hemsley wrote:
>>
>> I still think @download takes priority.
>>
>> The Content-Disposition header says, "Nevermind what filename the URL
>> shows; this is really file B.txt."
>>
>> The @download attribute says, "Nevermind what filename this link would
>> normally be; let's just consider it A.txt."
>
>
> OK, that's at least a reasonable argument for the behavior.  ;)
>
>
>> That seems like quite a sophisticated attack that relies on a lot of
>> things falling into place all at once.
>
>
> Uh... yes.  Like most browser exploits.

Perhaps. But maybe I'm not clear on what exactly the alternate
proposal is. Are you suggesting not supporting the @download
attribute? Or just ignoring it when Content-Disposition specifies a
filename? (I would suggest that neither is the appropriate response.)

>> Then I think it is the responsibility of the UA to sniff the file and
>> protect the user from such attempts to mislead.
>
>
> This is not trivial, since sniffing can easily fail on files that are both
> HTML and png or both HTML and exe at the same time.  There's a good bit of
> research on things like this.

Yes, and that research has already gone into creating the mimesniff
standard, has it not? I'm suggesting use the existing algoirthm(s) in
an additional arena, not creating a new, separate algorithm.

If a file from an image sharing site is served as (or determined to
be, via the sniffing algorithms) image/png, for example, then the UA
should suggest a filename with a .png extension, ignoring any
suggestion by the author for a .exe extension. (Whether you want to
change it to "A.png" or "A.exe.png" is debatable, I suppose.)

>> I'm not sure I have the resources to do extensive real-world testing
>> of this (and that documentation suggests it has been superseded in
>> more modern OSes), but I don't think it would be unreasonable for the
>> UA to override or augment the filename suggested by the @download
>> attribute it if determines that it would not be in the best interest
>> of the user to use the suggested filename unchanged.
>
>
> Phrased that way, using the Content-Disposition filename is a perfectly
> valid "override if not in the best interest of the user" behavior, fwiw.
>
> -Boris
>

True. But doesn't that imply a rejection of my aforementioned
"reasonable argument"?

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Priority between and content-disposition

2013-05-08 Thread Gordon P. Hemsley
On Wed, May 8, 2013 at 9:43 AM, Boris Zbarsky  wrote:
> On 5/8/13 6:53 AM, Gordon P. Hemsley wrote:
>>
>> It's not clear to me which of the two factors you take issue with.
>
>
> The question of which filename takes priority.
>
>
>> The second sentence very clearly suggests
>> that "A.txt" would be the filename presented to the user by default in
>> the save dialog.
>
>
> No, it suggests that A.txt is what the page author recommends.
>
> If, at the same time, B.txt is what the server author recommends, what
> should happen?

I still think @download takes priority.

The Content-Disposition header says, "Nevermind what filename the URL
shows; this is really file B.txt."

The @download attribute says, "Nevermind what filename this link would
normally be; let's just consider it A.txt."

>>> There is if you allow cross-origin @download.
>>>
>>> There is if you allow untrusted markup on your server and don't sanitize
>>> away @download (should it be sanitized away?  Unclear).
>>
>>
>> I'm still not seeing what the problem is. All this does is make the
>> browser treat the link as if the user followed it and then went File >
>> Save Page As
>
>
> No, because in that case the browser will definitely use the
> Content-Disposition filename, not the one from @download.

OK, technically, the way I phrased it, yes. But what I meant was that
it rolls a bunch of steps into one, telling the browser that the link
should be downloaded and named per suggestion.

>> What are the security concerns, cross-origin or otherwise?
>
>
> One concern is being able to do this:
>
> href="http://some-bank/statement.pdf";>
>
> cross-site and combining it with something that lets you read
> known-location.pdf (e.g. a file://-specific privacy hole that only applies
> to some filenames, or an  that the user has already filled
> in).

That seems like quite a sophisticated attack that relies on a lot of
things falling into place all at once. I'm not sure that should block
the use of the attribute in and of itself.

> Another concern is if you upload a file to an image-sharing site, but it
> happens to be a Windows executable.  Then you link to it with:
>
>   http://image-sharing-site/whatever";>
>
> and wait for the user to download and double-click.  This relies on the user
> thinking the file came from image-sharing-site so must be an image.  UAs may
> do mitigations here by changing the suggested filename, of course.

Then I think it is the responsibility of the UA to sniff the file and
protect the user from such attempts to mislead.

At the very least, the download UI could specify the actual type of
the file that is being downloaded. (More on how to protect users who
don't read that below.)

> Generally, allowing this sort of thing opens up several new phishing nd
> social engineering attack vectors, and it's not clear that we want that.

There is a price to freedom, as they say. We shouldn't let a few
rotten apples spoil the whole bunch.

>> Well, what I should have said is, there is no content sniffing beyond
>> what is already done for regular page saves. (The UI can show the MIME
>> type or format of the file in the download box, as it would for any
>> file it doesn't handle natively.)
>
>
> It can, and users routinely ignore that.
>
>
>> Ah, I admit, I'm a bit biased towards Mac in that regard. It's been a
>> while since I used Windows. But I'd be surprised to find out that the
>> browser (Firefox, in the case I have in mind) changes the extension in
>> the suggested filename (e.g. "example.php" for an HTML file) on
>> Windows but not on Mac
>
>
> It sure used to in some cases, partially in concert with the Windows
> filepicker.  See the (scant) documention for lpstrDefExt at
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms646839%28v=vs.85%29.aspx
> and I suggest actually doing some experimentation across the different save
> variants (save image, save link as, save page as, click on something with
> content-disposition:attacment) on several OSes to see the behavior.  There
> is certainly a good bit of code in the various file-saving codepaths in
> Firefox that attempts to ensure extensions match MIME types, to forbid
> saving things with certain extensions, etc.
>
> Also note that Chrome will change extensions on at least @download filenames
> to match the MIME type; I haven't experimented in detail with its behavior
> for other cases.  And I haven't experimented much with other browsers in
> this area, though I expect all have some interesting behavior.
&

Re: [whatwg] Priority between and content-disposition

2013-05-08 Thread Gordon P. Hemsley
On Tue, May 7, 2013 at 10:18 PM, Boris Zbarsky  wrote:
> On 5/7/13 5:54 PM, Gordon P. Hemsley wrote:
>>
>> A @download attribute with a value would override both factors, like so:
>> (1) Download it.
>> (2) "A.txt"
>
> Why?
>
> You say this as if it were obvious, but it's not obvious to me at all...
> What's the reasoning that makes this the desirable behavior?

It's not clear to me which of the two factors you take issue with.

Here's what the spec says:

"The download attribute, if present, indicates that the author intends
the hyperlink to be used for downloading a resource. The attribute may
have a value; the value, if any, specifies the default file name that
the author recommends for use in labeling the resource in a local file
system."

I interpret that first sentence to mean that the file should be
downloaded (disposition type = attachment) rather than displayed
(disposition type = inline). The second sentence very clearly suggests
that "A.txt" would be the filename presented to the user by default in
the save dialog.

>> I don't see what the security concerns might be: There is no
>> difference here than what is already available
>
> There is if you allow cross-origin @download.
>
> There is if you allow untrusted markup on your server and don't sanitize
> away @download (should it be sanitized away?  Unclear).

I'm still not seeing what the problem is. All this does is make the
browser treat the link as if the user followed it and then went File >
Save Page As

What are the security concerns, cross-origin or otherwise?

>> AFAICT, there are no content
>> sniffing or cross-domain issues at play.
>
> But there are; see above.

Well, what I should have said is, there is no content sniffing beyond
what is already done for regular page saves. (The UI can show the MIME
type or format of the file in the download box, as it would for any
file it doesn't handle natively.)

>> results when saving a file; they don't do any file extension vs. file
>> format checking.
>
> Uh... that depends on exactly how you save and your OS.  Browsers commonly
> do file extension vs MIME type checking on Windows.  Behavior on other OSes
> varies, and varies across browsers.
>
> -Boris

Ah, I admit, I'm a bit biased towards Mac in that regard. It's been a
while since I used Windows. But I'd be surprised to find out that the
browser (Firefox, in the case I have in mind) changes the extension in
the suggested filename (e.g. "example.php" for an HTML file) on
Windows but not on Mac, and I would argue that that perhaps should not
be the case.

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Priority between and content-disposition

2013-05-07 Thread Gordon P. Hemsley
I realize this is an old thread, so apologies if this has already been
resolved. The discussion that originally followed seemed to have
gotten off track, so I wanted to try to clarify things.

First off, there are two factors to consider:
(1) Whether to download the file or display it.
(2) What filename to suggest for the file when it is downloaded.

In the general case, with a normal  and no Content-Disposition
header (or the plain 'Content-Disposition: inline' header, listed as
(1) originally), the answers are:
(1) Display it.
(2) Whatever the filename on the server is (e.g. "page.txt" or
"example.php"), modulo OS restrictions.

In the case of a normal  and a 'Content-Disposition: inline;
filename="B.txt"' header (listed as (2) originally), the answers are:
(1) Display it.
(2) "B.txt"

Changing the disposition type doesn't change much, with a normal  and a 'Content-Disposition: attachment; filename="B.txt"' header
(listed as (3) originally):
(1) Download it.
(2) "B.txt"

So now, the question is, what effect does a @download attribute have?
Nothing too surprising.

An empty @download attribute would override the 1st factors above so
that they are always "Download it."

A @download attribute with a value would override both factors, like so:
(1) Download it.
(2) "A.txt"

Thus, the @download attribute acts to override the Content-Disposition
header, giving the following hierarchy:


@download > Content-Disposition > URL


Or, in pseudocode (with the assumption that if X has Y, then X is also present):


disposition_type = ( @download is present ) ? "attachment" : ( (
Content-Disposition header is present ) ? Content-Disposition
disposition type : "inline" );
suggested_filename = ( @download has a value ) ? value of @download :
( ( Content-Disposition has filename parameter ) ? Content-Disposition
filename value : filename from URL );


I don't see what the security concerns might be: There is no
difference here than what is already available, except that there's
now an additional way to specify it. AFAICT, there are no content
sniffing or cross-domain issues at play. Browsers already give strange
results when saving a file; they don't do any file extension vs. file
format checking. (For example, the output of a .php or .cgi or .py
file on a server is usually HTML, yet browsers don't generally make
any attempt to change the file extension to .html when saving the
file, IME.)

Does this make sense? Am I missing anything?

Regards,
Gordon


On Sat, Mar 16, 2013 at 9:49 PM, Jonas Sicking  wrote:
> It's currently unclear what to do if a page contains markup like  href="page.txt" download="A.txt"> if the resource at audio.wav
> responds with either
>
> 1) Content-Disposition: inline
> 2) Content-Disposition: inline; filename="B.txt"
> 3) Content-Disposition: attachment; filename="B.txt"
>
> People generally seem to have a harder time with getting header data
> right, than getting markup right, and so I think that in all cases we
> should display the "save as" dialog (or display equivalent download
> UI) and suggest the filename "A.txt".
>
> The spec is currently defining something else at least for 3.
>
> Potentially there are reasons to do something different in the case
> when the linked resource lives off of a different origin since in that
> case there might be security reasons to use the filename or
> disposition of the server that is actually serving up the content.
> However I don't think we can expect people to indicate
> "Content-Disposition: inline" in order to protect resources. Nor do I
> think that simply using a different filename is going to meaningfully
> protect downloaded content. So I think a stronger UI warning is needed
> in this scenario.
>
> Firefox currently doesn't support cross-origin @download references,
> so I don't have any meaningful implementation experience to share
> regarding that scenario.
>
> / Jonas



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] HTML differences from HTML4 document updated

2013-05-07 Thread Gordon P. Hemsley
Simon,

I think it would be good to consider the target audiences, of which
there are probably many:

You have the audience who is worried that HTML5 is some grand
departure from the HTML 4.01 they (think they) know and love. For
them, you'll want to describe what exactly has been removed and why,
instilling the idea of a separation between semantic and
presentational markup.

Then you have the audience that is excited to see what they can do now
with HTML5 that they couldn't do with HTML 4.01. For them, you'd list
the new elements and attributes and such.

Then you probably have some other incidentals such as things that were
removed or changed just because they were never implemented or people
never used them. These probably don't fall into either of the two
categories above.

But you also have another issue to consider: For this document, the
difference between the W3C's concept of specification snapshots and
WHATWG's concept of a living standard is not trivial. For the former,
you can have snapshot documents detailing the differences between each
snapshot specification; for the latter, you need a living document
that is anchored by a fixed point at one end (HTML 4.01).

This raises the question of the purpose of this document: Is it to
simplify the transition from HTML 4.01 to HTML5+? Or is it to act as
an HTML changelog from here on out? Because I think attempting to do
both within a single document will become unwieldy as time goes on.

Regards,
Gordon


On Tue, May 7, 2013 at 5:00 AM, Simon Pieters  wrote:
> On Mon, 06 May 2013 16:50:03 +0200, Jukka K. Korpela 
> wrote:
>
>>> I don't think this is of particular importance.
>>
>>
>> If it isn't, why not use the correct spelling?
>
>
> Mostly to be consistent with "HTML5".
>
>
>> When referring to specifications, it is usually a good idea to use their
>> own spelling, even when it is odd and confusing.
>>
>>> HTML 4.01 is intended. The differences between revisions of HTML4 is out
>>> of scope.
>>
>>
>> Then the heading should say "HTML 4.01".
>
>
> It's longer, and it's not clear to me that people are actually confused
> about what "HTML4" refers to.
>
>
>>> "Modern HTML differences from HTML4"? I'm not convinced that's a win.
>>> "Near-future" seems wrong since it's more like "current".
>>
>>
>> The difficulty here directly reflects the vague nature of HTML5: it partly
>> tries to describe HTML as actually implemented and partly specifies features
>> that should (or "shall") be implemented. Hence it is both modern and
>> (intended to be) near-future.
>>
>> But the fundamental difficulty is that you are trying to describe a
>> specific version, or set of versions, of HTML without giving it a proper
>> name or version number.
>>
>> Since WHATWG does not use a proper name for its version (the title is just
>> "HTML"), I think the only way to refer to it properly is to prefix it with
>> "WHATWG". This would lead to the title
>>
>> "Differences of HTML5 and WHATWG HTML from HTML 4.01"
>
>
> Here "HTML5" is supposed to refer to "W3C HTML5 and W3C HTML5.1"?
>
> How about I go back to the original title "Differences from HTML4"?
> http://wiki.whatwg.org/wiki/Differences_from_HTML4
>
>
>
>>> Such a document would be useful, but it's not this document. The primary
>>> focus for this document is what is different from HTML4.
>>
>>
>> But why? What is the purpose of this document? This is relevant to naming
>> it, and to the content too, of course. Now it is neither a reliable
>> comparison with links the relevant clauses nor an overview - it has too many
>> details, to begin with.
>
>
> It's more intended to be an overview. Can you give an example of something
> that is too detailed and suggest the level of detail that would be more
> appropriate?
>
>
>> Is this for authors who consider moving from HTML 4.01 to HTML 5?
>
>
> Yes.
>
>
>> Then I think it should primarily specify what HTML 4.01 features are
>> forbidden in HTML 5, then the extensions.
>
>
> Thanks, that's useful feedback.
>
>
> --
> Simon Pieters
> Opera Software



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] HTML differences from HTML4 document updated

2013-05-03 Thread Gordon P. Hemsley
It is my understanding that the W3C version lists "HTML5" and the
WHATWG version uses "HTML". That was what I intended by "HTML(5)". I
didn't mean the parentheses were included literally.

Gordon

On Fri, May 3, 2013 at 2:19 PM, Xaxio Brandish  wrote:
> Ah.  The document scope [1] explains why it uses "HTML" in the title as
> opposed to HTML5 or HTML(5).
>
> --Xaxio
>
> References:
> [1] http://html-differences.whatwg.org/#scope
>
>
>
> On Fri, May 3, 2013 at 11:16 AM, Gordon P. Hemsley 
> wrote:
>>
>> The way I interpreted it, Jukka meant that the title could be
>> something more flowing, like "Differences between HTML4 and HTML(5)".
>>
>> Gordon
>>
>> On Fri, May 3, 2013 at 2:10 PM, Xaxio Brandish 
>> wrote:
>> > Good day,
>> >
>> > Let us start with a definition:
>> >
>> > es·o·ter·ic
>> > /ˌesəˈterik/
>> > Adjective
>> > Intended for or likely to be understood by only a small number of people
>> > with a specialized knowledge or interest.
>> >
>> > The document Simon delivered and formatted is useful to a wide range of
>> > audiences interested in HTML and how it differs from a previous named
>> > release of the HTML roadmap, so I'm not sure calling the title of the
>> > document "esoteric" is accurate.
>> >
>> > Regardless of that, if the title is obscure, could you please offer up
>> > title suggestions so that your posting becomes more constructive?  Keep
>> > in
>> > mind that an existing document [1] on the whatwg.org site references
>> > HTML
>> > version 4 as "HTML4" already, so there is a precedent set for this.  I
>> > do
>> > not think this will confuse anybody, and it would have to be changed
>> > throughout documents on the entire site to be consistent.  I'd like to
>> > propose that both nomenclatures are valid when referring to the entire
>> > HTML
>> > 4 specification.
>> >
>> > The important thing (IMHO) to remember here regarding the title is that
>> > HTML released two subversions of HTML 4, HTML 4.0 [2] and HTML 4.01 [3].
>> > The document must be intended as a differentiation between the entire
>> > version of HTML4, since it does not specify a specific subversion to
>> > diff?
>> > However, it links to the HTML 4.01 specification in the "References"
>> > section.  If this is *only* a diff between HTML 4.01 and the living
>> > standard, perhaps the title should then be "HTML differences from HTML
>> > 4.01" so that the document has additional meaning.  If there are
>> > differences between HTML 4.0, HTML 4.01, *and* HTML5 in the same section
>> > of
>> > the document, those should probably be appropriately marked.
>> >
>> > --Xaxio
>> >
>> > References:
>> > [1]
>> >
>> > http://www.whatwg.org/specs/web-apps/current-work/multipage/introduction.html#history-1
>> > [2] http://www.w3.org/TR/1998/REC-html40-19980424/
>> > [3] http://www.w3.org/TR/REC-html40/
>> >
>> >
>> > On Fri, May 3, 2013 at 9:20 AM, Jukka K. Korpela 
>> > wrote:
>> >
>> >> 2013-05-03 18:37, Simon Pieters wrote:
>> >>
>> >>  The past few days I've been working on updating the HTML differences
>> >>> from HTML4 document, which is a deliverable of the W3C HTML WG but is
>> >>> now also available as a version with the WHATWG style sheet:
>> >>>
>> >>>
>> >>> http://html-differences.**whatwg.org/<http://html-differences.whatwg.org/>
>> >>>
>> >>
>> >> I think you should start from making the title sensible. "HTML
>> >> differences
>> >> from HTML4" is too esoteric even in this context.
>> >>
>> >> Think about a heading "FOO differences from FOO9". Wouldn't you say
>> >> that
>> >> some FOOist is writing very obscurely?
>> >>
>> >> Besides, the spelling is "HTML 4". Especially if you think HTML 4 is
>> >> ancient history, retain the historical spelling.
>> >>
>> >> Yucca
>> >>
>> >>
>> >>
>>
>>
>>
>> --
>> Gordon P. Hemsley
>> m...@gphemsley.org
>> http://gphemsley.org/ • http://gphemsley.org/blog/
>
>



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] HTML differences from HTML4 document updated

2013-05-03 Thread Gordon P. Hemsley
The way I interpreted it, Jukka meant that the title could be
something more flowing, like "Differences between HTML4 and HTML(5)".

Gordon

On Fri, May 3, 2013 at 2:10 PM, Xaxio Brandish  wrote:
> Good day,
>
> Let us start with a definition:
>
> es·o·ter·ic
> /ˌesəˈterik/
> Adjective
> Intended for or likely to be understood by only a small number of people
> with a specialized knowledge or interest.
>
> The document Simon delivered and formatted is useful to a wide range of
> audiences interested in HTML and how it differs from a previous named
> release of the HTML roadmap, so I'm not sure calling the title of the
> document "esoteric" is accurate.
>
> Regardless of that, if the title is obscure, could you please offer up
> title suggestions so that your posting becomes more constructive?  Keep in
> mind that an existing document [1] on the whatwg.org site references HTML
> version 4 as "HTML4" already, so there is a precedent set for this.  I do
> not think this will confuse anybody, and it would have to be changed
> throughout documents on the entire site to be consistent.  I'd like to
> propose that both nomenclatures are valid when referring to the entire HTML
> 4 specification.
>
> The important thing (IMHO) to remember here regarding the title is that
> HTML released two subversions of HTML 4, HTML 4.0 [2] and HTML 4.01 [3].
> The document must be intended as a differentiation between the entire
> version of HTML4, since it does not specify a specific subversion to diff?
> However, it links to the HTML 4.01 specification in the "References"
> section.  If this is *only* a diff between HTML 4.01 and the living
> standard, perhaps the title should then be "HTML differences from HTML
> 4.01" so that the document has additional meaning.  If there are
> differences between HTML 4.0, HTML 4.01, *and* HTML5 in the same section of
> the document, those should probably be appropriately marked.
>
> --Xaxio
>
> References:
> [1]
> http://www.whatwg.org/specs/web-apps/current-work/multipage/introduction.html#history-1
> [2] http://www.w3.org/TR/1998/REC-html40-19980424/
> [3] http://www.w3.org/TR/REC-html40/
>
>
> On Fri, May 3, 2013 at 9:20 AM, Jukka K. Korpela  wrote:
>
>> 2013-05-03 18:37, Simon Pieters wrote:
>>
>>  The past few days I've been working on updating the HTML differences
>>> from HTML4 document, which is a deliverable of the W3C HTML WG but is
>>> now also available as a version with the WHATWG style sheet:
>>>
>>> http://html-differences.**whatwg.org/<http://html-differences.whatwg.org/>
>>>
>>
>> I think you should start from making the title sensible. "HTML differences
>> from HTML4" is too esoteric even in this context.
>>
>> Think about a heading "FOO differences from FOO9". Wouldn't you say that
>> some FOOist is writing very obscurely?
>>
>> Besides, the spelling is "HTML 4". Especially if you think HTML 4 is
>> ancient history, retain the historical spelling.
>>
>> Yucca
>>
>>
>>



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] use of article to markup comments

2013-01-28 Thread Gordon P. Hemsley
List elements and sectioning elements both represent hierarchical
relationships. They differ in how they emphasize that relationship:
lists emphasize the hierarchy outside the content, while sectioning
emphasizes the hierarchy within the content.

If the question is specifically about how to mark up comments on a
blog post or something, there's no reason you can't combine the two
methods: Each comment is a self-contained , with
relationships between comments represented by .

One example:
http://jsbin.com/edewoy/1

That example presumes you consider blog post comments (or replies to
comments) as a section within the content that is being commented on
(or replied to). You could also modify the markup to have two
s (one for the blog post and one for the comments) packaged
within a single parent , but the principle is the same.

Note that the key here is that there is no restriction on combining
lists and sectioning elements, and thereby no need to modify the
semantics of  or  (as proposed in [2] in the root message).

Gordon

On Mon, Jan 28, 2013 at 12:13 PM, Steve Faulkner
 wrote:
>> Brucel wrote:
>>
>> On Sat, 26 Jan 2013 10:56:10 -, Steve Faulkner
>>  wrote:
>>
>>
>> > Lists are appropriate for indicating nested tree structures. The use
>> > of lists to markup comments is a common mark up pattern used in
>> > blogging software such as wordpress. The code verbosity is not
>> > dissimilar to  the use of article, less so even option end  tags
>> > are omitted. Besides comments are generated code not hand authored so
>> > I don't see a problem with code verbosity
>>
>> [...]
>>
>> >
>> >> (It makes some sense, I suppose, to think of comments as a "list", but
>> >> *unordered*? If you're going to group them at all, wouldn't the order
>> >> be important? Bruce Lawson (
>> >> http://lists.w3.org/Archives/Public/public-html/2013Jan/0111.html)'s
>> >> observation that comments are "heavily dependent on context" would seem
>> >> to support the idea that it *is* important, especially since some
>> >> comments are responses to others.)
>> >
>> > agreed it would be better to use order lists.
>> >
>>
>>   Wordpress blogs, for example, have comments like
>>
>> "Bob Smith said at 9.55 on 31 Febtember: LOL"
>>
>> Thus, every comment has a link that a UA can use to jump from comment to
>> comment. The order is implied via the timestamp. So what's wrong with
>>
>> 
>> Witty blogpost
>> lorem ipsum
>>
>> 
>> 35 erudite and well-reasoned comments
>> Bob Smith said at 9.55 on 31 Febtember: Can
>> I use DRM in Polyglot documents?
>> Hixie said at 9.57 on 1 June: What's your
>> use case?
>> ...
>> 
>>
>> 
>>
>> In short, why should the spec suggest any specific method of marking up
>> comments?
>
> Good question, in the case of  recommended tomarkup comments
> it seems like it's an element in search of a use case.
>
> For users who consume article semantics it appear to cause issues when
> used for any piece of content ranging from a one sentence comment to
> an article containing thousands of words or an interactive widget.
>
>
> regards
> SteveF



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Sniffing archives

2012-12-05 Thread Gordon P. Hemsley
(It seems I somehow managed to not send this to the list the first
time around. Addendum included.)

On Tue, Dec 4, 2012 at 2:40 AM, Adam Barth  wrote:
> On Mon, Dec 3, 2012 at 12:39 PM, Julian Reschke  wrote:
>> On 2012-11-29 20:25, Adam Barth wrote:
>>> These are supported in Chrome.  That's what causes the download.  From
>>
>> Can you elaborate about what you mean by "supported"? Chrome sniffs for the
>> type, and then offers to download as a result of that sniffing? How is that
>> different from not sniffing in the first place?
>
> They might otherwise be treated as a type that can be displayed
> (rather than downloaded).

But isn't the whole point of the spec to eliminate such accidental
sniffing? Anything not explicitly sniffed based on the first bytes of
the file will be assumed to be either 'application/octet-stream' or
'text/plain', depending on whether there are binary bytes present.

The old IE behavior that you were investigating in your 2009 paper,
where you sniff beyond the first few bytes to find embedded HTML, is
eliminated with this sniffing algorithm. There is no case where you
would accidentally sniff something as scriptable, if you were
following the algorithm correctly.

Or am I missing something?

P.S.

Note also that I have previously defined what it means to be
"supported by the user agent":

"A valid media type is supported by the user agent if the user agent
has the capability to interpret a resource of that media type and
present it to the user."

http://mimesniff.spec.whatwg.org/#supported-by-the-user-agent

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Sniffing archives

2012-12-04 Thread Gordon P. Hemsley
On Tue, Dec 4, 2012 at 11:07 AM, Adam Barth  wrote:
> On Mon, Dec 3, 2012 at 11:59 PM, Julian Reschke  wrote:
>> On 2012-12-04 08:40, Adam Barth wrote:
>>> They might otherwise be treated as a type that can be displayed
>>> (rather than downloaded).  Also, some user agents treat downloads of
>>
>> Do you have an example for that case?
>>
>>> ZIP archives differently than other sorts of download (e.g., they
>>> might offer to unzip them).
>>
>> Out of curiosity: which?
>
> Safari.
>
> Adam

To be more specific:

(1) Safari doesn't appear to prompt the user for any downloads. It
just automatically downloads any file it can't handle.
(2) If you allow Safari to open "safe" files that it downloads, ZIP
appears to be one of them. Gzip and RAR, however, do not.

So this isn't the most convincing argument.

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-29 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 2:30 PM, Adam Barth  wrote:
> On Wed, Nov 28, 2012 at 10:30 PM, Gordon P. Hemsley  
> wrote:
>> Based on my reading of the source code, it seems that Gecko treats a
>> resource served as 'application/octet-stream' as an unknown type which
>> is sniffed as if no Content-Type was specified.
>>
>> Are there security implications with doing this?
>
> Yes, there are very large security consequences.  I'm sorry that I
> don't have time to respond to all of these threads in detail, but I'm
> worried that you don't understand the consequences of the changes
> you're proposing to this specification.
>
> I'm not sure how to help you succeed here, but tweaking things in the
> spec without a compelling reason for doing so is not likely to lead to
> a useful specification.  I spent a great deal of time and effort
> studying the behaviors of many user agents and of a massive amount of
> content on the web.  I'm certainly willing to believe that the spec
> can be improved, but if you don't understand these sorts of basic
> things about content sniffing, I worry that changes that you make to
> the spec won't be improvements.
>
> Adam

I and others have already made clear that I was misreading the Mozilla
source code.

I'm aware of the security implications of interpreting a resource as
something other than what the Content-Type header says. The whole
reason I sent the original e-mail was because I thought Mozilla was
sniffing "application/octet-stream" in a way that it shouldn't, and I
wanted to clarify whether there was something I was missing.

I think you need to tone down your worry about my changes to the spec.
If I didn't have concern for the security implications for a change, I
wouldn't be sending an e-mail to the list about them, would I?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Sniffing archives

2012-11-29 Thread Gordon P. Hemsley
To be clear, I'm asking this because I would like to remove the
sniffing of archive types from the mimesniff spec if there aren't any
valid usecases.

On Wed, Nov 28, 2012 at 12:18 PM, Gordon P. Hemsley  wrote:
> The mimesniff spec currently includes signatures for ZIP, gzip, and
> RAR archive formats. However, no major browser seems to support them
> natively (they all prompt for download), and it's not clear whether
> the type detection is a product of the browser code or the OS, or
> whether it is used beyond choosing an appropriate file extension for
> the download.
>
> Are there any valid usecases for explicitly sniffing archive formats
> instead of letting them default to application/octet-stream like other
> binary files would? Note that Henri Sivonen has previously raised the
> issue that ZIP-based formats (like office suite documents), for
> example, would be misleadingly sniffed as ZIP files, and there is no
> easy way around that.
>
> --
> Gordon P. Hemsley
> m...@gphemsley.org
> http://gphemsley.org/ • http://gphemsley.org/blog/



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-29 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 12:57 PM, Boris Zbarsky  wrote:
> canPlayType is not called "against a file".  It's called with a single
> argument which is a string MIME type.  If you pass
> "application/octet-stream", it will return "".  Its behavior does not depend
> on any state of the element it's called on (like what it's actually pointing
> to, etc); only on the string passed in.

Oh, I see. My mistake. (One should never attempt to understand
something after 2 AM.)

So... are there any additional places where "application/octet-stream"
should be treated as if the media type was undefined? Or is this
conversation moot now?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-29 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 3:02 AM, Boris Zbarsky  wrote:
> On 11/29/12 2:53 AM, Gordon P. Hemsley wrote:
>>
>> At one point it says, "The MIME type "application/octet-stream" with
>> no parameters is never a type that the user agent knows it cannot
>> render. User agents must treat that type as equivalent to the lack of
>> any explicit Content-Type metadata when it is used to label a
>> potential media resource."
>>
>> But later it says, "The canPlayType(type) method must return the empty
>> string if type is a type that the user agent knows it cannot render or
>> is the type "application/octet-stream";"
>
>
> What's the contradiction?  We have set S = { types the user agent knows it
> cannot render }.  We have set T = S union { application/octet-stream }
>
> What the above statements tell us so far is:
>
> 1)  T != S
> 2)  canPlayType(type) must return empty string for all types in T.
>
> But later on in the resource selection algorithm there are certain actions
> taken for elements of S only.
>
>
>> This seems to me to be unclear as to when sniffing of the audio/video
>> resource occurs, and what it is used for.
>
>
> It's used for actually showing the video even if it's sent as
> application/octet-stream.

The apparent contradiction occurs when, e.g., an Opus file is tagged
as "application/octet-stream".

If I understand correctly, a UA would return "" when canPlayType() is
called against such a file—but then the file would actually play
because it is later sniffed as "application/ogg".

Am I missing something?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-29 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 2:32 AM, Boris Zbarsky  wrote:
> On 11/29/12 2:07 AM, Gordon P. Hemsley wrote:
>>
>> So perhaps a more useful question would be what to do in situations
>> like that—should mimesniff treat "application/octet-stream" as a type
>> "supported by the browser" for the purposes of sniffing images, audio
>> or video, fonts, or other media types?
>
>
> The way it works right now is that
> http://www.whatwg.org/specs/web-apps/current-work/#mime-types says:
>
>   The MIME type "application/octet-stream" with no parameters is never
>   a type that the user agent knows it cannot render. User agents must
>   treat that type as equivalent to the lack of any explicit
>   Content-Type metadata when it is used to label a potential media
>   resource.
>
> So for the purpose of sniffing media loads specifically, that type is
> treated just like no type at all.
>
> But first you have to know it's a media load.

Oh, this is probably the location where the HTML spec doesn't
currently, but eventually should, reference the "rules for sniffing
audio and video specifically" in mimesniff. (Is this where Opera
implements such rules?)

Is it just me (and my late-night reading), or is that section
contradictory on how to treat "application/octet-stream"?

At one point it says, "The MIME type "application/octet-stream" with
no parameters is never a type that the user agent knows it cannot
render. User agents must treat that type as equivalent to the lack of
any explicit Content-Type metadata when it is used to label a
potential media resource."

But later it says, "The canPlayType(type) method must return the empty
string if type is a type that the user agent knows it cannot render or
is the type "application/octet-stream";"

This seems to me to be unclear as to when sniffing of the audio/video
resource occurs, and what it is used for.

>> I imagine this ties in, too, to the issues with sniffing CSS files
>> that has been raised elsewhere:
>>
>> https://bugzilla.mozilla.org/show_bug.cgi?id=560388
>> https://bugzilla.mozilla.org/show_bug.cgi?id=562377
>
> Neither one of those has anything to do with application/octet-stream as far
> as I can tell.  Those cover cases in which data is sent with either no
> Content-Type header or with such a header which can't even be parsed as
> "major/minor".  Neither of which is true if the data says
> "appliction/octet-stream".

I was grouping them together because they both rely on context clues
for modifying the sniffing (fallback) behavior, but we can discuss
them separately if that's easier.

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-28 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 1:30 AM, Gordon P. Hemsley  wrote:
> Based on my reading of the source code, it seems that Gecko treats a
> resource served as 'application/octet-stream' as an unknown type which
> is sniffed as if no Content-Type was specified.

Oh, wait, I forgot what I was reading—Gecko does this specifically in
the context of sniffing for an audio or video resource. So, if a
resource tagged as 'application/octet-stream' is included in 
or , for example, it will be treated as unknown for the
purposes of identifying its true nature. This never follows a path of
scriptable privilege escalation, AFAICT.

So perhaps a more useful question would be what to do in situations
like that—should mimesniff treat "application/octet-stream" as a type
"supported by the browser" for the purposes of sniffing images, audio
or video, fonts, or other media types?

I imagine this ties in, too, to the issues with sniffing CSS files
that has been raised elsewhere:

https://bugzilla.mozilla.org/show_bug.cgi?id=560388
https://bugzilla.mozilla.org/show_bug.cgi?id=562377
https://bugzilla.mozilla.org/show_bug.cgi?id=808593

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-28 Thread Gordon P. Hemsley
Based on my reading of the source code, it seems that Gecko treats a
resource served as 'application/octet-stream' as an unknown type which
is sniffed as if no Content-Type was specified.

Are there security implications with doing this? Or should I add
'application/octet-stream' to the list of unknown types that currently
includes 'unknown/unknown', 'application/unknown', and '*/*' (step 2
of the "media type sniffing algorithm")? Or, given that that step
calls the "rules for identifying an unknown media type" with the
sniff-scriptable flag set, should it get its own call, with the
sniff-scriptable flag unset? Are there other options here?

I haven't checked what UAs actually do in practice, but I don't
believe the spec currently allows anything but leaving resources
tagged as 'application/octet-stream' as they are.

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Audio and video sniffing

2012-11-27 Thread Gordon P. Hemsley
Done: 
https://github.com/whatwg/mimesniff/commit/77ee676c8852f4e76facd7d6c1174ac0ec41696e

Note that this also affects the "media type sniffing algorithm" and
the "rules for identifying an unknown media type".

On Tue, Nov 27, 2012 at 12:51 AM, Simon Pieters  wrote:
> On Mon, 26 Nov 2012 23:38:02 +0100, Gordon P. Hemsley 
> wrote:
>
>> Upon looking through the code for Gecko's media sniffing, I noticed
>> that they seem to combine sniffing for audio and video elements. Given
>> that Opera has said that it uses the specific sniffing algorithms, and
>> that some media containers (like Ogg) can be used for either audio or
>> video, I wonder if it would make sense to combine audio and video
>> sniffing under a single audiovisual category? This would affect the
>> "matching audio/video type pattern" sections and the "sniffing
>> audio/video specifically" sections.
>>
>> Any objections? Other thoughts?
>
>
> Yes, I think it makes sense to have the same sniffing for both.  is
> like  without the rendering area.
>
> --
> Simon Pieters
> Opera Software



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Handling container formats like Ogg

2012-11-27 Thread Gordon P. Hemsley
On Tue, Nov 27, 2012 at 4:39 AM, Henri Sivonen  wrote:
> On Tue, Nov 27, 2012 at 12:59 AM, Gordon P. Hemsley  
> wrote:
>> Would this be something UAs would prefer to handle in their Ogg
>> library, or should I spec it as part of sniffing?
>
> What would be the use case for handling it as part of sniffing layer?

I don't know; that's why I'm asking! :)

Is it sufficient to sniff just for "application/ogg" and then let the
UA's Ogg library determine whether or not the contents of the file can
be handled? (I'm sensing the consensus is yes.)

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Handling container formats like Ogg

2012-11-26 Thread Gordon P. Hemsley
Container formats like Ogg can be used to store many different audio
and video formats, all of which can be identified generically as
"application/ogg". Determining which individual format to use (which
can be identified interchangeably as the slightly-less-generic
"audio/ogg" or "video/ogg", or using a 'codecs' parameter, or using a
dedicated media type) is much more complex, because they all use the
same "OggS" signature. It would requiring actually attempting to parse
the Ogg container to determine which audio or video format it is using
(perhaps not unsimilar to what is done for MP4 video and what might
have to be done with MP3 files without ID3 tags).

Would this be something UAs would prefer to handle in their Ogg
library, or should I spec it as part of sniffing?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Audio and video sniffing

2012-11-26 Thread Gordon P. Hemsley
Upon looking through the code for Gecko's media sniffing, I noticed
that they seem to combine sniffing for audio and video elements. Given
that Opera has said that it uses the specific sniffing algorithms, and
that some media containers (like Ogg) can be used for either audio or
video, I wonder if it would make sense to combine audio and video
sniffing under a single audiovisual category? This would affect the
"matching audio/video type pattern" sections and the "sniffing
audio/video specifically" sections.

Any objections? Other thoughts?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] The X-Content-Type-Options header

2012-11-16 Thread Gordon P. Hemsley
https://www.w3.org/Bugs/Public/show_bug.cgi?id=19865

Microsoft introduced the X-Content-Type-Options header in IE8 back in 2008:

http://blogs.msdn.com/b/ie/archive/2008/09/02/ie8-security-part-vi-beta-2-update.aspx

I would like to integrate the header into mimesniff and describe its
proper usage.

Right now, it allows one parameter: 'nosniff'. I would like to allow
the presence of this parameter to set the 'no-sniff flag' that I just
introduced into mimesniff (in addition to that flag's existing
duties):

http://mimesniff.spec.whatwg.org/#no-sniff-flag

But I would also like to fully spec the header, while leaving open the
possibility that other values may be added in the future.

In addition, I would like to, if I could, also allow the header to be
specified without the 'X-' prefix (so as 'Content-Type-Options'), for
that reason (and because of best current practice).

Does anyone have any questions, comments, or objections about this issue?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Proposal for a debugging information API

2012-11-14 Thread Gordon P. Hemsley
Recent blog posts that coincidentally may be useful in this discussion:

http://vocamus.net/dave/?p=1532
http://www.twobraids.com/2012/11/socorro-as-service.html

On Thu, Nov 15, 2012 at 12:07 AM, David Barrett-Kahn  wrote:

> Hi whatwg.  I have a proposal for a new web standard, and would value your
> feedback.  This is based on my experiences working on Google Docs, which
> has a well developed ability to send crash reports back to the server for
> analysis.  We often find these crash reports to be lacking in crucial
> information though, because that information is not available on the JS
> APIs.
>
> My proposal is to have a class of information which can be made available
> to an app only after the display of a generic 'this application has
> crashed' dialog, which could be drilled into to show what is being
> disclosed, and which of course can be denied.
>
> Good examples of the information in question are the system's precise
> hardware and network configuration, what Chrome extensions it has
> installed, and perhaps a screenshot of the failed application.
>
> I've fleshed this out in the following document, and would value opinions
> on the value of a feature of this kind, and the merits of this particular
> approach.
>
>
> https://docs.google.com/document/pub?id=1pw2Bzvy6OEn8YY3fAcZiReJPmgB79swkx-NJAdcemPk
>
> Thanks!
>
> -Dave
>



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Review requested on MIME Sniffing Standard

2012-11-12 Thread Gordon P. Hemsley
On Mon, Nov 12, 2012 at 6:08 PM, Ian Hickson  wrote:
> On Mon, 12 Nov 2012, Gordon P. Hemsley wrote:
>> But if everyone vows to just wait for 512 bytes (or EOF), then that's
>> fine with me.
>
> I don't think we should require tools to wait for 512 bytes. This is an
> area where if we have the requirement, some user agents are just going to
> have a timeout anyway and ignore the spec; we gain nothing by making it
> non-conforming to have a timeout.

I'm inclined to agree with you, but I'm curious what other
implementers have to say on the issue.

>> > What are the use cases for ‘Sniffing archives specifically’?
>>
>> No idea. I only included it for completeness.
>
> Please don't spec things for completeness without use cases. :-)

In that case, I need to know which you think you might want for HTML
and which you know you won't. (I don't know of any other specs reliant
on mimesniff.)

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Review requested on MIME Sniffing Standard

2012-11-12 Thread Gordon P. Hemsley
On Mon, Nov 12, 2012 at 10:06 AM, Henri Sivonen  wrote:
> Resending feedback previously written at
> https://bugzilla.mozilla.org/show_bug.cgi?id=808593#c10 :
>
> I think the bits ‘type is equal to "font" or’ and ‘type is equal to
> "archive" or’ are highly questionable. The most popular font types are
> in the process of getting application/ types and the most popular
> archives already have application/ types.

Buzzkill. ;(

> I suspect the ‘a reasonable amount of time has elapsed, as determined
> by the user agent.’ is unnecessary. The HTML spec has the same
> provision for the  prescan. Firefox didn’t implement it, a
> couple of people complained, then fixed their code, and the sky didn’t
> fall.

This line was present in a previous draft of the spec, as was the
seeming allowance to begin matching the resource header before it had
finished loading. For simplicity in the algorithm, I removed the
latter, so I left the former in as an escape hatch for those who
wanted to emulate that behavior.

But if everyone vows to just wait for 512 bytes (or EOF), then that's
fine with me.

> What are the use cases for ‘Sniffing archives specifically’?

No idea. I only included it for completeness.

The 'rules for sniffing * specifically' are intended as hooks for
other specs to tie into. If no spec requires you to implement it, then
you have no need to implement it. HTML uses 'rules for sniffing images
specifically' (and 'rules for distinguishing if a resource is text or
binary'), and I imagine it could also find uses for 'rules for
sniffing audio specifically' and 'rules for sniffing video
specifically' (and maybe even 'rules for sniffing fonts
specifically').

> It
> appears that it sniffs ODF-style files
> (http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part3.html#__RefHeading__752809_826425813
> ; EPUB, ODF, InDesign, etc.) and Open Packaging Conventions-based
> files (https://en.wikipedia.org/wiki/Open_Packaging_Conventions ;
> OOXML, XPS, etc.) files as zip archives. Is that intended and a
> desirable outcome in the light of use cases? (In general, it would be
> easier to review if the spec makes sense if the use cases and callers
> of various sniffing functions were known.)

I don't think that's intended, but I don't know. The selection of
which bytes to sniff predates me, and I don't know what the use cases
are.

> Otherwise, looks good to me.

Thanks for the review!

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Review requested on MIME Sniffing Standard

2012-11-05 Thread Gordon P. Hemsley
Hey all,

As you might have heard, I have taken over editorship of the MIME Sniffing
Standard from Adam Barth.

As a first step in my editorship, I have taken the opportunity to rewrite
the document in a more procedural and modular way (IMO). The content and
meaning itself is not supposed to have changed, and I need your help to
verify that that is the case:

http://mimesniff.spec.whatwg.org/

In addition, this now means that I am open to hearing your suggestions
about how to improve the document beyond its current (i.e. former)
semantics.

You can file bugs here:

https://www.w3.org/Bugs/Public/enter_bug.cgi?product=WHATWG&component=MIME

As this document was originally an IETF document, there are also old issues
here:

http://trac.tools.ietf.org/wg/websec/trac/query?component=mime-sniff

It's not clear to me which of those remain outstanding on the current
version of the document, and it would be helpful to me if individuals with
a vested interest in them could migrate them to Bugzilla (with updated
descriptions that reflect the current state of the document). This will
ensure that I address them in a timely manner.

Also, it would be helpful if you could mark them as blocking the general
bug here:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=19746

And if you want to follow the commits as they happen, you can follow
@mimesniff on Twitter:

https://twitter.com/mimesniff

Thanks!

Gordon

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [wiki] The WHATWG Wiki has been upgraded

2012-10-28 Thread Gordon P. Hemsley
For those who missed the announcement on IRC and Twitter last week:

The WHATWG Wiki has been upgraded to MediaWiki 1.19.2:

http://wiki.whatwg.org/

This update brings with it a lot of the changes you're probably
already used to from Wikipedia, including the new Vector theme.

Over the many years since the WHATWG Wiki was first set up, a lot of
cruft has accumulated in its configuration files. I have attempted to
remove a lot of that, in order to allow the modern default values to
come through. I don't if this will have much effect on the everyday
use of the wiki, but I thought I'd let you know.

In addition to the primary software update, I have also installed a
number of extensions, and these will have an effect on your use of the
wiki.

There are three extensions that I want to bring your attention to specifically.

The first one is ParserFunctions, which allows you to use some logical
functions in pages to (for example) create conditional output. This is
most useful, IMO, in templates, so you can condition the display of
the template based on the presence, absence, or value of template
parameters. See [[Template:Obsolete]] for an example:

http://wiki.whatwg.org/wiki/Template:Obsolete

The second extension I want to bring your attention to is
SyntaxHighlight. This allows you to use the  element
in a wiki page to automatically highlight whatever source code you
include. Given who we are, I've set it up to assume the language you
are highlighting is 'html5', but you can also specify another language
using the 'lang' attribute. (Note: This is not the same 'lang'
attribute that you would normally find in HTML. It's looking for a
programming language, not a BCP47 language tag.)

And the third, and potentially the most useful, extension is Gadgets.
This extension allows any administrator to install JavaScript and CSS
gadgets directly onto the wiki, for use by all. I've installed a
subset of the gadgets installed on Wikipedia which I think are the
most useful. I've also turned many of them on by default; you can see
full list of available gadgets (and edit your personal gadget
availability) by going to My preferences > Gadgets.

To see the full list of installed extensions, go to [[Special:Version]]:

http://wiki.whatwg.org/wiki/Special:Version

If you know of any useful extensions or gadgets that you think are
missing from the WHATWG Wiki, let me know and I'll be happy to install
them. And, as I am now the caretaker of the wiki (taking over, I
believe, from AryehGregor), let me know about any other wiki issues
you might have.

By the way, I think the wiki is a particularly useful place to store
information that might otherwise get lost in the shuffle of IRC logs
and e-mail archives, so if you have any such tidbits, head over to the
wiki and write them down!

If you don't yet have a wiki account, you'll have to ask someone for
help, as we've had some issues with spam accounts. But don't worry,
it's very simple to get help, as I've set it up so that any
autoconfirmed user can register an account. All they need is your
e-mail address and your desired username. If you just let out a cry on
IRC, someone should be able to help you, or you can contact one of the
permanent autoconfirmed users listed here:

http://wiki.whatwg.org/index.php?title=Special:ListUsers&group=autoconfirmed

Happy wikiing!

Gordon

P.S. If you think you should be a permanent autoconfirmed member (and
you're not), ping me on IRC or drop me a line off-list and I'll see
what I can do. ;)

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] base64 entities

2010-08-27 Thread Gordon P. Hemsley
On Fri, Aug 27, 2010 at 2:44 PM, Aryeh Gregor  wrote:
> > PHP offers no JS-string-literal-escape function. `addslashes` is very close,
> > but won't handle some cases with non-ASCII characters correctly. Better to
> > use `json_encode` to transfer the string, then write as text:
> >
> >    elmt.textContent =  > JSON_HEX_TAG); ?>
> >
> > (assuming innerText or Text Node backup for IE/older browsers.)
>
> Interesting, that's useful.  Too bad it only works in PHP 5.2 or higher.

PHP 5.2.0 came out in 2006. I don't see anything "too bad" about using
PHP 5.2 or higher with new technology.[1]

Regards,
Gordon

[1] See also: http://gophp5.org/

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] Proposal: @srctype or @type on

2010-07-13 Thread Gordon P. Hemsley
On Tue, Jul 13, 2010 at 3:26 AM, Boris Zbarsky  wrote:
> On 7/12/10 11:31 PM, Gordon P. Hemsley wrote:
>>
>> The particular use case that prompted me to think about this is
>> including a PDF via . In Firefox (last I checked), one is
>> required to install a separate add-on in order to support in-browser
>> display of PDF files on Mac OS X, since there is no native or integrated
>> Adobe Reader support available.
>
> I'm pretty sure you can install the Adobe Reader plug-in on Mac if you want
> to.

Perhaps now, but that wasn't always the case—at least not for Firefox.
I admit that my experience is somewhat outdated. Installing the
third-party PDF viewer add-on is one of the first things I did, in a
"set it and forget it" kind of way. (Plus, I'm still on Tiger.)

But, again, the PDF example was just one possible use case. I'm sure
there are plenty of other file types that cause similar situations,
including the TIFF issue that I mentioned.

>> Without the add-on, the user will be prompted to download the PDF file
>
> Which is exactly what would happen for a type="application/pdf" iframe, no?
>  Silently not showing the content doesn't seem acceptable.
>
> -Boris
>
Well, the idea is to have the browser operate more intelligently than
that. The page in the iframe is (by definition) not the primary
document that the user is trying to load, so it shouldn't have the
power steal the user's attention immediately upon page load. It would
be very disorienting, and would likely cause the user to lose their
train of thought.

I was thinking more along the lines of Flashblock does or what happens
when the window in an  can't load: The content would be
replaced somehow by a message and a button/link to allow the user to
manually download the contents of the iframe, if they so choose. It
shouldn't make that decision for the user, as it's not the user's
fault that their browser does not support the format of some ancillary
document.

At least, that's how I see it.

Gordon

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] Proposal: @srctype or @type on

2010-07-12 Thread Gordon P. Hemsley
Nils,

I don't hate the HTTP Content-Type header. In fact, I like it very much.

But this proposal was intended to guide the user agent before they
ever receive the HTTP Content-Type header. ;)

Cheers,
Gordon

On Tue, Jul 13, 2010 at 2:48 AM, Nils Dagsson Moskopp
 wrote:
> "Gordon P. Hemsley"  schrieb am Tue, 13 Jul 2010
> 02:31:19 -0400:
>
>> It should not be assumed that whatever resource included via 
>> is going to be of type 'text/html' or another easily parsable type.
>> Thus, it could be helpful for the author to give the user agent a
>> hint as to what type of document it is requesting be displayed
>> inline, and allow the user agent to choose not to display the
>> contents of the  if it feels it cannot support it.
>
> Have you thought of using HTTP Content-Type headers and classic MIME
> type handling to determine compatibility ?
>
>> […]
>>
>> Now, I'm not a spec implementor by any means, but I am a web author
>> and a web user, so I've been on both sides of this issue. And it
>> doesn't appear that it would be too complicated to extend the
>> existing support of @type.
>
> AFAIK, implementors could use HTTP Content-Type headers for the given
> purpose.
>
>> Thoughts?
>
> Why do you hate HTTP Content-Type headers ? ;)
>
>
> Cheers,
> Nils
>



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


[whatwg] Proposal: @srctype or @type on

2010-07-12 Thread Gordon P. Hemsley
Hello all.

There a number of attributes that are designed to give the user agent a
preview of what MIME type to except for referenced resource. (And there are
also attributes like @hreflang that preview other things.) And yet,
, which has to load a full document, has no ability to allow the
user agent to determine compatibility.

Thus, I propose doing one of the following:
(1) add @srctype to 
(2) extend the meaning of @type that applies to , , and  to
apply to , as well

I'm more inclined to believe that option (2) is the better option.

But now for the reasoning.

It should not be assumed that whatever resource included via  is
going to be of type 'text/html' or another easily parsable type. Thus, it
could be helpful for the author to give the user agent a hint as to what
type of document it is requesting be displayed inline, and allow the user
agent to choose not to display the contents of the  if it feels it
cannot support it.

The particular use case that prompted me to think about this is including a
PDF via . In Firefox (last I checked), one is required to install a
separate add-on in order to support in-browser display of PDF files on Mac
OS X, since there is no native or integrated Adobe Reader support available.
Without the add-on, the user will be prompted to download the PDF file,
which can be very disconcerting if the user wasn't even expecting a PDF
file. And I'm sure there are plenty of other instances where this same
situation occurs. (TIFF files, perhaps? Like on the U.S. Patent Office's
website?)

Now, I'm not a spec implementor by any means, but I am a web author and a
web user, so I've been on both sides of this issue. And it doesn't appear
that it would be too complicated to extend the existing support of @type.

Thoughts?

Gordon

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] select element should have a required attribute

2010-06-18 Thread Gordon P. Hemsley
I'm not sure how you interpreted, but I wanted to clarify, in case it wasn't
clear.

I'm pretty sure this person is asking why @required isn't allowed on
 elements.

As in:
http://dev.w3.org/html5/markup/forms-attributes.html#shared-form.attrs.required

I don't know what the exact reasoning is for it not being on there, nor do I
know exactly how @required is supposed to be enforced, but I do think that
the method suggested in the bug is a bad one. Sometimes, authors will
include an empty  on purpose in order to allow for an empty option
to be selected.

Thus, as you've said, Ash, there will always be some sort of value sent from
a  element. And, including the option of an empty string, I can't
think of any way that there wouldn't be a value sent.

Gordon

On Fri, Jun 18, 2010 at 7:04 AM, Ashley Sheridan
wrote:

>  On Fri, 2010-06-18 at 11:35 +0200, Mounir Lamouri wrote:
>
> Hi,
>
> I'm wondering why select element do not have a required attribute. It
> seems to be perfectly suitable. With the required attribute, select
> element would be able to suffer from being missing and the :required
> pseudo-class could apply.
>
> Is there a reason why the select element has no required attribute or
> it's only an omission?
>
> Related bug:http://www.w3.org/Bugs/Public/show_bug.cgi?id=9625
>
> Thanks,
> --
> Mounir
>
>
> Required as in it should always have a value sent? If so, then it always
> does. The default value for a select element is not an empty string as an
>  is always there (unless someone has been stupid enough to create an
> empty select list.)
>
> As such, some sort of value will always be sent.
>
>   Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>


-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] Is there a way to stop scrolling when pressing directional arrows?

2010-06-14 Thread Gordon P. Hemsley
For what it's worth, I am actually of the opposite opinion, Ash.

I like it when Flash steals the focus of the keyboard, and here's why:
Besides the arrow keys, which are available to everyone, I also use the
"Find As You Type" feature in Firefox. However, that usually means that I
can't play any HTML5 games that use letters as play keys. Because the HTML5
game usually doesn't steal the focus of the keyboard, typing a letter key
activates the FAYT feature and distracts me from the game.

With that being said, Bespin (from Mozilla Labs) uses , and it has
no problem stealing the keyboard focus (with JavaScript) for most
keypresses.

Gordon

2010/6/14 Ashley Sheridan 

>  On Mon, 2010-06-14 at 13:38 -0600, Carlos Andrés Solís wrote:
>
> Hello! I've been noticing a problem in many HTML5 test apps, very
> especially games. When the directional arrow buttons are pressed, the screen
> scrolls. This is a problem that, as far as I know, Flash had solved by
> changing the focus of the application to the app. Is this doable in HTML5?
> - Carlos Solís
>
>
> I don't think it's something that was 'solved'  by Flash. To be honest, I'm
> often annoyed at the way Flash steals the focus of all my key presses making
> it almost impossible to navigate using only the keyboard.
>
> You could use Javascript to put the focus onto an object, capture all the
> key presses on that and return false for them all maybe.
>
>   Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>


-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] <% text %> and in corporate intranet html content

2010-02-15 Thread Gordon P. Hemsley
On Tue, Feb 9, 2010 at 10:05 PM, Biju  wrote:

> What should a user agent display when html content is...
>
> 
> <%@ page language="java" %>
> 
>
> At present IE and Safari display blank
>
> Firefox display <%@ page language="java" %>
>
> And for document.body.innerHTML browsers give
> Firefox --> <%@ page language="java" %>
> IE --><%@ page language="java" %>
> and Safari gives blank
>
> Also for
> 
> 
> 
>
> Firefox gives blank
>
> But for
> 
> abc "  ?> xyz
> 
>
> Firefox display...
> abc " ?> xyz
>
> ie, all the contents after first ">"
> with .innerHTML --> abc "  ?> xyz
>
> IE in this case again hide all content till "?>"
> as well as preserve content including the white space in innerHTML
>
> Due to these problems browsing corporate intranet with Firefox is
> little irritating.
> Calling help desk and asking to provide fix will get a reply that
> company has standardized on IE6, so please use IE.
>
>
> So per HTML standard in both case what should user agent display and
> as well as content of .innerHTML
>
> Thanks
> Biju
>

For what it's worth, I filed a Mozilla bug on a similar issue, and it was
marked INVALID.

https://bugzilla.mozilla.org/show_bug.cgi?id=477455
Parser does not wait for "?>" to close blocks that begin with "http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] the cite element

2009-10-06 Thread Gordon P. Hemsley
On Tue, Oct 6, 2009 at 4:15 PM, Erik Vorhes  wrote:
> On Tue, Oct 6, 2009 at 2:52 PM, Gordon P. Hemsley  wrote:
>> I also propose allowing parenthetical citations and footnote markers
>> (as is used in the various W3C/WHATWG specifications) to also be
>> marked up with , though I'm not sure if TabAtkins agrees with me
>> on that point.
>
> I suppose  allows for more functionality in current UAs, but this
> is an interesting proposition, especially if there were a way to
> crosslink  used in this way to the original source (or whatever
> it would point to). Would it be something along the lines of  for="aside-id">, or did you have something else in mind?
>
> Erik

Hmm... I hadn't given much thought to the implementation of that, as I
was more worried about the other part of the debate, but I think
treating  as analogous to  in that situation is indeed a
good idea.

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] the cite element

2009-10-06 Thread Gordon P. Hemsley
(I'm ignoring all of the unproductive back-and-forth that has occurred
thus far. This is meant to start the discussion off fresh.)

I was discussing the  element with TabAtkins on IRC and I
proposed analyzing the actual word 'cite'. Using it as a verb, the
definition of 'cite' applies to quotes/quotations, titles, and people,
depending on the context. TabAtkins noted that the first use case is
so far off of legacy implementations, that it wouldn't even be worth
considering for  (especially because we have other elements that
function as such).

That leaves usages of 'cite' for both titles of works and authors of
works. Putting aside the issue of styling for a moment, these two
pieces of data both fall under the semantic meaning of 'cite'. Thus,
they should fall under the semantic meaning of . If an author
should have the need to differentiate between the two, I propose that
they use  and .

Thus, I propose the following (which TabAtkins generally agrees with):

Leave the default styling of  to be italicized for legacy
implementations and allow any reference to any work or author, with
the granularity decided by the individual web developer.

I also propose allowing parenthetical citations and footnote markers
(as is used in the various W3C/WHATWG specifications) to also be
marked up with , though I'm not sure if TabAtkins agrees with me
on that point.

I hope this message can help bring the discussion back to a neutral
zone that will lead to an amicable resolution of this long debate.

Regards,
Gordon

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] [html5] r4029 - [e] (0) Example of use without .

2009-09-29 Thread Gordon P. Hemsley
Ah. I was afraid you might say that.

On Tue, Sep 29, 2009 at 6:54 PM, Ian Hickson  wrote:

> On Tue, 29 Sep 2009, Gordon P. Hemsley wrote:
> >
> > s/Html/html/
>
> Actually that was intentional in that example. I like to show a variety of
> syntaxes so that people can see that they can do whichever one they
> prefer.
>
> --
> Ian Hickson   U+1047E)\._.,--,'``.fL
> http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] [html5] r4029 - [e] (0) Example of use without .

2009-09-29 Thread Gordon P. Hemsley
s/Html/html/

On Tue, Sep 29, 2009 at 4:30 AM, Simon Pieters  wrote:

> On Tue, 29 Sep 2009 07:57:21 +0200,  wrote:
>
>  Author: ianh
>> Date: 2009-09-28 22:57:20 -0700 (Mon, 28 Sep 2009)
>> New Revision: 4029
>>
>> Modified:
>>   index
>>   source
>> Log:
>> [e] (0) Example of  use without .
>>
>> Modified: index
>> ===
>> --- index   2009-09-29 02:41:23 UTC (rev 4028)
>> +++ index   2009-09-29 05:57:20 UTC (rev 4029)
>> @@ -13031,7 +13031,60 @@
>>  
>> +  
>> +   Here is a graduation programme with two sections, one for the
>> +   list of people graduating, and one for the description of the
>> +   ceremony.
>> +
>> +   <!DOCTPE Html>
>>
>
> s/DOCTPE/DOCTYPE/
>
> --
> Simon Pieters
> Opera Software
>



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] article/section/details naming/definition problems

2009-09-16 Thread Gordon P. Hemsley
I'd sent this earlier, but it got caught in the message queue that
apparently nobody checks. Let's see if it works this time.

-- Forwarded message ------
From: Gordon P. Hemsley 
Date: Tue, Sep 15, 2009 at 11:31 PM
Subject: Re: [whatwg] article/section/details naming/definition problems
To: whatwg List 


On Tue, Sep 15, 2009 at 9:08 PM, Ian Hickson  wrote:

> On Tue, 15 Sep 2009, Jeremy Keith wrote:
> > In that blog post, I point out that  and  were once
> more
> > divergent but have converged over time (since the @cite and @pubdate
> > attributes were dropped from ).
> >
> > I've also seen a lot of confusion from authors wondering when to use
> 
> > and when to use . Bruce wrote an article on HTML5 doctor
> recently to
> > address this:
> > http://html5doctor.com/the-section-element/
> >
> > Probably the best tutorial I've seen on this issue is from Ted:
> > http://edward.oconnor.cx/2009/09/using-the-html5-sectioning-elements
> >
> > ...but even so, the confusion remains. The very fact that tutorials are
> > required for what should be intuitive structural elements is worrying — I
> > don't see the same issues around ,  or  (now that
> the
> > content model has been changed) ...although there is continuing confusion
> > around .
>
> I'd like to rename , if someone can come up with a better word
> that means "blog post, blog comment, forum post, or widget". I do think
> there is an important difference between a subpart of a page that is
> a potential candidate for syndication, and a subsection of a page that
> only makes sense with the rest of the page.
>

What about ? (Directly, it's a coincidence that RSS happens to have
the same-named element, as I just used a thesaurus. But perhaps [indirectly]
there's a reason RSS uses  to begin with. And, after all, it's
supposed to be used as a hint that it could be syndicated content, right?)

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/