On 11.11.2010 20:10, Alexey Proskuryakov wrote:

11.11.2010, в 9:19, Julian Reschke написал(а):

As far as the Chromium request goes, please consider feature parity with 
Safari. We've supported non-ASCII file names in Content-Disposition for a while 
now, and judging by the lack of bug reports, our approach[*] is sufficient for 
Web compatibility. The only issue I know is with GMail, which blocks Safari 
server-side, replacing non-ASCII characters with question marks.

Do you have information on how frequently it's used?

Raw bytes seem to be the most common representation for non-ASCII file names on 
the Web. Implementing that fixed all bug reports I had about Web compatibility 
in that respect (except for GMail, of course), and didn't cause new ones. Some 
examples were Yahoo! Mail, several file sharing services, and several Korean 
forums.

This is not surprising, as that's the only way to make a download link that 
works in both IE and Firefox (at least for target audiences, see below).

Judging from

<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-03.html#rfc.section.C.4>

it's not supported in IE, Opera, and Konqueror, so it's definitively not 
interoperable today (besides, it conflicts with the existing definitions for 
the header).

The "Encoding Sniffing" column looks somewhat misleading to me, because browsers 
interpret raw bytes differently. I don't know if any browser "sniffs" encoding in the 
common sense of the word. But both IE and Firefox support raw bytes in Content-Disposition, 
although in different ways.

Oh, I called it "sniffing" because according to HTTP/1.1 it's ISO-8859-1, and some browsers "sniff" for different encodings.

IE: Uses "Language for non-Unicode programs" setting. So with the system 
language set to Russian, Content-Disposition is interpreted as windows-1251 (Cyrillic). 
I'm not sure what it does if decoding fails.
Firefox: Tries UTF-8, then referring document's encoding, then Latin-1.
Safari: Tries UTF-8, then referring document's encoding, then browser default 
encoding, and then Latin-1, which can never fail.

Thanks for the details, my data in <http://greenbytes.de/tech/tc2231/#attwithisofnplain> and <http://greenbytes.de/tech/tc2231/#attwithutf8fnplain> was based on blackbox testing. The observable effect, from testing in a Western European locale, is that the UAs do not interoperate; some stick to 8859-1 (Konq, Opera, IE), some "sniff" (Safari, Chrome, FF).


The IE's mechanism is obviously the weakest - it only works if the file name 
encoding happens to match local user default. But that's almost always the case 
for end users. Anyway, if a certain Content-Disposition with raw bytes works in 
IE _or_ Firefox, it's almost certain to work in Safari, too. If the link works 
in both, it's pretty certain to work in Safari.

Indeed; and I wasn't even aware of that because I'm testing with the local I'm in.

I don't think the IETF will ever approve a standard where the encoding depends on the recipient's locale, with no reliable way to find out upfront what that locale is.

Having two sources of file name information in HTTP headers sounds like a very 
weird idea to me.
...

It's the format that has been an IETF standard for a VERY long time.

If you have concerns with this format then you *really* should raise them in 
the IETF HTTPbis WG, which is revising the spec for Content-Disposition, and 
plans to submit it for publication soon (it's already past IETF Working Group 
Last Call).

Perhaps I misunderstood your comment or was unclear myself - I don't have a 
strong opinion about RFC2231-style encoding. It seems cleaner than raw bytes, 
but with de facto standard being raw bytes, it also seems superfluous.

I disagree that "raw bytes" are a de facto standard; they do not interoperate across UAs (see above)...

I would welcome it if the standard described what to do with raw bytes, because 
that's the practical case both browser and server developers need to work with. 
Obviously, I think that Safari solution is best for browsers (with possible 
addition of RFC2231/5988 support).

The spec (RFC 2616) already says that raw bytes are ISO-8859-1, so UAs overriding this are in violation of the spec (IMHO).

Introducing a separate parameter (filename*) that doesn't carry the legacy problems is in my opinion the best way to move forward.

It would seem very weird and unfortunate to me if file names were looked up in both 
Content-Disposition and Link header fields. This is what I referred to as "two 
sources".

Ah, so that was a misunderstanding.

I was referring to the fact that "Link:" uses the same *encoding* (RFC 5987) for the "title" parameter (not "filename"). So if a UA was to process Link headers for, for instance, chapter titles, it could parse "title*" to discover I18Nized chapter titles.

So no overlap with C-D, except that maybe the library for decoding RFC5987-encoded parameters could be re-used.

Best regards, Julian
_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Reply via email to