On 11.11.2010 20:10, Alexey Proskuryakov wrote:
11.11.2010, в 9:19, Julian Reschke написал(а):
As far as the Chromium request goes, please consider feature parity with
Safari. We've supported non-ASCII file names in Content-Disposition for a while
now, and judging by the lack of bug reports, our approach[*] is sufficient for
Web compatibility. The only issue I know is with GMail, which blocks Safari
server-side, replacing non-ASCII characters with question marks.
Do you have information on how frequently it's used?
Raw bytes seem to be the most common representation for non-ASCII file names on
the Web. Implementing that fixed all bug reports I had about Web compatibility
in that respect (except for GMail, of course), and didn't cause new ones. Some
examples were Yahoo! Mail, several file sharing services, and several Korean
forums.
This is not surprising, as that's the only way to make a download link that
works in both IE and Firefox (at least for target audiences, see below).
Judging from
<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-03.html#rfc.section.C.4>
it's not supported in IE, Opera, and Konqueror, so it's definitively not
interoperable today (besides, it conflicts with the existing definitions for
the header).
The "Encoding Sniffing" column looks somewhat misleading to me, because browsers
interpret raw bytes differently. I don't know if any browser "sniffs" encoding in the
common sense of the word. But both IE and Firefox support raw bytes in Content-Disposition,
although in different ways.
Oh, I called it "sniffing" because according to HTTP/1.1 it's
ISO-8859-1, and some browsers "sniff" for different encodings.
IE: Uses "Language for non-Unicode programs" setting. So with the system
language set to Russian, Content-Disposition is interpreted as windows-1251 (Cyrillic).
I'm not sure what it does if decoding fails.
Firefox: Tries UTF-8, then referring document's encoding, then Latin-1.
Safari: Tries UTF-8, then referring document's encoding, then browser default
encoding, and then Latin-1, which can never fail.
Thanks for the details, my data in
<http://greenbytes.de/tech/tc2231/#attwithisofnplain> and
<http://greenbytes.de/tech/tc2231/#attwithutf8fnplain> was based on
blackbox testing. The observable effect, from testing in a Western
European locale, is that the UAs do not interoperate; some stick to
8859-1 (Konq, Opera, IE), some "sniff" (Safari, Chrome, FF).
The IE's mechanism is obviously the weakest - it only works if the file name
encoding happens to match local user default. But that's almost always the case
for end users. Anyway, if a certain Content-Disposition with raw bytes works in
IE _or_ Firefox, it's almost certain to work in Safari, too. If the link works
in both, it's pretty certain to work in Safari.
Indeed; and I wasn't even aware of that because I'm testing with the
local I'm in.
I don't think the IETF will ever approve a standard where the encoding
depends on the recipient's locale, with no reliable way to find out
upfront what that locale is.
Having two sources of file name information in HTTP headers sounds like a very
weird idea to me.
...
It's the format that has been an IETF standard for a VERY long time.
If you have concerns with this format then you *really* should raise them in
the IETF HTTPbis WG, which is revising the spec for Content-Disposition, and
plans to submit it for publication soon (it's already past IETF Working Group
Last Call).
Perhaps I misunderstood your comment or was unclear myself - I don't have a
strong opinion about RFC2231-style encoding. It seems cleaner than raw bytes,
but with de facto standard being raw bytes, it also seems superfluous.
I disagree that "raw bytes" are a de facto standard; they do not
interoperate across UAs (see above)...
I would welcome it if the standard described what to do with raw bytes, because
that's the practical case both browser and server developers need to work with.
Obviously, I think that Safari solution is best for browsers (with possible
addition of RFC2231/5988 support).
The spec (RFC 2616) already says that raw bytes are ISO-8859-1, so UAs
overriding this are in violation of the spec (IMHO).
Introducing a separate parameter (filename*) that doesn't carry the
legacy problems is in my opinion the best way to move forward.
It would seem very weird and unfortunate to me if file names were looked up in both
Content-Disposition and Link header fields. This is what I referred to as "two
sources".
Ah, so that was a misunderstanding.
I was referring to the fact that "Link:" uses the same *encoding* (RFC
5987) for the "title" parameter (not "filename"). So if a UA was to
process Link headers for, for instance, chapter titles, it could parse
"title*" to discover I18Nized chapter titles.
So no overlap with C-D, except that maybe the library for decoding
RFC5987-encoded parameters could be re-used.
Best regards, Julian
_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev