And Clover ha scritto: > Manlio Perillo wrote: > >> Words of *TEXT MAY contain characters from character sets other than >> ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 > > Yeah, this is, unfortunately, a lie. The rules of RFC 2047 apply only to > RFC*822-family 'atoms' and not elsewhere; indeed, RFC2047 itself > specifically denies that an encoded-word can go in a quoted-string. > > RFC2047 encoded-words are not on-topic in an HTTP header(*); this has > been confirmed by newer development work on HTTPbis by Reschke et al. > (http://tools.ietf.org/wg/httpbis/). >
Thanks. HTTPbis seems to fix all these problems: "Historically, HTTP has allowed field content with text in the ISO- 8859-1 [ISO-8859-1] character encoding and supported other character sets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII character encoding [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII characters. Recipients SHOULD treat other (obs-text) octets in field content as opaque data." This is the new rule for `quoted-string`: quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE qdtext = OWS / %x21 / %x23-5B / %x5D-7E / obs-text ; OWS / <VCHAR except DQUOTE and "\"> / obs-text obs-text = %x80-FF quoted-pair = "\" ( WSP / VCHAR / obs-text ) > The "correct" way of escaping header parameters in an RFC*822-family > protocol would be RFC2231's complex encoding scheme, but HTTP is > explicitly not an 822-family protocol despite sharing many of the same > constructs. See > http://tools.ietf.org/html/draft-reschke-rfc2231-in-http-06 for a > strategy for how 2231 should interact with HTTP, but note that for now > RFC2231-in-HTTP simply does not exist in any deployed tools. > It seems reasonable. > So for now there is basically nothing useful WSGI can do other than > provide direct, byte-oriented (even if wrapped in 8859-1 unicode > strings) access to headers. > Yes, this is what I think. I have some doubts about wrapping the headers in 8859-1 unicode strings, but luckily there is surrogateescape. Regards Manlio _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com