Charsets in HTTP (was: the CGI.pm in Perl 6)

A. Pagaltzis Sat, 16 Sep 2006 10:38:12 -0700

* Darren Duncan <[EMAIL PROTECTED]> [2006-09-09 20:40]:
> 4.  Make UTF-8 the default HTTP response character encoding,
> and the default declared charset for text/* MIME types, and
> explicitly declare that this is what the charset is.  The only
> time that output should be anything else, even Latin-1, is if
> the programmer specifies such.


No, please don’t. For unknown MIME types, the charset should be
undeclared. In particular, `application/octet-stream` should
never have a charset forced on it if one is not assigned by the
client code explicitly. Likewise, for `application/xml` and
`application/*+xml`, a charset should NEVER be explicitly
declared, as XML documents are self-describing, whereas declaring
a charset forces using the charset declared in the HTTP header.
This is very unwise (cf. Ruby’s Postulate).

> 5.  Similarly, default to trying to treat the HTTP request as
> UTF-8 if it doesn't specify a character encoding; fallback to
> Latin-1 only if the text parts of the HTTP request don't look
> like valid UTF-8.

This is not just unwise, it is actually wrong. Latin-1 is the
default for `text/*` MIME types if no charset is declared. Using
a different charset in violation of the HTTP RFCs is __BROKEN__.

In fact, now that I’m writing all this out, I am starting to
think that maybe CGI.pm6 should simply punt on charsets as CGI.pm
does. Otherwise, the code and API would have to have able to deal
with the full complexity of charsets in HTTP, and the docs would
have to explain it, which is no picnic at all.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Charsets in HTTP (was: the CGI.pm in Perl 6)

Reply via email to