On 2/1/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
...the only real question in my mind is what to do if user supplied data
has *NO* charset information of any kind ... for XML the spec seems very
clear that in that case you test for UTF-8 or UTF-16 ... but for arbitrary
streams of character data in other formats (CSV, JSON, etc...) it seems
like trysting the servlet container to tell us the default encoding is the
right way to go.

For XML, I think trusting the XML parser, and not the servlet
container is a better way to go.
That means handing the XML parser an InputStream instead of a Reader.

There *is* one place I think we should use UTF-8 when there isn't a
charset specified:
a POST with "Content-Type: application/x-www-form-urlencoded".

a) You can't get browsers to put a charset there.
b) Browsers by default encode the form data in the charset of the form.
c) We know more than the servlet container in this instance... we know
at least that
  our admin pages use UTF-8, and that a POST coming from them will be UTF-8.

-Yonik

Reply via email to