On 2/1/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
...the only real question in my mind is what to do if user supplied data has *NO* charset information of any kind ... for XML the spec seems very clear that in that case you test for UTF-8 or UTF-16 ... but for arbitrary streams of character data in other formats (CSV, JSON, etc...) it seems like trysting the servlet container to tell us the default encoding is the right way to go.
For XML, I think trusting the XML parser, and not the servlet container is a better way to go. That means handing the XML parser an InputStream instead of a Reader. There *is* one place I think we should use UTF-8 when there isn't a charset specified: a POST with "Content-Type: application/x-www-form-urlencoded". a) You can't get browsers to put a charset there. b) Browsers by default encode the form data in the charset of the form. c) We know more than the servlet container in this instance... we know at least that our admin pages use UTF-8, and that a POST coming from them will be UTF-8. -Yonik