-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

André,

On 11/27/2009 11:05 AM, André Warnier wrote:
> A bit more detail : in java, if you open a text input stream without
> specifying in which encoding it is, it will default to the "platform"
> encoding, which in this case is the locale setting of the process which
> runs the Java JVM which runs tomcat.

Yes! This is likely to be half of the problem.

> That applies also to webapps which read posted input, unless you are
> careful.

No! The default encoding for servlets is ISO-8859-1 unless the client
specifies the encoding (which many do not). The value for file.encoding
is not used here, unless there is a horrendous bug in Tomcat.

> You will not see this issue with XML input, because XML contains either
> an explicit charset declaration, or defaults to UTF-8.

Yes!

> So the XML parser always knows.

Not always. If your webapp is already reading from an
incorrectly-encoded reader (say, because the client is supplying UTF-8
characters but didn't tell the server and the server is assuming
ISO-8859-1) and you pass that reader on to an XML parser, the XML parser
may die trying to read bytes from the reader (it's actually the reader
that dies) or it may complain that an invalid character has been read.

I predict the problem is:

1) The client is using file.encoding which is /not/ ISO-8859-1
2) The client is not supplying the encoding in the Content-Type header
3) The server is drawing the conclusion that ISO-8859-1 should be used

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksUJ2AACgkQ9CaO5/Lv0PB1xACeL8lXhPhnH3Jv3dkDPgVyy4ry
9fYAnRCIvd9qeOkErvl+mRDSwyjdV8WC
=HZnr
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to