Hello,
I have recently completed the torturous process of translating my web-site into 16
European languages. Having had lots of advice from this list and other sources I have
come down to a few conclusions about what a Java / Tomcat web-site needs in order to
fully support UTF-8.
These are:
1.
JSP pages must inlcude the header:
<%@ page
contentType="text/html; charset=UTF-8"
%>
2.
In the Catalina.bat (windows) catalina.sh (windows) apache$jakarta_config.com
(OpenVMS), file there must be a switch added to the call to java.exe. The switch is:
-Dfile.encoding=UTF-8
I cannot find documentation for this environment variable anywhere or what it actually
does but it is essential.
3.
For translation of inputs coming back from the browser there must be a method that
translates from the browser's ISO-8859-1 to UTF-8. It seems to me that -1 is used in
all regions as I have had people in countries such as Greece & Bulgaria test this and
they always send input back in -1 encoding. The method which you will use constantly
should go something like this:
/**
* Convert ISO8859-1 format string (which is the default sent by IE
* to the UTF-8 format that the database is in.
*/
public String toUTF8(String isoString)
{
String utf8String = null;
if (null != isoString && !isoString.equals(""))
{
try
{
byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
utf8String = new String(stringBytesISO, "UTF-8");
}
catch(UnsupportedEncodingException e)
{
// As we can't translate just send back the best guess.
System.out.println("UnsupportedEncodingException is: " + e.getMessage());
utf8String = isoString;
}
}
else
{
utf8String = isoString;
}
return utf8String;
}
I have found that these three steps are all that is necessary to make your site accept
any language that UTF-8 can work with. I extend my thanks to those of you on the
Tomcat users list who helped me find these little gems.
Kind regards,
Andoni.