Character Encoding problems

Nikola Milutinovic Tue, 04 Dec 2001 06:23:04 -0800
Hi all.

I'm developing a web application that uses textual data for Central&Eastern Europe. 
The text is in a database which is internally UNICODE. I also have another DB instance 
which is ISO-8859-2 encoded, so all options are in the play.

I thought I should set contentType="text/html; charset=ISO-8859-2", declare a page to 
the same (just in case) and "sit back and enjoy myself". Unfortunately, I was wrong.

Not only is the Latin-2 support in both IE and Netscape buggy (they wouldn't display 
"s-caron" and "z-caron", but would display Caps versions of those characters), but 
Java is bugging me, too. Instead of letters specific to our alphabet, I'm getting "?".

With the help of a dedicated PostgreSQL JDBC developer, I have tracked this problem 
down to JVM, which has a default encoding of "ISO-8859-1". In a standalone Java 
application I can do explicit encoding, like this:

System.out.write( testString.getBytes( "ISO-8859-2" ) );

and it will print the characters I expect, instead of "?".

What do I do in Tomcat?

I have set contentType to "text/html; charset=ISO-8859-2" and in a generated Servlet 
code it really has:

response.setContentType("text/html; charset=ISO-8859-2");

So, no trouble there. How do I get a (Unicode) string to convert to a ISO-8859-2 
encoded byte stream? Because, eventually, that is what the browser should get. I 
cannot use the method from above, since JspWriter doesn't accept byte[] as an 
argument.

Nix.
Character Encoding problems

Reply via email to