On Sat, 2003-02-22 at 17:45, THG wrote:
> with request.getParameter("test") you get the iso-8859-1 encoded utf-8 data (which 
> is encoded with us_ascii) - an double encoded string.

Ok, I think I'm beginning to get it.  I did this in a JSP (with
charset=UTF-8):

String parameter = request.getParameter("test");
byte[] ba = parameter.getBytes("ISO-8859-1");
String result = new String(ba, "UTF-8");               
out.println("<p>This is the text you entered: '" + result + "'");

and it works perfectly.  It seems that the browser is sending the bytes
as UTF-8, but Java doesn't know that, so I tell it, "the bytes are
ISO-8859-1", which basically means "raw 8-bit stuff (don't touch it)",
and turn it into a byte[], which actually contains bytes which are
UTF-8.  In other words, when I use getBytes("ISO-8859-1") I am actually
tricking Java into treating the String as raw bytes, instead of a
sequence of chars.  Great.  Now I want to send it back out, so I need to
create a string.

So, I create a new String from that byte[], telling it that the bytes in
the byte[] are actually UTF-8 bytes.  This creates a String which, like
ALL native strings within Java, but UNLIKE the string returned by
request.getParameter() in this case, is UTF-16 encoded.

Then, I use out.println, which is smart enough to know what the content
type and encoding of the page is, and therefore it knows how to convert
a Java string (which is always UTF-16) into the correct sequence of
bytes for the OutputStream, and of course that conversion depends on the
charset.

This means that if I have a database full of strings which are all
UTF-8, I can load them into Java (making them UTF-16), and create a JSP
with charset=Shift_JIS for example, and then out.println() will
magically convert those UTF-16 strings to Shift_JIS as it is displaying
them (assuming the strings contain only characters which can be
represented in Shift_JIS)?

If I'm right in this, then I understand how it is working, and I might
want to write an addition to the Tomcat docs to explain this.  It's a
bit tricky, using ISO-8859-1 when I'm not actually using ISO-8859-1 as
the encoding.

Thanks!



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to