On Sat, 2003-02-22 at 17:45, THG wrote: > with request.getParameter("test") you get the iso-8859-1 encoded utf-8 data (which > is encoded with us_ascii) - an double encoded string.
Ok, I think I'm beginning to get it. I did this in a JSP (with charset=UTF-8): String parameter = request.getParameter("test"); byte[] ba = parameter.getBytes("ISO-8859-1"); String result = new String(ba, "UTF-8"); out.println("<p>This is the text you entered: '" + result + "'"); and it works perfectly. It seems that the browser is sending the bytes as UTF-8, but Java doesn't know that, so I tell it, "the bytes are ISO-8859-1", which basically means "raw 8-bit stuff (don't touch it)", and turn it into a byte[], which actually contains bytes which are UTF-8. In other words, when I use getBytes("ISO-8859-1") I am actually tricking Java into treating the String as raw bytes, instead of a sequence of chars. Great. Now I want to send it back out, so I need to create a string. So, I create a new String from that byte[], telling it that the bytes in the byte[] are actually UTF-8 bytes. This creates a String which, like ALL native strings within Java, but UNLIKE the string returned by request.getParameter() in this case, is UTF-16 encoded. Then, I use out.println, which is smart enough to know what the content type and encoding of the page is, and therefore it knows how to convert a Java string (which is always UTF-16) into the correct sequence of bytes for the OutputStream, and of course that conversion depends on the charset. This means that if I have a database full of strings which are all UTF-8, I can load them into Java (making them UTF-16), and create a JSP with charset=Shift_JIS for example, and then out.println() will magically convert those UTF-16 strings to Shift_JIS as it is displaying them (assuming the strings contain only characters which can be represented in Shift_JIS)? If I'm right in this, then I understand how it is working, and I might want to write an addition to the Tomcat docs to explain this. It's a bit tricky, using ISO-8859-1 when I'm not actually using ISO-8859-1 as the encoding. Thanks! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]