hey folks,
to show you what is it all about i wrote a small app which shows the
html utf-8 codes of the entered string. this is the jsp code:
<html><head><meta http-equiv="Content-Type" content="text/html;
charset=UTF-8"></head><body> <form act="/tests/utf.jsp" method=post><input type=text
name=source ><input type=submit><form><p> <%if(request.getParameter("source")!=null){
request.setCharacterEncoding("UTF-8");
out.println(request.getParameter("source").length()+"<p>");
out.println(request.getParameter("source")); StringBuffer sb = new StringBuffer();
for(int i=0; i<request.getParameter("source").length(); i++) {
if(request.getParameter("source").charAt(i) == '&') sb.append("&"); else
sb.append(request.getParameter("source").charAt(i)); } out.println("<p>"+
sb.toString());}%> </body></html>
well, as you see, this code block gets a utf-8 encoded parameter from
a request, outputs its length, the parameter itself, and its html
utf-8 codes.
to test it i send a hebrew letter ALEF. on tomcat 4.xx everything
works perfect and i get the following response:
7
א
א
(in case you don't see it here, it's 7 , alef's utf-8 code and alef's utf-8
code parsed to be visible in browser)
cool. then i run the same code on tomcat 5.0.16 and KABOOM. this is
what i get:
2
א
א
(in case you don't see it here: it's 2, and twice alef as it would be
passed in windows-1255 or iso... where the hell utf-8 is gone?)
all this makes me understand that tomcat 5 has some bug influenting
its utf-8 support. how comes the parameter length of one char is 2?!
thanks in advance.