hey folks,

to show you what is it all about i wrote a small app which shows the
html utf-8 codes of the entered string. this is the jsp code:

<html><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"></head><body> <form act="/tests/utf.jsp" method=post><input type=text 
name=source ><input type=submit><form><p> <%if(request.getParameter("source")!=null){  
request.setCharacterEncoding("UTF-8");   
out.println(request.getParameter("source").length()+"<p>");   
out.println(request.getParameter("source"));   StringBuffer sb = new StringBuffer();  
for(int i=0; i<request.getParameter("source").length(); i++)  {    
if(request.getParameter("source").charAt(i) == '&')      sb.append("&");    else      
sb.append(request.getParameter("source").charAt(i));   }  out.println("<p>"+ 
sb.toString());}%> </body></html>

well, as you see, this code block gets a utf-8 encoded parameter from
a request, outputs its length, the parameter itself, and its html
utf-8 codes.
to test it i send a hebrew letter ALEF. on tomcat 4.xx everything
works perfect and i get the following response:

7
א
&#1488;

(in case you don't see it here, it's 7 , alef's utf-8 code and alef's utf-8
code parsed to be visible in browser)

cool. then i run the same code on tomcat 5.0.16 and KABOOM. this is
what i get:

2
א
א

(in case you don't see it here: it's 2, and twice alef as it would be
passed in windows-1255 or iso... where the hell utf-8 is gone?)

all this makes me understand that tomcat 5 has some bug influenting
its utf-8 support. how comes the parameter length of one char is 2?!

thanks in advance.

Reply via email to