This is exactly what should happen. You are working with characters not bytes hence you see 1 UTF-8 character.
Mark > -----Original Message----- > From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] > Sent: Sunday, July 04, 2004 11:18 PM > To: Tomcat Users List > Subject: Re: utf-8 with tomcat 5: second round > > hey mark, thanks for response. > i run the code i pasted below. > for example, i enter one hebrew letter. it's utf > code is 1488. > on tc 4.0.xx i get the following results: > > 7 (the length of its utf-8 code) > א (the letter itself in utf-8 encoding) > &#1488;(same as above parsed to be visible in browser) > > in tc 5 i get this: > 1(which already lets me know that this is not really utf-8) > the entered hebrew letter > the entered hebrew letter (nothing is parsed, so '&' signed > wasn't even met) > this is it. > > ----- Original Message ----- > From: "Mark Thomas" <[EMAIL PROTECTED]> > To: "'Tomcat Users List'" <[EMAIL PROTECTED]>; "'Asher > Tarnopolski'" <[EMAIL PROTECTED]> > Sent: Sunday, July 04, 2004 8:46 PM > Subject: RE: utf-8 with tomcat 5: second round > > > > Asher, > > > > A few questions... > > > > What do you put in the text box on the form and what output > do you see? > > > > Are you really using "<form act="/tests/utf.jsp" > method=post>" or do you > mean > > <form action="/tests/utf.jsp" method=post>? > > > > When I did my test I copied your UTF-8 character form the > bugzilla report > and > > pasted into the text box. I was seeing question marks in > the output until > I > > added the <[EMAIL PROTECTED] pageEncoding="UTF-8"%> The test was on XP > (as per the > bug > > report) and I assume you used IE as the browser. > > > > The URI encoding is a red herring in this case. Because you > are using post > it is > > only the request encoding that matters. > > > > The full text of my test JSP is below. > > > > Mark > > > > <%@ page language="java" import="java.lang.*,java.util.*" %> > > <%@ page pageEncoding="UTF-8" %> > > <html> > > <body> > > > > <form action="bug29900.jsp" method=post> > > <input type=text name=source > > > <input type=submit> > > <form> > > <p> > > > > <% > > request.setCharacterEncoding("UTF-8"); > > > > if(request.getParameter("source")!=null) > > { > > out.println(request.getParameter("source").length()+"<p>"); > > > > out.println(request.getParameter("source")); > > > > StringBuffer sb = new StringBuffer(); > > for(int i=0; i<request.getParameter("source").length(); i++) > > { > > if(request.getParameter("source").charAt(i) == '&') > > sb.append("&"); > > else > > sb.append(request.getParameter("source").charAt(i)); > > > > } > > out.println("<p>"+ sb.toString()); > > } > > %> > > > > </body> > > </html> > > > > > > > > > -----Original Message----- > > > From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] > > > Sent: Sunday, July 04, 2004 6:25 PM > > > To: [EMAIL PROTECTED] > > > Subject: utf-8 with tomcat 5: second round > > > > > > hi folks, > > > i've published a question about it a couple of days ago, but > > > didn't get any responses. > > > i've tried some things i found in bugzilla, but they didn't > > > help. so, i wanna try to get your help once more. > > > once more about my problem: > > > i try to send utf-8 encoded parameters in POST body, but they > > > arrived encoded in ISO... > > > this worked perfectly with tomcat 4.0.x. > > > from the info i've got from a developer at bugzilla i learned > > > that the difference between tc4.0 and tc5 > > > that causes the change is actually in coyote http1.1 > > > connector. there is an attribute > > > called useBodyEncodingForURI which was set to "true" in tc4, > > > but became "false" in tc5. > > > setting it to "true" together with <%@ page > > > pageEncoding="UTF-8" %> and > > > <%request.setCharacterEncoding("UTF-8");%> will make the > difference. > > > i made the change, the jsp tags are in the code and coyote > > > settings look like this now: > > > > > > <code> > > > <!-- Define a non-SSL Coyote HTTP/1.1 Connector on port 8080 --> > > > <Connector port="8080" > > > maxThreads="150" minSpareThreads="25" > > > maxSpareThreads="75" > > > enableLookups="false" redirectPort="8443" > > > acceptCount="100" > > > debug="0" connectionTimeout="20000" > > > useBodyEncodingForURI="true" > > > disableUploadTimeout="true" /> > > > </code> > > > > > > but this doesn't help! another request to bugzilla didn't > > > help either, i was told that this is not a bug in tomcat, > > > so they are not going to deal with the question. well, may be > > > it's not a tomcat bug, but it should be some kind of bug. > > > any ideas? > > > > > > my testing code comes here: > > > > > > <code> > > > > > > <[EMAIL PROTECTED] contentType="text/html; charset=utf-8"%> > > > <[EMAIL PROTECTED] pageEncoding="utf-8"%> > > > <html> > > > <head> > > > </head> > > > <body> > > > > > > <form act="/tests/utf.jsp" method=post> > > > <input type=text name=source > > > > <input type=submit> > > > <form> > > > <p> > > > > > > <% > > > request.setCharacterEncoding("UTF-8"); > > > > > > if(request.getParameter("source")!=null) > > > { > > > out.println(request.getParameter("source").length()+"<p>"); > > > > > > out.println(request.getParameter("source")); > > > > > > StringBuffer sb = new StringBuffer(); > > > for(int i=0; i<request.getParameter("source").length(); i++) > > > { > > > if(request.getParameter("source").charAt(i) == '&') > > > sb.append("&"); > > > else > > > sb.append(request.getParameter("source").charAt(i)); > > > > > > } > > > out.println("<p>"+ sb.toString()); > > > } > > > %> > > > > > > </body> > > > </html> > > > > > > > > > </code> > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]