> From: Krzysztof Cieniuch [mailto:[EMAIL PROTECTED] 
> Could someone clearly explain me
> how exactly tomcat 5.0.28 handles
> character encodings in requests.

This is not a trivial thing to explain. The short answer is "as the spec
requires".

If you are only interested in request parameters then the following will work (I
have tested this on a clean build of 5.0.x from CVS).

1. Make sure URIEncoding="UTF-8" is set on your connector. Be careful that you
set URIEncoding, not URLEncoding.

2. You can you the following test JSP (saved using default ASCII encoding).
<%@ page language="java" import="java.lang.*,java.util.*"
contentType="text/html; charset=UTF-8" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <title>UTF-8 Encoding</title>
  </head>
  <body>
    <p>Text from JSP page (which is ASCII encoded).</p>
    <form action="utf8.jsp" method="post">
      <input type="text" name="anyText" />
      <input type="submit" value="Post form data" />
    </form>
    <p>Text obtained from parameter is:</p>
    <%
      request.setCharacterEncoding("UTF-8");
      String param = request.getParameter("anyText");
      out.println(param);
    %>
  </body>
</html>

I have pasted a range a weird and wonderful characters to this and they all
appear as expected.

> set meta header content-type to UTF-8
meta headers have no effect in tomcat.

> pass -Dfile.encoding=UTF-8 jvm in startup parameters
Not necessary for the above test case. If you want to use UTF-8 text in your JSP
files things get a little more interesting. I haven't looked into this very
much.

> I've downloaded tomcat 4.1.30 did the same test no difference
The TC4.1.x branch will also work using the example above

> I realy don't understand why this is so hard to set character encoding
> corectly.
Again, the short answer is because the early internet technologies didn't handle
this at all. Support for this has been added subsequently whilst retaining
backwards compatability and hence the complexities. The moral here is if you
don't want your application to get into the same mess, design it for i18n from
the start and test it carefully.

> In some post i've read that iso8859-1 character encoding was 
> hardcoded into
> tomcat 4.1.29
> Is this realy true ?
It was in the early 4.1.x releases. It was fixed somewhere between 4.1.24 and
4.1.30 but I can't remember (and don't really have the time to search the CVS
changelog / mail archives to find out exactly when)

> do i realy always must write:
> new
> String(request.getParameter("parameter_name")).getBytes(ISO-88
> 59-1),"whateve
> r_encoding_i_need").
No, you don't ;)  - see above.

> PS. OT: Thinking about switching to Jetty or with this web 
> container also are such problems ?
Feel free. 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to