There are lots of potential pitfalls when using non-default character encodings. It is easy to make mistakes both with Tomcat settings and with your code.

To sort out the tomcat settings, get the following index.jsp to work for whatever text you supply to the form. I have tested this with the latest TC4 and TC5 code and it works for me with any text I choose to enter.

Once you have this working, you can look at your application and see what is different.

Mark

<%@ page contentType="text/html; charset=UTF-8" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <title>Encoding fun</title>
  </head>
  <body>
    <p>Data posted to this form was:
    <%
      request.setCharacterEncoding("UTF-8");
      out.print(request.getParameter("mydata"));
    %>

    </p>
    <form method="post" action="index.jsp"
          enctype="application/x-www-form-urlencoded">
      <input type="text" name="mydata">
      <input type="submit" value="Submit" />
      <input type="reset" value="Reset" />
    </form>
  </body>
</html>

Fadwa Barham wrote:
While I was searching for a solution for the encoding, I found this
There is a standard for encoding URIs (http://www.w3.org/International/O-URL-
code.html) but this standard is not consistently followed by clients. This causes a number of problems.


The functionality provided by Tomcat (4 and 5) to handle this less than ideal situation is described below.

1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which if set to true will use the request body encoding to decode the URI query parameters.
- The default value is true for TC4 (breaks spec but gives consistent behaviour across TC4 versions)
- The default value is false for TC5 (spec compliant but there may be migration issues for some apps)
2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to ISO-8859-1.
3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding field which defaults to the URIEncoding. It must be set before the parameters are parsed to have an effect.


Things to note regarding the servlet API:
1. HttpServletRequest.setCharacterEncoding() normally only applies to the request body NOT the URI.
2. HttpServletRequest.getPathInfo() is decoded by the web container.
3. HttpServletRequest.getRequestURI() is not decoded by container.


Other tips:
1. Use POST with forms to return parameters as the parameters are then part of the request body.



Is this means that the changes between tc4 and tc5 about encoding is the reason why I can't have the write encoding in the new versions of tomcat? and if so, how to solve the problem?
Thanks


----- Original Message ----- From: "Fadwa Barham" To: "Tomcat Users List" Sent: Tuesday, March 01, 2005 3:24 AM
Subject: Re: Arabic encoding




As tomcat 4.1.31 is suitable for arabic and it seems until now that tomcat 4.1.31 solved the jndi datasource problems: Intermittent dB connection Failures and Random Connection closed Exceptions
I will use tomcat 4.1.31 until I can configure the latest versions of tomcat.
I feel not lucky
----- Original Message ----- From: "Fadwa Barham" To: "Tomcat Users List" Sent: Tuesday, March 01, 2005 2:39 AM
Subject: Re: Arabic encoding




I tested many tomcat versions, I found until tomcat 4.1.31 no problems with arabic, but when I tried tomcat-4.1.18 and newer versions, I faced the same problem.

----- Original Message ----- From: "Benson Margulies" To: "Tomcat Users List" Sent: Sunday, February 27, 2005 4:08 PM
Subject: RE: Arabic encoding




It depends on what the Oracle JDBC driver does with byte values that are
not legitimate US7ASCII. If, for some reason, it treated the data as
ISO-8859-1 instead of US7ASCII, then it might have streamed out through
tomcat, and the browser would have auto-detected the CP1256 pretending
to be ISO-8859-1.

-----Original Message-----
From: Fadwa Barham [mailto:[EMAIL PROTECTED]
Sent: Sunday, February 27, 2005 1:43 PM
To: Tomcat Users List
Subject: Re: Arabic encoding

But I wonder why the old tomcat and java displayed arabic correctly, and
I use the same classes12.jar in both of the old and the new.
I want to know what is the differance, what encoding they stopped to
support? It looks like that tomcat cannot understand the old Java cause
I have to change the encoding to arabic windows in the internet explorer
each time I request the servlet, and when I do this, every arabic
character is displayed correctly.
I think it is better to understand the problem and the changes so I can
handle the problem if I faced it again in the newer versions of tomcat
or Java.
I know that being the database in us7ascii is not good, but changing the
database encoding each time I face the problem is not the right way. I
may change it this time, but I need to understand.
thanks

----- Original Message -----
From: "Benson Margulies" To: "Tomcat Users List" Sent: Sunday, February 27, 2005 12:44 AM
Subject: RE: Arabic encoding




Oracle's ODBC driver will transcode from the database to UTF-16 based

on

the databse encoding. If the database is in US7ASCII, this is a
destructive process for Arabic. The only alternative I can think of is
to do all your database I/O in hex.

-----Original Message-----
From: Fadwa Barham [mailto:[EMAIL PROTECTED]
Sent: Saturday, February 26, 2005 1:20 PM
To: Tomcat Users List
Subject: Re: Arabic encoding

I use oracle 7 database, and the NLS language is
American_America.US7ASCII, and it is not easy to change it to utf-8.
Beside, the question is, a servlet work fine on tomcat 4.0.6 why it
stopped with the new versions, what changes made to the encoding of
tomcat??
do I need tomcat-i18n-ar.jar? and if so, from where to get it?
I can't determine where is the problem, is it from the new Java or the
new tomcat.
thanks in advanced

----- Original Message -----
From: "Benson Margulies" To: "Tomcat Users List" Sent: Wednesday, February 23, 2005 11:26 PM
Subject: RE: Arabic encoding




What database? Do you have the database set up to deliver Unicode, or
CP1256, correctly? Note that not all Arabic fits into CP1256, you

might

really be better off with UTF-8.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--------------------------------- Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to