At 12:23 PM +0100 11/30/04, Guillaume Cottenceau wrote:
"Otto, Frank" <otto 'at' delta-barth.de> writes:

 Hi,

 I have a html form. The user can input text in ISO-8859-2 format.

 After submit the form the characters are wrong.  I don't know why.

 I have set

 <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

 and there is a filter (filters.SetCharacterEncodingFilter) to encode UTF-8.

How can I convert the polish character, which the user has inputed?

The problem is, in which encoding the browser is sending the data, and is it telling so (e.g. does the server has a way to know the encoding of the data it's receiving).

Theoretically, when sending HTTP POST, browsers should put the
charset= parameter in Content-Type and everything should be fine
if the receiving server decodes that correctly. But, as far as I
know, once, Mozilla tried to put the charset in Content-Type:
field when replying with HTTP POST but had to remove this because
it broke too many existing servers with bad configurations.
However, I can't find the bug in their bugzilla again.

The problem is worse with HTTP GET where there is really nothing
to tell the encoding of the parameters passed.

As far as I know, the most reliable way is to specify
"accept-charset" as UTF-8 in the <form> of the HTML (w3.org's
description of this parameter: "This attribute specifies the list
of character encodings for input data that is accepted by the
server processing this form"). Theoretically, this forces the
browser to send the data in UTF-8. As far as I know, tests showed
that this should work correctly with current browsers. The
problem is that this parameter is not available in <html:form>
from struts, the reason of it I have no clue about.

This property was added in Struts 1.2.2, as noted on http://struts.apache.org/userGuide/struts-html.html#form


In my experience, this is not as well supported by browsers as one would wish. My experience is that the most effective way to coax a browser to use a specific encoding when returning data is to send the page in the encoding which you want the browser to use when returning data. It looks like that's what's being done -- but be careful not to count on a JSP "page" directive in a tile, since only the "outermost" tile (the first one which writes to the response stream) has the chance to set the page encoding.

Also note that Struts' multipart processing currently is rather clumsy in handling form encoding. If the request object returns null from "getCharacterEncoding()", then Struts assumes ISO-8859-1. I was setting the request encoding with a custom command in a chain-request processor, but the request was still returning a null value in CommonsMultipartRequestHandler where the non-uploaded fields were being processed. Of course, this may have nothing to do with your situation.

In the case where I have a non-multipart form, I have consistently gotten suitable results simply by making sure that the form page is delivered to the browser with the character encoding I want to use.

Joe

--
Joe Germuska [EMAIL PROTECTED] http://blog.germuska.com
"Narrow minds are weapons made for mass destruction" -The Ex

Reply via email to