Title: RE: charset used for parameters decoding on HTTP request Tomcat3.x,4

> > You will still need to fix the actual parameter parsing routine to delay
> > applying the encoding until the name and parameter are parsed out of the
> > input stream...
>
> Yes, most of this is already done. It also has a very nice performance
> implication - since the String is converted and alocated only when and if
> it's needed.
>
> The only missing part is the "internationalization" module that detects
> the encoding ( charset and accept-language parsing doesn't look good
> either in the current code ), and putting the pieces togheter.

The problem is that browsers do not send the charset used to encode the form's parameters; but they sent the request with the ContentType header application/x-www-form-urlencoded. The charset should follow the encoding type ex: "application/x-www-form-urlencoded; charset=UTF8" but in most of cases does not.

From my point of view instead of implementing a routine in charge of analysing the request header to extract the data's encoding charset (few chances for it to really work), It would be better to adopt the following policy:

 * we suppose that the request's parameters encoding is the one used for the response to this request content encoding. If the servlet processing generates a result page encoded with Shift_JIS charset, it is reasonnable to suppose that the incoming form data used for the page generation is encoded with the Shift_JIS charset.

 * While the parameters decoding, instead of suppose that one url's encoded entity (%XX) is a caracter to be decoded, we append all characters as bytes and then we decode the full parameter string using the encoding set on the response (javax.servlet.http.HttpServletResponse.setCharacterEncoding(String)).

 * The response encoding must be set on the response object before the first call to one of following function (then parameters are parsed):

    - javax.servlet.http.HttpServletRequest.getParameter(String)
    - javax.servlet.http.HttpServletRequest.getParameterNames()
    - javax.servlet.http.HttpServletRequest.getParameterValues(String)

   If the charset was not set on the response object when one of the functions listed above is called then parameters are decoded using the default JVM's encoding.

NB: This policy is used in Caucho's Resin servlet engine and it works fine.
    Modifications in Tomcat code are basic and the risk to impact the core processing is weak
 
Adalbert

Adalbert Wysocki, software engineer
<mailto:[EMAIL PROTECTED]>
phone: +33 (0)1 71.00.68.67
fax: +33 (0)1 71.00.68.02


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to