Hi,

The problem is that browsers tend to not tell the character encoding
used when posting data ... Don't ask me why ;-)

So we have to do guessing, something I really do not like.

But it looks like browsers send POST data in the same encoding as the
form was received as. So if the form is received as UTF-8 encoded,
browsers send back encoded in UTF-8.

Now, how does Sling know what encoding has been used to send the form ?
Short answer: It cannot know.

Hence the _charset_ request parameter.

But listening to our clients and users and understanding that most of
the time UTF-8 is used anyway, how about this solution:

  * We stick with the _charset_ parameter. Whatever that parameter
    conveys is used to decode parameters.
  * If the parameter does not exist, we support a new configuration
    option defining the default encoding to be used.
  * If the configuration option is also missing, we default to the
    same value as we do today; which is ISO-8859-1

Of course the configuration option would not be set by default (for
backwards compatibility reasons).

Would that help your case ?

Regards
Felix

Am Mittwoch, den 20.10.2010, 14:05 -0400 schrieb sam lee: 
> according to:
> http://download.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#getCharacterEncoding%28%29
> request.getCharacterEncoding() should return " the name of the character
> encoding used in the body of this request. ".
> 
> But request.getCharacterEncoding() always seems to return  ISO-8859-1.
> For example, my html.jsp looks like:
> <%@ page language="java" contentType="text/html; charset=UTF-8"
>     pageEncoding="UTF-8"%>
> ...
> <form method="POST" action="/some/path"
>     accept-charset="utf-8"
>     enctype="application/x-www-form-urlencoded; charset=utf-8">
>     <input type="hidden" name="_charset_" value="UTF-8" />
>     <input type="submit" value="Save" />
> ...
> 
> Then I would expect request.getCharacterEncoding()  (from POST.jsp) to
> return "UTF-8". But it still returns "ISO-8859-1".
> 
> Is this intended?
> 
> >From sling documentation:
> http://sling.apache.org/site/request-parameters.html#RequestParameters-CharacterEncoding
> I don't get this part:  "This identity transformation happens to generate
> strings as the original data was generated with ISO-8859-1 encoding."
> 
> As long as I set _charset_ to the encoding of the rendered page (with
> <form>), I don't have a problem. But, I was wondering if
> .getCharacterEncoding() should be set to whatever request body was encoded
> as, not what sling used to perform "identity transform" with.
> 
> Also, wouldn't it be better if _charset_ is missing from request, it's
> automatically set to request body encoding? Or, browsers don't send request
> body encoding information?
> 
> Thanks.
> Sam


Reply via email to