Hi,
The problem is that browsers tend to not tell the character encoding
used when posting data ... Don't ask me why ;-)
So we have to do guessing, something I really do not like.
But it looks like browsers send POST data in the same encoding as the
form was received as. So if the form is received as UTF-8 encoded,
browsers send back encoded in UTF-8.
Now, how does Sling know what encoding has been used to send the form ?
Short answer: It cannot know.
Hence the _charset_ request parameter.
But listening to our clients and users and understanding that most of
the time UTF-8 is used anyway, how about this solution:
* We stick with the _charset_ parameter. Whatever that parameter
conveys is used to decode parameters.
* If the parameter does not exist, we support a new configuration
option defining the default encoding to be used.
* If the configuration option is also missing, we default to the
same value as we do today; which is ISO-8859-1
Of course the configuration option would not be set by default (for
backwards compatibility reasons).
Would that help your case ?
Regards
Felix
Am Mittwoch, den 20.10.2010, 14:05 -0400 schrieb sam lee:
> according to:
> http://download.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#getCharacterEncoding%28%29
> request.getCharacterEncoding() should return " the name of the character
> encoding used in the body of this request. ".
>
> But request.getCharacterEncoding() always seems to return ISO-8859-1.
> For example, my html.jsp looks like:
> <%@ page language="java" contentType="text/html; charset=UTF-8"
> pageEncoding="UTF-8"%>
> ...
> <form method="POST" action="/some/path"
> accept-charset="utf-8"
> enctype="application/x-www-form-urlencoded; charset=utf-8">
> <input type="hidden" name="_charset_" value="UTF-8" />
> <input type="submit" value="Save" />
> ...
>
> Then I would expect request.getCharacterEncoding() (from POST.jsp) to
> return "UTF-8". But it still returns "ISO-8859-1".
>
> Is this intended?
>
> >From sling documentation:
> http://sling.apache.org/site/request-parameters.html#RequestParameters-CharacterEncoding
> I don't get this part: "This identity transformation happens to generate
> strings as the original data was generated with ISO-8859-1 encoding."
>
> As long as I set _charset_ to the encoding of the rendered page (with
> <form>), I don't have a problem. But, I was wondering if
> .getCharacterEncoding() should be set to whatever request body was encoded
> as, not what sling used to perform "identity transform" with.
>
> Also, wouldn't it be better if _charset_ is missing from request, it's
> automatically set to request body encoding? Or, browsers don't send request
> body encoding information?
>
> Thanks.
> Sam