asbachb wrote:
Thanks for you reply.
I checked my clients request to tomcat which shows that the umlauts are
correctly replaced with their enities:
GET
"http://localhost:8080/wicket-umlauts-1.0-SNAPSHOT/page/param/v%C3%A4lue-xxx"
This request should be a valid ASCII request and shouldn't be a problem to
decode?
I understand what you mean, and you are right, but in a case like this you have to be very
careful in your use of vocabulary.
The term "ASCII" is usually reserved for talking about a character set (or alphabet) which
includes only 128 codes, represented by one byte per character, of which the letters are
A-Z and a-z. Basically thus the English alphabet.
An "umlaut" is a diacritic mark.
An "lowercase a with umlaut" is a letter of the German alphabet (and probably
others).
The term "entity" is usually used in the context of XML or HTML, to denote something of
the form "&xxx;" where "xxx" represents the name of a symbol.
And "/wicket-umlauts-1.0-SNAPSHOT/page/param/v%C3%A4lue-xxx" seems to be the result of 2
consecutive steps :
a) the client composes a URL as a Unicode String, and encodes it using the
UTF-8 encoding
b) after (a), it scans this URL for any byte/character that is not valid in a URL (as per
RFC 2396) and "URL-encodes" it, which consists of replacing the offending byte by its
encoding as "%xy", where "xy" is the hexadecimal representation of the byte value.
The server, when it receives this request,
c) "URL-decodes" the URL, replacing each "%xy" sequence by the corresponding
single-byte code
d) and then, it depends..
If you have told the server to decode the URL (after (c)) as if it was UTF-8/Unicode, then
the server will do that, to generate an internal Java Unicode String.
This is not the default. You have to tell the server to do that. With Tomcat, you do
that by using the 'URIencoding="UTF-8"' attribute of the Connector.
(You cannot in this case use the "useBodyEncodingForURI" atribute, because for a GET
request, there is no body (and thus no body encoding of course)).
If you have done that, and your application asks Tomcat for the URL String directly, then
you should get the correct Java (Unicode) String in response.
(You should be able to check this easily with a simple JSP page).
Now if you get this path via a call specific to the "wicket" application you are using,
then you have to check in that application what happens, to make the result different.
Maybe this "wicket" thing does its own decoding of the path, resulting in a (wrong)
double-decoding ?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org