Re: ServletWebRequest.getServletPath() returns strange values on uris with german umlauts

André Warnier Thu, 27 Jan 2011 14:46:15 -0800

asbachb wrote:

Thanks for you reply.


I checked my clients request to tomcat which shows that the umlauts are
correctly replaced with their enities:

GET
"http://localhost:8080/wicket-umlauts-1.0-SNAPSHOT/page/param/v%C3%A4lue-xxx";

This request should be a valid ASCII request and shouldn't be a problem to
decode?

I understand what you mean, and you are right, but in a case like this you have to be verycareful in your use of vocabulary.The term "ASCII" is usually reserved for talking about a character set (or alphabet) whichincludes only 128 codes, represented by one byte per character, of which the letters areA-Z and a-z. Basically thus the English alphabet.

An "umlaut" is a diacritic mark.
An "lowercase a with umlaut" is a letter of the German alphabet (and probably 
others).

The term "entity" is usually used in the context of XML or HTML, to denote something ofthe form "&xxx;" where "xxx" represents the name of a symbol.And "/wicket-umlauts-1.0-SNAPSHOT/page/param/v%C3%A4lue-xxx" seems to be the result of 2consecutive steps :

a) the client composes a URL as a Unicode String, and encodes it using the 
UTF-8 encoding

b) after (a), it scans this URL for any byte/character that is not valid in a URL (as perRFC 2396) and "URL-encodes" it, which consists of replacing the offending byte by itsencoding as "%xy", where "xy" is the hexadecimal representation of the byte value.


The server, when it receives this request,
c) "URL-decodes" the URL, replacing each "%xy" sequence by the corresponding 
single-byte code
d) and then, it depends..

If you have told the server to decode the URL (after (c)) as if it was UTF-8/Unicode, thenthe server will do that, to generate an internal Java Unicode String.

This is not the default. You have to tell the server to do that. With Tomcat, you dothat by using the 'URIencoding="UTF-8"' attribute of the Connector.(You cannot in this case use the "useBodyEncodingForURI" atribute, because for a GETrequest, there is no body (and thus no body encoding of course)).

If you have done that, and your application asks Tomcat for the URL String directly, thenyou should get the correct Java (Unicode) String in response.

(You should be able to check this easily with a simple JSP page).

Now if you get this path via a call specific to the "wicket" application you are using,then you have to check in that application what happens, to make the result different.Maybe this "wicket" thing does its own decoding of the path, resulting in a (wrong)double-decoding ?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: ServletWebRequest.getServletPath() returns strange values on uris with german umlauts

Reply via email to