Hello,

Last year we had to enable the WebFolder access of Japanese (and German)
characters in Slide. Slide did already work with German characters and
mostly with Japanese, but not completely.

I want to share my experiences with you:

1) Currently the URL standard states, that only USASCII characters can be
used within a URL, nevertheless there are efforts to standardize an
additional encoding (possibly UTF-8 will be the final standard).
2) WebFolders sends all Japanese characters encoded in the platform specific
encoding (on a Japanese PC this is shift-jis, on a German/US PC this is
ISO-8859-1). The used encoding is not transported to the server via e.g. a
http header.
3) IE sends URLs encoded either in UTF-8 (default) or the platform specific
encoding.
4) Tomcat 3.3 uses ISO-8859-1 as the default encoding for all bytes
converted into a String (ByteChunk.toString method). If the requestUri()
method is called Tomcat converts the bytes into a String using ISO-8859-1,
and then the String is converted and masked (%xx notion) with the encoding
specified in server.xml [<DecodeInterceptor defaultEncoding="UTF-8" />] .
Slide will unmask the URL and encode it again
(WebdavUtils.decodeURL(req.getRequestURI())). This did not work in every
case.

Changes applied to Slide (it does work with Tomcat 3.3 and German and
Japanese characters now):

1) Configuration.java:  Possibility to specify the received URL encoding for
the Slide Server. Please note, that the server is now limited to clients
using exactly this encoding. Before the default was UTF-8 coded in
WebDAVUtils.
2) WebdavUtils.java:
a. New method: fixTomcatURL(url). Decodes the url using ISO-8859-1 and
encodes the byte array using the Configuration.urlEncoding setting. 
b. DecodeUrl(path) uses the Configuration.urlEncoding setting
c. DecodeUrl(path, encoding) uses the encoding parameter
d. encodeURL(String path) uses the Configuration.urlEncoding setting
e. encodeUrl(path, encoding) uses the encoding parameter
f. getRelativePath uses the fixTomcatURL method.
3) AbstractMultistatusResponseMethod.java:  The target URL (of copy/move) is
fixed and decoded.
4) PropFindMethod.java
a. All URLs in the response XML document are UTF-8 encoded
b. req.getContextPath() instead of req.getRequestURI() plus processing.
Currently this implies, that the context path is USASCII.

I have not tested (yet) with Tomcat 4. Some more testing with language
specific characters should be done. One of the Tomcat experts should have a
look at 4b) please.



During my testing and fixing I got the impression, that Tomcat should offer
a similar feature. May I suggest some Tomcat extensions:

1) in server.xml the administrator would be able to specify the URL decoding
(plus all headers decoding). If not specified the default is UTF-8. May be
there is a possibility to infer the used encoding from one of the sent
headers.
2) All Strings (getRequestUri, etc) returned by Tomcat are ready to use Java
Strings, e.g. there is no hidden encoding or masking (%xx) any more.
3) The servlet can inquire the used decoding from Tomcat.

What do you think? Remy, do you want to discuss this with the Tomcat team?

Best regards

Juergen





--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to