Short version:

Does Tomcat 5 no longer serve files with international characters in their filenames?


Long version:

Environment:  Tomcat 5.1.19 on WinXP Pro

I have a file located in: <tomcat-home>/<webapps>/MyWebApp/.  The filename contains 
international characters:  0x305f 0x3079 0x304f (a.k.a E3-81-9F E3-81-B9 E3-81-8F in 
UTF-8)).

When I navigate to the directory via http://<server>:8080/<webappname>/ I get a 
directory listing of the files in that directory.  I can access every file on that 
list except those that contain international characters.

When I click on a filename that contains international characters, I'm sent to 
http://<server>:8080/<webappname>%E3%81%9F%E3%81%B9%E3%81%8F.xml.  This is the correct 
result of putting the filename through a URLEncoder with the UTF-8 character set, 
which is what I assume is being done behind by the scene by the server.  Except the 
file doesn't appear.  I get a 404 error.

So I made some Java testing code:

try {
    URL url = new URL("http://<server>:8080/<webapp>/%E3%81%9F%E3%81%B9%E=
3%81%8F.xml");
    HttpURLConnection conn = (HttpURLConnection)url.openConnection();

    // checking the headers
    String header;
    String key;
    int i = 0;
    while ((header = conn.getHeaderField(i)) != null) {
        key = conn.getHeaderFieldKey(i);
        System.out.println(key + " = " + header);
        i++;
    }

    // checking the content
    InputStream is = url.openConnection().getInputStream();
    InputStreamReader isr = new InputStreamReader(is);
    int chr;
    while ((chr = isr.read()) != -1) {
        System.out.print((char)chr);
    }
    System.out.println("success");
} catch (Throwable t) { t.printStackTrace(); }

The headers I get back are:
HTTP/1.1 404 /<webapp>/%E3%81%9F%E3%81%B9%E3%81%8F.scene.xml
Content-Type = text/html;charset=ISO-8859-1
Content-Language = en-US
Content-Length = 1091
Date = Wed, 10 Mar 2004 18:02:01 GMT
Server = Apache-Coyote/1.1

No help there because I get those same headers when I try to access a file that 
doesn't exist at all:

HTTP/1.1 404 /<webapp>/inexistent.xml
Content-Type = text/html;charset=3DISO-8859-1
Content-Language = en-US
Content-Length = 1040
Date = Wed, 10 Mar 2004 18:03:22 GMT
Server = Apache-Coyote/1.1

When I try to access the input stream to read for content, I get a 
FileNotFoundException.

I'm pretty confident that this problem does not exist in Tomcat 4.

I'm also pretty confident that this problem is not related to the characters being 
3-byte UTF-8.  I've tested using 2-byte UTF-8 (D0-9F, D1-80) and the result is the 
same.

Is this a bug?

-Ed Toro


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to