Christopher Schultz wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sylvie,

On 9/17/2009 9:12 AM, Sylvie Perrin wrote:
I have a shared directory on a windows system named SHAREDDIR and
containing one file named "fichié.txt"

Sylvie,
why do you not name your file "fichier.txt", like it should be written in French ? That would solve your problem immediately, save a lot of ink on this thread, and save you a lot of time in the end.

Seriously.

There are so many pieces that play their part between on the one side a browser that you do not control, on a workstation that you do not control, in the middle HTML and HTTP for which the default character set is iso-8859-1 and Java for which the internal character set is Unicode, a local Linux filesystem which is charset-agnostic, and on the other side a Windows system which stores its filenames in directories as Unicode, that you will never get a solution that is totally foolproof. If you have to play with a web application which involves files on different platforms, stick with filenames that are purely made of US-ASCII characters.

André




Seriously now, let's start at the beginning.
You are, like many of us, the victim of these horrible English-speaking imperialists in the computer industry. They just don't understand alphabets with more than 27 letters, and get totally confused by our és and às and cédilles and sharfe s'eses. But since they got there first (mainly because of all the anti-competitive subsidies they gave to Boeing and GM), we are the ones who have to adapt.

So, you have a file, which on your Unix/Linux system looks like
/home/me/mountDir/fichié.txt.
Or, does it really ?

Try the following :
- open a console window on your Linux system
- enter the command "locale -a", and find 2 result lines like :
fr_FR.iso8859-1
fr_FR.utf8
(or something similar, the point being to have one looking like it contains 8859-1 and the other looking like it contains "utf8").

- now enter "export LC_CTYPE=fr_FR.iso8859-1"
(adapt this in function of what you found above with locale -a)

- now enter "ls -l /home/me/mountDir/"
How does the filename look like ?

- now enter "export LC_CTYPE=fr_FR.utf8"
(adapt this in function of what you found above with locale -a)

- now enter "ls -l /home/me/mountDir/" again
How does the filename look like now ?

I would bet the file name looks different.

Now go to your Windows systems, open the Windows Explorer, and look at what this filename loks like. Then on your Windows system, open a command window, navigate to the same directory, do a "dir", and look at what the filename loks like.
A difference, also ?

Why is that ?
The filename itself did not change in the directory of your Windows system.

But the name of that file is going to "look" different, depending on how many "layers" of software there are between that directory entry and the process that uses that filename, and on the settings of each of these layers.

The above are simple cases, involving just a few layers : the original directory, the CIFS filesystem drivers on your Linux machine, the "ls" program itself, and the display interface between that program and your console. Now you add Java and Tomcat on top of that, and you add HTTP, and you add URI encoding/decoding, and you add the browser, and you add the encoding of your html pages.

In other words, give it up.


I mount this shared directory on my Linux system with the following
command:
mount -t cifs -o iocharset=utf8 //IpWindows/SHAREDDIR /home/me/mountDir/
In a standalone Java application running on my Linux system, I can
create a FileInputStream from the file located in the remote directory
like this:

String mountPath = "/home/me/mountDir";
File[] list = new File(mountPath).listFiles();
File file = list[0];
try {
   FileInputStream fStream = new FileInputStream(file);
}
catch (FileNotFoundException e) {
   e.printStackTrace();
}

Can you have your standalone Java program print the following information:

1. The full path of the file
2. The values for these system properties:
   a. file.encoding
   b. sun.jnu.encoding

When I execute the same code in a servlet running on the same machine,
the call to FileInputStream constructor always throws a
FileNotFountException because it  doesn't recognize the "é" character in
the path of the file.

Please post the above values within your servlet environment, too.

Are you sure that it's because of the é, or is it because the user that
Tomcat is running under does not have permission to read that file?
Under what user /is/ Tomcat running?

Since I don't know what the problem is I have had a hard time tracking
down a solution online. I especialy take care to follow all steps
described in the FAQ/CharacterEncoding parts of wiki. Here is my
configuration:

I set URIEncoding in my port 8080 connector to UTF-8 (I use this port to
execute my servlet)
<Connector port="8080" protocol="HTTP/1.1"
  connectionTimeout="20000"
  redirectPort="8443"
  URIEncoding="UTF-8"
  useBodyEncodingForURI="true" />

None of these settings matter. These are only relevant for HTTP
communication, and your code is not reading anything from the request.

I use a filter to set the default encoding to UTF-8 and my first line of
my doFilter method is
request.setCharacterEncoding("UTF-8");

Your filter sets /what/ default encoding? What does it set it to?

Setting the encoding of the request will not affect your code above.

I add in my servlet the set of content-type for responses to UTF-8 and
my first line of my doGet method is
response.setContentType("text/html;charset=UTF-8");

This will also have no effect.

My tomcat is started with CATALINA_OPTS=-Dfile.encoding=UTF-8

Okay. Let's see what your command-line program reports for
file.encoding, etc.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqyZxQACgkQ9CaO5/Lv0PArBACdGM53y+0/2L1lkf3gvngXpnAz
8D8An3pjgMT4jBOk6jg+zRNEXGORzJ1G
=v9Bf
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to