I solved the problem by specifying WINDOWS-1252 as the Charset where I read the imported file on the Linux host (excerpt follows). I still don't understand why this wasn't necessary on the older Tomcat/JVM, unless the decoding of ISO-8859-1 just became stricter. Apparently WINDOWS-1252 is a superset of ISO-8859-1.
File importFile; ... Reader importFileReader = new InputStreamReader(new FileInputStream(importFile),"WINDOWS-1252"); -----Original Message----- From: Tad Woods [mailto:[EMAIL PROTECTED] Sent: Monday, December 31, 2007 9:40 AM To: users@tomcat.apache.org Subject: Tomcat upgrade introduces character set problems. Upgrading from Tomcat 5.5.14 to 5.5.23 has introduced character set encoding problems. Note the JVM also changed from 1.5.0_06 to 1.5.0_11 in the upgarde. In the earlier release I was able to post, persist, and reload special characters just fine. Now with 5.5.23 some special characters are being converted to "?" or other non-displayable characters. After research and testing, I was able to solve the problem with regular form posts by doing two things: (1) ensuring that all of my pages specify content type = "text/html; charset=UTF-8" and (2) set up a servlet Filter for all url-patterns that calls request.setCharacterEncoding("UTF-8"). The outstanding encoding problem is with multipart/form-data posts. For example: I upload a text file, process it with ServletFileUpload, save it to disk, then read that file back from disk and special characters get converted to "?". I have tried to specify different character sets at various places in that process flow with no success. This is where I am stuck with testing: The Linux host's JVM default character set is US-ASCII. I have tried the content type of the HTML multipart form as UTF-8, ISO-8859-1, and US-ASCII. In the servlet Filter for the multipart post I have tried setCharacterEncoding() to the various character sets. If I call DiskFileItem.getCharSet() on the uploaded file it returns null, and the default character set for DiskFileItem is ISO-8859-1. If I download the uploaded file via FTP (not via HTTP through Tomcat) back to my Windows client the content looks fine (i.e. the special characters are there). However when I read the file inside my servlet and re-display the content via an HTTP response, the special characters turn to "?" or other non-displayable characters. In the servlet I have tried reading the file several ways, including FileReader and a FileInputStream wrapped by an InputStreamReader specifying the various character sets. To make this even more interesting (or frustrating), if I run the same tests solely in my Windows client, the multipart post works fine (as did the earlier Tomcat on the host)! The Window's clients default character set is WINDOWS-1252 (apparently a superset of ISO-8859-1). Note that the host's default character set remained US-ASCII for both versions of Tomcat, so I don't know whether that is a factor or not. Tad --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]