Character encoding question
Hi All, I am having problems with Scandinavian characters on my system and am attempting to isolate the problem; any help would be greatly appreciated. The problem manifests when forms are posted containing Scandinavian characters. In some cases these characters are de/encoded correctly and in other cases they are replaced by strings of other special characters. The system that I am working with aims to use UTF-8 encoding, and there are meta elements on all the pages to this effect. I have enabled the RequestDumperValve which shows that the characters are incorrectly decoded by the time they reach this stage of processing. I have noticed a correlation (although not necessarily a relationship) between whether the characters are correctly decoded and whether the request is reporting using: contentType=text/html;charset=ISO-8859-1 or incorrectly decoded when: contentType=text/html;charset=UTF-8 (This comes from the part of the RequestDumperValve output which is separated from the main request part by two dashed lines). I have tried modifying the Connector for the service by adding URIEncoding=UTF-8 and experimented with both: -- Richard --- Richard Jones| Overingeniør | Senior Engineer Universitetsbiblioteket i Bergen | University of Bergen Library e: [EMAIL PROTECTED] t: +47 55 58 25 37 BORA: http://bora.uib.no/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Character Encoding question
Hi All, I am having problems with Scandinavian characters on my system and am attempting to isolate the problem; any help would be greatly appreciated. The problem manifests when forms are posted containing Scandinavian characters. In some cases these characters are de/encoded correctly and in other cases they are replaced by strings of other special characters. The system that I am working with aims to use UTF-8 encoding, and there are meta elements on all the pages to this effect. I have enabled the RequestDumperValve which shows that the characters are incorrectly decoded by the time they reach this stage of processing. I have noticed a correlation (although not necessarily a relationship) between whether the characters are correctly decoded and whether the request is reporting using: contentType=text/html;charset=ISO-8859-1 or incorrectly decoded when: contentType=text/html;charset=UTF-8 (This comes from the part of the RequestDumperValve output which is separated from the main request part by two dashed lines). I have tried modifying the Connector for the service by adding URIEncoding=UTF-8 and experimented with both: useBodyEncodingForURI=true and useBodyEncodingForURI=false with no luck. Does anyone have any experience or advice that might point me in the right direction. It is entirely possible that this problem is /not/ a tomcat issue, but I am running out of ideas here. PS, sorry for the previous partial post - I just discovered a new keyboard shortcut by accident! Best Wishes, -- Richard --- Richard Jones| Overingeniør | Senior Engineer Universitetsbiblioteket i Bergen | University of Bergen Library e: [EMAIL PROTECTED] t: +47 55 58 25 37 BORA: http://bora.uib.no/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding question
On 8/29/05, Richard Jones [EMAIL PROTECTED] wrote: Hi All, I am having problems with Scandinavian characters on my system and am attempting to isolate the problem; any help would be greatly appreciated. This is what I did to work with Tomcat. a, Set up Tomcat first: To support UTF encoded data send as part of URI one has to set the URIEncoding attribute of the coyote Connector element in server.xml. b, Use a filter to set the character encoding of the request before it is processed. public void doFilter(ServletRequest req,ServletResponse res,FilterChain chain) { request.setCharacterEncoding(UTF-8); chain.doFilter(req,res); } c, Set the following header in all JSP pages. %@ page contentType=text/html; charset=UTF-8% Reference: http://www.mail-archive.com/tomcat-user@jakarta.apache.org/msg153811.html -- rgds Anto Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character Encoding question
Just a guess: Check the format of the .jsp-files. I had similar problems and solved them by converting the jsp-files to UTF-8. - Manfred Richard Jones wrote: Hi All, I am having problems with Scandinavian characters on my system and am attempting to isolate the problem; any help would be greatly appreciated. The problem manifests when forms are posted containing Scandinavian characters. In some cases these characters are de/encoded correctly and in other cases they are replaced by strings of other special characters. The system that I am working with aims to use UTF-8 encoding, and there are meta elements on all the pages to this effect. I have enabled the RequestDumperValve which shows that the characters are incorrectly decoded by the time they reach this stage of processing. I have noticed a correlation (although not necessarily a relationship) between whether the characters are correctly decoded and whether the request is reporting using: contentType=text/html;charset=ISO-8859-1 or incorrectly decoded when: contentType=text/html;charset=UTF-8 (This comes from the part of the RequestDumperValve output which is separated from the main request part by two dashed lines). I have tried modifying the Connector for the service by adding URIEncoding=UTF-8 and experimented with both: useBodyEncodingForURI=true and useBodyEncodingForURI=false with no luck. Does anyone have any experience or advice that might point me in the right direction. It is entirely possible that this problem is /not/ a tomcat issue, but I am running out of ideas here. PS, sorry for the previous partial post - I just discovered a new keyboard shortcut by accident! Best Wishes, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding question
Hi, I am having problems with Scandinavian characters on my system and am attempting to isolate the problem; any help would be greatly appreciated. This is what I did to work with Tomcat. a, Set up Tomcat first: To support UTF encoded data send as part of URI one has to set the URIEncoding attribute of the coyote Connector element in server.xml. b, Use a filter to set the character encoding of the request before it is processed. public void doFilter(ServletRequest req,ServletResponse res,FilterChain chain) { request.setCharacterEncoding(UTF-8); chain.doFilter(req,res); } c, Set the following header in all JSP pages. %@ page contentType=text/html; charset=UTF-8% I have delved into the source code, and after a minor fiddle with the tomcat server.xml to ensure that URIEncoding is set correctly I am certain that these three requirements are currently fulfilled, and were so before. I have disabled RequestDumperValve after reading something about issues it causes with setCharacterEncoding, and instead am relying on our own log4j debug output. Annoyingly (or not, depending on how you look at it), I am having trouble reproducing the problem again! Perhaps I had fixed it with the URIEncoding setting, only not noticed because I still had RequestDumperValve enabled. This problem has been strangely elusive; sometimes I have problems and other times not, both within the same system and across other versions. I'll keep an eye on it, and see if there's any more information I can get out of it. I may be back later today ;) Thanks all for your help, Best Wishes, -- Richard --- Richard Jones| Overingeniør | Senior Engineer Universitetsbiblioteket i Bergen | University of Bergen Library e: [EMAIL PROTECTED] t: +47 55 58 25 37 BORA: http://bora.uib.no/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding question
The setCharacterEncoding() must be called on the request before any getParameter() method is invoked on it. There may be some filters that is processing the request in the filter chain before the setCharacter encoding filter. Also what is the character encoding used by the OS ?. URIEncoding method is useful with GET methods since the parameters are sent along with the URL. Try using POST methods everywhere and check for problems. To be safe I am using hidden variables and sending data as POST instead of appending it to the URL. On 8/29/05, Richard Jones [EMAIL PROTECTED] wrote: Hi, I am having problems with Scandinavian characters on my system and am attempting to isolate the problem; any help would be greatly appreciated. This is what I did to work with Tomcat. a, Set up Tomcat first: To support UTF encoded data send as part of URI one has to set the URIEncoding attribute of the coyote Connector element in server.xml. b, Use a filter to set the character encoding of the request before it is processed. public void doFilter(ServletRequest req,ServletResponse res,FilterChain chain) { request.setCharacterEncoding(UTF-8); chain.doFilter(req,res); } c, Set the following header in all JSP pages. %@ page contentType=text/html; charset=UTF-8% I have delved into the source code, and after a minor fiddle with the tomcat server.xml to ensure that URIEncoding is set correctly I am certain that these three requirements are currently fulfilled, and were so before. I have disabled RequestDumperValve after reading something about issues it causes with setCharacterEncoding, and instead am relying on our own log4j debug output. Annoyingly (or not, depending on how you look at it), I am having trouble reproducing the problem again! Perhaps I had fixed it with the URIEncoding setting, only not noticed because I still had RequestDumperValve enabled. This problem has been strangely elusive; sometimes I have problems and other times not, both within the same system and across other versions. I'll keep an eye on it, and see if there's any more information I can get out of it. I may be back later today ;) Thanks all for your help, Best Wishes, -- Richard --- Richard Jones| Overingeniør | Senior Engineer Universitetsbiblioteket i Bergen | University of Bergen Library e: [EMAIL PROTECTED] t: +47 55 58 25 37 BORA: http://bora.uib.no/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- rgds Anto Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding question
Hi, The setCharacterEncoding() must be called on the request before any getParameter() method is invoked on it. There may be some filters that is processing the request in the filter chain before the setCharacter encoding filter. Yeah, I think this was the problem with the RequestDumperValve. I don't think there are any other filters which happen before the main servlet processing. We have a super-servlet which does setCharacterEncoding before the extending servlet touches anything, so it should be ok. Also what is the character encoding used by the OS ?. My /etc/sysconfig/i18n file says: LANG=en_US.UTF-8 SUPPORTED=en_US.UTF-8:en_US:en:nb_NO.UTF-8:nb_NO:nb Perhaps nb_NO and nb should be suffixed with .UTF-8? URIEncoding method is useful with GET methods since the parameters are sent along with the URL. Try using POST methods everywhere and check for problems. To be safe I am using hidden variables and sending data as POST instead of appending it to the URL. I generally prefer POST also, but we are working with a community developed package, and there are plenty of places where GET is used. I've just spent a moment testing an area of the system where the GET implementation was causing serious problems and this now appears to be working correctly (now that URIEncoding is set in the Connector). Cheers, -- Richard --- Richard Jones| Overingeniør | Senior Engineer Universitetsbiblioteket i Bergen | University of Bergen Library e: [EMAIL PROTECTED] t: +47 55 58 25 37 BORA: http://bora.uib.no/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]