Character encoding question

2005-08-29 Thread Richard Jones
Hi All,

I am having problems with Scandinavian characters on my system and am
attempting to isolate the problem; any help would be greatly
appreciated.

The problem manifests when forms are posted containing Scandinavian
characters.  In some cases these characters are de/encoded correctly and
in other cases they are replaced by strings of other special characters.
The system that I am working with aims to use UTF-8 encoding, and there
are meta elements on all the pages to this effect.

I have enabled the RequestDumperValve which shows that the characters
are incorrectly decoded by the time they reach this stage of processing.
I have noticed a correlation (although not necessarily a relationship)
between whether the characters are correctly decoded and whether the
request is reporting using:

contentType=text/html;charset=ISO-8859-1

or incorrectly decoded when:

contentType=text/html;charset=UTF-8

(This comes from the part of the RequestDumperValve output which is
separated from the main request part by two dashed lines).

I have tried modifying the Connector for the service by adding

URIEncoding=UTF-8

and experimented with both:


-- 
Richard
---
Richard Jones|
Overingeniør | Senior Engineer
Universitetsbiblioteket i Bergen | University of Bergen Library

e: [EMAIL PROTECTED]
t: +47 55 58 25 37

BORA: http://bora.uib.no/



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Character Encoding question

2005-08-29 Thread Richard Jones
Hi All,

I am having problems with Scandinavian characters on my system and am
attempting to isolate the problem; any help would be greatly
appreciated.

The problem manifests when forms are posted containing Scandinavian
characters.  In some cases these characters are de/encoded correctly and
in other cases they are replaced by strings of other special characters.
The system that I am working with aims to use UTF-8 encoding, and there
are meta elements on all the pages to this effect.

I have enabled the RequestDumperValve which shows that the characters
are incorrectly decoded by the time they reach this stage of processing.
I have noticed a correlation (although not necessarily a relationship)
between whether the characters are correctly decoded and whether the
request is reporting using:

contentType=text/html;charset=ISO-8859-1

or incorrectly decoded when:

contentType=text/html;charset=UTF-8

(This comes from the part of the RequestDumperValve output which is
separated from the main request part by two dashed lines).

I have tried modifying the Connector for the service by adding

URIEncoding=UTF-8

and experimented with both:

useBodyEncodingForURI=true and useBodyEncodingForURI=false

with no luck.

Does anyone have any experience or advice that might point me in the
right direction.  It is entirely possible that this problem is /not/ a
tomcat issue, but I am running out of ideas here.

PS, sorry for the previous partial post - I just discovered a new
keyboard shortcut by accident!

Best Wishes,

-- 
Richard
---
Richard Jones|
Overingeniør | Senior Engineer
Universitetsbiblioteket i Bergen | University of Bergen Library

e: [EMAIL PROTECTED]
t: +47 55 58 25 37

BORA: http://bora.uib.no/



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Character encoding question

2005-08-29 Thread Anto Paul
On 8/29/05, Richard Jones [EMAIL PROTECTED] wrote:
 Hi All,
 
 I am having problems with Scandinavian characters on my system and am
 attempting to isolate the problem; any help would be greatly
 appreciated.

This is what I did to work with Tomcat. 

a, Set up Tomcat first: To support UTF encoded data send as part of
URI one has to set the URIEncoding attribute of the coyote Connector
element in server.xml.
b, Use a filter to set the character encoding of the request before it
is processed.
public void doFilter(ServletRequest req,ServletResponse res,FilterChain chain) {
request.setCharacterEncoding(UTF-8);
chain.doFilter(req,res);
}
c, Set the following header in all JSP pages.
%@ page contentType=text/html; charset=UTF-8%

Reference:
http://www.mail-archive.com/tomcat-user@jakarta.apache.org/msg153811.html

-- 
rgds
Anto Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Character Encoding question

2005-08-29 Thread Manfred Steurer

Just a guess:
Check the format of the .jsp-files. I had similar problems and solved 
them by converting the jsp-files to UTF-8.


- Manfred


Richard Jones wrote:


Hi All,

I am having problems with Scandinavian characters on my system and am
attempting to isolate the problem; any help would be greatly
appreciated.

The problem manifests when forms are posted containing Scandinavian
characters.  In some cases these characters are de/encoded correctly and
in other cases they are replaced by strings of other special characters.
The system that I am working with aims to use UTF-8 encoding, and there
are meta elements on all the pages to this effect.

I have enabled the RequestDumperValve which shows that the characters
are incorrectly decoded by the time they reach this stage of processing.
I have noticed a correlation (although not necessarily a relationship)
between whether the characters are correctly decoded and whether the
request is reporting using:

contentType=text/html;charset=ISO-8859-1

or incorrectly decoded when:

contentType=text/html;charset=UTF-8

(This comes from the part of the RequestDumperValve output which is
separated from the main request part by two dashed lines).

I have tried modifying the Connector for the service by adding

URIEncoding=UTF-8

and experimented with both:

useBodyEncodingForURI=true and useBodyEncodingForURI=false

with no luck.

Does anyone have any experience or advice that might point me in the
right direction.  It is entirely possible that this problem is /not/ a
tomcat issue, but I am running out of ideas here.

PS, sorry for the previous partial post - I just discovered a new
keyboard shortcut by accident!

Best Wishes,

 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Character encoding question

2005-08-29 Thread Richard Jones
Hi,

  I am having problems with Scandinavian characters on my system and am
  attempting to isolate the problem; any help would be greatly
  appreciated.
 
 This is what I did to work with Tomcat. 
 
 a, Set up Tomcat first: To support UTF encoded data send as part of
 URI one has to set the URIEncoding attribute of the coyote Connector
 element in server.xml.
 b, Use a filter to set the character encoding of the request before it
 is processed.
 public void doFilter(ServletRequest req,ServletResponse res,FilterChain 
 chain) {
 request.setCharacterEncoding(UTF-8);
 chain.doFilter(req,res);
 }
 c, Set the following header in all JSP pages.
 %@ page contentType=text/html; charset=UTF-8%

I have delved into the source code, and after a minor fiddle with the
tomcat server.xml to ensure that URIEncoding is set correctly I am
certain that these three requirements are currently fulfilled, and were
so before.  I have disabled RequestDumperValve after reading something
about issues it causes with setCharacterEncoding, and instead am relying
on our own log4j debug output.

Annoyingly (or not, depending on how you look at it), I am having
trouble reproducing the problem again!  Perhaps I had fixed it with the
URIEncoding setting, only not noticed because I still had
RequestDumperValve enabled.

This problem has been strangely elusive; sometimes I have problems and
other times not, both within the same system and across other versions.
I'll keep an eye on it, and see if there's any more information I can
get out of it.  I may be back later today ;)

Thanks all for your help,

Best Wishes,

-- 
Richard
---
Richard Jones|
Overingeniør | Senior Engineer
Universitetsbiblioteket i Bergen | University of Bergen Library

e: [EMAIL PROTECTED]
t: +47 55 58 25 37

BORA: http://bora.uib.no/



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Character encoding question

2005-08-29 Thread Anto Paul
The setCharacterEncoding() must be called on the request before any
getParameter() method is invoked on it. There may be some filters that
is processing the request in the filter chain before the setCharacter
encoding filter.

Also what is the character encoding used by the OS ?. 
URIEncoding method is useful with GET methods since the parameters are
sent along with the URL. Try using POST methods everywhere and check
for problems. To be safe I am using hidden variables and sending data
as POST instead of appending it to the URL.


On 8/29/05, Richard Jones [EMAIL PROTECTED] wrote:
 Hi,
 
   I am having problems with Scandinavian characters on my system and am
   attempting to isolate the problem; any help would be greatly
   appreciated.
 
  This is what I did to work with Tomcat.
 
  a, Set up Tomcat first: To support UTF encoded data send as part of
  URI one has to set the URIEncoding attribute of the coyote Connector
  element in server.xml.
  b, Use a filter to set the character encoding of the request before it
  is processed.
  public void doFilter(ServletRequest req,ServletResponse res,FilterChain 
  chain) {
  request.setCharacterEncoding(UTF-8);
  chain.doFilter(req,res);
  }
  c, Set the following header in all JSP pages.
  %@ page contentType=text/html; charset=UTF-8%
 
 I have delved into the source code, and after a minor fiddle with the
 tomcat server.xml to ensure that URIEncoding is set correctly I am
 certain that these three requirements are currently fulfilled, and were
 so before.  I have disabled RequestDumperValve after reading something
 about issues it causes with setCharacterEncoding, and instead am relying
 on our own log4j debug output.
 
 Annoyingly (or not, depending on how you look at it), I am having
 trouble reproducing the problem again!  Perhaps I had fixed it with the
 URIEncoding setting, only not noticed because I still had
 RequestDumperValve enabled.
 
 This problem has been strangely elusive; sometimes I have problems and
 other times not, both within the same system and across other versions.
 I'll keep an eye on it, and see if there's any more information I can
 get out of it.  I may be back later today ;)
 
 Thanks all for your help,
 
 Best Wishes,
 
 --
 Richard
 ---
 Richard Jones|
 Overingeniør | Senior Engineer
 Universitetsbiblioteket i Bergen | University of Bergen Library
 
 e: [EMAIL PROTECTED]
 t: +47 55 58 25 37
 
 BORA: http://bora.uib.no/
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 
rgds
Anto Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Character encoding question

2005-08-29 Thread Richard Jones
Hi,

 The setCharacterEncoding() must be called on the request before any
 getParameter() method is invoked on it. There may be some filters that
 is processing the request in the filter chain before the setCharacter
 encoding filter.

Yeah, I think this was the problem with the RequestDumperValve.  I don't
think there are any other filters which happen before the main servlet
processing.  We have a super-servlet which does setCharacterEncoding
before the extending servlet touches anything, so it should be ok.

 Also what is the character encoding used by the OS ?. 

My /etc/sysconfig/i18n file says:

LANG=en_US.UTF-8
SUPPORTED=en_US.UTF-8:en_US:en:nb_NO.UTF-8:nb_NO:nb

Perhaps nb_NO and nb should be suffixed with .UTF-8?

 URIEncoding method is useful with GET methods since the parameters are
 sent along with the URL. Try using POST methods everywhere and check
 for problems. To be safe I am using hidden variables and sending data
 as POST instead of appending it to the URL.

I generally prefer POST also, but we are working with a community
developed package, and there are plenty of places where GET is used.
I've just spent a moment testing an area of the system where the GET
implementation was causing serious problems and this now appears to be
working correctly (now that URIEncoding is set in the Connector).

Cheers,

-- 
Richard
---
Richard Jones|
Overingeniør | Senior Engineer
Universitetsbiblioteket i Bergen | University of Bergen Library

e: [EMAIL PROTECTED]
t: +47 55 58 25 37

BORA: http://bora.uib.no/



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]