RE: Discrepancies between servlets and JSP on tomcat in handling UTF-8?

Bodycombe, Andrew Mon, 25 Nov 2002 07:37:06 -0800

Interesting. I have encountered a similiar problem.

I have a servlet that connects to an XML application. The response from the
application is read using a SAX reader, and I encountered an error if the
response contained any non-ASCII characters (� and � in particular, as I am
currently working in Germany)


I did a little investigation, and found that the content type was text/xml;
charset=ISO-5591-1 and the xml tag was

<?xml version="1.0" encoding="UTF-8"?>

Now the XML I received in my servlet was ISO-5591-1 and not UTF-8, so I have
contacted the application developers to say "Please fix your application
because the XML I receive is not encoded as UTF-8, it is ISO-5591-1." 

I have a work-round, where I read the input using an ISO-8859-1
InputStreamReader, and get the SAXReader to use this as the input. This is
working fine as a temporary measure.

The original message in this thread suggests to me that this could actually
be a tomcat problem and not necessarily a problem with the application I
connect to. There is clearly a discrepancy between the "encoding" type and
the "charset", and the SAX reader is using the value of the encoding
attribute to read the text.

Is tomcat doing something with the HTTP text, possibly converting it from
UTF-8 into ISO-5591-1?

I confess, I've not tried this servlet out in other servlet containers, just
tomcat version 4.1.12, running on Windows

Andy


-----Original Message-----
From: Stephen Riek
To: Tomcat Users List
Sent: 24/11/2002 23:37
Subject: Discrepancies between servlets and JSP on tomcat in handling UTF-8
?


I have a Form which is displayed in UTF-8. The form 
contains just one editable field, namely 
<textarea name="test"></textarea>. 

   When I submit this to a JSP:
   ----------------------------
   I can extract the value of the string using,
   <%
   String s = request.getParameter("test");
   %>

   I then write the value of s to a UTF-8 file with this
   <% 
   PrintWriter o = new PrintWriter(new OutputStreamWriter(new
FileOutputStream("output.html"), "UTF-8"));
   o.write(msg);
   o.flush();
   o.close();
   %>

Opening the file "output.html" with a browser, I 
see that the original UTF-8 text is still perfectly 
intact and encoded as recognizable UTF-8.  
Hooray, it works.

   
But when I submit the form to a servlet
-----=---------------------------------
If I submit the form to a servlet, and try the same code,

   String s = request.getParameter("test");
   PrintWriter o = new PrintWriter(new OutputStreamWriter(new
FileOutputStream("output.html"), "UTF-8"));
   o.write(msg);
   o.flush();
   o.close();
   
Then the text is no longer recognizable. 
Same if I try to output 's' to the browser (after 
setting ContentTYpe to text/html;charset=uTF-8)
   
It WILL however work if I do the following:   

   String s = new
String(request.getParameter("test").getBytes("8859_1"), "UTF-8");

This is screwing with my head. 
First off, I thought that all JSP become servlets 
anyway so there should be no discrepancy between the first
and second sets of code. 

Secondly, Tomcat seems pretty inconsistent
in that "s=request.getParameter()" works in a JSP but 
not in a servlet.

Has anybody else noticed this ? 
Or can anybody account for this behaviour ? 

Thank you,

Stephen.





---------------------------------
With Yahoo! Mail you can get a bigger mailbox -- choose a size that fits
your needs

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: Discrepancies between servlets and JSP on tomcat in handling UTF-8?

Reply via email to