Here's What I Did ----------------- In both versions of TC, I added an "em dash" character to the "/tomcat-docs/cgi-howto.html" documents that come with the TC documentation. The UTF-8 representation for the "em dash" character is the three bytes 0xE28094. I also made sure both documents had the following META tag in its <head>:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'/>
This constitutes a correct HTML document, with respect to the actual and announced document encoding.
Here's What I Saw (TC v5.0.14) ------------------------------ Under TC v5.0.14 the "em dash" character was rendered as *THREE SEPARATE CHARACTERs* (one for each byte). Moreover, putting a sniffer on the HTTP stream indicated the following response header was being sent by the v5.0.14 Coyote Connector: Content-Type: text/html;charset=ISO-8859-1
First of all, was that a HTML or JSP? If it was JSP, then unless you specify your page encoding in JSP Page directive, Tomcat will and should use default encoding for HTTP headers.
Secondly, what is actually sent in TC 5.0.12 case?
Conclusion (?) -------------- It seems that v5.0.14 of the Coyote Connector is incorrectly sending the wrong response header. I'm not sure what the HTTP spec says *should* be sent for the header if the document's <head> contains:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'/>
This is part of HTML specification, which lets page author circumvent the HTTP header sent by the server. All clients are invited (but not forced) to follow <meta> tags, instead of HTTP headers.
For static content, like HTML pages, you cannot specify page encoding, other than default, on the fly. For dynamic content, like JSP, you have JSP Page directive in which to do it, like this:
<%@ page info="A test page" contentType="text/html; charset=utf-8" %>
Nix.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
