See inline.

----- Original Message ----- 
From: "Tony LaPaso" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, November 10, 2003 8:15 PM
Subject: TC 5.0.14 Breaks UTF-8 Content via HTTP Header


> Hi everyone,
>
> It seems a change to TC v5.0.14 may have broken the serving of UTF-8
> documents. Specifically, one of the HTTP headers seems wrong. I'd like to
> describe what I'm seeing TC v5.0.14 compared with v5.0.12.
>
> For both v5.0.12 and v5.0.14 I'm running TC as it comes "out of the box"
> i.e., with no changes to the default configurations.
>
> In both cases I tested with four browsers (IE 5, IE 6, Netscape 7.1 and
> Firebird 0.7), all on Win 2K.
>
>
> Here's What I Did
> -----------------
> In both versions of TC, I added an "em dash" character to the
> "/tomcat-docs/cgi-howto.html" documents that come with the TC
documentation.
> The UTF-8 representation for the "em dash" character is the three bytes
> 0xE28094. I also made sure both documents had the following META tag in
its
> <head>:
>
> <meta http-equiv='Content-Type' content='text/html; charset=utf-8'/>
>
> I then saved the documents as UTF-8 (without a BOM). Finally, I brought
the
> document into a hex editor to check that the em dash was properly encoded
as
> three bytes (which it was). This indicated to me that the document was
> indeed encoded as UTF-8.
>
>
> Here's What I Saw (TC v5.0.12)
> ------------------------------
> Under TC v5.0.12, everything looked great using all browsers -- the "em
> dash" was rendered correctly. I put a sniffer on the HTTP stream. The
> v5.0.12 Coyote Connector was sending this HTTP response header:
> Content-Type: text/html
>
>
> Here's What I Saw (TC v5.0.14)
> ------------------------------
> Under TC v5.0.14 the "em dash" character was rendered as *THREE SEPARATE
> CHARACTERs* (one for each byte). Moreover, putting a sniffer on the HTTP
> stream indicated the following response header was being sent by the
v5.0.14
> Coyote Connector:
> Content-Type: text/html;charset=ISO-8859-1
>
>
> Aside
> -----
> For the heck of it I re-saved the v5.0.14 UTF-8 document with a BOM
> (0xEFBBBF). Doing this made IE correctly render it but Netscape and
Firebird
> still had problems. I'm pretty sure that Unicode says the BOM is optional
> anyway.
>
>
> Conclusion (?)
> --------------
> It seems that v5.0.14 of the Coyote Connector is incorrectly sending the
> wrong response header. I'm not sure what the HTTP spec says *should* be
sent
> for the header if the document's <head> contains:

The spec says nothing about META tags.  Tomcat (correctly) treats then as
just so much output text.

>
> <meta http-equiv='Content-Type' content='text/html; charset=utf-8'/>
>
> My guess is that either the response header in v5.0.14 needs to be changed
> to:
> Content-Type: text/html;charset=UTF-8
>
> or possibly:
>
> Content-Type: text/html
>
> as it was with TC v5.0.12.
>
> Can anyone comment? Is this a TC v5.0.14 bug? It would seem to be.

It looks like a 5.0.12 bug, that was subsequently fixed :).  The 2.4
Servlet-spec clearly states:
<spec-quote version="Servlet-2.4-pfd3" section="14.2.22">
If no character encoding has been specified, ISO-8859-1
is returned.
</spec-quote>

>
> Thanks,
>
> Tony
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

This message is intended only for the use of the person(s) listed above as the 
intended recipient(s), and may contain information that is PRIVILEGED and 
CONFIDENTIAL.  If you are not an intended recipient, you may not read, copy, or 
distribute this message or any attachment. If you received this communication in 
error, please notify us immediately by e-mail and then delete all copies of this 
message and any attachments.

In addition you should be aware that ordinary (unencrypted) e-mail sent through the 
Internet is not secure. Do not send confidential or sensitive information, such as 
social security numbers, account numbers, personal identification numbers and 
passwords, to us via ordinary (unencrypted) e-mail.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to