Jan Luehe wrote:

Bill,


luehe       2004/07/27 17:43:17

 Modified:    coyote/src/java/org/apache/coyote Response.java
 Log:
 Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
 response header if no char encoding was specified").

 According to the Servlet 2.4 spec, calling:

   ServletResponse.setContentType("text/html");

 must yield these results:

   ServletResponse.getContentType() -> "text/html"

   Content-Type response header -> "text/html;charset=ISO-8859-1"

 Notice the absence of a charset in the result of getContentType(), but
 its presence (set to the default ISO-8859-1) in the Content-Type
 response header.

 Tomcat is currently not including the default charset in the
 Content-Type response header if no char encoding was specified.



-1.  This gets us right back to the same old problem where we are sending
back "image/gif; charset=iso-8859-1", and nobody can read the response.


yes, sorry, I had forgotten about that case.

If we're not going to assume that the UA believes that the default encoding
is iso-8859-1 (which is what we are doing now),


I think the reason the spec added the requirement to clearly identify
the encoding in all cases (when using a writer) was because many
browsers let the user choose
which encoding to apply to responses that don't declare their encoding,
which will result in data corruption if the response was encoded in
ISO-8859-1 and the user picks an incompatible encoding.

AFAIK browsers let the user choose the encoding even if it is specified.

And they do that exactly because some 'smart' servers send a wrong encoding ( like 8859-1 ) even if the content is different.

If you are using a foreign charset, your data will be either 8859-x ( with x!= 1 ) or UTF8. In any case - it will never be 8859-1 ( since the foreign characters won't exist there ). So the requirement is to basically break any foreign language.




then I'd suggest simply
doing:
setCharacterEncoding(getCharacterEncoding());
in Response.getWriter (since the spec only requires that we identify the
charset when using a Writer, and we don't really know what it is when using
OutputStream).


The problem with this is that if you call getWriter() (with your proposed fix) followed by getContentType(), the returned content type
will include a charset, which is against the spec of getContentType():


  * If no character encoding has been specified, the
  * charset parameter is omitted.

This is why we need to append the default charset to the value of the
Content-Type header, if no char encoding has been specified.


On one side it is required to identify the charset in all cases ( to not confuse browsers ), but on the other you are not allowed to specify the real encoding from the writer, if it wasn't specified :-) ?

Costin


Jan



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to