2008/9/12 André Warnier <[EMAIL PROTECTED]>

> Konstantin Kolinko wrote:
>
>> 2008/9/12 André Warnier <[EMAIL PROTECTED]>:
>>
>>> Caldarale, Charles R wrote:
>>>
>>>> I'm not sure these days what the "normal web character set" really is.
>>>>  If
>>>> you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
>>>> symbol is not present.  However, for any of the ISO-8859-x variants, it
>>>> is
>>>> present, using the 163 (0xA3) value you noted (same as the Unicode code
>>>> point).  It's also in UTF-8 of course, but requires two bytes (0xC2
>>>> 0xA3) to
>>>> represent the code point.
>>>>
>>>>  I love these discussions about character sets. They seem to confuse so
>>> many
>>> people; even I, who have been involved in them for 30 years...
>>>
>>> Anyway, I have a related question, which I don't think constitutes a
>>> hijack
>>> of this thread, because the underlying cause is probably similar.
>>> Here it goes :
>>>
>>> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
>>> The above Tomcat's running under the same Linux or Solaris, essentially
>>> set
>>> up the same way. The JVM may vary, but I don't think that is the problem,
>>> because of the consistency of the problem as explained below.
>>> I am running a webapp from an external supplier, always the same binary
>>> version.  I don't have the code, can't see what's in it.
>>> The pages served by that webapp are the same html pages, all of them
>>> having
>>> a declaration <meta http-equiv="Content-Type" content="text/html;
>>> charset=iso-8859-1">.
>>> The pages also *are* properly encoded as iso-8859-1 (100% positive, I
>>> know
>>> the difference).
>>> The browser receiving the pages is always the same one, same settings.
>>>
>>> Now,
>>>
>>> case a)
>>> in the Tomcat startup files, I do nothing, meaning I just take Tomcat
>>> out-of-the-box and run the webapp.
>>> Result : in any such html page that contains characters with an ISO-8859
>>> codepoint above \xA0 (meaning the displayable characters of the "high"
>>> part
>>> of the table, where one finds things like "uppercase A with umlaut"),
>>> these
>>> characters
>>>  - appear in the browser display as "?" (minus the quotes)
>>>  - also if I save the page from the browser to disk, and look at them
>>> with
>>> an iso-8859-1 capable editor, they are effectively "?".
>>> (So it's not the browser misunderstanding them, it is Tomcat sending them
>>> that way).
>>>
>>> case b)
>>> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or
>>> even
>>> in /etc/init.d/tomcat5.5), I add the following line
>>> LC_CTYPE="en_us.iso88591"
>>> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
>>> (before the actual start of Tomcat)
>>> and restart Tomcat
>>> then the same page displays properly in the browser, and also is correct
>>> iso-8859-1 when saved to disk and examined with the editor.
>>> (In other words, what previously were "?" characters, are now the correct
>>> iso-8859-1 character bytes).
>>>
>>> Now my question is :
>>> How can it matter which LC_CTYPE Tomcat is started under, that would have
>>> the result above ?
>>> The behaviour above is consistent across different hosts, across the same
>>> or
>>> different Tomcat versions, it is always the same webapp, always the same
>>> html pages, always the same browser, etc.  Only that LC_CTYPE line
>>> changes
>>> the behaviour.
>>> On the face of it, the only thing I can think of that would explain this,
>>> is
>>> that the webapp in question does something wrong, but what exactly could
>>> it
>>> be doing ?
>>> Any ideas ?
>>>
>>>
>> It is <[EMAIL PROTECTED] pageEncoding="..." %> that is missing from those 
>> pages.
>> Thus JSP compiler does not know what encoding they are using for their
>> source and messes them at compilation time.
>>
> [...]
>
> But these pages, as far as Tomcat and the webapp are concerned, are not
> dynamic
>
in any way.  They are straight static html pages.
> So is the JSP stuff relevant ?
> (I'm genuinely asking, since I know nothing about JSP pages)
>
>
The static HTML pages, as well as all the other static files, are served by
the
DefaultServlet. You should dig there. I think that fileEncoding
initialization parameter
of the servlet, as well as <mime-mapping> settings in web.xml come into
play.

JSP settings are irrelevant for them, of course.

Best regards,
Konstantin Kolinko

Reply via email to