2008/9/12 André Warnier <[EMAIL PROTECTED]> > Konstantin Kolinko wrote: > >> 2008/9/12 André Warnier <[EMAIL PROTECTED]>: >> >>> Caldarale, Charles R wrote: >>> >>>> I'm not sure these days what the "normal web character set" really is. >>>> If >>>> you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling >>>> symbol is not present. However, for any of the ISO-8859-x variants, it >>>> is >>>> present, using the 163 (0xA3) value you noted (same as the Unicode code >>>> point). It's also in UTF-8 of course, but requires two bytes (0xC2 >>>> 0xA3) to >>>> represent the code point. >>>> >>>> I love these discussions about character sets. They seem to confuse so >>> many >>> people; even I, who have been involved in them for 30 years... >>> >>> Anyway, I have a related question, which I don't think constitutes a >>> hijack >>> of this thread, because the underlying cause is probably similar. >>> Here it goes : >>> >>> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) >>> The above Tomcat's running under the same Linux or Solaris, essentially >>> set >>> up the same way. The JVM may vary, but I don't think that is the problem, >>> because of the consistency of the problem as explained below. >>> I am running a webapp from an external supplier, always the same binary >>> version. I don't have the code, can't see what's in it. >>> The pages served by that webapp are the same html pages, all of them >>> having >>> a declaration <meta http-equiv="Content-Type" content="text/html; >>> charset=iso-8859-1">. >>> The pages also *are* properly encoded as iso-8859-1 (100% positive, I >>> know >>> the difference). >>> The browser receiving the pages is always the same one, same settings. >>> >>> Now, >>> >>> case a) >>> in the Tomcat startup files, I do nothing, meaning I just take Tomcat >>> out-of-the-box and run the webapp. >>> Result : in any such html page that contains characters with an ISO-8859 >>> codepoint above \xA0 (meaning the displayable characters of the "high" >>> part >>> of the table, where one finds things like "uppercase A with umlaut"), >>> these >>> characters >>> - appear in the browser display as "?" (minus the quotes) >>> - also if I save the page from the browser to disk, and look at them >>> with >>> an iso-8859-1 capable editor, they are effectively "?". >>> (So it's not the browser misunderstanding them, it is Tomcat sending them >>> that way). >>> >>> case b) >>> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or >>> even >>> in /etc/init.d/tomcat5.5), I add the following line >>> LC_CTYPE="en_us.iso88591" >>> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) >>> (before the actual start of Tomcat) >>> and restart Tomcat >>> then the same page displays properly in the browser, and also is correct >>> iso-8859-1 when saved to disk and examined with the editor. >>> (In other words, what previously were "?" characters, are now the correct >>> iso-8859-1 character bytes). >>> >>> Now my question is : >>> How can it matter which LC_CTYPE Tomcat is started under, that would have >>> the result above ? >>> The behaviour above is consistent across different hosts, across the same >>> or >>> different Tomcat versions, it is always the same webapp, always the same >>> html pages, always the same browser, etc. Only that LC_CTYPE line >>> changes >>> the behaviour. >>> On the face of it, the only thing I can think of that would explain this, >>> is >>> that the webapp in question does something wrong, but what exactly could >>> it >>> be doing ? >>> Any ideas ? >>> >>> >> It is <[EMAIL PROTECTED] pageEncoding="..." %> that is missing from those >> pages. >> Thus JSP compiler does not know what encoding they are using for their >> source and messes them at compilation time. >> > [...] > > But these pages, as far as Tomcat and the webapp are concerned, are not > dynamic > in any way. They are straight static html pages. > So is the JSP stuff relevant ? > (I'm genuinely asking, since I know nothing about JSP pages) > > The static HTML pages, as well as all the other static files, are served by the DefaultServlet. You should dig there. I think that fileEncoding initialization parameter of the servlet, as well as <mime-mapping> settings in web.xml come into play.
JSP settings are irrelevant for them, of course. Best regards, Konstantin Kolinko