If anybody was waiting for the solution the this conundrum with bated
breath (!), the solution is that one must have the LC_* environmewnt
variables set up. Once this is done s are handled correctly:
LC_COLLATE=en_UK
LC_CTYPE=en_UK
LC_MESSAGES=C
LC_MONETARY=en_UK
LC_NUMERIC=en_UK
LC_TIME=C
Adam
On Mon, 30 Sep 2002, Swanson, Brion wrote:
|I still believe it to be a Java input issue, although this could get tricky
|because you are not creating your own InputSource and/or InputStream, you
|are relying on the implementation of the parse(String) method to do it
|properly.
|
|My inclination is that the parse(String) method is creating an InputSource
|object with some default encoding that is not completely compatible with the
|input you're giving it (i.e. you have characters outside the range of the
|default character encoding) which gives you the '?' characters.
|
|If you know your encoding (Unicode, UTF-8, ISO-8859-1, etc.) you can try
|creating the InputSource yourself and setting its encoding by using the
|following methods:
|
| InputSource urlSource = new InputSource(url.toString());
| urlSource.setEncoding("UTF-8");
| parser.parse(urlSource);
|
|I believe ISO-8859-1 is the encoding you're looking for where   =  
|= non-breaking space
|(http://www.htmlhelp.com/reference/charset/iso160-191.html).
|
|Good luck!
|Brion
|
|-----Original Message-----
|From: Dr A.C. Marshall [mailto:[EMAIL PROTECTED]
|Sent: Monday, September 30, 2002 12:41 PM
|To: '[EMAIL PROTECTED]'
|Subject: RE: entity appears as ?
|
|
|On Mon, 30 Sep 2002, Swanson, Brion wrote:
|
||Have you tried explicitly setting the encoding to UTF-8?
|
|Yes - no joy.
|
||
||Another problem may be in your Java code. I had this issue a while ago
|when
||reading in characters using a character stream (as opposed to a byte
||stream). The JRE wants to convert all input in a character stream into
|some
||default encoding and when it cannot determine the value of a byte, it
||replaces it with a question mark (?).
|
|I use:
|
| LMLDocumentHandler myDocumentHandler = new
|LMLDocumentHandler(this,url);
| DocumentHandler documentHandler = myDocumentHandler;
| parser.setDocumentHandler(documentHandler);
| LMLErrorHandler myErrorHandler = new LMLErrorHandler();
| ....
| try {
| parser.parse(url.toString());
| ,..... ETC
|
|so theres no issues with input. Admittedly this is the old API but as I
|say - everything worked OK under jserv / jdk 1.1
|
|Could it be something to do with the character sets that the JVM (jre)
|understands? And if so how do I tell it about other char sets.
|
|Adam
|
||Brion Swanson
||
||-----Original Message-----
||From: Dr A.C. Marshall [mailto:[EMAIL PROTECTED]
||Sent: Monday, September 30, 2002 9:43 AM
||To: [EMAIL PROTECTED]
||Subject: entity appears as ?
||
||
||Dear Esteemed collegues,
||
||I have been using java servlets / xerces / jserv for a while now. We
||recently switched over to tomcat and have one very odd problem - connected
||with references to (which is defined in an entity file as  ) .
||Under jserv things worked fine - under tonmcat, xerces substitutes
||a ? whenever it encounters a   That is to say the characters()
||method of the document handler has a ? in the string where the  
||should be.
||
||I have tried other parsers, eg, aelfred, and get the same effect. Now I
||guess the change is related to us now using jdk 1.4 rather than the
||switch to tomcat. I have tried generating 1.1, 1.2, 1.3 and 1.4 target
||code but still get the ?'s!
||
||I'm sure this is a very simple problem .... but what is the solution?
||
||Adam Marshall
||
|
|
--
Dr AC Marshall ([EMAIL PROTECTED]). LUSID System Programmer,
Centre for Lifelong Learning, University of Liverpool.
Cheese of the Millenium: Quejo con Piri Piri
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]