If anybody was waiting for the solution the this conundrum with bated
breath (!), the solution is that one must have the LC_* environmewnt
variables set up. Once this is done  s are handled correctly:

LC_COLLATE=en_UK
LC_CTYPE=en_UK
LC_MESSAGES=C
LC_MONETARY=en_UK
LC_NUMERIC=en_UK
LC_TIME=C

Adam



On Mon, 30 Sep 2002, Swanson, Brion wrote:

|I still believe it to be a Java input issue, although this could get tricky
|because you are not creating your own InputSource and/or InputStream, you
|are relying on the implementation of the parse(String) method to do it
|properly.
|
|My inclination is that the parse(String) method is creating an InputSource
|object with some default encoding that is not completely compatible with the
|input you're giving it (i.e. you have characters outside the range of the
|default character encoding) which gives you the '?' characters.
|
|If you know your encoding (Unicode, UTF-8, ISO-8859-1, etc.) you can try
|creating the InputSource yourself and setting its encoding by using the
|following methods:
|
|       InputSource urlSource = new InputSource(url.toString());
|       urlSource.setEncoding("UTF-8");
|       parser.parse(urlSource);
|
|I believe ISO-8859-1 is the encoding you're looking for where   = &nbsp
|= non-breaking space
|(http://www.htmlhelp.com/reference/charset/iso160-191.html).
|
|Good luck!
|Brion
|
|-----Original Message-----
|From: Dr A.C. Marshall [mailto:[EMAIL PROTECTED]
|Sent: Monday, September 30, 2002 12:41 PM
|To: '[EMAIL PROTECTED]'
|Subject: RE:   entity appears as ?
|
|
|On Mon, 30 Sep 2002, Swanson, Brion wrote:
|
||Have you tried explicitly setting the encoding to UTF-8?
|
|Yes - no joy.
|
||
||Another problem may be in your Java code.  I had this issue a while ago
|when
||reading in characters using a character stream (as opposed to a byte
||stream).  The JRE wants to convert all input in a character stream into
|some
||default encoding and when it cannot determine the value of a byte, it
||replaces it with a question mark (?).
|
|I use:
|
|      LMLDocumentHandler myDocumentHandler = new
|LMLDocumentHandler(this,url);
|      DocumentHandler documentHandler = myDocumentHandler;
|      parser.setDocumentHandler(documentHandler);
|      LMLErrorHandler myErrorHandler = new LMLErrorHandler();
|       ....
|      try {
|        parser.parse(url.toString());
|         ,..... ETC
|
|so theres no issues with input. Admittedly this is the old API but as I
|say - everything worked OK under jserv / jdk 1.1
|
|Could it be something to do with the character sets that the JVM (jre)
|understands? And if so how do I tell it about other char sets.
|
|Adam
|
||Brion Swanson
||
||-----Original Message-----
||From: Dr A.C. Marshall [mailto:[EMAIL PROTECTED]
||Sent: Monday, September 30, 2002 9:43 AM
||To: [EMAIL PROTECTED]
||Subject:   entity appears as ?
||
||
||Dear Esteemed collegues,
||
||I have been using java servlets / xerces / jserv for a while now. We
||recently switched over to tomcat and have one very odd problem - connected
||with references to   (which is defined in an entity file as  ) .
||Under jserv things worked fine - under tonmcat, xerces substitutes
||a ? whenever it encounters a    That is to say the characters()
||method of the document handler has a ? in the string where the  
||should be.
||
||I have tried other parsers, eg, aelfred, and get the same effect. Now I
||guess the change is related to us now using jdk 1.4 rather than the
||switch to tomcat. I have tried generating 1.1, 1.2, 1.3 and 1.4 target
||code but still get the ?'s!
||
||I'm sure this is a very simple problem .... but what is the solution?
||
||Adam Marshall
||
|
|

-- 
   Dr AC Marshall ([EMAIL PROTECTED]). LUSID System Programmer,
   Centre for Lifelong Learning, University of Liverpool.

   Cheese of the Millenium: Quejo con Piri Piri

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to