I still believe it to be a Java input issue, although this could get tricky
because you are not creating your own InputSource and/or InputStream, you
are relying on the implementation of the parse(String) method to do it
properly.

My inclination is that the parse(String) method is creating an InputSource
object with some default encoding that is not completely compatible with the
input you're giving it (i.e. you have characters outside the range of the
default character encoding) which gives you the '?' characters.

If you know your encoding (Unicode, UTF-8, ISO-8859-1, etc.) you can try
creating the InputSource yourself and setting its encoding by using the
following methods:

        InputSource urlSource = new InputSource(url.toString());
        urlSource.setEncoding("UTF-8");
        parser.parse(urlSource);

I believe ISO-8859-1 is the encoding you're looking for where   = &nbsp
= non-breaking space
(http://www.htmlhelp.com/reference/charset/iso160-191.html).

Good luck!
Brion

-----Original Message-----
From: Dr A.C. Marshall [mailto:[EMAIL PROTECTED]
Sent: Monday, September 30, 2002 12:41 PM
To: '[EMAIL PROTECTED]'
Subject: RE:   entity appears as ?


On Mon, 30 Sep 2002, Swanson, Brion wrote:

|Have you tried explicitly setting the encoding to UTF-8?

Yes - no joy.

|
|Another problem may be in your Java code.  I had this issue a while ago
when
|reading in characters using a character stream (as opposed to a byte
|stream).  The JRE wants to convert all input in a character stream into
some
|default encoding and when it cannot determine the value of a byte, it
|replaces it with a question mark (?).

I use:

      LMLDocumentHandler myDocumentHandler = new
LMLDocumentHandler(this,url);
      DocumentHandler documentHandler = myDocumentHandler;
      parser.setDocumentHandler(documentHandler);
      LMLErrorHandler myErrorHandler = new LMLErrorHandler();
       ....
      try {
        parser.parse(url.toString());
         ,..... ETC

so theres no issues with input. Admittedly this is the old API but as I
say - everything worked OK under jserv / jdk 1.1

Could it be something to do with the character sets that the JVM (jre)
understands? And if so how do I tell it about other char sets.

Adam

|Brion Swanson
|
|-----Original Message-----
|From: Dr A.C. Marshall [mailto:[EMAIL PROTECTED]
|Sent: Monday, September 30, 2002 9:43 AM
|To: [EMAIL PROTECTED]
|Subject:   entity appears as ?
|
|
|Dear Esteemed collegues,
|
|I have been using java servlets / xerces / jserv for a while now. We
|recently switched over to tomcat and have one very odd problem - connected
|with references to   (which is defined in an entity file as  ) .
|Under jserv things worked fine - under tonmcat, xerces substitutes
|a ? whenever it encounters a    That is to say the characters()
|method of the document handler has a ? in the string where the  
|should be.
|
|I have tried other parsers, eg, aelfred, and get the same effect. Now I
|guess the change is related to us now using jdk 1.4 rather than the
|switch to tomcat. I have tried generating 1.1, 1.2, 1.3 and 1.4 target
|code but still get the ?'s!
|
|I'm sure this is a very simple problem .... but what is the solution?
|
|Adam Marshall
|

-- 
   Dr AC Marshall ([EMAIL PROTECTED]). LUSID System Programmer,
   Centre for Lifelong Learning, University of Liverpool.

   Cheese of the Millenium: Quejo con Piri Piri

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

<<application/ms-tnef>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to