Hi Ha,

In short, no.  In the Java version of SAX--which the API we support is very
closely modeled on--SAXExceptions may wrap other exceptions, and there is a
method for getting the wrapped exception.  Presumably, applications with
knowledge of the underlying implementation  can then do instanceof checks
to see what kind of exception has been wrapped.  Since we can't do that in
C++, and we wouldn't want to make the SAX API dependent on Xerces-internal
concepts like XMLException, there doesn't seem to be any  way of
duplicating this kind of thing here.

If you have any thoughts about this though, would be great to hear them.  I
wasn't involved with the project when the SAX interfaces were ported over,
so I'm hypothesizing about why the exception-wrapping portion wasn't
ported; perhaps there is some way around this that I haven't thought
of/wasn't thought of then.

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]




|---------+---------------------------->
|         |           "Huynh, Ha"      |
|         |           <[EMAIL PROTECTED]|
|         |           com>             |
|         |                            |
|         |           08/07/2003 06:54 |
|         |           PM               |
|         |           Please respond to|
|         |           xerces-c-dev     |
|         |                            |
|---------+---------------------------->
  
>---------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                    
                                                         |
  |       To:       "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>                          
                                     |
  |       cc:                                                                          
                                                         |
  |       Subject:  RE: UTFDataFormatException (bitwise AND error in XMLUTF8Transcode 
r)                                                        |
  |                                                                                    
                                                         |
  |                                                                                    
                                                         |
  
>---------------------------------------------------------------------------------------------------------------------------------------------|



Thanks Neil. You're right it doesn't work with UTF-8 encoding.  Looks like
my editor had changed the encoding when I edited the original file.

Another question.  In my ErrorHandler the following methods error, warning,
and fatalError are all passed the argument SAXParseException.
Anyway I can decipher from the SAXParseException that it was a
UTFDataFormatException.  I don't see any exposed methods to get the
exception type, although it is present in the message.
I want to direct the user to specify an encoding (possibly tell then the
appropriate encoding too).

Thanks,
Ha


-----Original Message-----
From: Neil Graham [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 07, 2003 3:45 PM
To: [EMAIL PROTECTED]
Subject: Re: UTFDataFormatException (bitwise AND error in
XMLUTF8Transcoder)



Hi Ha,

There's no doubt that the attached file is not well-formed.  The 0xb7
characters are definitely not properly encoded in UTF-8 (they'd need to be
encoded as 0xc2 0xb7 in order for the encoding to be proper (if I've done
the conversion correctly).  If no encoding declaration is specified, an XML
parser is required to treat a document as UTF-8 (unless it can determine
that it's actually UTF-16).

Note that all is well if you specify the document's encoding to be

      encoding="ISO-8859-1"

which is, I suspect, the actual encoding.  I was not able to reproduce the
behaviour you describe when the document is declared to be UTF-8:  the
parser still produced an error for me in this case.  If you continue to
observe this, please attach a test case declared to be UTF-8 that works.

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]




|---------+---------------------------->
|         |           "Huynh, Ha"      |
|         |           <[EMAIL PROTECTED]|
|         |           com>             |
|         |                            |
|         |           08/07/2003 06:12 |
|         |           PM               |
|         |           Please respond to|
|         |           xerces-c-dev     |
|         |                            |
|---------+---------------------------->

>
---------------------------------------------------------------------------
------------------------------------------------------------------|
  |
|
  |       To:       "'[EMAIL PROTECTED]'"
<[EMAIL PROTECTED]>
|
  |       cc:
|
  |       Subject:  UTFDataFormatException (bitwise AND error in
XMLUTF8Transcoder)
|
  |
|
  |
|

>
---------------------------------------------------------------------------
------------------------------------------------------------------|




I am getting a UTFDataFormatException when using the following xml doc
(attached). It appears to be complaining about the "bullet" character.
Note
the xml doc contains hidden character (LATIN A with circumflex) right
before
the bullet.

If I add the encoding="UTF-8" there is no UTFDataFormatException. However,
without specifying any encoding I get the following error.  When I trace
through the code it looks like the default encoding for xerces 2.3 is to
use
UTF-8.  The UTFDataFormatException is thrown in XMLUTF8Transcoder.cpp ln
222.
        if((gUTFByteIndicatorTest[trailingBytes] & *srcPtr) !=
gUTFByteIndicator[trailingBytes]) { throw error here}

I checked the values and
gUTFByteIndicatorTest[trailingBytes] = 0
*srcPtr = 183
gUTFByteIndicator[trailingBytes] = 0

So we should not go into this loop.  However the computation of the line:
gUTFByteIndicatorTest[trailingBytes] & *srcPtr = 128  //This should be 0.

Another observation I made was that if I were to use the xml doc without
specifying an encoding AND move the bullet character and hidden character
value to another element of the xml, this exception does not occur. Not
sure
what's going on.

Fatal Error at file C:\temp\SAXSchemaParser\Debug/personal.xml, line 1,
char
22
  Message: An exception occurred! Type:UTFDataFormatException,
Message:invalid byte 1 (¬) of a 1-byte sequence.

I am running xerces 2.3 compiled with MSVS 7.0.
Any ideas?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

#### personal.xml has been removed from this note on August 07 2003 by Neil
Graham



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to