Hi David,

Section 4.3.3 of the XML 1.0 spec, 2nd Edition [*], states

[[
It is a fatal error if an XML entity is determined (via default,
encoding declaration, or higher-level protocol) to be in a certain encoding
but contains octet sequences that are not legal in that encoding.
]]

So an option like this would run directly counter to the spec.  And would
promote non-interoperability with other XML processors, which can be
expected to reject such documents.

It's just too bad that Xerces-C was non-conformant in this respect for so
long.  It's to be regretted that this will cause some folks some pain.  But
it's pain that would be felt when the documents have to be fed to some
other processor anyway, so I guess it's just a matter of now or later...

Cheers,
Neil

[*]:  http://www.w3.org/TR/REC-xml#charencoding

Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]




|---------+---------------------------->
|         |           David Schulze    |
|         |           <[EMAIL PROTECTED]|
|         |           om>              |
|         |                            |
|         |           08/28/2003 02:30 |
|         |           PM               |
|         |           Please respond to|
|         |           xerces-c-dev     |
|         |                            |
|---------+---------------------------->
  
>---------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                    
                                                         |
  |       To:       "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>                          
                                     |
  |       cc:                                                                          
                                                         |
  |       Subject:  12436 - UTF-8 transcoder is not strict (and therefore not secure)  
     .                                                   |
  |                                                                                    
                                                         |
  |                                                                                    
                                                         |
  
>---------------------------------------------------------------------------------------------------------------------------------------------|



12436 - UTF-8 transcoder is not strict (and therefore not secure).
Xerces 2.3.0 fixed bug #12436, unfortunately my company has shipped some
XML
files that do not conform to this. (The trade-mark symbol) So newer code
that uses version 2.3.0 cannot parse these older files.  Is there a way for
me to turn off the validity checking of multi-byte sequences when using
UTF-8 that does not require modification to the XML file itself?  Perhaps
some switch I can set before beginning a parse?
Thanks for any help.
David Schulze
DeLorme Mapping
Yarmouth, Maine, USA

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to