Don't all of the 5 pre-defined characters (& < > ' ") need to be encoded to
avoid problems in parsers?  I thought that was required for well-formed
XML.  For example, apostrophe (') is used for delimiting attribute values.
I do know that in our case the Xerces SAX parser threw exceptions (or just
returned errors?) if any of those 5 appeared in a value string.

Rick



|---------+---------------------------->
|         |           "John Wilson"    |
|         |           <[EMAIL PROTECTED]|
|         |           >                |
|         |                            |
|         |           02/28/2002 02:51 |
|         |           PM               |
|         |           Please respond to|
|         |           rpc-dev          |
|         |                            |
|---------+---------------------------->
  
>-----------------------------------------------------------------------------------------------------------------------|
  |                                                                                    
                                   |
  |       To:       <[EMAIL PROTECTED]>                                           
                                   |
  |       cc:                                                                          
                                   |
  |       Subject:  Re: DO NOT REPLY [Bug 6763] New:  -     XMLWriter doesn't escape 
enough characters                    |
  
>-----------------------------------------------------------------------------------------------------------------------|




[snip]

> org.apache.xmlrpc.XmlRpc$XMLWriter.chardata escapes the characters &, <,
and >
> in strings passed as arguments to execute().  If the string contains
other
> characters that are not allowed in XML, then the XmlRpcServer fails with
a
> SAXParseException on the other side of the wire.  In the example I
encountered,
> the string contained the character 0x05, which should probably be escaped
as
> &#0005;.  (I have worked around this by adding my own pass over the
argument
> strings before calling execute, but this is obviously not ideal.)

This isn't a bug. You just can't legally have a Unicode character with the
value 5 in a well formed XML document. Escaping it as &#0005; makes no
difference.

The relevant part of the spec is Section 4.1 Character and Entity
References
"Well-Formedness Constraint: Legal Character
Characters referred to using character references must match the production
for Char. "

MinML currently and erroneously allows this - I'm in process of tightening
it's checking and it will soon reject it.

John Wilson
The Wilson Partnership
http://www.wilson.co.uk







Reply via email to