On Wed, 2005-05-04 at 13:42 -0700, Steve Quint wrote: 
>I must first profess that I'm not very smart, and reading the XML 
>"recommendation" on W3C makes my head spin.  My apologies if my 
>interpretation of this "recommendation" is wrong.

I feel that.  There's a reason I sat on this patch for _months_.  ;-)

>At 4:45 PM -0700 5/3/05, Daniel Rall wrote:
>>
>>Hi Steve, can you elaborate on this?  Both the XML RFCs and certain
>>encodings dictate what constitutes valid content, and how content must
>>be represented.  For instance, certain multi-byte characters simply
>>aren't representable in 7 bit encoding like ASCII -- the only way to
>>deliver'em through an ASCII encoding is to use another encoding which
>>can be represented in ASCII (e.g. base-64).

Re-reading this and doing some research, I see that I've mis-remembered
part of the problem, and that my description here may be (debateably)
incorrect near the end given the removal of the "ASCII-only" restriction
from the XML-RPC spec.

>>I don't follow you.  The XML-RPC spec itself used to dictate that the
>>XML payload must be ASCII.  That changed only recently.
>
>Unless I'm missing something, can't any multi-byte character be 
>represented using entity encoding.  Why is this operation reserved 
>for UTF-8 and UTF-16?

Indeed, multi-byte characters can be represented using entity encoding.
Here's the deal...

On 2002/08/19, CVS rev 1.3 of XmlWriter introduced code to entity encode
characters in the range 0x20 to 0xff, characters which are invalid as
un-encoded _XML_.  And so it was Good.

On 2002/08/20, CVS rev 1.4 of XmlWriter incorrectly changed the code
introduced in rev 1.3 to throw an exception when encountering characters
in that same range of 0x20 to 0xff, claiming that such characters were
not valid in XML-RPC <string> payloads, because at that time, XML-RPC
allowed only ASCII data for its <string> data type.  Rev 1.4 _should've_
looked more similar to the change I just committed, which disallowed
characters outside of the range of 0x20 to 0x7f, and occurred within
<string> data.

On 6/30/03, Dave Winer removed the restriction about only ASCII being
allowed in <string> payloads from the XML-RPC specification.


With the restriction on ASCII-only <string> payloads removed, do we want
to go back to the days of CVS rev 1.3, where all characters which are
not valid _XML_ are entity encoded, and no special handling is enforced
based on the XmlWriter's encoding?  (What does this mean for inter-op
with older XML-RPC clients/servers?)  Or do we -- as the code I just
checked in does -- assume that XML parser that will be receiving the
content generated by XmlWriter could be converting the data into ASCII
whenever it's declared to be ASCII?  (Again, keeping support for old
XML-RPC clients/servers in mind here.)


Reply via email to