On Wed, 2005-05-04 at 13:42 -0700, Steve Quint wrote: >I must first profess that I'm not very smart, and reading the XML >"recommendation" on W3C makes my head spin. My apologies if my >interpretation of this "recommendation" is wrong.
I feel that. There's a reason I sat on this patch for _months_. ;-) >At 4:45 PM -0700 5/3/05, Daniel Rall wrote: >> >>Hi Steve, can you elaborate on this? Both the XML RFCs and certain >>encodings dictate what constitutes valid content, and how content must >>be represented. For instance, certain multi-byte characters simply >>aren't representable in 7 bit encoding like ASCII -- the only way to >>deliver'em through an ASCII encoding is to use another encoding which >>can be represented in ASCII (e.g. base-64). Re-reading this and doing some research, I see that I've mis-remembered part of the problem, and that my description here may be (debateably) incorrect near the end given the removal of the "ASCII-only" restriction from the XML-RPC spec. >>I don't follow you. The XML-RPC spec itself used to dictate that the >>XML payload must be ASCII. That changed only recently. > >Unless I'm missing something, can't any multi-byte character be >represented using entity encoding. Why is this operation reserved >for UTF-8 and UTF-16? Indeed, multi-byte characters can be represented using entity encoding. Here's the deal... On 2002/08/19, CVS rev 1.3 of XmlWriter introduced code to entity encode characters in the range 0x20 to 0xff, characters which are invalid as un-encoded _XML_. And so it was Good. On 2002/08/20, CVS rev 1.4 of XmlWriter incorrectly changed the code introduced in rev 1.3 to throw an exception when encountering characters in that same range of 0x20 to 0xff, claiming that such characters were not valid in XML-RPC <string> payloads, because at that time, XML-RPC allowed only ASCII data for its <string> data type. Rev 1.4 _should've_ looked more similar to the change I just committed, which disallowed characters outside of the range of 0x20 to 0x7f, and occurred within <string> data. On 6/30/03, Dave Winer removed the restriction about only ASCII being allowed in <string> payloads from the XML-RPC specification. With the restriction on ASCII-only <string> payloads removed, do we want to go back to the days of CVS rev 1.3, where all characters which are not valid _XML_ are entity encoded, and no special handling is enforced based on the XmlWriter's encoding? (What does this mean for inter-op with older XML-RPC clients/servers?) Or do we -- as the code I just checked in does -- assume that XML parser that will be receiving the content generated by XmlWriter could be converting the data into ASCII whenever it's declared to be ASCII? (Again, keeping support for old XML-RPC clients/servers in mind here.)