----- Original Message ----- From: "Rune Froysa" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, November 29, 2002 11:50 AM Subject: Bug: MinML silently ignores encoding
> Unless sax.driver is set, XmlRpc will default to the MinML sax driver. > This driver silently ignores the encoding specification in the first > line of the xml file, and the default character encoding does not seem > to be utf-8. > > If this driver is supposed to still be the default driver for XmlRpc, > then I sugges that it should detect utf-8 usage and emit a warning to > save developers from having to do a lot of debugging to figure out why > their utf-8 based code does not work. > > Pythons xmlrpclib seems to default to utf-8. As far as I can see, the > Xml-RPC spec does not specify any character set. The XML-RPC character spec says that the contents of the message must be ASCII characters. The Apache XML-RPC implementation extends this spec by supporting ISO8859/1 encoding. Note that the encoding of ASCII characters is identical in UTF-8 and ISO8859/1. If you want to use non ASCII characters in a message then the best and safest way of doing so is to escape those characters with Unicode values > 127 as &#nnnn; This will maximise your chance of interoperating between XML implementations. Even so some implementations will fail when encountering these entities. I believe that these has been code committed to generate the &#nnnn; escaping in some circumstances but I'm not sure that the XML writer currently escapes all non ASCII characters. The next version of MinML will recognise and use the encoding declaration. John Wilson The Wilson Partnership http://www.wilson.co.uk
