Brief analysis: - the lib tries to encode all chars outside of the ASCII range as 'XML character entity' when serializing
- this has the main benefit that such an xml is valid regardless of the charset assumed by the parser, i.e. we do not need to add a 'charset' parameter to either the HTTP Content-type header or the XML prologue - it is also the best solution I could come up with to solve the long-standing problems with cahrset encodings (I also tried the other way round, e.g. explicitly stating the charset used for xml, in a private fork of the lib I use for personal projects, but I would rather stick with the current approach, as it solves the problem in a more elegant way) - unfortunately, as I work with non-mbstring enabled installs by default, I assumed that internal string representation was iso-8859-1, and coded the xmlrpc_encode_entitites function accordingly - I am now looking at the PHP man page for utf8_decode, and there are a few examples of a correct utf8-to-xmlentities functions, that might be of use - basically, I see two options to extend the lib to make up for your problem: + extend the xmlrpc_encode_entitites function to take into account the xmlrpc_internalencoding global var, and use 2 different parsing alghoritms (better solution but slower) + add a 'workaround' solution: a class var of server/client objects that will prevent the escaping of non-ascii chars to take place. + note that both things could actually be combined... Would you be willing to test the patches? Bye Gaetano > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of a.h.s. boy > (lists) > Sent: Tuesday, November 15, 2005 12:17 AM > To: phpxmlrpc@lists.usefulinc.com > Subject: [phpxmlrpc] xmlrpc_encode_entitites causing parse error > > > I'm using the XML-RPC library to retrieve calendar listing records > from a calendar website. Both the client and the server are > using the > latest XML-RPC library. > > Both client and server are using UTF-8 encoding all around, and I've > adjusted $xmlrpc_internalencoding. > > Some of the calendar entries are in Japanese, input with UTF-8 > encoding, and displayed on the site with UTF-8 encoding. (See http:// > www.radicalendar.org/calendar/index.php?view=month&group=imcjapan). > > If I make an XMLRPC request to retrieve some Japanese entries, the > library chokes and returns an "Invalid token" error. After > what seems > like 90 hours of debugging (checking the strings and arrays at > various stages of encoding and parsing), I tracked the problem down > to the default case of xmlrpc_encode_entitites() > > default: > if ($code < 32 || $code > 159) > $character = ("&#".strval($code).";"); > > If I simply comment out that code, leaving a blank default case, the > XML is now valid and parses (and displays) exactly as expected. I > have NOT debugged the code to the extent where I can tell exactly > what character's entity reference might be the exact cause of the > problem...it's all complicated by the fact that I don't read > Japanese, so debugging is that much harder. > > Any idea why the entity conversion is causing the XML to become > invalid? Is it feasible to leave off the > > There's an example page at http://dev.dadaimc.org/mod/calendar/ > index.php with debugging turned on, but it'll only be valid > for today > (11/14/05 -0500), after which time the Japanese entry will no longer > be part of the results. But I'd be happy to reproduce the problem > upon request. > > Cheers, > spud. > > > > ------------------------------------------------------------------- > a.h.s. boy > spud(at)nothingness.org "as yes is to if,love is to yes" > http://www.nothingness.org/ > ------------------------------------------------------------------- > > _______________________________________________ > phpxmlrpc mailing list > phpxmlrpc@lists.usefulinc.com > http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc >
_______________________________________________ phpxmlrpc mailing list phpxmlrpc@lists.usefulinc.com http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc