Brief analysis:

- the lib tries to encode all chars outside of the ASCII range as 'XML 
character entity' when serializing

- this has the main benefit that such an xml is valid regardless of the charset 
assumed by the parser, i.e. we do not need to add a 'charset' parameter to 
either the HTTP Content-type header or the XML prologue

- it is also the best solution I could come up with to solve the long-standing 
problems with cahrset encodings (I also tried the other way round, e.g. 
explicitly stating the charset used for xml, in a private fork of the lib I use 
for personal projects, but I would rather stick with the current approach, as 
it solves the problem in a more elegant way)

- unfortunately, as I work with non-mbstring enabled installs by default, I 
assumed that internal string representation was iso-8859-1, and coded the 
xmlrpc_encode_entitites function accordingly

- I am now looking at the PHP man page for utf8_decode, and there are a few 
examples of a correct utf8-to-xmlentities functions, that might be of use

- basically, I see two options to extend the lib to make up for your problem:
  + extend the xmlrpc_encode_entitites function to take into account the 
xmlrpc_internalencoding global var, and use 2 different parsing alghoritms 
(better solution but slower)
  + add a 'workaround' solution: a class var of server/client objects that will 
prevent the escaping of non-ascii chars to take place.
  + note that both things could actually be combined...

Would you be willing to test the patches?


> -----Original Message-----
> [mailto:[EMAIL PROTECTED] Behalf Of a.h.s. boy
> (lists)
> Sent: Tuesday, November 15, 2005 12:17 AM
> To:
> Subject: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
> I'm using the XML-RPC library to retrieve calendar listing records  
> from a calendar website. Both the client and the server are 
> using the  
> latest XML-RPC library.
> Both client and server are using UTF-8 encoding all around, and I've  
> adjusted $xmlrpc_internalencoding.
> Some of the calendar entries are in Japanese, input with UTF-8  
> encoding, and displayed on the site with UTF-8 encoding. (See http:// 
> If I make an XMLRPC request to retrieve some Japanese entries, the  
> library chokes and returns an "Invalid token" error. After 
> what seems  
> like 90 hours of debugging (checking the strings and arrays at  
> various stages of encoding and parsing), I tracked the problem down  
> to the default case of xmlrpc_encode_entitites()
> default:
>     if ($code < 32 || $code > 159)
>        $character = ("&#".strval($code).";");
> If I simply comment out that code, leaving a blank default case, the  
> XML is now valid and parses (and displays) exactly as expected. I  
> have NOT debugged the code to the extent where I can tell exactly  
> what character's entity reference might be the exact cause of the  
>'s all complicated by the fact that I don't read  
> Japanese, so debugging is that much harder.
> Any idea why the entity conversion is causing the XML to become  
> invalid? Is it feasible to leave off the
> There's an example page at 
> index.php with debugging turned on, but it'll only be valid 
> for today  
> (11/14/05 -0500), after which time the Japanese entry will no longer  
> be part of the results. But I'd be happy to reproduce the problem  
> upon request.
> Cheers,
> spud.
> -------------------------------------------------------------------
> a.h.s. boy
> spud(at)            "as yes is to if,love is to yes"
> -------------------------------------------------------------------
> _______________________________________________
> phpxmlrpc mailing list
phpxmlrpc mailing list

Reply via email to