Re: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
I grabbed a copy from CVS, but I'm in the middle of a few days of hardcode iCalendar coding, so I'm focusing on that. I'll run some tests and offer comments as soon as I have the chance. Thanks for the quick work! Cheers, spud. On Nov 16, 2005, at 11:33 AM, Gaetano Giunta wrote: OK, code checked in into CVS. Feel free to download and test it (I added a new test case for UTF-8 in testsuite, but the more testing the better). I adopted the 'convert all to ASCII' way-of-life, and modified the function xmlrpc_encode_entities() to respect the value of $GLOBALS ['xmlrpc_internalencoding']. As stated in my last post, more flexible usage patterns might make it into future releases. Right now escaping iso-8859-1 might be faster than it was previously, since I use str_replace instead of the hand-made algorithm, but escaping UTF8 will be dog slow. The lib is not built for speed anyway, if you're aiming for that the php xmlrpc extension will surely server you better. The main problem I see with that right now are: - turning xmlrpc_encode_entities() into a general charset transcoder migth make it slower for the default case operation, unless user has mbstring ON - how server and msg objs will communicate to xmlrpcval objs the desired charset for serialization (only solution I can think of: add an extra param in calls to serialize()) - xmlrpc_encode_entities() is used when serializing server-added debug info. Since that info might come at the same time from user messages, client request (at debug lvl 3) and php error messages, there is a serious risk it will be a charset pot-pourri, ie there is no sure way that it will conform to ANY charset. I wonder if using a CDATA section instead of a comment to wrap debug info might help in solving this problem. The second solution is to just base64-encode the debug info, and let the client sort it out. Of course that would break any existing client that makes usage of that undocumented info... Bye Gaetano -Original Message- From: a.h.s. boy (lists) [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 15, 2005 6:57 PM To: Gaetano Giunta Cc: phpxmlrpc@lists.usefulinc.com Subject: Re: [phpxmlrpc] xmlrpc_encode_entitites causing parse error On Nov 15, 2005, at 11:31 AM, Gaetano Giunta wrote: Very toughtful response. Man, I love cross-linguistic typos...makes great new English words: "toughtful" = "tough thoughtfulness". Brilliant. UTF-8 everywhere is fine and dandy but for 2 aspects: - in fact XML-over-http without a charset declaration SHOULD be assumed to be ISO-8859-1 (there is a RFC somewhere about that, which I cannot recall now). Hmmm. The XML 1.0 spec (http://www.w3.org/TR/2000/REC-xml-20001006) reads: Because each XML entity not accompanied by external encoding information and not in UTF-8 or UTF-16 encoding MUST begin with an XML encoding declaration, in which the first characters must be ' mechanism that is exempt from the restrictions on the text top- level type (see section 19.4.1 of HTTP 1.1 [RFC-2068]), "UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]) is also recommended. UTF-16 is supported by all conforming XML processors [REC-XML]. Since the handling of CR, LF and NUL for text types in most MIME applications would cause undesired transformations of individual octets in UTF-16 multi-octet characters, gateways from HTTP to these MIME applications MUST transform the XML entity from a text/xml; charset="utf-16" to application/xml; charset="utf-16". Conformant with [RFC-2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii". In cases where the XML entity is transmitted via HTTP, the default charset value is still "us-ascii". ...which implies that us-ascii, not iso-8859-1, is the default (but not really a problem if you're encoding everything outside of ASCII). But I know that my RDFParser class, for example, defaults to "utf-8" and overrides that only if the encoding is specified as something else in the xml delaration. I assume I made that decision for good reasons, though I don't remember them now! Still, the number of factors affecting encoding and transmission are unbelievably complex. In my software, for example, there is: 1) Page encoding used when users submit data via a form (mine: UTF-8) a) Default charset header sent by Apache (mine: UTF-8) b) Default charset set in META tags (mine: UTF-8) c) Charset setting of client browser (no control!) 2) Encoding of database (mine: MySQL 3.x, so limited to ISO-8859-1) 3) Encoding of page used to display data (Irrelevant to XML-RPC transfers, but 1a,1b,1c apply) 4) PHP
RE: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
OK, code checked in into CVS. Feel free to download and test it (I added a new test case for UTF-8 in testsuite, but the more testing the better). I adopted the 'convert all to ASCII' way-of-life, and modified the function xmlrpc_encode_entities() to respect the value of $GLOBALS['xmlrpc_internalencoding']. As stated in my last post, more flexible usage patterns might make it into future releases. Right now escaping iso-8859-1 might be faster than it was previously, since I use str_replace instead of the hand-made algorithm, but escaping UTF8 will be dog slow. The lib is not built for speed anyway, if you're aiming for that the php xmlrpc extension will surely server you better. The main problem I see with that right now are: - turning xmlrpc_encode_entities() into a general charset transcoder migth make it slower for the default case operation, unless user has mbstring ON - how server and msg objs will communicate to xmlrpcval objs the desired charset for serialization (only solution I can think of: add an extra param in calls to serialize()) - xmlrpc_encode_entities() is used when serializing server-added debug info. Since that info might come at the same time from user messages, client request (at debug lvl 3) and php error messages, there is a serious risk it will be a charset pot-pourri, ie there is no sure way that it will conform to ANY charset. I wonder if using a CDATA section instead of a comment to wrap debug info might help in solving this problem. The second solution is to just base64-encode the debug info, and let the client sort it out. Of course that would break any existing client that makes usage of that undocumented info... Bye Gaetano > -Original Message- > From: a.h.s. boy (lists) [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 15, 2005 6:57 PM > To: Gaetano Giunta > Cc: phpxmlrpc@lists.usefulinc.com > Subject: Re: [phpxmlrpc] xmlrpc_encode_entitites causing parse error > > > On Nov 15, 2005, at 11:31 AM, Gaetano Giunta wrote: > > > Very toughtful response. > > Man, I love cross-linguistic typos...makes great new English words: > "toughtful" = "tough thoughtfulness". Brilliant. > > > UTF-8 everywhere is fine and dandy but for 2 aspects: > > > > - in fact XML-over-http without a charset declaration SHOULD be > > assumed to be ISO-8859-1 (there is a RFC somewhere about that, > > which I cannot recall now). > > Hmmm. The XML 1.0 spec (http://www.w3.org/TR/2000/REC-xml-20001006) > reads: > > Because each XML entity not accompanied by external encoding > information and not in UTF-8 or UTF-16 encoding MUST begin with an > XML encoding declaration, in which the first characters must be ' xml', any conforming processor can detect, after two to four octets > of input, which of the following cases apply. > > RFC 2376, however, offers suggestions for XML MIME-types sent over > HTTP, but it reads (pardon the length): > > Although listed as an optional parameter, the use of the charset >parameter is STRONGLY RECOMMENDED, since this > information can be >used by XML processors to determine authoritatively > the character >encoding of the XML entity. The charset parameter can also be > used >to provide protocol-specific operations, such as charset-based >content negotiation in HTTP. "UTF-8" [RFC-2279] is the >recommended value, representing the UTF-8 charset. UTF-8 is >supported by all conforming XML processors [REC-XML]. > >If the XML entity is transmitted via HTTP, which uses > a MIME-like >mechanism that is exempt from the restrictions on the text top- >level type (see section 19.4.1 of HTTP 1.1 > [RFC-2068]), "UTF-16" >(Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]) is > also >recommended. UTF-16 is supported by all conforming XML > processors >[REC-XML]. Since the handling of CR, LF and NUL for text > types in >most MIME applications would cause undesired transformations of >individual octets in UTF-16 multi-octet characters, > gateways from >HTTP to these MIME applications MUST transform the XML entity > from >a text/xml; charset="utf-16" to application/xml; > charset="utf-16". > >Conformant with [RFC-2046], if a text/xml entity is > received with >the charset parameter omitted, MIME processors and XML > processors >MUST use the default charset value of "us-ascii". In > cases where >the XML entity is transmitted via HTTP, the default > charset value >is still "us-ascii". >
RE: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
Darn, just when I thought I had reached charset-encoding guru state, I discover I was mostly wrong. I really love to be a coder... > ... > On Nov 15, 2005, at 11:31 AM, Gaetano Giunta wrote: > > > Very toughtful response. > > Man, I love cross-linguistic typos...makes great new English words: > "toughtful" = "tough thoughtfulness". Brilliant. I can do a lot better if you wish, mixing up italian, french, english and php typos all in the same sentence ;) > > UTF-8 everywhere is fine and dandy but for 2 aspects: > > > > - in fact XML-over-http without a charset declaration SHOULD be > > assumed to be ISO-8859-1 (there is a RFC somewhere about that, > > which I cannot recall now). > > Hmmm. The XML 1.0 spec (http://www.w3.org/TR/2000/REC-xml-20001006) > reads: > > ... > > RFC 2376, however, offers suggestions for XML MIME-types sent over > HTTP, but it reads (pardon the length): > > ... OK, I'll admit I blew this one. I cannot figure outh which RFC I (mis)read that convinced me that latin-1 was the way to go for text/xml over http, but RFC 3023 is definitely THE reference on this subject. And it states that - a charset-encoding SHOULD be put in the http headers for interop's sake - when that is unavailabe, xml MUST be treated as US-ASCII (regardless of the xml prologue...) > ... > But I know that my RDFParser class, for example, defaults to "utf-8" > and overrides that only if the encoding is specified as something > else in the xml delaration. I assume I made that decision for good > reasons, though I don't remember them now! Most likely having bad sources of xml that send utf-8 stuff without declaring it explicitly. Very annoying, but quite common, at least a little while ago. > > Still, the number of factors affecting encoding and transmission are > unbelievably complex. > ... > and...ugh! Sometimes I just want to kill myself. Yup, I only had the chance to prove myself with an arabic website once. It was great fun, and source of a lot of learning, but it never went online (and the translator refused to translate single phrases as I had specced, to be put in the translation engine db, but insisted on giving me bak the 5 page translation document without hinting at any separation of paragraphs...) > > While I suppose that attempting to convert all data into us-ascii > through entity encoding gives us the "least common donominator" > solution -- make everything 7-bit! -- it obviously isn't working > perfectly. This is btw a 'road accident' not a by-design feature, and the previous situation was wrong anyway. The general solution (i.e. let the lib encode any internal charset to ascii) is a bit daunting to be coded in php, but to add the 80% case (ie utf8 to ascii) I think is quite easy. AND we are following the spec. > So perhaps any solution that simply makes it work, > regardless of whether or not it changes the use of > $xmlrpc_internalencoding, would be good. I did wonder about the > utf8_encode() function, and why you didn't simply use that > instead of > $character = ("".strval($code).";"); Won't that do all the right > work for you? Yes, provided that we added UTF-8 in the http headers. No, in the current situation. > > In any case, I think you should try to make the XMLRPC > library follow > as closely as possible the relevant spec/RFC "recommended" behavior, > and let that be your guide. > ... What I am currently thinking about is something along the lines: 1 - add support for xmlrpc_internalencoding in xmlrpc_encode_entities(), ONLY for utf-8 to ascii, ascii-to-ascii and iso-8859-1 to ascii 2 - add support for specific charset encodings into xmlrpcmsg. If left unspecified, defaults to us-ascii, as per the current behaviour. When specified, it will modify the http content-type header, and potentially save a lot of time while NOT encoding special chars into xml entities 3 - figure out wheter the response charset encoding should be left to decide to the response object or to the server. Hint: the server can make intelligent decisions based on the client's http headers (accepted-charset). Bye Gaetano___ phpxmlrpc mailing list phpxmlrpc@lists.usefulinc.com http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc
Re: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
On Nov 15, 2005, at 11:31 AM, Gaetano Giunta wrote: Very toughtful response. Man, I love cross-linguistic typos...makes great new English words: "toughtful" = "tough thoughtfulness". Brilliant. UTF-8 everywhere is fine and dandy but for 2 aspects: - in fact XML-over-http without a charset declaration SHOULD be assumed to be ISO-8859-1 (there is a RFC somewhere about that, which I cannot recall now). Hmmm. The XML 1.0 spec (http://www.w3.org/TR/2000/REC-xml-20001006) reads: Because each XML entity not accompanied by external encoding information and not in UTF-8 or UTF-16 encoding MUST begin with an XML encoding declaration, in which the first characters must be 'xml', any conforming processor can detect, after two to four octets of input, which of the following cases apply. RFC 2376, however, offers suggestions for XML MIME-types sent over HTTP, but it reads (pardon the length): Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the character encoding of the XML entity. The charset parameter can also be used to provide protocol-specific operations, such as charset-based content negotiation in HTTP. "UTF-8" [RFC-2279] is the recommended value, representing the UTF-8 charset. UTF-8 is supported by all conforming XML processors [REC-XML]. If the XML entity is transmitted via HTTP, which uses a MIME-like mechanism that is exempt from the restrictions on the text top- level type (see section 19.4.1 of HTTP 1.1 [RFC-2068]), "UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]) is also recommended. UTF-16 is supported by all conforming XML processors [REC-XML]. Since the handling of CR, LF and NUL for text types in most MIME applications would cause undesired transformations of individual octets in UTF-16 multi-octet characters, gateways from HTTP to these MIME applications MUST transform the XML entity from a text/xml; charset="utf-16" to application/xml; charset="utf-16". Conformant with [RFC-2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii". In cases where the XML entity is transmitted via HTTP, the default charset value is still "us-ascii". ...which implies that us-ascii, not iso-8859-1, is the default (but not really a problem if you're encoding everything outside of ASCII). But I know that my RDFParser class, for example, defaults to "utf-8" and overrides that only if the encoding is specified as something else in the xml delaration. I assume I made that decision for good reasons, though I don't remember them now! Still, the number of factors affecting encoding and transmission are unbelievably complex. In my software, for example, there is: 1) Page encoding used when users submit data via a form (mine: UTF-8) a) Default charset header sent by Apache (mine: UTF-8) b) Default charset set in META tags (mine: UTF-8) c) Charset setting of client browser (no control!) 2) Encoding of database (mine: MySQL 3.x, so limited to ISO-8859-1) 3) Encoding of page used to display data (Irrelevant to XML-RPC transfers, but 1a,1b,1c apply) 4) PHP internal encoding 5) XMLRPC library internal encoding 6) XML declaration charset (optional, but highly recommended by spec) 7) text/xml MIME type charset declaration (optional, mine: text/ xml;charset=utf-8) 8) application/xml MIME type charset declaration (optional) ...and since all of them could be set to different encodings, getting it all straight is a dizzying adventure. Add to that the complexity of handling things like users copying text from a Word document created in Windows-1252 and pasting into a form on a UTF-8 page, and...ugh! Sometimes I just want to kill myself. While I suppose that attempting to convert all data into us-ascii through entity encoding gives us the "least common donominator" solution -- make everything 7-bit! -- it obviously isn't working perfectly. So perhaps any solution that simply makes it work, regardless of whether or not it changes the use of $xmlrpc_internalencoding, would be good. I did wonder about the utf8_encode() function, and why you didn't simply use that instead of $character = ("".strval($code).";"); Won't that do all the right work for you? In any case, I think you should try to make the XMLRPC library follow as closely as possible the relevant spec/RFC "recommended" behavior, and let that be your guide. Adding some extra settings to client/server objects is fine, but the causal user might not be used to using those, and backward compatability is a primary concern to me. Traduced in code that would probably mean adding some hacky
RE: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
Very toughtful response. UTF-8 everywhere is fine and dandy but for 2 aspects: - in fact XML-over-http without a charset declaration SHOULD be assumed to be ISO-8859-1 (there is a RFC somewhere about that, which I cannot recall now). The xmlrpc lib got it wrong the first time around, but I never dared to cahnge the global var to a more 'correct' default, as the only benefit I imagine would have been breaking a lot of people's scripts. This basically contradicts the argument 'UTF-8' is universal: xmlrpc clients written in other languages might (correctly) make the assumption that the received xml charset is iso-8859-1 when unspecified, and dutifully choke on utf-8 characters. - unless mbstring is enabled, all PHP processing is carried out in ISO-8859-1 (of course, this does not apply to data gotten of your DB directly in UTF-8 encoding) Having said that, there is no guarantee that strings that the user gets out of his db are in fact utf-8, and sending some weird japanese charset using an utf-8 declaration is most likely wrong. Adding some extra settings to client/server objects is fine, but the causal user might not be used to using those, and backward compatability is a primary concern to me. Traduced in code that would probably mean adding some hacky stuff of the sort "object default charset preference is undefined, and while still undefined use global variable, otherwise use object preference" (doable but ugly). The though part is letting the client object communicate the desired charset encoding to the xmlrpcval object, since the responsibility of creating serialized content is left to the xmlrpcval object itself (and I'm surely not changing that fundamental assumption). I think I need a copule of days to sort out a good solution... Bye Gaetano ps: the real (only ?) advantage of using variables instead of constnts for things such as internal_encoding is that you can redefine them not inside the xmlrpc lib but just after its inclusion, eg. this way you do not have to change anything when updating... > -Original Message- > From: a.h.s. boy (lists) [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 15, 2005 4:34 PM > To: Gaetano Giunta > Cc: phpxmlrpc@lists.usefulinc.com > Subject: Re: [phpxmlrpc] xmlrpc_encode_entitites causing parse error > > > On Nov 15, 2005, at 4:11 AM, Gaetano Giunta wrote: > > > Brief analysis: > > > > - the lib tries to encode all chars outside of the ASCII range as > > 'XML character entity' when serializing > > I understand the theory, but one of the benefits to using UTF-8 in > the first place is its ability to properly render all sorts of > languages and character sets. Debugging becomes brutal when you're > staring at a huge string of HTML entities. > > > - this has the main benefit that such an xml is valid > regardless of > > the charset assumed by the parser, i.e. we do not need to add a > > 'charset' parameter to either the HTTP Content-type header or the > > XML prologue > > Well...apparently it isn't valid XML despite the lack of > charset...or > we wouldn't be having this discussion! ;-) > > > - it is also the best solution I could come up with to solve the > > long-standing problems with cahrset encodings (I also tried the > > other way round, e.g. explicitly stating the charset used for xml, > > in a private fork of the lib I use for personal projects, but I > > would rather stick with the current approach, as it solves the > > problem in a more elegant way) > > Believe me, I totally understand the issue of long-standing charset > encoding problems! I've been developing a CMS that needs to handle > multiple languages, alphabets, directionality, and XML-RPC/RSS feeds > all on the same page! Not easy, especially if your own linguistic > range is limited to English and Romance languages! > > But I'm also a fan of proper declarations...and I'd rather have an > XML feed explicitly declare its charset encoding (and work) than try > to be "universal" and fail. :-) > > I'll admit to not being fully familiar with all the XMLRPC library > code -- only enough to debug a bit -- but it appears that > $xmlrpc_internalencoding is declared as a global variable, though it > is only used in object methods. Could it be changed to be a property > of the xmlrpcmsg and xmlrpc_server classes? That way it could be set > through scripting with > > $xmlrpcmsg->set_internalencoding($foo); > > or something similar? That would be more flexible, and since you > _always_ know what the encoding is, you can send it in the XML > prologue, which is what that parameter is desig
Re: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
On Nov 15, 2005, at 4:11 AM, Gaetano Giunta wrote: Brief analysis: - the lib tries to encode all chars outside of the ASCII range as 'XML character entity' when serializing I understand the theory, but one of the benefits to using UTF-8 in the first place is its ability to properly render all sorts of languages and character sets. Debugging becomes brutal when you're staring at a huge string of HTML entities. - this has the main benefit that such an xml is valid regardless of the charset assumed by the parser, i.e. we do not need to add a 'charset' parameter to either the HTTP Content-type header or the XML prologue Well...apparently it isn't valid XML despite the lack of charset...or we wouldn't be having this discussion! ;-) - it is also the best solution I could come up with to solve the long-standing problems with cahrset encodings (I also tried the other way round, e.g. explicitly stating the charset used for xml, in a private fork of the lib I use for personal projects, but I would rather stick with the current approach, as it solves the problem in a more elegant way) Believe me, I totally understand the issue of long-standing charset encoding problems! I've been developing a CMS that needs to handle multiple languages, alphabets, directionality, and XML-RPC/RSS feeds all on the same page! Not easy, especially if your own linguistic range is limited to English and Romance languages! But I'm also a fan of proper declarations...and I'd rather have an XML feed explicitly declare its charset encoding (and work) than try to be "universal" and fail. :-) I'll admit to not being fully familiar with all the XMLRPC library code -- only enough to debug a bit -- but it appears that $xmlrpc_internalencoding is declared as a global variable, though it is only used in object methods. Could it be changed to be a property of the xmlrpcmsg and xmlrpc_server classes? That way it could be set through scripting with $xmlrpcmsg->set_internalencoding($foo); or something similar? That would be more flexible, and since you _always_ know what the encoding is, you can send it in the XML prologue, which is what that parameter is designed for anyway. - basically, I see two options to extend the lib to make up for your problem: + extend the xmlrpc_encode_entitites function to take into account the xmlrpc_internalencoding global var, and use 2 different parsing alghoritms (better solution but slower) Well...UTF-8 should only require converting "&", "<", and '"' explicitly, and the rest is assumed to be valid. So the only fork you'd need in the code is to convert additional entities for non- UTF-8 encodings. Shouldn't slow anything down...in fact, it would make UTF-8 faster, since it would skip additional processing. In fact, I may be mistaken, but it seems like older versions of the library didn't even do the entity translation...at least, in the course of my own development, I know I included some entity conversion routines to process the data _before_ I sent it to the XMLRPC library (but it may have been redundant on my part). Though I admit I do like the idea that I can pass _anything_ to the XMLRPC library and have it properly encoded for me! Would you be willing to test the patches? Absolutely...but I do think you should give some serious thought to making the internal encoding variable more scriptable so no one ever needs to hard-code changes in the script file. I hate having to remember to change the variable value whenever I upgrade the library... Cheers, spud. --- a.h.s. boy spud(at)nothingness.org"as yes is to if,love is to yes" http://www.nothingness.org/ --- ___ phpxmlrpc mailing list phpxmlrpc@lists.usefulinc.com http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc
RE: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
Brief analysis: - the lib tries to encode all chars outside of the ASCII range as 'XML character entity' when serializing - this has the main benefit that such an xml is valid regardless of the charset assumed by the parser, i.e. we do not need to add a 'charset' parameter to either the HTTP Content-type header or the XML prologue - it is also the best solution I could come up with to solve the long-standing problems with cahrset encodings (I also tried the other way round, e.g. explicitly stating the charset used for xml, in a private fork of the lib I use for personal projects, but I would rather stick with the current approach, as it solves the problem in a more elegant way) - unfortunately, as I work with non-mbstring enabled installs by default, I assumed that internal string representation was iso-8859-1, and coded the xmlrpc_encode_entitites function accordingly - I am now looking at the PHP man page for utf8_decode, and there are a few examples of a correct utf8-to-xmlentities functions, that might be of use - basically, I see two options to extend the lib to make up for your problem: + extend the xmlrpc_encode_entitites function to take into account the xmlrpc_internalencoding global var, and use 2 different parsing alghoritms (better solution but slower) + add a 'workaround' solution: a class var of server/client objects that will prevent the escaping of non-ascii chars to take place. + note that both things could actually be combined... Would you be willing to test the patches? Bye Gaetano > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of a.h.s. boy > (lists) > Sent: Tuesday, November 15, 2005 12:17 AM > To: phpxmlrpc@lists.usefulinc.com > Subject: [phpxmlrpc] xmlrpc_encode_entitites causing parse error > > > I'm using the XML-RPC library to retrieve calendar listing records > from a calendar website. Both the client and the server are > using the > latest XML-RPC library. > > Both client and server are using UTF-8 encoding all around, and I've > adjusted $xmlrpc_internalencoding. > > Some of the calendar entries are in Japanese, input with UTF-8 > encoding, and displayed on the site with UTF-8 encoding. (See http:// > www.radicalendar.org/calendar/index.php?view=month&group=imcjapan). > > If I make an XMLRPC request to retrieve some Japanese entries, the > library chokes and returns an "Invalid token" error. After > what seems > like 90 hours of debugging (checking the strings and arrays at > various stages of encoding and parsing), I tracked the problem down > to the default case of xmlrpc_encode_entitites() > > default: > if ($code < 32 || $code > 159) >$character = ("".strval($code).";"); > > If I simply comment out that code, leaving a blank default case, the > XML is now valid and parses (and displays) exactly as expected. I > have NOT debugged the code to the extent where I can tell exactly > what character's entity reference might be the exact cause of the > problem...it's all complicated by the fact that I don't read > Japanese, so debugging is that much harder. > > Any idea why the entity conversion is causing the XML to become > invalid? Is it feasible to leave off the > > There's an example page at http://dev.dadaimc.org/mod/calendar/ > index.php with debugging turned on, but it'll only be valid > for today > (11/14/05 -0500), after which time the Japanese entry will no longer > be part of the results. But I'd be happy to reproduce the problem > upon request. > > Cheers, > spud. > > > > --- > a.h.s. boy > spud(at)nothingness.org"as yes is to if,love is to yes" > http://www.nothingness.org/ > --- > > ___ > phpxmlrpc mailing list > phpxmlrpc@lists.usefulinc.com > http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc > ___ phpxmlrpc mailing list phpxmlrpc@lists.usefulinc.com http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc