ID: 45996 Comment by: mike at kogan dot org Reported By: phpbugs at colin dot guthr dot ie Status: Assigned Bug Type: XML related Operating System: Mandriva Linux PHP Version: 5.2.6 Assigned To: rrichards New Comment:
I also have run into this - we had some legacy php code on the xml_parser that was fine on some centos 4 servers with php4 and 5 running apache 1.3. We've been debugging this failure for a day now on our new centos 5 server running php5 and libxml2 2.7.2, and we confirm the same problem. The characterHandler is not called for the known entities so scripts depending on this (rss feed converters etc) emit flawed html. I agree there's much better ways to parse XML but this is legacy stuff thats somewhat pervasive and we didn;t choose what these folks used for their apps. I'd love to rebuild their server with an older libxml2 but am not sure how to go backwards without causing some other problem. Customer has cpanel/whm and all that hooey and I'd rather not create a mess on their new server. Hope ya'll fix this soon as it is an issue on the cpanel folks that have 2.7.2 in their stable branch for centos 5 that is being spread by their updater. If someone can give me a pointer that a straightup build and install of the old release code wont make things worse I'll take a crack at moving their server back. Previous Comments: ------------------------------------------------------------------------ [2008-10-08 09:50:16] phpbugs at colin dot guthr dot ie Yes, I suspect that the comments left by ptn at post dot cz are incorrect when they say it is fixed in libxml. rrichards has given a very complete explanation of the problem and it is more fundamental than a simple bug. Compiling PHP with libexpat is the correct workaround for now. ------------------------------------------------------------------------ [2008-10-08 09:18:54] uraes at hot dot ee just tried libxml2-2.7.2 and 5.2.6-pl7-gentoo and it is still broken: Example PHP code: <? $data="<?xml version = '1.0' encoding = 'UTF-8'?> <rss version=\"2.0\" > <channel> <item> <description><a href="http://www.google.com">Google</a></description> </item> </channel> </rss> "; $parser = xml_parser_create('UTF-8'); xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1); xml_parse_into_struct($parser, $data, $vals, $index); xml_parser_free($parser); echo "<pre>"; echo "<b>Original XML:</b><br>".htmlentities($data); echo "<br><br><b>Parsed struct:</b><br>"; print_r($vals); ?> .. parsed result is "a href=http://www.google.com>Google/a>" ------------------------------------------------------------------------ [2008-10-07 11:19:33] ptn at post dot cz this bug seems to be fixed in libxm2-2.7.2 http://svn.gnome.org/viewvc/libxml2?view=revision&revision=3798 ------------------------------------------------------------------------ [2008-09-09 23:06:00] phpbugs at colin dot guthr dot ie Comments by Daniel Veillard on the libxml ML: The only thing I can think of is that libxml2 doesn't anymore ask though a SAX callback when looking for entities references if they are in the predefined set. This comes in essence by an old decision from the XML working group stating that user definition for those 5 entities could not override the default predefined ones. So I guess that change is logical. Now what is done on top of SAX to result in that bug, I don't really know :-\ ------------------------------------------------------------------------ [2008-09-06 15:43:29] [EMAIL PROTECTED] Assigned to the maintainer (Rob, don't forget to change status too when you assign something to yourself :) ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/45996 -- Edit this bug report at http://bugs.php.net/?id=45996&edit=1