From: justin at jwd dot co dot uk Operating system: Windows XP PHP version: 5.0.2 PHP Bug Type: XML related Bug description: PHP DOM functions output UTF-8 encoded regardless of input encoding
Description: ------------ When retrieving sections of text from an HTML page using the new DOM functions, the output is encoded using UTF-8 despite the input being correctly detected as encoded ISO-8859-1. This means extra code in order to convert back to the original charset of the input text. Surely the DOM functions should either encode according to the detected input encoding or at least provide some mechanism for setting the output encoding? Or am I being stupid here? Reproduce code: --------------- <pre><?php $xhtml= <<<HTML_END <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /></head> <body><p class="test_paragraph">Test Paragraph</p></body> HTML_END; $in=new DomDocument(); $in->loadHTML($xhtml); $xin=new DomXpath($in); $text=$xin->query('//[EMAIL PROTECTED]"test_paragraph"]/text()')->item(0)->nodeValue; echo(htmlspecialchars($text)."\n"); // Outputs "Test Paragraph" $text=iconv("UTF-8", "ISO-8859-1", $text); echo(htmlspecialchars($text)."\n"); // Outputs "Test Paragraph" ?></pre> Expected result: ---------------- Test Paragraph Test Paragraph Actual result: -------------- Test Paragraph Test Paragraph -- Edit bug report at http://bugs.php.net/?id=30975&edit=1 -- Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=30975&r=trysnapshot4 Try a CVS snapshot (php5.0): http://bugs.php.net/fix.php?id=30975&r=trysnapshot50 Try a CVS snapshot (php5.1): http://bugs.php.net/fix.php?id=30975&r=trysnapshot51 Fixed in CVS: http://bugs.php.net/fix.php?id=30975&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=30975&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=30975&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=30975&r=needscript Try newer version: http://bugs.php.net/fix.php?id=30975&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=30975&r=support Expected behavior: http://bugs.php.net/fix.php?id=30975&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=30975&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=30975&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=30975&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=30975&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=30975&r=dst IIS Stability: http://bugs.php.net/fix.php?id=30975&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=30975&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=30975&r=float MySQL Configuration Error: http://bugs.php.net/fix.php?id=30975&r=mysqlcfg