From:             justin at jwd dot co dot uk
Operating system: Windows XP
PHP version:      5.0.2
PHP Bug Type:     XML related
Bug description:  PHP DOM functions output UTF-8 encoded regardless of input 
encoding

Description:
------------
When retrieving sections of text from an HTML page using the new DOM
functions, the output is encoded using UTF-8 despite the input being
correctly detected as encoded ISO-8859-1. This means extra code in order
to convert back to the original charset of the input text. Surely the DOM
functions should either encode according to the detected input encoding or
at least provide some mechanism for setting the output encoding? Or am I
being stupid here?

Reproduce code:
---------------
<pre><?php
$xhtml= <<<HTML_END
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<head><title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"
/></head>
<body><p class="test_paragraph">Test&nbsp;Paragraph</p></body>
HTML_END;

$in=new DomDocument();
$in->loadHTML($xhtml);
$xin=new DomXpath($in);

$text=$xin->query('//[EMAIL 
PROTECTED]"test_paragraph"]/text()')->item(0)->nodeValue;

echo(htmlspecialchars($text)."\n"); // Outputs "Test Paragraph"

$text=iconv("UTF-8", "ISO-8859-1", $text);
echo(htmlspecialchars($text)."\n"); // Outputs "Test Paragraph"
?></pre>

Expected result:
----------------
Test Paragraph
Test Paragraph

Actual result:
--------------
Test Paragraph
Test Paragraph

-- 
Edit bug report at http://bugs.php.net/?id=30975&edit=1
-- 
Try a CVS snapshot (php4):   http://bugs.php.net/fix.php?id=30975&r=trysnapshot4
Try a CVS snapshot (php5.0): 
http://bugs.php.net/fix.php?id=30975&r=trysnapshot50
Try a CVS snapshot (php5.1): 
http://bugs.php.net/fix.php?id=30975&r=trysnapshot51
Fixed in CVS:                http://bugs.php.net/fix.php?id=30975&r=fixedcvs
Fixed in release:            http://bugs.php.net/fix.php?id=30975&r=alreadyfixed
Need backtrace:              http://bugs.php.net/fix.php?id=30975&r=needtrace
Need Reproduce Script:       http://bugs.php.net/fix.php?id=30975&r=needscript
Try newer version:           http://bugs.php.net/fix.php?id=30975&r=oldversion
Not developer issue:         http://bugs.php.net/fix.php?id=30975&r=support
Expected behavior:           http://bugs.php.net/fix.php?id=30975&r=notwrong
Not enough info:             
http://bugs.php.net/fix.php?id=30975&r=notenoughinfo
Submitted twice:             
http://bugs.php.net/fix.php?id=30975&r=submittedtwice
register_globals:            http://bugs.php.net/fix.php?id=30975&r=globals
PHP 3 support discontinued:  http://bugs.php.net/fix.php?id=30975&r=php3
Daylight Savings:            http://bugs.php.net/fix.php?id=30975&r=dst
IIS Stability:               http://bugs.php.net/fix.php?id=30975&r=isapi
Install GNU Sed:             http://bugs.php.net/fix.php?id=30975&r=gnused
Floating point limitations:  http://bugs.php.net/fix.php?id=30975&r=float
MySQL Configuration Error:   http://bugs.php.net/fix.php?id=30975&r=mysqlcfg

Reply via email to