ID:               41980
 Updated by:       [EMAIL PROTECTED]
 Reported By:      borys dot forytarz at gmail dot com
-Status:           Open
+Status:           Feedback
 Bug Type:         DOM XML related
 Operating System: Linux
 PHP Version:      5.2.3
 New Comment:

Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.




Previous Comments:
------------------------------------------------------------------------

[2007-07-12 19:24:45] borys dot forytarz at gmail dot com

Description:
------------
There is a problem with DOM and encoding. I have two separate files,
one full XHTML code (DTD, head, meta, body and more contents) saved in
UTF-8. Meta declaration is UTF-8, server sends the code in UTF-8 too.
The second file is a simple file without any DTD, head, meta and body.
Saved in UTF-8 too. The problem is, when I import nodes from the second
file using importNode(), in the output there are invalid encoded
characters (those who were declared in the second file). It is strange
because as I read, DOM works in UTF-8 so there should be not such a
problem.

What is more, I was debugging the properties such as actualEncoding and
they shown me that there is UTF-8...

If it's not a bug, but I think it is, how to fix that? I can't declare
in the second file DTD, head and body elements.

Reproduce code:
---------------
$this->dom = new DOMDocument('1.0','UTF-8');
$this->dom->encoding = 'UTF-8';

$this->dom->formatOutput = self::$formatOutput;
$this->dom->preserveWhiteSpace = self::$preserveWhiteSpace;
@$this->dom->loadHtmlFile($html);

...

echo $this->dom->saveXML();

The above works well for the complete XHTML file. But when I load an
incomplete file (encoded in UTF-8) I don't see properly encoded
characters when I import nodes from the second document to the first
one.

I tried to convert the whole output with iconv() and
mb_convert_encoding() but it seems not to make any difference at all.

Expected result:
----------------
Properly encoded characters from both complete XHTML file and second
"poor" file. The second file is such as follows:

<content id="something">
   <h1>some string</h1>
</content>

Actual result:
--------------
Not properly encoded characters from between <content> tag.


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=41980&edit=1

Reply via email to