Is there a meta tag that specifies the encoding? When loading HTML that is also used to determine the encoding. I think I need to clarify the encoding issue: I'll bet when the document is loading, the encoding is being properly detected. When working with the elements however you are getting hung up on the UTF-8 factor....

you probably do something like the following:

$myelement = getElementById('someid');
print $myelement->textContent;

That right there will output the textual content in UTF-8 (the garbled characters). It does not take into consideration the encoding used in the origional document. This is just how the xml functions work. Now...

You really need to do something like:

$text = $myelement->textContent;
print iconv("UTF-8", <output encoding>, $text);

If the encoding is in the meta tag, typically encountered as:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

If you add the content to a dom node, you do not change the encoding since the functions all work on UTF-8. The document to which the content is being added however, must be set to use the desired encoding. I am assuming you are doing what I previously explained though.

Rob


Leonidas Safran wrote:
Hello Rob,

Thanks for answering (so fast)... :-)

Remember most of the functionality - other than the saveXML(), saveHTML() functions - output using UTF-8 (which you would need to convert to what ever encoding you need).

Well I did try before loadHTML call:

$doc = new DomDocument('1.0', 'iso-8859-1');

This does nothing. loadHTML() causes a new underlying document and replaces the one you created with the new DOMDocument() call. That is only pertinant when you are manually building a document.


Maybe it's a problem that the source webpage I'm loading has no charset 
declaration. It solely uses:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html lang="de" xmlns="http://www.w3.org/1999/xhtml";>

Don't know if that has an influence...

How are you getting that output?

About the output I make, I don't use the saveHTML function because I just cut some parts 
of the source (grabbed with getElementById() and other related functions) and only need 
them, so I just "echo" them into a new document.


LS

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to