Hi, I'm using MSXML 4 in combination with Perl 5.8.0. I'm having an XML file with some unicode in it. Parsing the file goes OK. When trying to read a textnode with unicode something strange happens: when I open the XML file in a textviewer i can clearly see 3 separate characters. When counting the characters in the variable where i put the node I only have 1 character left. Reading this value in some unicode aware editor gives me the wrong representation. The file is UTF-8 coded, according to the xml declaration, viewing the file in IE looks good. This is what i do: $tmp = $node->selectSingleNode('ID')->{'text'} # puts the nodeValue in $tmp if ($node->selectSingleNode('ID[.="' . $tmp .'"]') { # should return true, # but returns false, because $tmp and the nodeValue aren't the same anymore }
The character I trying to read is unicode character U+2248, UTF-8 encoded as 0xe2 0x89 0x88, the character MSXML/Perl returns me is just a single character 0x98 (a tilde) anybody any idea?? Hans