Andreas, > >>>>> On Wed, 09 Mar 2005 20:03:08 +0000, [EMAIL PROTECTED] said:
> > I don't understand what the [UTF8 "\x{c4}"] > > "\x{x4}" is valid perl notation for the Unicode character 0xc4. Yes, but it isn't UTF8. Ok I can live with the Devel Peek label being incorrect. > % perl -le ' > my $data = "\xC4"; > binmode STDOUT, ":utf8"; > print $data ; > ' | od -t x1 > 0000000 c3 84 0a > 0000003 But my data is _not_ \xC4. My data is \xC3\x84. ie valid utf-8. I expect that when I turn on the utf8 flag for that hex sequence that it is treated as utf-8. For some strange reason it is converting it to xC4, which isn't what I'd expect. I do admit to being a unicode noob, so perhaps my expectations need adjusting :) Here's the problem: I have the data in a db, it is utf-8 encoded so I get it into perl as \xC3\x84. I turn on the utf-8 flag and then output it as xml using the module XML::LibXML. The module XML::LibXML has two output methods, toFH and toString. If I generate xml using the above data and with an encoding of utf-8, I get two different files. One is correct (using toFH) the other isn't (it contains xC4, invalid utf-8). toFH does not use perl's IO, toString does. I thought, at first, that the module may be incorrect, however, when the xml created by toString is parsed in memory, it passes ok. ie the error occurs during the output. Which means the module is ok. Now, in spite of Devel::Peeks label, it seems that perl's internal data is utf-8. I am just curious as to why a :raw binmode would change the data. If indeed it is, I am after all just guessing here. John