I remember XML::LibXML doing funky things with the utf8 flag -- but in
your case,
is it possible to try using a proper XML declaration?

i.e.:

    <?xml version="1.0" encoding="utf-8"?><p>Tomas ....</p>

This seems to produce the correct output for me (perl 5.12.1, LibXML 1.70)

--d

2010/6/16 David E. Wheeler <da...@kineticode.com>:
> Fellow Perlers,
>
> I'm parsing a lot of XML these days, and came upon a a Yahoo! Pipes feed that 
> appears to mangle an originating Flickr feed. But the curious thing is, when 
> I pull the offending string out of the RSS and just stick it in a script, 
> Encode knows how to decode it properly, while XML::LibXML (and my 
> Unicode-aware editors) cannot.
>
> The attached script demonstrates. $str has the bogus-looking character". 
> Encode, however, seems to properly convert it to the "č" in "Laurinavičius" 
> in the output. XML::LibXML, OTOH, outputs it as "LaurinaviÄ ius" -- that is, 
> broken. (If things look truly borked in this email too, please look at the 
> attached script.)
>
> So my question is, what gives? Is this truly a broken representation of the 
> character and Encode just figures that out and fixes it? Or is there 
> something off with my editor and with XML::LibXML.
>
> FWIW, the character looks correct in my editor when I load it from the 
> original Flickr feed. It's only after processing by Yahoo! Pipes that it 
> comes out looking mangled.
>
> Any insights would be appreciated.
>
> Best,
>
> David
>
>
>

Reply via email to