On Jun 18, 2010, at 12:05 AM, John Delacour wrote: > In this case all talk of iso-8859-1 and cp1252 is a red herring. I read > several Italian websites where this same problem is manifest in external > material such as ads. The news page proper is encoded properly and declared > as utf-8 but I imagine the web designers have reckoned that the stuff they > receive from the advertisers is most likely to be received as windows-1252 > and convert accordingly rather than bother to verify the encoding. As a > result material that is received as utf-8 will undergo a superfluous encoding. > > Here's a way to get the file in question properly encoded:
Yep, that works for me, too. I guess XML::LibXML isn't using Encode in the same way to decode content, as it returns the string with the characters as \x{c4}\x{8d}. Thanks for the help, everyone. I've got my code parsing all my feeds and emitting a valid UTF-8 feed of its own now. Best, David