Marco Baroni <[EMAIL PROTECTED]> writes: >Thanks for your advice... the output does look different, this time, >but it still doesn't look like utf8... (I get the same error with >recode).
> >If somebody could suggest a way to convert to another encoding, or a >better way to identify the encoding of eac page, that would also be >fine (once I have control over the encodings, I think I can find some >way to convert back to utf8 (eg, via recode). In my opinion Encode's from_to isn't a natural interface. (With from_to neither the original nor the result is in a form in which you can use perl's character semantics.) It is much better IMHO to use ->decode directly. That is use 'decode' to convert (based on 'charset=' in this case) whatever encoding source is in to Unicode. Then write Unicode using binmode :utf8 or :encoding() of your choice. If you must use from_to() then appropriate target for a :utf8 stream is to get characters into internal Unicode form: from_to($text, $charset, 'Unicode') I would prefer to use binmode STDOOUT,":utf8"; my $encoding = find_encoding($charset); my $unicode = $encoding->decode($text); print $unicode; > >Thanks again, > >Marco > >On Saturday, May 8, 2004, at 05:16 Europe/Rome, Edward Batutis wrote: > >> Marco: >> >> I think you are converting twice: >> >>> # output will be utf8 >>> binmode(STDOUT, ":utf8"); >>> ... >>> from_to($html_text,$charset,"utf8"); >>> ... >> >> Here, it will convert html_text to utf-8 again because of binmode with >> utf-8: >> >>> print "CURRENT URL $url\n$html_text\n"; >> >> I think you can just remove the binmode line and it will work. >> >>> Why do encodings always cause so much pain? >> >> I hope this helps today's pain, at least :-). >> >> Regards, >> >> =Ed >>