Thanks! I will try the solution you propose, and I will let y'all know whether it works.
In the meantime, I had ``solved'' the problem by saving pages with different charset=... declarations to different output files (ofile.sjis, ofile.euc, etc.), and then using recode to convert everything to the same charset. Unfortunately, this (moving the encoding processing outside perl) seems to be what I always end up doing, when I have to deal with characters outside the latin1 range... As you said, from_to isn't a natural interface, at least for me! Regards, Marco > In my opinion Encode's from_to isn't a natural interface. > (With from_to neither the original nor the result is in a form > in which you can use perl's character semantics.) > > It is much better IMHO to use ->decode directly. > > That is use 'decode' to convert (based on 'charset=' in this case) > whatever encoding source is in to Unicode. Then write Unicode using > binmode :utf8 or :encoding() of your choice. > > If you must use from_to() then appropriate target for a :utf8 stream > is to get characters into internal Unicode form: > > from_to($text, $charset, 'Unicode') > > I would prefer to use > > binmode STDOOUT,":utf8"; > my $encoding = find_encoding($charset); > my $unicode = $encoding->decode($text); > print $unicode; > -- Marco Baroni SSLMIT, University of Bologna http://sslmit.unibo.it/~baroni