> > use Encode 'from_to';
> >
> > my $orjan = '�RJAN';
> > my $lundstrom = 'LUNDSTR�M';
> >
> > print $orjan . ' ' . $lundstrom . "\n";
> >
> > from_to $orjan,'latin1','utf-8';
> > from_to $lundstrom,'latin1','utf-8';
>
> It is my understanding that from_to is the wrong thing to use here. The
Your understanding is correct.
> - you obtain some character data, for example by putting it literally in
> your script. If the script itself is in utf-8, it should contain
> "use utf8;". If not (like your script), perl will assume ISO-8859-1.
Or "use encoding 'whatever';", and Perl actually assumes whatever is
your native encoding, be it ISO 8859-1, or -2, or CP1252, or EBCDIC,
or whatever.
> A different source of data would be reading from a file, which is
> opened with the correct encoding specified (see Andreas' reply).
>
> A third source would be by reading a file or a socket and obtainng raw
> bytes which can be interpreted as characters using decode().
In this case, e.g.:
$lundstrom = decode("latin-1", $lundstrom);
> - Manipulate the data using perl string operations
>
> - Output the data to a filehandle which is opened using the correct
> encoding.
>
> The from_to function looks enticing, particularly because everyone has
> heard about perl and utf8 strings, when it's almost always the wrong
> thing to use. And perl does not use utf8, but supports unicode character
> semantics.
At least in the current Encode doc there is a section:
B<CAVEAT>: The following operations look the same but are not quite so;
from_to($data, �so-8859-1", �tf8"); #1
$data = decode(�so-8859-1", $data); #2
Both #1 and #2 make $data consist of a completely valid UTF-8 string
but only #2 turns utf8 flag on. #1 is equivalent to
$data = encode(�tf8", decode(�so-8859-1", $data));
See L</"The UTF-8 flag"> below.
> --
> Bart.
--
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen