On Wed, 3 Sep 2003, Jarkko Hietaniemi wrote:
> > > use Encode 'from_to';
> > >
> > > my $orjan = '�RJAN';
> > > my $lundstrom = 'LUNDSTR�M';
> > >
> > > print $orjan . ' ' . $lundstrom . "\n";
> > >
> > > from_to $orjan,'latin1','utf-8';
> > > from_to $lundstrom,'latin1','utf-8';
> >
> > It is my understanding that from_to is the wrong thing to use here. The
>
> Your understanding is correct.
It was me that didn't understand ;)
> > - you obtain some character data, for example by putting it literally in
> > your script. If the script itself is in utf-8, it should contain
> > "use utf8;". If not (like your script), perl will assume ISO-8859-1.
>
> Or "use encoding 'whatever';", and Perl actually assumes whatever is
> your native encoding, be it ISO 8859-1, or -2, or CP1252, or EBCDIC,
> or whatever.
>
> > A different source of data would be reading from a file, which is
> > opened with the correct encoding specified (see Andreas' reply).
> >
> > A third source would be by reading a file or a socket and obtainng raw
> > bytes which can be interpreted as characters using decode().
>
> In this case, e.g.:
>
> $lundstrom = decode("latin-1", $lundstrom);
This starts to look like the application where I will use this stuff. I
use the university ldap server for authentication and to get some
elementary info about authors of dissertations. The LDAP server returns
the stuff in uppercase utf-8. I wan't to store them in a bibliographic
database, in a more typographically appealing. I get my data from
the Net::LDAP module. The strings doesn't seem to be decoded...
but then ...
> from_to($data, �so-8859-1", �tf8"); #1
> $data = decode(�so-8859-1", $data); #2
I added
binmode STDOUT, ":utf8";
at the top, and
$data_in_my_script = decode("utf8", $data_from_LDAP);
and by that I'm a much happier man than an hour ago!
Thanks again
Sigfrid