Re: UTF-8 case conversion

sigfrid . lundberg Wed, 03 Sep 2003 06:19:07 -0700

On Wed, 3 Sep 2003, Jarkko Hietaniemi wrote:

> > > use Encode 'from_to';
> > >
> > > my $orjan = '�RJAN';
> > > my $lundstrom = 'LUNDSTR�M';
> > >
> > > print $orjan . ' ' . $lundstrom . "\n";
> > >
> > > from_to $orjan,'latin1','utf-8';
> > > from_to  $lundstrom,'latin1','utf-8';
> >
> > It is my understanding that from_to is the wrong thing to use here. The
>
> Your understanding is correct.


It was me that didn't understand ;)

> > - you obtain some character data, for example by putting it literally in
> >   your script. If the script itself is in utf-8, it should contain
> >   "use utf8;". If not (like your script), perl will assume ISO-8859-1.
>
> Or "use encoding 'whatever';", and Perl actually assumes whatever is
> your native encoding, be it ISO 8859-1, or -2, or CP1252, or EBCDIC,
> or whatever.
>
> >   A different source of data would be reading from a file, which is
> >   opened with the correct encoding specified (see Andreas' reply).
> >
> >   A third source would be by reading a file or a socket and obtainng raw
> >   bytes which can be interpreted as characters using decode().
>
> In this case, e.g.:
>
> $lundstrom = decode("latin-1", $lundstrom);

This starts to look like the application where I will use this stuff. I
use the university ldap server for authentication and to get some
elementary info about authors of dissertations. The LDAP server returns
the stuff in uppercase utf-8. I wan't to store them in a bibliographic
database, in a more typographically appealing. I get my data from
the Net::LDAP module. The strings doesn't seem to be decoded...

but then ...

>   from_to($data, �so-8859-1", �tf8"); #1
>   $data = decode(�so-8859-1", $data);  #2

I added

binmode STDOUT, ":utf8";

at the top, and

$data_in_my_script = decode("utf8", $data_from_LDAP);

and by that I'm a much happier man than an hour ago!

Thanks again

Sigfrid

Re: UTF-8 case conversion

Reply via email to