Re: questions about encode/decode

Juerd Waalboer Mon, 15 Oct 2007 14:42:49 -0700

E R skribis 2007-10-15 16:25 (-0500):
> 1. What is the result of Encode::encode("iso-8559-1", $x) if $x is not
> a utf8 string (i.e. Encode::is_utf8($x) returns false.)


"utf8 string" is already confusing. It can be either one of the
following:

1. byte string with UTF8 encoded text
2. Perl Unicode string that at this point in time is encoded as UTF8
   *internally*

Encode::is_utf8 indicates that the latter is true. You should NOT have
to peek at the status of this internal flag, except for debugging perl
itself.

Encode::encode expects a Unicode string, which can be encoded as
ISO-8859-1 or UTF8 internally. If the Unicode string is ISO-8859-1
internally, is_utf8 returns false, and if it is UTF8 internally, it
returns true.

This is how Encode::encode knows, again: *internally*, how to convert
the string.

Assuming you meant 8859, not 8559, the answer to your question is: a
copy of $x is returned, because the encoding you used happens to equal
the encoding that Perl used internally.

> 2. What is the result of $string = decode("iso-8859-1", $octets) if
> $octets is a utf8 string?

Do not use Encode::decode on unicode strings, but use it on bytestrings
only. Every individual byte of the bytestring is seen as a single
ISO-8859-1 character, so a multi-byte UTF8 sequence will *not* be
interpreted as a single character.

Perhaps helpful: http://tnx.nl/perlunitut,perlunifaq
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <[EMAIL PROTECTED]>  <http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy <[EMAIL PROTECTED]>

Re: questions about encode/decode

Reply via email to