Thanks for the detailed response - it was very helpful!

As a follow-up, does anyone have any suggestions about optimizing a
routine such as this:

sub escapeHTML {
  my $x = shift;

  $x =~ s/&/&/g;
  $x =~ s/</&lt;/g;
  ...
  Encode::encode("iso-8859-1", $x);
}

Basically I'm concerned about the overhead to constantly look up the
encoder sub for every fragment of HTML I need to escape.

Thanks...


On 10/15/07, Juerd Waalboer <[EMAIL PROTECTED]> wrote:
> E R skribis 2007-10-15 16:25 (-0500):
> > 1. What is the result of Encode::encode("iso-8559-1", $x) if $x is not
> > a utf8 string (i.e. Encode::is_utf8($x) returns false.)
>
> "utf8 string" is already confusing. It can be either one of the
> following:
>
> 1. byte string with UTF8 encoded text
> 2. Perl Unicode string that at this point in time is encoded as UTF8
>    *internally*
>
> Encode::is_utf8 indicates that the latter is true. You should NOT have
> to peek at the status of this internal flag, except for debugging perl
> itself.
>
> Encode::encode expects a Unicode string, which can be encoded as
> ISO-8859-1 or UTF8 internally. If the Unicode string is ISO-8859-1
> internally, is_utf8 returns false, and if it is UTF8 internally, it
> returns true.
>
> This is how Encode::encode knows, again: *internally*, how to convert
> the string.
>
> Assuming you meant 8859, not 8559, the answer to your question is: a
> copy of $x is returned, because the encoding you used happens to equal
> the encoding that Perl used internally.
>
> > 2. What is the result of $string = decode("iso-8859-1", $octets) if
> > $octets is a utf8 string?
>
> Do not use Encode::decode on unicode strings, but use it on bytestrings
> only. Every individual byte of the bytestring is seen as a single
> ISO-8859-1 character, so a multi-byte UTF8 sequence will *not* be
> interpreted as a single character.
>
> Perhaps helpful: http://tnx.nl/perlunitut,perlunifaq
> --
> Met vriendelijke groet,  Kind regards,  Korajn salutojn,
>
>   Juerd Waalboer:  Perl hacker  <[EMAIL PROTECTED]>  <http://juerd.nl/sig>
>   Convolution:     ICT solutions and consultancy <[EMAIL PROTECTED]>
>

Reply via email to