Thanks for the detailed response - it was very helpful! As a follow-up, does anyone have any suggestions about optimizing a routine such as this:
sub escapeHTML { my $x = shift; $x =~ s/&/&/g; $x =~ s/</</g; ... Encode::encode("iso-8859-1", $x); } Basically I'm concerned about the overhead to constantly look up the encoder sub for every fragment of HTML I need to escape. Thanks... On 10/15/07, Juerd Waalboer <[EMAIL PROTECTED]> wrote: > E R skribis 2007-10-15 16:25 (-0500): > > 1. What is the result of Encode::encode("iso-8559-1", $x) if $x is not > > a utf8 string (i.e. Encode::is_utf8($x) returns false.) > > "utf8 string" is already confusing. It can be either one of the > following: > > 1. byte string with UTF8 encoded text > 2. Perl Unicode string that at this point in time is encoded as UTF8 > *internally* > > Encode::is_utf8 indicates that the latter is true. You should NOT have > to peek at the status of this internal flag, except for debugging perl > itself. > > Encode::encode expects a Unicode string, which can be encoded as > ISO-8859-1 or UTF8 internally. If the Unicode string is ISO-8859-1 > internally, is_utf8 returns false, and if it is UTF8 internally, it > returns true. > > This is how Encode::encode knows, again: *internally*, how to convert > the string. > > Assuming you meant 8859, not 8559, the answer to your question is: a > copy of $x is returned, because the encoding you used happens to equal > the encoding that Perl used internally. > > > 2. What is the result of $string = decode("iso-8859-1", $octets) if > > $octets is a utf8 string? > > Do not use Encode::decode on unicode strings, but use it on bytestrings > only. Every individual byte of the bytestring is seen as a single > ISO-8859-1 character, so a multi-byte UTF8 sequence will *not* be > interpreted as a single character. > > Perhaps helpful: http://tnx.nl/perlunitut,perlunifaq > -- > Met vriendelijke groet, Kind regards, Korajn salutojn, > > Juerd Waalboer: Perl hacker <[EMAIL PROTECTED]> <http://juerd.nl/sig> > Convolution: ICT solutions and consultancy <[EMAIL PROTECTED]> >