Re: questions about encode/decode

Juerd Waalboer Mon, 15 Oct 2007 15:29:56 -0700

In mailing lists, please write your reply below quotation, and cut
quotation to the minimum required for context. Thanks!



E R skribis 2007-10-15 17:01 (-0500):
> As a follow-up, does anyone have any suggestions about optimizing a
> routine such as this: sub escapeHTML {

Probably the best optimization is to use the freely available
HTML::Entities module that comes with LWP.

>   $x =~ s/&/&amp;/g; $x =~ s/</&lt;/g;

Use a single regex, because every regex has to scan the entire string.
See HTML::Entities for inspiration if you don't want to use the module
(e.g. if you don't want the full spectrum of entities that it supports).

>   Encode::encode("iso-8859-1", $x);

It's very probably better to standardize on UTF-8 for your output. Doing
that now saves a lot of trouble when you will need it. And sooner or
later, you will.

> Basically I'm concerned about the overhead to constantly look up the
> encoder sub for every fragment of HTML I need to escape.

Encode your output once, when outputting. PerlIO layers help to automate
this and save a lot of development time:

    binmode STDOUT, ":encoding(UTF-8)";
    print $foo;  # automatically encoded!
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <[EMAIL PROTECTED]>  <http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy <[EMAIL PROTECTED]>

Re: questions about encode/decode

Reply via email to