A casual user won't understand that documentation... Hell, I'm not even sure I completely understand the implications of it and when to use/not use escape_html based on it... I think an example is called for, but not in the POD... Maybe in the Guide?
Issac Eric Cholet wrote: > --On Sunday, March 24, 2002 21:57:54 +0000 [EMAIL PROTECTED] wrote: > >> dougm 02/03/24 13:57:53 >> >> Modified: . Changes STATUS >> src/modules/perl Util.xs >> t/net/perl util.pl >> Log: >> Submitted by: Geoff Young <[EMAIL PROTECTED]> >> Reviewed by: dougm >> properly escape highbit chars in Apache::Utils::escape_html > > > This is uncool for those of us using a non-ASCII encoding and sending > out lots of characters with the 8th bit set, e.g. in a French page > many accented characters will be replaced by 6-byte sequences. > If I'm sending out "Content-type: text/html; charset=ISO-8859-1", > and calling escape_html to escape '<', '>' and the like, I'm going > to be serving quite a lot more bytes than before this patch. > > However escape_html () has no clue as to what the character set is, > and whether it has been correctly specified in the Content-Type. > It has also be mentionned here that escape_html is only valid for > single-byte encodings. > > So this patch does the right thing to escape the odd 8 bit char in > a mostly ASCII output, but users of other charsets should be warned > not to use it. I use HTML::Entities::encode($_[0], '<>&"') myself. > > Therefore I propose a doc patch to clear this up: > > Index: Util.pm > =================================================================== > RCS file: /home/cvs/modperl/Util/Util.pm,v > retrieving revision 1.8 > diff -u -r1.8 Util.pm > --- Util.pm 4 Mar 2000 20:55:47 -0000 1.8 > +++ Util.pm 25 Mar 2002 18:19:37 -0000 > @@ -68,6 +68,13 @@ > > my $esc = Apache::Util::escape_html($html); > > +This function is unaware of its argument's character set and encoding. > +It assumes a single-byte encoding and escapes all characters with the > +8th bit set. Do not use it with multi-byte encodings such as utf8. > +When using a single byte non-ASCII encoding such as ISO-8859-1, > +consider specifying the character set in the Content-Type header, > +and using HTML::Entities to avoid unnecessary escaping. > + > =item escape_uri > > This function replaces all unsafe characters in the $string with their > > > -- > Eric Cholet > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED]