[EMAIL PROTECTED] said: > CGI::Util has a couple functions escape() and unescape() which url encode/ > decode strings. Unfortunately I lose the utf8 flag on my scalar when I > encode then decode using those functions (see below). Should unescape() > be setting the utf8 flag? Or is there no way for unescape() to know that > it should set the utf8 flag?
Looking at the source for CGI::Util, it appears that disabling the utf8 flag is intended as a feature, not a bug: # URL-encode data sub escape { shift() if @_ > 1 and ( ref($_[0]) || (defined $_[1] && $_[0] eq $CGI::DefaultClass)); my $toencode = shift; return undef unless defined($toencode); # force bytes while preserving backward compatibility -- dankogai $toencode = pack("C*", unpack("C*", $toencode)); if ($EBCDIC) { $toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",$E2A[ord($1)])/eg; } else { $toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",ord($1))/eg; } return $toencode; } Seeing how this and the "unescape" function are set up, I would guess that there is no way for "unescape" to "know" when a given input string should be decoded as utf8 data. Only the calling app can know that, and it should apply the conversion to the output of "unescape". CGI::Util is way too "general purpose" to make assumptions about character encodings. Since Dan Kogai is a frequent contributor to this list, he might have more to say on this. David Graff