# New Ticket Created by Christian Jaeger # Please include the string: [perl #37170] # in the subject line of all future correspondence about this issue. # <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=37170 >
This is a bug report for perl from [EMAIL PROTECTED], generated with the help of perlbug 1.35 running under perl v5.8.7. ----------------------------------------------------------------- [Please enter your report here] I'm in the process of "porting" a perl web app (fastcgi, running with -T flag) from perl 5.005_03 to current releases. I first had problems with 5.8.4: when I read in a block of data using read, about like this: use Encode; open F,"some/file/containing_utf8_text" or die $!; my $buf; read F,$buf,10,1000 or die $!; my $str= Encode::decode_utf8($buf); gave a $str which still had the utf8 byte sequences as characters (and print "utf8?: ", Encode::is_utf8($str) ? "yes" : "no", "\n"; gave "no" iirc) (I'm actually using my own wrappers around open and read, so I didn't test the exact code as above). I did narrow those down to the usage of the -T flag. I found that one of either of the following would make the decoding work correctly: - switching off tainting mode - detainting $buf before decoding it, like: $buf=~ /(.*)/s or die; my $str= Encode::decode_utf8($1); - upgrading to perl 5.8.7 (5.8.7-3 from Debian testing) "Fine, it has been fixed" I thought. But now I realized that something else still doesn't work under taint mode. Sorry that I'm a bit vague below, I'm under pressure to finish the project; please contact me if you need more information. For now I'm simply turning of taint mode. (What I'm doing is, I write a list of strings to one file, first writing the lengths of each, so that I know how to split the file contents into the strings agan when reading back in: my $d= [ list of strings or string refs ]; my $f= ... filehandle to new output file, blessed to a class which has an xprint method. my @is_utf8; for(@$d) { my $rft; my $is_utf8; # if (defined($rft=Scalar::Util::reftype($_)) and $rft eq "SCALAR") { $is_utf8= Encode::is_utf8($$_); Eile->log("reference ".($is_utf8 ? "is" : "is not")." utf8"); Encode::_utf8_off($$_) if $is_utf8; $f->xprint(pack('l',length($$_)), ($is_utf8 ? "1" : "0") ); } else { $is_utf8= Encode::is_utf8($_); Eile->log("string ".($is_utf8 ? "is" : "is not")." utf8"); Encode::_utf8_off($_) if $is_utf8; $f->xprint(pack('l',length($_)), ($is_utf8 ? "1" : "0") ); } push @is_utf8,$is_utf8; } $f->xprint(pack('l',-1),"|");# "|" is choosen arbitrarily, it's not used anywhere. for(@$d) { my $is_utf8= shift @is_utf8; my $rft; if (defined($rft=Scalar::Util::reftype($_)) and $rft eq "SCALAR") { $f->xprint($$_); Encode::_utf8_on($$_) if $is_utf8; } else { $f->xprint($_); Encode::_utf8_on($_) if $is_utf8; } } ) The problem is that sometimes Encode::is_utf8 reports false on a string, even when I know it must contain unicode characters: - the file being written to disk *does* contain utf8 sequences. - the flag being written to disk is false. (Encode::is_utf8 gave false) - the length being written into the header is too short (which means that the length builtin reported the length in unicode code points, not bytes -- how can this be if Encode::is_utf8 is false?). As I said, again switching off taint mode seems to make it work fine. (The strings being written above were coming from LWP (from HTTP get requests) -- maybe they were tainted for this reason.) Thanks for your works, Christian. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl v5.8.7: Configured by Debian Project at Thu Jun 9 00:28:22 EST 2005. Summary of my perl5 (revision 5 version 8 subversion 7) configuration: Platform: osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.7 -Dsitearch=/usr/local/lib/perl/5.8.7 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.7 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='3.3.6 (Debian 1:3.3.6-6)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.7 gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: --- @INC for perl v5.8.7: /etc/perl /usr/local/lib/perl/5.8.7 /usr/local/share/perl/5.8.7 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl /usr/local/lib/perl/5.8.4 /usr/local/share/perl/5.8.4 /usr/local/lib/perl/5.8.3 /usr/local/share/perl/5.8.3 . --- Environment for perl v5.8.7: HOME=/home/chris LANG=de_CH LANGUAGE (unset) LC_CTYPE=de_CH LC_NUMERIC=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/Gambit-C/bin:/opt/j2sdk_nb/j2sdk1.4.2/bin/:/home/chris/local/bin:/home/chris/bin:/root/local/bin:/root/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/bin/X11:/usr/local/sbin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin PERL_BADLANG (unset) SHELL=/bin/bash