# New Ticket Created by  Christian Jaeger 
# Please include the string:  [perl #37170]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=37170 >



This is a bug report for perl from [EMAIL PROTECTED],
generated with the help of perlbug 1.35 running under perl v5.8.7.


-----------------------------------------------------------------
[Please enter your report here]

I'm in the process of "porting" a perl web app (fastcgi, running with
-T flag) from perl 5.005_03 to current releases.

I first had problems with 5.8.4: when I read in a block of data using
read, about like this:

 use Encode;
 open F,"some/file/containing_utf8_text" or die $!;
 my $buf;
 read F,$buf,10,1000 or die $!;
 my $str= Encode::decode_utf8($buf);

gave a $str which still had the utf8 byte sequences as characters

(and
 print "utf8?: ", Encode::is_utf8($str) ? "yes" : "no", "\n";
gave "no" iirc)

(I'm actually using my own wrappers around open and read, so I didn't
test the exact code as above).

I did narrow those down to the usage of the -T flag. I found that one
of either of the following would make the decoding work correctly:

 - switching off tainting mode
 - detainting $buf before decoding it, like:
     $buf=~ /(.*)/s or die;
     my $str= Encode::decode_utf8($1);
 - upgrading to perl 5.8.7 (5.8.7-3 from Debian testing)

"Fine, it has been fixed" I thought.

But now I realized that something else still doesn't work under taint
mode. Sorry that I'm a bit vague below, I'm under pressure to finish
the project; please contact me if you need more information. For now
I'm simply turning of taint mode.

(What I'm doing is, I write a list of strings to one file, first
writing the lengths of each, so that I know how to split the file
contents into the strings agan when reading back in:

           my $d= [ list of strings or string refs  ];
           my $f= ... filehandle to new output file, blessed to a class which 
has an xprint method.

           my @is_utf8;
           for(@$d) {
               my $rft;
               my $is_utf8;
               # 
               if (defined($rft=Scalar::Util::reftype($_)) and $rft eq 
"SCALAR") {
                   $is_utf8= Encode::is_utf8($$_);
                   Eile->log("reference ".($is_utf8 ? "is" : "is not")." utf8");
                   Encode::_utf8_off($$_) if $is_utf8;
                   $f->xprint(pack('l',length($$_)),
                              ($is_utf8 ? "1" : "0")
                             );
               } else {
                   $is_utf8= Encode::is_utf8($_);
                   Eile->log("string ".($is_utf8 ? "is" : "is not")." utf8");
                   Encode::_utf8_off($_) if $is_utf8;
                   $f->xprint(pack('l',length($_)),
                              ($is_utf8 ? "1" : "0")
                             );
               }
               push @is_utf8,$is_utf8;
           }
           $f->xprint(pack('l',-1),"|");# "|" is choosen arbitrarily, it's not 
used anywhere.
           for(@$d) {
               my $is_utf8= shift @is_utf8;
               my $rft;
               if (defined($rft=Scalar::Util::reftype($_)) and $rft eq 
"SCALAR") {
                   $f->xprint($$_);
                   Encode::_utf8_on($$_) if $is_utf8;
               } else {
                   $f->xprint($_);
                   Encode::_utf8_on($_) if $is_utf8;
               }
           }

)

The problem is that sometimes Encode::is_utf8 reports false on a
string, even when I know it must contain unicode characters:

 - the file being written to disk *does* contain utf8 sequences.
 - the flag being written to disk is false. (Encode::is_utf8 gave false)
 - the length being written into the header is too short (which
   means that the length builtin reported the length in unicode code
   points, not bytes -- how can this be if Encode::is_utf8 is false?).

As I said, again switching off taint mode seems to make it work fine.
(The strings being written above were coming from LWP (from HTTP get
requests) -- maybe they were tainted for this reason.)

Thanks for your works,
Christian.


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl v5.8.7:

Configured by Debian Project at Thu Jun  9 00:28:22 EST 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
    uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686 
gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN 
-Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr 
-Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr 
-Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 
-Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.7 
-Dsitearch=/usr/local/lib/perl/5.8.7 -Dman1dir=/usr/share/man/man1 
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl 
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib 
-Dlibperl=libperl.so.5.8.7 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN 
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN 
-fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='3.3.6 (Debian 1:3.3.6-6)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.7
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    

---
@INC for perl v5.8.7:
    /etc/perl
    /usr/local/lib/perl/5.8.7
    /usr/local/share/perl/5.8.7
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    /usr/local/lib/perl/5.8.3
    /usr/local/share/perl/5.8.3
    .

---
Environment for perl v5.8.7:
    HOME=/home/chris
    LANG=de_CH
    LANGUAGE (unset)
    LC_CTYPE=de_CH
    LC_NUMERIC=C
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=/usr/local/Gambit-C/bin:/opt/j2sdk_nb/j2sdk1.4.2/bin/:/home/chris/local/bin:/home/chris/bin:/root/local/bin:/root/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/bin/X11:/usr/local/sbin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

Reply via email to