Lo all, I was wondering if someone could help me out with this little problem. A large part is probably down to my ignorance, anyway... I have the following small script:
#!/usr/bin/perl -w use Encode qw(is_utf8 _utf8_on encode_utf8 decode_utf8 decode encode); use Devel::Peek; my $data = "\xC3\x84"; _utf8_on($data); print 'IS: ', is_utf8($data)?1:0,"\n",'ORD: ', ord $data, "\n"; print 'LENGTH: ', length $data, "\n"; print 'PEEK: ', Dump($data); open FH1, "> file"; binmode FH1, ":raw"; print FH1 $data ; close FH1; Basically I have xC3 x84 and let perl think it is utf-8. It is valid utf-8 ie A with diaresis. This is the output and what Devel::Peek produces: IS: 1 ORD: 196 LENGTH: 1 SV = PVMG(0x80ae27c) at 0x805af24 REFCNT = 1 FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x8051118 "\303\204"\0 [UTF8 "\x{c4}"] CUR = 2 LEN = 3 MAGIC = 0x804ee78 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 1 I don't understand what the [UTF8 "\x{c4}"] is telling me. xc4 is not valid utf-8. It is however valid unicode as xc4 is a precomposed char. What's worse is that the output file contains xc4 and not the utf-8 sequence I expected. Could one of you kind souls give me some clue please? Thx John here's my perl -V Summary of my perl5 (revision 5 version 8 subversion 6) configuration: Platform: osname=linux, osvers=2.6.10, archname=i686-linux-ld uname='linux silent-running 2.6.10 #1 sat feb 19 23:23:07 gmt 2005 i686 unknown ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=define usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -I/usr/include' ccversion='', gccversion='3.2.2', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags ='-L/usr/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.6/i686-linux-ld/CORE' cccdlflags='-fpic', lddlflags='-shared -L/usr/lib' Characteristics of this binary (from libperl): Compile-time options: USE_LONG_DOUBLE USE_LARGE_FILES Built under linux Compiled at Mar 2 2005 15:03:34 @INC: /usr/lib/perl5/5.8.6/i686-linux-ld /usr/lib/perl5/5.8.6 /usr/lib/perl5/site_perl/5.8.6/i686-linux-ld /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl .