I haven't used the encoding options in TT, but I've read the related code. Also, I'm not an encoding expert, just curious enough about this to play around a bit. With those disclaimers, here's what I think.
The symptoms as they appear to me: The first UTF-8 string does not have the utf8 flag set. It is then being encode()'d as UTF-8. encode() and decode() are only called in a few places in the 2.19 libs (according to grep). The only place that encode() is called is Template/Filters.pm: uri_filter() and url_filter(). But if the same templates are being used in both tests, then I don't think filtering shouldn't apply. For reference, the only place that decode() is called is Template::Provider::_decode_unicode(), which will: Do nothing if the content is already flagged "utf8" Encode::decode(...) if the content has a supported BOM Encode::decode($tt->{ENCODING}) if that option is set Perl could also encode at output time if it sees a utf8-flagged value, or if the output handle is using the ":utf8" output layer. This output layer can be set in TT ala: $tt->process($infile, $vars, $outfile, binmode => ':utf8') So, if any part of the output has been decoded from Unicode, then the entire output will be encoded as though it were Unicode. You might check your error logs for "Wide character in print". Here are some tests I ran: # Show that we can send the string unmodified $ perl -e ' $s = "\x{e2}\x{80}\x{9c}"; print $s ' | hexdump -C e2 80 9c # Encode string at output by setting the ":utf8" output layer $ perl -e ' binmode(STDOUT, ":utf8"); $s = "\x{e2}\x{80}\x{9c}"; print $s ' | hexdump -C c3 a2 c2 80 c2 9c # Show that we can send decoded string unmodified $ perl -MEncode -e ' $s = "\x{e2}\x{80}\x{9c}"; $d = Encode::decode("UTF-8", $s); print $d ' | hexdump -C Wide character in print at -e line 1. e2 80 9c # Fix output warning by setting the ":utf8" output layer $ perl -MEncode -e ' binmode(STDOUT, ":utf8"); $s = "\x{e2}\x{80}\x{9c}"; $d = Encode::decode("UTF-8", $s); print $d ' | hexdump -C e2 80 9c # Show perl encoding $s because $d has the "utf-8" flag set $ perl -MEncode -e ' $s = "\x{e2}\x{80}\x{9c}"; $d = Encode::decode("UTF-8", $s); print "$s $d" ' | hexdump -C Wide character in print at -e line 1. c3 a2 c2 80 c2 9c 20 e2 80 9c ^^ # the space _/ # Fix output warning by setting the ":utf8" output layer $ perl -MEncode -e ' binmode(STDOUT, ":utf8"); $s = "\x{e2}\x{80}\x{9c}"; $d = Encode::decode("UTF-8", $s); print "$s $d" ' | hexdump -C c3 a2 c2 80 c2 9c 20 e2 80 9c ^^ # the space _/ --- Rodney Broom ----- Original Message ----- From: "Clinton Gormley" <[EMAIL PROTECTED] > To: "Todd Freeman" <[EMAIL PROTECTED]> Cc: <templates@template-toolkit.org> Sent: Friday, February 15, 2008 03:25 Subject: Re: [Templates] Apache::Template, Plugins and UTF-8 OH MY! > Hi Todd > > I'm not sure if this is the problem, but it may be at least PART of the > problem. > > As I understand it, TT either needs to be told explicitly to use UTF8 in > the process() call, or it needs a UTF8-BOM at the beginning of the > template to know that it should treat the template (and the included > UTF8 data) as UTF8. > > Basically, I got some fairly funky renderings until I added the BOM. > > I have a little script which I use to add the BOM (once and only once) > to all of my templates - may be of use to you (works in linux, will need > a few changes to work on Windows): > > > #!/usr/bin/perl > use strict; > use warnings FATAL => 'all'; > > our $root = '/PATH/TO/TEMPLATES'; > our $bom = "\x{EF}\x{BB}\x{BF}"; > > process_dir($root); > $| = 1; > > sub process_dir { > my $dir = shift; > my @files = glob( $dir . "/*" ); > foreach my $file (@files) { > if ( -f $file && $file =~ /\.tt$/ ) { > process_file($file); > } > elsif ( -d $file && $file !~ m|/\.svn| ) { > process_dir($file); > } > } > } > > sub process_file { > my $name = my $file = shift; > $name =~ s/^$root//; > print sprintf( "Processing : %-50s", $name ); > local ( *FH, $/ ); > open( FH, '<:bytes', $file ) > or die "can't open $file: $!"; > my $a = <FH>; > close FH; > > my $b = $a; > $a =~ s/$bom//g; > $a = $bom . $a; > if ( $a ne $b ) { > open( FH, '>:bytes', $file ) > or die "can't write to $file : $!"; > print FH $a; > close FH > or die "can't close file $file"; > print " ...Updated\n"; > } > else { > print "\n"; > } > } > > > > > _______________________________________________ > templates mailing list > templates@template-toolkit.org > http://mail.template-toolkit.org/mailman/listinfo/templates > _______________________________________________ templates mailing list templates@template-toolkit.org http://mail.template-toolkit.org/mailman/listinfo/templates