Bill Moseley wrote:
> On Wed, Jan 8, 2014 at 9:22 AM, Dave Howorth 
> <dhowo...@mrc-lmb.cam.ac.uk>wrote:
> 
>> I have some templates that generate web pages. The templates themselves
>> only contain ASCII*.
> 
> ASCII is a subset of UTF-8, so I'd use:
> 
>   ENCODING => 'UTF-8'.
> 
> Then Template::Provider will decode the templates when loading.

Hi Bill, Thanks for the detailed reply. I'd spotted this option, but the
templates aren't actually utf-8 so I figured that I really shouldn't
need it, though it's an obvious possible thing to play with to see what
effect it has. But I don't like blind guessing, so I didn't try it yet.

I now tried it and it appears to make no difference.

>> They include some data that is extracted from a
>> database and some of the data values are now Unicode UTF-8...
> 
> Then, as you mentioned in your DBIx::Class post, you need to tell the DBD
> driver that your database is in UTF-8.  Then the DBD driver will decode
> when reading.
> 
> Then you have "characters" inside of Perl.

Indeed so, and I think I had done that.

>> What is the best way to persuade TT to generate a Unicode file?
> 
> I would do something like this:
> 
> $tt->process( $template, \%vars, \$output );
> $bytes = Encode::encode_utf8( $output );
> 
> Or have $output be a filehandle with a utf-8 output layer.

This is a bit I have trouble with. My $output is just a filename, so TT
is responsible for the output and should therefore be responsible for
the encoding IMHO. The content of the file is made up by joining some
ASCII strings (the template etc) to some utf-8 strings ( the template
variables obtained from the database. AAUI, perl should automagically
make the resulting content string be a utf-8 internal string value. So
when asked to print it to a file, TT ought to encode it appropriately in
my view. Instead it outputs a file containing both utf-8 and
windows-1252 byte sequences. But obviously I'm missing some principle.

(BTW, I should have said that I'm entirely on Linux, there are no
Windows machines involved).

I hadn't tried your suggestions or adding a binmode option argument to
process() for two reasons:

(1) I'm not seeing any wide character errors, which I think I ought to
if binmode is a problem.

(2) I've got a lot of calls to process, so hacking them all is
unattractive. I'd rather like to find a single central change to get the
behaviour I want if possible.

I have now tried it and it makes things worse! The copyright symbol
seems to be double encoded and the left quote becomes <U+0091> i.e. PU1.

So I'm still completely confused. I'm obviously missing something but
I've no idea where to look.

Cheers, Dave

_______________________________________________
templates mailing list
templates@template-toolkit.org
http://mail.template-toolkit.org/mailman/listinfo/templates

Reply via email to