Re: [Templates] UTF-8 and detecting encoding.

Bill Moseley Wed, 25 Jan 2006 17:06:57 -0800

On Wed, Jan 25, 2006 at 12:37:54PM -0800, Tatsuhiko Miyagawa wrote:
> 
> This is not a direct answer to your question, but my
> Template::Provider::Encoding module will help your situation. When you
> use it with Stash::ForceUTF8, you don't have to care about UTF-8
> auto-upgrading problem.


Thanks.  I like this idea as it forces the encoding to be defined in
the templates.  Might be nice to specify something other than utf8 as
the default encoding if not encoding is specified in the template,
though.

Since Template::Provider::Encoding calls Template::Provider::_load
should it not check for the utf8 flag before trying to decode it?
If a template had a BOM then returned data would already be decoded.

The other option would be to set UNICODE => 0, but that would not
handle the case of a scalar being passed in that was already utf8.


Now, if there was a module that prevented people from pasting from MS
Word.



Few comments/questions about TT's handling of encoding.  Please
correct me if I'm wrong about anything.


Template::Provider will attempt to determine the encoding by BOM for
templates supplied by file name or a handle.  Scalar refs are not
touched, so they need to be correctly decoded before passed to
process().

This BOM detection happens automatically for perl > 5.007.

There's a "UNICODE" option to provider.  Thus, this feature can be
disabled.  It seems that this option is not documented currently (in
my quick grep).

Obviously, you need an editor or some way to write the BOM to all the
template files to use this feature.


Now:

- If a BOM is not found then the text is left alone.  It might be nice
to specify a default encoding so that if no BOM is found then the
text is still decoded instead of left as raw data.

So, in my case I could specify cp1252 and if UTF8 is not detected by
BOM then it is assumed that it's 1252 and then converted to a perl
string.

- I also wonder if _decode_unicode should just return if the input
text is already flagged as uft8.  This would be useful when supplying
a file handle that already has a PerlIO Layer set.  Currently if you
pass in a file handle with <:utf set you will get:

  Cannot decode string with wide characters at /usr/lib/perl/5.8/Encode.pm line 
166, <$fh> chunk 1.

if the file also contains a BOM.



Oh, BTW.  Isn't this suppose to be correct according to the IO::File
docs?

    $ perl -MIO::File -le "IO::File->new('utf8.html', 'r')->binmode(':utf8')"
    usage $fh->binmode([LAYER]) at -e line 1

This works, though:

    binmode($fh, ':utf8')



-- 
Bill Moseley
[EMAIL PROTECTED]


_______________________________________________
templates mailing list
[email protected]
http://lists.template-toolkit.org/mailman/listinfo/templates

Re: [Templates] UTF-8 and detecting encoding.

Reply via email to