On Tue, Aug 10, 2010 at 12:44 AM, Johan Vromans <[email protected]>wrote:

> Bill Moseley <[email protected]> writes:
>
> > I'm curious about that. What are the portability issues? Are you
> > rendering to browsers that do not support, say, utf8?
>
> No, it's a server problem, depending on version and configuration
> options. See e.g. the Apache AddDefaultCharset config option. As a
> result, HTML served may be provided with a 'charset=iso-8859.1' or
> 'charset=utf-8'. Or none. Sometimes the charset <meta> tag is obeyed,
> sometimes is is not.
>

It's a server configuration error.  If you are serving static files make
sure they are encoded correctly and then set AddDefaultCharset to match.
That can be done in .htaccess if you don't have access to the server
config.  For dynamic
content set a Content-Type header with charset.

If you have characters in Perl and then you send them over the wire as
octets then they have to be encoded into something, right?  And if you
encode you must say what the charset is or else the octets are just a string
of bits to the client.

In other words, your documents are encoded in something, so you need to get
the web server to tell the client what that encoding is.

Anyway, I'm wondering if the template is the correct place to do what you
asking.  It makes sense to "escape" < and > in the templates as they have
special meaning, but seems like you really want to *encode* the entire HTML
response content into a given charset (which you should always do anyway).

So, after calling process() you then Encode into the encoding you want to
send and agrees with what the web server is saying.  Encode will even do
your entities, if you really want to encode to ASCII:

$ echo "hello is привет" | perl -MEncode -lne 'print Encode::encode(
"ascii", Encode::decode_utf8($_), Encode::HTMLCREF )'
hello is &#1087;&#1088;&#1080;&#1074;&#1077;&#1090;

But, again, the client needs to know what encoding your content is encoded
in, so might as well encode to utf8 and just tell the client it's utf8
instead of ascii.

echo "hello is привет" | perl -MEncode -lne 'print Encode::encode( "utf8",
Encode::decode_utf8($_), Encode::HTMLCREF )'
hello is привет


RedHat started adding 'AddDefaultCharset UTF-8' a couple of years ago to
> the distributed server configs. Not funny.
>

That seems like a reasonable default.  If files are ASCII on disk then they
are fine.  And utf8 would be a good bet otherwise, as
the locale was probably utf8, too.


>
> Therefore I adapted the habit to always use &entities; for anything
> non-ASCII.
>

Seems so last century. ;)




-- 
Bill Moseley
[email protected]
_______________________________________________
templates mailing list
[email protected]
http://mail.template-toolkit.org/mailman/listinfo/templates

Reply via email to