Aloha!
I've done some searching online regarding character encoding and
UTF-8 support within mod_python, but haven't been able to get the
proper functionality out of mod_python.
Here's the situation: I have changed my site.py in Python 2.4.3 to
use "utf-8" as the default encoding. I have a database with correct
unicode representations in it. I execute routines from the
interpreter and get correct unicode objects out of the database.
When I run these exact routines from inside of a PSP page, the
unicode object has now been latin1 decoded. Please note that from
the examples below that I am using identical MySQLdb connection
settings.
I am still a bit unclear as to where exactly this is happening inside
of mod_python, and any advice to a solution would be greatly
appreciated. It's pretty critical that a developer can provide UTF-8
support in order for mod_python to gain traction in enterprise
applications.
If this is a user error on my part, I'd greatly appreciate being
pointed to a proper solution.
Best,
earle.
------ THIS WORKS FROM WITHIN THE INTERPRETER:
(conn, cursor) = util.DBConnect(MySQLdb.cursors.DictCursor)
cursor.execute("SELECT * from unicode_test")
items = cursor.fetchall()
for item in items:
print item,
# RESULTS: correct unicode:
# ([EMAIL PROTECTED] 14:55 266) python utest.py
# {'data': u'\u9577\u5ca1', 'id': 35L}
# {'data': u'\u9577\u5ca1', 'id': 36L}
------- THIS DOES NOT WORK FROM .PSP, it produces a latin1 decoded
unicode object of the correct unicode (see below):
<%
req.content_type = 'text/html;charset=UTF-8;';
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm
l1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<%@ include file="include/webglobals.psps" %>
<%
(conn, cursor) = util.DBConnect(MySQLdb.cursors.DictCursor)
req.write("MYSQL CONNECTION CHARSET: ")
req.write(conn.character_set_name())
req.write("<p/>")
req.write("SYS.DEFAULTENCODING: ")
req.write(sys.getdefaultencoding())
req.write("<p/>")
res = cursor.execute("SELECT * from unicode_test")
items = cursor.fetchall()
for i in items:
#
req.write("DATA: ")
req.write(i['data'])
req.write(", item: ")
%>
<%= i %>
<%
req.write(", BYTES: ")
req.write(i['data'].encode('unicode_escape'))
req.write("<p/>")
#
# end: items
req.write("SHOULD LOOK LIKE THIS: %s" % ( u'\u9577\u5ca1', ))
%>
</body>
</html>
---- RESULTS:
MYSQL CONNECTION CHARSET: utf8
SYS.DEFAULTENCODING: utf-8
DATA: 長岡, item: {'data': u'\xe9\x95\xb7\xe5\xb2\xa1', 'id':
35L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1
DATA: 長岡, item: {'data': u'\xe9\x95\xb7\xe5\xb2\xa1', 'id':
36L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1
SHOULD LOOK LIKE THIS: 長岡
------- Notice if I latin1 decode the -correct- unicode object, i
get the exact
unicode object that is appearing inside of the PSP:
>>> u'\u9577\u5ca1'.decode('latin1')
u'\xe9\x95\xb7\xe5\xb2\xa1'