Hello!

I was trying to "unicodify" TurboGears, starting with wiki20 tutorial.
Here are the results.
(Please note that I follow "unicode everywhere" principle here)

Feel free to do what you want with my code (submit relevant tickets,
patches etc.)


Wiki20 code itself
==================
1. content = content.encode('utf8') must be removed, because Kid deals
with unicode->utf8 itself.
2. WikiWords regular expression obviously doesn't work with non-Latin
alphabet, but I didn't have time to fix it. I created test page with
unicode name and data manually, but that is probably not enough to
capture all unicode-related bugs...
3. wiki20.tgz is not is sync with latest TurboGears and tutorial!


Kid
===
Kid was the easiest to deal with. It just accepted unicode and
automatically encoded html to utf8.


CherryPy
========

Decoding incoming arguments
---------------------------
I had to add the following lines to dev.cfg (so that CP would decode
incoming parameters to unicode):
    decodingFilter.on = True
    decodingFilter.encoding = "utf8"
(Note: encodingFilter should NOT be enabled, because of Kid.)

Decoding and encoding urls
--------------------------
Specially encoded urls (like
http://localhost:8080/%D0%A2%D0%B5%D1%81%D1%82) was processed from/to
utf8 by CP automatically. But unicode<->utf8 had to be done manually:
...
    def default(self, pagename):
        pagename = unicode(pagename,'utf8')
        return self.index(pagename)
...
and
...
        raise cherrypy.HTTPRedirect("/%s" % pagename.encode('utf8'))

Decoding and encoding cookies (turbogears.flash)
------------------------------------------------
I wanted to translate "Changes saved!" to Russian. To do this I had to
make turbogears.flash to work with unicode parameters. I patched
Turbogears (just encoding message before calling flash doesn't work --
http://groups.google.com/group/turbogears/browse_frm/thread/13539eb82b6fa60c).
Also take a look at http://www.cherrypy.org/ticket/353.

--- controllers.py.original     Thu Oct 20 04:12:42 2005
+++ controllers.py      Thu Oct 20 04:16:14 2005
@@ -128,6 +128,7 @@

 def flash(message):
     """Set a message to be displayed in the browser on next page
display"""
+    message = message.encode('utf8')
     cherrypy.response.simpleCookie['tg_flash'] = message
     cherrypy.response.simpleCookie['tg_flash']['path'] = '/'

@@ -136,6 +137,7 @@
     after retrieval."""
     try:
         message = cherrypy.request.simpleCookie["tg_flash"].value
+        message = unicode(message,'utf8')
         cherrypy.response.simpleCookie["tg_flash"] = ""
         cherrypy.response.simpleCookie["tg_flash"]['expires'] = 0
         cherrypy.response.simpleCookie['tg_flash']['path'] = '/'


SQLObject
=========
Changing StringCol to UnicodeCol doesn't help, because SQLObject won't
automatically encode queries (e.g. Page.byPagename wouldn't work
properly).

Instead I patched SQLObject as suggested by Stuart Bishop:
http://article.gmane.org/gmane.comp.python.sqlobject/2027

--- dbconnection.py.original    Thu Oct 20 04:43:08 2005
+++ dbconnection.py     Wed Oct 19 23:38:14 2005
@@ -292,6 +292,11 @@
     def _executeRetry(self, conn, cursor, query):
         if self.debug:
             self.printDebug(conn, query, 'QueryR')
+        if isinstance(query, unicode):
+            query = query.encode('utf8')
+        else:
+            # raise UnicodeError if it is not valid utf8 already
+            query.decode('utf8')
         return cursor.execute(query)

     def _query(self, conn, s):


--- col.py.original     Thu Oct 20 04:45:30 2005
+++ col.py      Thu Oct 20 04:46:50 2005

@@ -503,17 +485,15 @@
     def to_python(self, value, state):
         if value is None:
             return None
-        if isinstance(value, unicode):
-           return value.encode("ascii")
+        if isinstance(value, str):
+           return unicode(value,"utf8")
         return value

     def from_python(self, value, state):
         if value is None:
             return None
-        if isinstance(value, str):
-            return value
         if isinstance(value, unicode):
-            return value.encode("ascii")
+            return value.encode("utf8")
         return value

 class SOStringCol(SOStringLikeCol):

Probably it's not a very good way to fix SQLObject, because _every_
query is encoded into utf8. There was a discussion on the mailing list,
but they didn't come to any conclusion:
http://thread.gmane.org/gmane.comp.python.sqlobject/2156


FormEncode
==========
validators.StringBoolean doesn't work when parameters coming from the
browser are decoded using CherryPy's decodingFilter. I patched it:

--- validators.py.original      Thu Oct 20 04:29:48 2005
+++ validators.py       Thu Oct 20 04:31:08 2005
@@ -1516,7 +1516,7 @@
     messages = { "string" : "Value should be %(true)r or %(false)r" }

     def _to_python(self, value, state):
-        if isinstance(value, str):
+        if isinstance(value, basestring):
             value = value.strip().lower()
             if value in self.true_values:
                 return True



Alexey

Reply via email to