The first thing is to see what MySQL thinks its server and client
character sets are:

$ paster shell development.ini
>>> import pylons.database
>>> engine = pylons.database.create_engine()
>>> engine.execute("SHOW VARIABLES").fetchall()

Look for the variables that start with "character_set_" and
"collation_".  On my workstation they show:

 (u'character_set_client', u'utf8'),
 (u'character_set_connection', u'utf8'),
 (u'character_set_database', u'utf8'),
 (u'character_set_filesystem', u'binary'),
 (u'character_set_results', u'utf8'),
 (u'character_set_server', u'utf8'),
 (u'character_set_system', u'utf8'),
 (u'character_sets_dir', u'/usr/share/mysql/charsets/'),
 (u'collation_connection', u'utf8_general_ci'),
 (u'collation_database', u'utf8_general_ci'),
 (u'collation_server', u'utf8_general_ci'),

Then you have a choice of "?use_unicode=1" (a MySQLdb option in the DSN) or:

    create_engine(dsn, convert_unicode=True, encoding="utf8")

SQLAlchemy wants to standardize on 'convert_unicode' because it's the
same for all engines, but currently there are some problems with that:

1) You have to know what character set the MySQL connection is.

2) If you're using pylons.database.session_context, you can't add
arguments to the create_engine call without hacking pylons.database or
reimplementing the functions yourself.  Hopefully this will be
improved in a future version of Pylons -- as soon as we figure out
what to do.

3) convert_unicode works at the Table level, so raw engine.execute()
queries will still be 'str' strings.

* * * *
I have MySQLdb 1.2.1 because Kubuntu is still on the older version and
I can't get the newer one to compile:

$ easy_install MySQL-python
  File "/tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py",
line 24, in mysql_config
EnvironmentError: mysql_config not found

Thank you, now what is "mysql_config"?

* * * *
With autoloaded tables you'll get unicode values if you use either
use_unicode or convert_unicode, or str values if you don't.  With
declared columns you'll also have to use the Unicode column type.

I had to use a my.cnf file to get the client and server charsets to
match.  Otherwise I was getting:

 (u'character_set_client', u'latin1'),
 (u'character_set_connection', u'latin1'),
 (u'character_set_database', u'utf8'),
 (u'character_set_filesystem', u'binary'),
 (u'character_set_results', u'latin1'),
 (u'character_set_server', u'utf8'),
 (u'character_set_system', u'utf8'),
 (u'character_sets_dir', u'/usr/share/mysql/charsets/'),
 (u'collation_connection', u'latin1_swedish_ci'),
 (u'collation_database', u'utf8_general_ci'),
 (u'collation_server', u'utf8_general_ci'),

The my.cnf is a symlink to /etc/mysql/my.cnf and contains among other things:

    [client]
    default-character-set  = utf8

    [mysqld]
    character-set-server = utf8
    default-character-set = utf8

The DSN looks like this:

    sqlalchemy.dbapi =
mysql://...?read_default_file=%(here)s/my.cnf&use_unicode=1

Unfortunately MySQLdb does not raise an exception if the file is
missing, so it's worth creating a unit test that checks whether all
the character sets are correct.  I haven't tried that yet.

* * * *
If you have existing records in the database which are in a different
charset than what the column purports to be, and you want to keep the
actual values as is, change the column type to BINARY in MySQL, then
change it to the desired charset:

http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html

If you want to change the data to the charset the column purports to
be, write a Python program that connects WITHOUT Unicode support, read
the values, decode them from the source charset, encode them into the
destination charset, and write them back to the database.

With "mysql://...?use_unicode=1" and reflected tables, I get Unicode
values.  Without use_unicode, I get string values.  If I hack
pylons.database to pass 'convert_unicode=True' to create_engine() in a
way that it'll be propagated to the session_context,  I get Unicode
values again.  Curiously, in all three cases the column instances are
MSString with convert_unicode=False:

Column(u'name',MSString(length=255,convert_unicode=False),nullable=False)

* * * *
The following didn't work:

"mysql://...?use_unicode=1&charset=utf8"

sqlalchemy.exceptions.DBAPIError: (Connection failed)
(OperationalError) (2019, 'Can\'t initialize character set utf8"
(path: /usr/share/mysql/charsets/)')

This may be because I'm using the older MySQLdb, or maybe it's a bug
Ubuntu's packaging of MySQL.

-- 
Mike Orr <[EMAIL PROTECTED]>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to