Re: SQLAlchemy + MySQL + UTF-8 support - how?

David Smith Tue, 29 May 2007 19:03:06 -0700

Hi Mike,

Thank you for all the details. I've seen the merits of the arguments and I
guess the conclusion is these points:


1) The MySQL specific option of use_unicode is not needed anymore. The MySQL
specific charset option also is deprecated.

2) The SQLAlchemy convert_unicode option is recommended, and is needed when
you're autoloading tables and you want to get unicode objects. It obsoletes
the use_unicode option. Correspondingly, the SQLAlchemy encoding option
obsoletes the MySQL specific charset option.

3) No options will help you when doing raw engine.execute operations; you
must convert for yourself.

4) You should use Unicode fields inside of your application in order for
your internal encoding/decoding logic to not depend on the database
configuration. I.e., whether you're using convert_unicode or not, when
using Unicode field you will always get a unicode object from SQLAlchemy.

5) Pylons does not properly pass the convert_unicode option to
create_engine. This has been discussed on this list in many forms, most
recently by Mike himself, and everyone seems to be agreed it needs fixing
and soon. I've created ticket #250 for this.
 
There was one point you made that I believe is inaccurate and so did not
include it in the above. Where you said that if you want to use the
convert_unicode option, you must know what encoding the database is in, but
my tests don't suggest that. My MySQL settings are currently:

character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
collation_connection latin1_swedish_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci

which are the defaults on Debian, and I'm happy in my ignorance. AFAIK as
long as the character_set_system is utf-8, the data itself is stored in
utf-8, and other variables like client and server encoding can be changed
at will to force MySQL (client or server respectively) to convert from
utf-8 to whatever else.

I've verified that with the above MySQL settings and only using Unicode
field types, no covert_unicode or use_unicode or any other settings, the
data in the DB is utf-8 and I get unicode objects and everything is groovy.

That suggests to me that if you really care, you should set the backend
encoding to utf-8 or whatever with the SQLAlchemy encoding settings, but it
is not necessary nor especially recommended.

Does that tie up the loose ends on this discussion?

Thanks again,
David

Mike Orr wrote:

> 
> The first thing is to see what MySQL thinks its server and client
> character sets are:
> 
> $ paster shell development.ini
>>>> import pylons.database
>>>> engine = pylons.database.create_engine()
>>>> engine.execute("SHOW VARIABLES").fetchall()
> 
> Look for the variables that start with "character_set_" and
> "collation_".  On my workstation they show:
> 
>  (u'character_set_client', u'utf8'),
>  (u'character_set_connection', u'utf8'),
>  (u'character_set_database', u'utf8'),
>  (u'character_set_filesystem', u'binary'),
>  (u'character_set_results', u'utf8'),
>  (u'character_set_server', u'utf8'),
>  (u'character_set_system', u'utf8'),
>  (u'character_sets_dir', u'/usr/share/mysql/charsets/'),
>  (u'collation_connection', u'utf8_general_ci'),
>  (u'collation_database', u'utf8_general_ci'),
>  (u'collation_server', u'utf8_general_ci'),
> 
> Then you have a choice of "?use_unicode=1" (a MySQLdb option in the DSN)
> or:
> 
>     create_engine(dsn, convert_unicode=True, encoding="utf8")
> 
> SQLAlchemy wants to standardize on 'convert_unicode' because it's the
> same for all engines, but currently there are some problems with that:
> 
> 1) You have to know what character set the MySQL connection is.
> 
> 2) If you're using pylons.database.session_context, you can't add
> arguments to the create_engine call without hacking pylons.database or
> reimplementing the functions yourself.  Hopefully this will be
> improved in a future version of Pylons -- as soon as we figure out
> what to do.
> 
> 3) convert_unicode works at the Table level, so raw engine.execute()
> queries will still be 'str' strings.
> 
> * * * *
> I have MySQLdb 1.2.1 because Kubuntu is still on the older version and
> I can't get the newer one to compile:
> 
> $ easy_install MySQL-python
>   File "/tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py",
> line 24, in mysql_config
> EnvironmentError: mysql_config not found
> 
> Thank you, now what is "mysql_config"?
> 
> * * * *
> With autoloaded tables you'll get unicode values if you use either
> use_unicode or convert_unicode, or str values if you don't.  With
> declared columns you'll also have to use the Unicode column type.
> 
> I had to use a my.cnf file to get the client and server charsets to
> match.  Otherwise I was getting:
> 
>  (u'character_set_client', u'latin1'),
>  (u'character_set_connection', u'latin1'),
>  (u'character_set_database', u'utf8'),
>  (u'character_set_filesystem', u'binary'),
>  (u'character_set_results', u'latin1'),
>  (u'character_set_server', u'utf8'),
>  (u'character_set_system', u'utf8'),
>  (u'character_sets_dir', u'/usr/share/mysql/charsets/'),
>  (u'collation_connection', u'latin1_swedish_ci'),
>  (u'collation_database', u'utf8_general_ci'),
>  (u'collation_server', u'utf8_general_ci'),
> 
> The my.cnf is a symlink to /etc/mysql/my.cnf and contains among other
> things:
> 
>     [client]
>     default-character-set  = utf8
> 
>     [mysqld]
>     character-set-server = utf8
>     default-character-set = utf8
> 
> The DSN looks like this:
> 
>     sqlalchemy.dbapi =
> mysql://...?read_default_file=%(here)s/my.cnf&use_unicode=1
> 
> Unfortunately MySQLdb does not raise an exception if the file is
> missing, so it's worth creating a unit test that checks whether all
> the character sets are correct.  I haven't tried that yet.
> 
> * * * *
> If you have existing records in the database which are in a different
> charset than what the column purports to be, and you want to keep the
> actual values as is, change the column type to BINARY in MySQL, then
> change it to the desired charset:
> 
> http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html
> 
> If you want to change the data to the charset the column purports to
> be, write a Python program that connects WITHOUT Unicode support, read
> the values, decode them from the source charset, encode them into the
> destination charset, and write them back to the database.
> 
> With "mysql://...?use_unicode=1" and reflected tables, I get Unicode
> values.  Without use_unicode, I get string values.  If I hack
> pylons.database to pass 'convert_unicode=True' to create_engine() in a
> way that it'll be propagated to the session_context,  I get Unicode
> values again.  Curiously, in all three cases the column instances are
> MSString with convert_unicode=False:
> 
> Column(u'name',MSString(length=255,convert_unicode=False),nullable=False)
> 
> * * * *
> The following didn't work:
> 
> "mysql://...?use_unicode=1&charset=utf8"
> 
> sqlalchemy.exceptions.DBAPIError: (Connection failed)
> (OperationalError) (2019, 'Can\'t initialize character set utf8"
> (path: /usr/share/mysql/charsets/)')
> 
> This may be because I'm using the older MySQLdb, or maybe it's a bug
> Ubuntu's packaging of MySQL.
> 



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: SQLAlchemy + MySQL + UTF-8 support - how?

Reply via email to