Re: SQLAlchemy + MySQL + UTF-8 support - how?
@mike orr $ sudo apt-get install command-not-found $ mysql_config The program 'mysql_config' is currently not installed. You can install it by typing: apt-get install libmysqlclient15-dev $ sudo apt-get install libmysqlclient15-dev or $ sudo apt-get install apt-file $ apt-file update $ apt-file search mysql_config libmysqlclient15-dev: usr/bin/mysql_config ... The problem is that a default repository install of mysql doesn't include the development files, and mysql_config is really just a program to point to the right configuration options and development headers for things that need to link to them (like the compilation of mysqldb). Would be nicer if it said couldn't find mysql_config; do you have the mysql client development headers installed? but I'm not sure it's worth submitting a bugreport/patch just for that. ... I have MySQLdb 1.2.1 because Kubuntu is still on the older version and I can't get the newer one to compile: $ easy_install MySQL-python File /tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py, line 24, in mysql_config EnvironmentError: mysql_config not found Thank you, now what is mysql_config? ... Mike Orr [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
On 5/29/07, David Smith [EMAIL PROTECTED] wrote: There was one point you made that I believe is inaccurate and so did not include it in the above. Where you said that if you want to use the convert_unicode option, you must know what encoding the database is in, but my tests don't suggest that. Oh, you're right. SQLAlchemy checks the character set of the connection. I thought of that after I sent my email. 2007-05-31 14:11:35,356 INFO sqlalchemy.engine.base.Engine.0x..8c show variables like 'character_set_results' 1) The MySQL specific option of use_unicode is not needed anymore. The MySQL specific charset option also is deprecated. I suppose. I'm still holding on to use_unicode for now though. The important thing is to use either convert_unicode or use_unicode. The MySQL charset option wouldn't work for me at all; it didn't recognize utf8 as a valid choice. -- Mike Orr [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
jj, I actually agree strongly with you regarding the potential dangers. My counter argument is that if you specify the encoding of the database in the config file to a different encoding than is in the mysql config file, don't you risk double-encoding anyway? Now that the bug is fixed in MySQLdb, and no other database needs such options, I'd prefer to promote the Unicode field as the proper way forward. SQLAlchemy should, via the MySQLdb module, determine at runtime the encoding of the database and translate from that encoding to unicode objects as necessary (as should any other ORM). Cheers, David Shannon -jj Behrens wrote: I'm not necessarily disagreeing with you, but I do think it's really bad practice to have your database confused about the encoding of the data that it's storing. At one point, I had things messed up and the data in the database was double encoded. Sure, it worked as long as I was using SQLAlchemy, but if I ever decided to use something else, I would have been in really bad shape. Fortunately, they fixed that bug in MySQLdb. Happy Hacking! -jj --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
Hi David, On 5/29/07, David Smith [EMAIL PROTECTED] wrote: Max, I have no problems by only using Unicode field-types in my model; that is, my sqlalchemy connection string does not contain anything related to UTF-8. I haven't investigated my MySQL setup so thoroughly but it seems to work with MySQL 4 and 5 and stable and development versions of mysqldb. It is quite possible that the data being stored in the database is in iso-8859-1 or whatever MySQL uses by default but SQLAlchemy is automatically manipulating the binary stirngs into Unicode objects based on the encoding of the database. My exprience shows that it may be necessarily to specify encoding for mysql in dburi. The reason is that mysql server may be configured with default encoding that doesn't match the database you're connecting to. E.g. if you open mysql console and type \s you can see something like latin1 while database is utf8-encoded. In a situation like this you do need to tell db encoding explicitly. So I guess it makes sense to always be explicit here. Ideally, I'd like to see a best practice of, basically, stop worrying specifically about MySQL database encoding and all of its nastiness and simply use Unicode fields instead of String fields for data that holds natural language characters. You surely want to use Unicode instead of String. It will do the right thing (converting to and from unicode objects). Also, if the setup is wrong, you'll discover it sooner when it'll try to decode utf8 data assuming latin1 charset. Max. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
Hi Max, thanks for the quick response, Max Ischenko wrote: My exprience shows that it may be necessarily to specify encoding for mysql in dburi. The reason is that mysql server may be configured with default encoding that doesn't match the database you're connecting to. E.g. if you open mysql console and type \s you can see something like latin1 while database is utf8-encoded. In a situation like this you do need to tell db encoding explicitly. So I guess it makes sense to always be explicit here. I guess I should have been more explicit. The point is that it shouldn't matter if your database uses e.g. iso-8859-1 internally, as long as it can hold the bytes. UTF-8 is just another encoding, and the point is that it should be the job of SQLAlchemy (or insert your other platform's ORM) to convert into the database's encoding and back out to unencoded Python unicode objects. Thus, if you want the DB internal data to be stored in UTF-8, you should configure that in your database server settings, not in your application. Ideally, I'd like to see a best practice of, basically, stop worrying specifically about MySQL database encoding and all of its nastiness and simply use Unicode fields instead of String fields for data that holds natural language characters. You surely want to use Unicode instead of String. It will do the right thing (converting to and from unicode objects). Also, if the setup is wrong, you'll discover it sooner when it'll try to decode utf8 data assuming latin1 charset. This again is the same point as above; SQLAlchemy should determine that the DB is not utf-8 and is indeed latin1 or iso-8859-1 or whatever. Cheers, David --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
On 5/29/07, David Smith [EMAIL PROTECTED] wrote: jj, I actually agree strongly with you regarding the potential dangers. My counter argument is that if you specify the encoding of the database in the config file to a different encoding than is in the mysql config file, don't you risk double-encoding anyway? Double encoding should *theoretically* never happen because MySQLdb knows if it has a unicode object or a UTF-8 string. Now that the bug is fixed in MySQLdb, and no other database needs such options, I'd prefer to promote the Unicode field as the proper way forward. SQLAlchemy should, via the MySQLdb module, determine at runtime the encoding of the database and translate from that encoding to unicode objects as necessary (as should any other ORM). To tell you the truth, I don't feel strong either way. As long as *I* the application developer don't need to do the encoding, and as long as the database and I agree about the encoding, it doesn't matter to me who does it. All the database drivers SHOULD support encoding for you. After all, not everyone uses SQLAlchemy. Personally, I think this should have gone in the DBAPI spec. However, if someone wants to let SQLAlchemy do the work, I don't have any objections. Happy Hacking! -jj Shannon -jj Behrens wrote: I'm not necessarily disagreeing with you, but I do think it's really bad practice to have your database confused about the encoding of the data that it's storing. At one point, I had things messed up and the data in the database was double encoded. Sure, it worked as long as I was using SQLAlchemy, but if I ever decided to use something else, I would have been in really bad shape. Fortunately, they fixed that bug in MySQLdb. Happy Hacking! -jj -- http://jjinux.blogspot.com/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
On 5/29/07, David Smith [EMAIL PROTECTED] wrote: My exprience shows that it may be necessarily to specify encoding for mysql in dburi. The reason is that mysql server may be configured with default encoding that doesn't match the database you're connecting to. E.g. if you open mysql console and type \s you can see something like latin1 while database is utf8-encoded. In a situation like this you do need to tell db encoding explicitly. So I guess it makes sense to always be explicit here. I guess I should have been more explicit. The point is that it shouldn't matter if your database uses e.g. iso-8859-1 internally, as long as it can hold the bytes. UTF-8 is just another encoding, and the point is that it should be the job of SQLAlchemy (or insert your other platform's ORM) to convert into the database's encoding and back out to unencoded Python unicode objects. Thus, if you want the DB internal data to be stored in UTF-8, you should configure that in your database server settings, not in your application. Hmm. Not sure I agree completely. One database server supports multiple clients / databases which means every one may have different settings. Encoding spec is part of dburi syntax in sqlalchemy, along with other connections settings, such as username. This seems right to me. OTOH, the application doesn't need to know database encoding, it's just an implementation detail. UnicodeString gives you precisely this. You surely want to use Unicode instead of String. It will do the right thing (converting to and from unicode objects). Also, if the setup is wrong, you'll discover it sooner when it'll try to decode utf8 data assuming latin1 charset. This again is the same point as above; SQLAlchemy should determine that the DB is not utf-8 and is indeed latin1 or iso-8859-1 or whatever. I don't want the tool to outsmart me. ;) Max. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
The first thing is to see what MySQL thinks its server and client character sets are: $ paster shell development.ini import pylons.database engine = pylons.database.create_engine() engine.execute(SHOW VARIABLES).fetchall() Look for the variables that start with character_set_ and collation_. On my workstation they show: (u'character_set_client', u'utf8'), (u'character_set_connection', u'utf8'), (u'character_set_database', u'utf8'), (u'character_set_filesystem', u'binary'), (u'character_set_results', u'utf8'), (u'character_set_server', u'utf8'), (u'character_set_system', u'utf8'), (u'character_sets_dir', u'/usr/share/mysql/charsets/'), (u'collation_connection', u'utf8_general_ci'), (u'collation_database', u'utf8_general_ci'), (u'collation_server', u'utf8_general_ci'), Then you have a choice of ?use_unicode=1 (a MySQLdb option in the DSN) or: create_engine(dsn, convert_unicode=True, encoding=utf8) SQLAlchemy wants to standardize on 'convert_unicode' because it's the same for all engines, but currently there are some problems with that: 1) You have to know what character set the MySQL connection is. 2) If you're using pylons.database.session_context, you can't add arguments to the create_engine call without hacking pylons.database or reimplementing the functions yourself. Hopefully this will be improved in a future version of Pylons -- as soon as we figure out what to do. 3) convert_unicode works at the Table level, so raw engine.execute() queries will still be 'str' strings. * * * * I have MySQLdb 1.2.1 because Kubuntu is still on the older version and I can't get the newer one to compile: $ easy_install MySQL-python File /tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py, line 24, in mysql_config EnvironmentError: mysql_config not found Thank you, now what is mysql_config? * * * * With autoloaded tables you'll get unicode values if you use either use_unicode or convert_unicode, or str values if you don't. With declared columns you'll also have to use the Unicode column type. I had to use a my.cnf file to get the client and server charsets to match. Otherwise I was getting: (u'character_set_client', u'latin1'), (u'character_set_connection', u'latin1'), (u'character_set_database', u'utf8'), (u'character_set_filesystem', u'binary'), (u'character_set_results', u'latin1'), (u'character_set_server', u'utf8'), (u'character_set_system', u'utf8'), (u'character_sets_dir', u'/usr/share/mysql/charsets/'), (u'collation_connection', u'latin1_swedish_ci'), (u'collation_database', u'utf8_general_ci'), (u'collation_server', u'utf8_general_ci'), The my.cnf is a symlink to /etc/mysql/my.cnf and contains among other things: [client] default-character-set = utf8 [mysqld] character-set-server = utf8 default-character-set = utf8 The DSN looks like this: sqlalchemy.dbapi = mysql://...?read_default_file=%(here)s/my.cnfuse_unicode=1 Unfortunately MySQLdb does not raise an exception if the file is missing, so it's worth creating a unit test that checks whether all the character sets are correct. I haven't tried that yet. * * * * If you have existing records in the database which are in a different charset than what the column purports to be, and you want to keep the actual values as is, change the column type to BINARY in MySQL, then change it to the desired charset: http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html If you want to change the data to the charset the column purports to be, write a Python program that connects WITHOUT Unicode support, read the values, decode them from the source charset, encode them into the destination charset, and write them back to the database. With mysql://...?use_unicode=1 and reflected tables, I get Unicode values. Without use_unicode, I get string values. If I hack pylons.database to pass 'convert_unicode=True' to create_engine() in a way that it'll be propagated to the session_context, I get Unicode values again. Curiously, in all three cases the column instances are MSString with convert_unicode=False: Column(u'name',MSString(length=255,convert_unicode=False),nullable=False) * * * * The following didn't work: mysql://...?use_unicode=1charset=utf8 sqlalchemy.exceptions.DBAPIError: (Connection failed) (OperationalError) (2019, 'Can\'t initialize character set utf8 (path: /usr/share/mysql/charsets/)') This may be because I'm using the older MySQLdb, or maybe it's a bug Ubuntu's packaging of MySQL. -- Mike Orr [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
Hi Mike, Thank you for all the details. I've seen the merits of the arguments and I guess the conclusion is these points: 1) The MySQL specific option of use_unicode is not needed anymore. The MySQL specific charset option also is deprecated. 2) The SQLAlchemy convert_unicode option is recommended, and is needed when you're autoloading tables and you want to get unicode objects. It obsoletes the use_unicode option. Correspondingly, the SQLAlchemy encoding option obsoletes the MySQL specific charset option. 3) No options will help you when doing raw engine.execute operations; you must convert for yourself. 4) You should use Unicode fields inside of your application in order for your internal encoding/decoding logic to not depend on the database configuration. I.e., whether you're using convert_unicode or not, when using Unicode field you will always get a unicode object from SQLAlchemy. 5) Pylons does not properly pass the convert_unicode option to create_engine. This has been discussed on this list in many forms, most recently by Mike himself, and everyone seems to be agreed it needs fixing and soon. I've created ticket #250 for this. There was one point you made that I believe is inaccurate and so did not include it in the above. Where you said that if you want to use the convert_unicode option, you must know what encoding the database is in, but my tests don't suggest that. My MySQL settings are currently: character_set_client latin1 character_set_connection latin1 character_set_database latin1 character_set_filesystem binary character_set_results latin1 character_set_server latin1 character_set_system utf8 character_sets_dir /usr/share/mysql/charsets/ collation_connection latin1_swedish_ci collation_database latin1_swedish_ci collation_server latin1_swedish_ci which are the defaults on Debian, and I'm happy in my ignorance. AFAIK as long as the character_set_system is utf-8, the data itself is stored in utf-8, and other variables like client and server encoding can be changed at will to force MySQL (client or server respectively) to convert from utf-8 to whatever else. I've verified that with the above MySQL settings and only using Unicode field types, no covert_unicode or use_unicode or any other settings, the data in the DB is utf-8 and I get unicode objects and everything is groovy. That suggests to me that if you really care, you should set the backend encoding to utf-8 or whatever with the SQLAlchemy encoding settings, but it is not necessary nor especially recommended. Does that tie up the loose ends on this discussion? Thanks again, David Mike Orr wrote: The first thing is to see what MySQL thinks its server and client character sets are: $ paster shell development.ini import pylons.database engine = pylons.database.create_engine() engine.execute(SHOW VARIABLES).fetchall() Look for the variables that start with character_set_ and collation_. On my workstation they show: (u'character_set_client', u'utf8'), (u'character_set_connection', u'utf8'), (u'character_set_database', u'utf8'), (u'character_set_filesystem', u'binary'), (u'character_set_results', u'utf8'), (u'character_set_server', u'utf8'), (u'character_set_system', u'utf8'), (u'character_sets_dir', u'/usr/share/mysql/charsets/'), (u'collation_connection', u'utf8_general_ci'), (u'collation_database', u'utf8_general_ci'), (u'collation_server', u'utf8_general_ci'), Then you have a choice of ?use_unicode=1 (a MySQLdb option in the DSN) or: create_engine(dsn, convert_unicode=True, encoding=utf8) SQLAlchemy wants to standardize on 'convert_unicode' because it's the same for all engines, but currently there are some problems with that: 1) You have to know what character set the MySQL connection is. 2) If you're using pylons.database.session_context, you can't add arguments to the create_engine call without hacking pylons.database or reimplementing the functions yourself. Hopefully this will be improved in a future version of Pylons -- as soon as we figure out what to do. 3) convert_unicode works at the Table level, so raw engine.execute() queries will still be 'str' strings. * * * * I have MySQLdb 1.2.1 because Kubuntu is still on the older version and I can't get the newer one to compile: $ easy_install MySQL-python File /tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py, line 24, in mysql_config EnvironmentError: mysql_config not found Thank you, now what is mysql_config? * * * * With autoloaded tables you'll get unicode values if you use either use_unicode or convert_unicode, or str values if you don't. With declared columns you'll also have to use the Unicode column type. I had to use a my.cnf file to get the client and server charsets to match. Otherwise I was getting: (u'character_set_client', u'latin1'), (u'character_set_connection', u'latin1'), (u'character_set_database', u'utf8'),
Re: SQLAlchemy + MySQL + UTF-8 support - how?
On 5/11/07, andyprog [EMAIL PROTECTED] wrote: Hi all, (I'm using Pylons for the first time, so pls bear with me...) I'm setting up a Pylons app with Mako + SQLAlchemy + MySQL + UTF-8 support -- but I've had trouble finding one place with clear instructions on how to do this, so I've gathered pieces of info from here and there, but this, coupled with the fact that some instructions are outdated, and others not so clear, has left me quite confused... :( It is outdated indeed. If you have the latest stable version of Pylons, SA and MySQLdb (1.2.2) the following should be enough: sqlalchemy.dburi = mysql://...?charset=utf8 And you should get properly decoded unicode strings from SA. Btw, make sure your mysql db really contains utf8-encoded data (use mysql console in utf8 locale to poke around and show full columns to check collation. Max. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: SQLAlchemy + MySQL + UTF-8 support - how?
Thank you for your answers, Max and BJPirt. I'm going to set things up and I'll let you know how it went. Andy P --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---