I've been trying to get to the bottom of unicode issues with MySQL and the various tests that exercise this functionality - test_blob.py and test_unicode.py.
I'm no expert on all of these issues, but I've gained a little understanding over the last few days. I'm using release 0.7.1, at revision 1954, MySQLdb 1.2.1+, MySQL 5.0, Python 2.4.3. The suggestion, when the test_blob.py and test_unicode.py failed, was to use the following charset=utf8&sqlobject_encoding=utf-8 in my connection URI, but I found that even this didn't work, generating essentially the same error. (This from test_blob.py, test_unicode.py similar). no extra connection URI settings: UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 167: ordinal not in range(128) with extra connection URI settings: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 167: unexpected code byte The problem is, I believe, that python strings (<type 'str'>) are no guaranteed to be either utf-8 or ascii. The python string s = chr(128) is not valid ascii _or_ utf-8, but SQLObject (and in particular its tests) effectively assume it is. chr(128) is not valid ascii because it sets the 8th bit. chr(128) is not valid utf-8 because it should be represented with two bytes, the first of which should begin with 11. (See http://en.wikipedia.org/wiki/UTF-8). My solution (this is where my ignorance comes in) is to change the default sqlobjct_encoding to 'latin-1', rather than 'ascii' in mysql/mysqlconnection.py. latin-1 is a 256 symbol single byte string encoding, which correctly represents the range of python string instances. With this change, all of SQLObject's unicode and blob tests pass again, with no connection URI settings or special magic at the MySQL database end. On the other hand, in the process I've discovered a similar problem with sqlite's unicode handling. In test_unicode.py, a UnicodeColumn is made an alternateID, which implies unique. For MySQL, unique implies a key, which requires a length. So, col1 = UnicodeCol(alternateID=True) becomes col1 = UnicodeCol(alternateID=True,length=100) and the test passes for MySQL, handling all the Unicode stuff correctly. However, with the length argument, sqlite now fails this test. With no length argument, sqlite uses a TEXT type, while with a length argument, it uses a VARCHAR type. The TEXT type works (incorrectly!!! I believe) because it returns python strings rather than unicode strings. The VARCHAR type doesn't work (correctly!!! I believe) because it tries to coerce a python string into a unicode string (and a similar codec error is encoutered). So, I think the use of the 'ascii' encoding, where 'latin-1' is actually what is required is a bug that should be fixed. The MySQL driver is the only place this is done explicitly, but the problem the sqlite's VARCHAR makes me think that this bug is present implicitly in a variety of other places. Cheers! nathan -- Nathan Edwards, Ph.D. Center for Bioinformatics and Computational Biology 3119 Biomolecular Sciences Bldg. #296 University of Maryland, College Park, MD 20742 Phone: +1 301-405-9901 Email: [EMAIL PROTECTED] WWWeb: http://www.umiacs.umd.edu/~nedwards ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss