On Tue, 2007-11-27 at 15:17 +0100, Markus Gritsch wrote:
> On 27/11/2007, Oleg Broytmann <[EMAIL PROTECTED]> wrote:
> > On Tue, Nov 27, 2007 at 02:56:59PM +0100, Markus Gritsch wrote:
> > > But if a BLOB contains just bytes which do *not* trigger the
> > > Exception, the BLOB in the query *does* get converted to unicode, and
> > > I doubt this would be the desired behavior.
> >
> >    The absence of try/except in question wouldn't prevent anything in this
> > case, right?
> 
> Right, but the presence of this try/except does swallow Exceptions
> which might be helpful in some other cases.  Having this try/except
> here unconditionally for all cases just prevents from getting an
> exception when converting to unicode e.g. if one has specified a wrong
> encoding.  This is dangerous.  Instead of getting an exception, the
> code just continues to run, issuing a query containing the wrong
> characters for the DB.

But SQLObject already sends the wrong characters for the DB for MySQL by 
default.

By default, MySQL expects data to be sent over the wire encoded in latin-1.  
SQLObject by default sends Unicode data encoded in UTF-8.  Since UTF-8 is 
a subset of latin-1, this doesn't cause any exceptions.  But since MySQL 
thinks the data is latin-1, it will return weird results for some queries:

Consider a connection string like this:
connectionForURI('mysql://test:[EMAIL PROTECTED]/test?use_unicode=1')

class Person(SQLObject):
    name = UnicodeCol()
    sname = BLOBCol()
Person.createTable()
p = Person(name=u'\u20ac', sname = u'\u20ac'.encode('utf-8')) # \u20ac is the 
'Euro symbol'.

If you go to the mysql command line, you would expect 
select length(name), length(sname) from person;
to return 1, 3 (3 is the length of \u20ac encoded in utf-8).  In fact 
it returns 3, 3.  So then you think you'll specify the charset:

sqlhub.threadConnection = connectionForURI('mysql://test:[EMAIL 
PROTECTED]/test?use_unicode=1&charset=utf8&sqlobject_encoding=utf-8')

Now run that same query:
select length(name), length(sname) from person;
1, 1
But wait, if the length of the blob col is 1, then is the blob column being 
treated as utf-8?  No.  It's being mangled.

assert Person.select()[0].sname == u'\u20ac'.encode('utf-8') #assertion fails.
In fact, the result of the query is that sname contains '\x80'.

Conclusion: SQLObject is doing the wrong thing on many levels for MySQL.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss

Reply via email to