On 11/4/06, Michael Bayer <[EMAIL PROTECTED]> wrote: > Shannon -jj Behrens wrote: > > I'm using convert_unicode=True. Everything is fine as long as I'm the > > one reading and writing the data. However, if I look at what's > > actually being stored in the database, it's like the data has been > > encoded twiced. If I switch to use_unicode=True, which I believe is > > MySQL specific, things work just fine and what's being stored in the > > database looks correct. > > yes, if mysql client lib is encoding, and SA is also encoding, the data > will get encoded twice. im not familiar with how i could look at the > encoded data to tell if it was already encoded (and not sure if i > should be...the unicode encoding option should only be enabled in one > place, not two)
If it's a unicode object, you should encode it. If it's a str object, you should assume it's already encoded. If it's a str object, and .encode gets called on it, if it's ASCII, nothing happens. If it's anything else, you'll get an exception. All of this is fine. I just don't understand why I'm getting something from MySQLdb (not from SQLAlchemy) that says it's a unicode object, but wasn't actually decoded :-/ > > I started looking through the SQLAlchemy code, and I came across this: > > > > def convert_bind_param(self, value, dialect): > > if not dialect.convert_unicode or value is None or not > > isinstance(value, unicode): > > return value > > else: > > return value.encode(dialect.encoding) > > def convert_result_value(self, value, dialect): > > if not dialect.convert_unicode or value is None or > > isinstance(value, unicode): > > return value > > else: > > return value.decode(dialect.encoding) > > > > The logic looks backwards. It says, "If it's not a unicode object, > > return it. Otherwise, encode it." Later, "If it is a unicode object, > > return it. Otherwise decode it." > > sending unicode values to databases whose client APIs dont handle > unicode involves taking a python unicode object from the application, > encoding it into an encoded series of bytes, and sending it to the > database. receieving a result value involves taking the encoded series > of bytes and decoding into a unicode object. so you have *non* unicode > instances going into the DB, and *non* unicode coming out - the DBAPI > is assumed to not have any idea what a python unicode object is (such > as pscopg's). That sounds fine. > We've been doing the unicode thing for a while now, and you should > notice that we have unit tests for just about every function in SA, Yeah, I definitely think this is not SA's fault. > especially important ones like this. the unicode unit test runs > unicode and raw encoded values in and out in numerous ways, which pass > for at least mysql,sqlite, postgres, oracle, and ms-sql. we have had > some people having issues with MySQL specifically, which seems to be > because some folks have a mysql config that is stuck in "convert > unicode" mode and experience the double-encoding issue. I think you're onto something. > the one > improvement that could be made here is for MySQL to provide a > subclassed unicode type that disables conversion if the dialect is > known to have convert_unicode=True already....then again i sort of like > that this forces people to understand their database config. Thanks, Mike. -jj -- http://jjinux.blogspot.com/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---