On 11/4/06, Michael Bayer <[EMAIL PROTECTED]> wrote:
> Shannon -jj Behrens wrote:
> > I'm using convert_unicode=True.  Everything is fine as long as I'm the
> > one reading and writing the data.  However, if I look at what's
> > actually being stored in the database, it's like the data has been
> > encoded twiced.  If I switch to use_unicode=True, which I believe is
> > MySQL specific, things work just fine and what's being stored in the
> > database looks correct.
>
> yes, if mysql client lib is encoding, and SA is also encoding, the data
> will get encoded twice.  im not familiar with how i could look at the
> encoded data to tell if it was already encoded (and not sure if i
> should be...the unicode encoding option should only be enabled in one
> place, not two)

If it's a unicode object, you should encode it.  If it's a str object,
you should assume it's already encoded.  If it's a str object, and
.encode gets called on it, if it's ASCII, nothing happens.  If it's
anything else, you'll get an exception.  All of this is fine.  I just
don't understand why I'm getting something from MySQLdb (not from
SQLAlchemy) that says it's a unicode object, but wasn't actually
decoded :-/

> > I started looking through the SQLAlchemy code, and I came across this:
> >
> >     def convert_bind_param(self, value, dialect):
> >         if not dialect.convert_unicode or value is None or not
> > isinstance(value, unicode):
> >             return value
> >         else:
> >             return value.encode(dialect.encoding)
> >     def convert_result_value(self, value, dialect):
> >         if not dialect.convert_unicode or value is None or
> > isinstance(value, unicode):
> >             return value
> >         else:
> >             return value.decode(dialect.encoding)
> >
> > The logic looks backwards.  It says, "If it's not a unicode object,
> > return it.  Otherwise, encode it."  Later, "If it is a unicode object,
> > return it.  Otherwise decode it."
>
> sending unicode values to databases whose client APIs dont handle
> unicode involves taking a python unicode object from the application,
> encoding it into an encoded series of bytes, and sending it to the
> database.  receieving a result value involves taking the encoded series
> of bytes and decoding into a unicode object.  so you have *non* unicode
> instances going into the DB, and *non* unicode coming out - the DBAPI
> is assumed to not have any idea what a python unicode object is (such
> as pscopg's).

That sounds fine.

> We've been doing the unicode thing for a while now, and you should
> notice that we have unit tests for just about every function in SA,

Yeah, I definitely think this is not SA's fault.

> especially important ones like this.  the unicode unit test runs
> unicode and raw encoded values in and out in numerous ways, which pass
> for at least mysql,sqlite, postgres, oracle, and ms-sql.  we have had
> some people having issues with MySQL specifically, which seems to be
> because some folks have a mysql config that is stuck in "convert
> unicode" mode and experience the double-encoding issue.

I think you're onto something.

>  the one
> improvement that could be made here is for MySQL to provide a
> subclassed unicode type that disables conversion if the dialect is
> known to have convert_unicode=True already....then again i sort of like
> that this forces people to understand their database config.

Thanks, Mike.

-jj

-- 
http://jjinux.blogspot.com/

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to