Hello Michael,

I  know  there's  a  database  engine  parameter  "encoding". It tells
sqlalchemy  in  which  encoding  Unicode  objects  should  be saved to
database.

I  suggest  adding another encoding, let's say "client_encoding" which
will  be  used  when  convert_unicode  is True and user assigns string
object  to  object attribute. Currently even if convert_unicode is set
to True string go to database as-is, bypassing convertion to unicode.

This  option  will  allow  to  assign  string's  in  national/platform
specific  encodings, like cp1251 straigt to object attributes and they
will be properly converted to database encoding (engine.encoding).


See,  encoding  on  client  machine  may be different from encoding in
database. You can see changes that I suggest from attached diff.

Suggested    changes    will    can    make    life    of   users   of
multilingual/multienconding  enviromnents  a  little  easier while not
affexcting all other users of SQLAlchemy.

MB> On Apr 17, 2006, at 5:47 AM, Vasily Sulatskov wrote:

>> In my opinion that's a bug and that behaviour should be changed to  
>> something
>> like that:
>> 1. If object is unicode then convert it to engine specified  
>> encoding (like
>> utf8) as it happens now
>> 2. If it's a string then convert it to unicode using some another  
>> specifed
>> encoding (it should be added to engine parameters). This encoding  
>> specifies
>> client-side encoding. It's often handy to have different encodings  
>> in database
>> and on client machines (at least for people with "alternate  
>> languages" :-)


MB> there already is an encoding parameter for the engine.

MB> http://www.sqlalchemy.org/docs/dbengine.myt#database_options

MB> does that solve your problem ?

-- 
Best regards,
 Vasily                            mailto:[EMAIL PROTECTED]

Attachment: encodings.diff
Description: Binary data

# -*- coding: cp1251 -*-

import sqlalchemy

db = sqlalchemy.create_engine('sqlite://', echo=False, echo_uow=False,
    convert_unicode=True, client_encoding='cp1251')

# a table to store companies
companies = sqlalchemy.Table('companies', db, 
    sqlalchemy.Column('company_id', sqlalchemy.Integer, primary_key=True),
    sqlalchemy.Column('name', sqlalchemy.Unicode(50)))
    
class Company(object):
    pass

sqlalchemy.assign_mapper(Company, companies)

companies.create()

# Company(name=u'Some text in cp1251 encoding')
# This lines works perfectly, unicode object is automatically encoded to
# utf8 before going to database
Company(name=u'Êàêîé-òî òåêñò â êîäèðîâêå cp1251')

# This line still works fine:
# It goes to database as is, i.e. as a string and when decoded
# it is a valid utf8 that can be converted to unicode without
# problems
Company(name='Some text in ascii')

# And this line causes problems:
# It goes to database as is, i.e. as a string and when
Company(name='Êàêîé-òî òåêñò â êîäèðîâêå cp1251')

sqlalchemy.objectstore.commit()

sqlalchemy.objectstore.clear()


c = Company.get(1)
print type(c.name)


c = Company.get(2)
# Now we get something funny. We specified name as a string during
# object creation and get it out of database as Unicode.
print type(c.name)

# And this line will crash interpeter because sqlalchemy tries to convert it
# name to Unicode as it was an utf8 and it is not. It is still in cp1251 
# encoding
c2 = Company.get(3)

print c2.name.encode('cp866')

print 'Done'

Reply via email to